You are here: Home > Dive Into Python > Dynamic functions > plural.py, stage 1 | << >> | ||||
Dive Into PythonPython from novice to pro |
So you're looking at words, which at least in English are strings of characters. And you have rules that say you need to find different combinations of characters, and then do different things to them. This sounds like a job for regular expressions.
Example 17.1. plural1.py
import re def plural(noun): if re.search('[sxz]$', noun): return re.sub('$', 'es', noun) elif re.search('[^aeioudgkprt]h$', noun): return re.sub('$', 'es', noun) elif re.search('[^aeiou]y$', noun): return re.sub('y$', 'ies', noun) else: return noun + 's'
OK, this is a regular expression, but it uses a syntax you didn't see in Chapter 7, Regular Expressions. The square brackets mean “match exactly one of these characters”. So [sxz] means “s, or x, or z”, but only one of them. The $ should be familiar; it matches the end of string. So you're checking to see if noun ends with s, x, or z. | |
This re.sub function performs regular expression-based string substitutions. Let's look at it in more detail. |
Example 17.2. Introducing re.sub
>>> import re >>> re.search('[abc]', 'Mark') <_sre.SRE_Match object at 0x001C1FA8> >>> re.sub('[abc]', 'o', 'Mark') 'Mork' >>> re.sub('[abc]', 'o', 'rock') 'rook' >>> re.sub('[abc]', 'o', 'caps') 'oops'
Example 17.3. Back to plural1.py
import re def plural(noun): if re.search('[sxz]$', noun): return re.sub('$', 'es', noun) elif re.search('[^aeioudgkprt]h$', noun): return re.sub('$', 'es', noun) elif re.search('[^aeiou]y$', noun): return re.sub('y$', 'ies', noun) else: return noun + 's'
Example 17.4. More on negation regular expressions
>>> import re >>> re.search('[^aeiou]y$', 'vacancy') <_sre.SRE_Match object at 0x001C1FA8> >>> re.search('[^aeiou]y$', 'boy') >>> >>> re.search('[^aeiou]y$', 'day') >>> >>> re.search('[^aeiou]y$', 'pita') >>>
Example 17.5. More on re.sub
>>> re.sub('y$', 'ies', 'vacancy') 'vacancies' >>> re.sub('y$', 'ies', 'agency') 'agencies' >>> re.sub('([^aeiou])y$', r'\1ies', 'vacancy') 'vacancies'
This regular expression turns vacancy into vacancies and agency into agencies, which is what you wanted. Note that it would also turn boy into boies, but that will never happen in the function because you did that re.search first to find out whether you should do this re.sub. | |
Just in passing, I want to point out that it is possible to combine these two regular expressions (one to find out if the rule applies, and another to actually apply it) into a single regular expression. Here's what that would look like. Most of it should look familiar: you're using a remembered group, which you learned in Section 7.6, “Case study: Parsing Phone Numbers”, to remember the character before the y. Then in the substitution string, you use a new syntax, \1, which means “hey, that first group you remembered? put it here”. In this case, you remember the c before the y, and then when you do the substitution, you substitute c in place of c, and ies in place of y. (If you have more than one remembered group, you can use \2 and \3 and so on.) |
Regular expression substitutions are extremely powerful, and the \1 syntax makes them even more powerful. But combining the entire operation into one regular expression is also much harder to read, and it doesn't directly map to the way you first described the pluralizing rules. You originally laid out rules like “if the word ends in S, X, or Z, then add ES”. And if you look at this function, you have two lines of code that say “if the word ends in S, X, or Z, then add ES”. It doesn't get much more direct than that.
<< Dynamic functions |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
plural.py, stage 2 >> |