As stated earlier, regular expressions use the backslash character ("\") to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python's usage of the same character for the same purpose in string literals.
Let's say you want to write a RE that matches the string
"\section", which might be found in a LaTeX file. To figure
out what to write in the program code, start with the desired string
to be matched. Next, you must escape any backslashes and other
metacharacters by preceding them with a backslash, resulting in the
string "\\section". The resulting string that must be passed
to re.compile() must be
\\section. However, to
express this as a Python string literal, both backslashes must be
||Text string to be matched|
||Escaped backslash for re.compile|
||Escaped backslashes for a string literal|
In short, to match a literal backslash, one has to write
'\\\\' as the RE string, because the regular expression
must be "\\", and each backslash must be expressed as
"\\" inside a regular Python string literal. In REs that
feature backslashes repeatedly, this leads to lots of repeated
backslashes and makes the resulting strings difficult to understand.
The solution is to use Python's raw string notation for regular
expressions; backslashes are not handled in any special way in
a string literal prefixed with "r", so
r"\n" is a
two-character string containing "\" and "n",
"\n" is a one-character string containing a newline.
Frequently regular expressions will be expressed in Python
code using this raw string notation.
|Regular String||Raw string|