Using the Rx Toolkit Komodo IDE only

Tutorial

Feature Showcase

Komodo's Rx Toolkit is a tool for building, editing and debugging regular expressions. Build a regular expression in the Regular Expression pane, and enter the sample search string in the Search Text pane. Adjust the regular expression as necessary to produce the desired matches in the Match Results pane.

If you are new to Regular Expressions, see the Regular Expressions Primer. In addition, online references for Python, Perl, PHP, Ruby, and Tcl regular expressions are available via the Rx Toolkit's Help menu.

To open the Rx Toolkit, click in the toolbar, or select Tools|Rx Toolkit.

Rx Toolkit Quick Reference

Selecting a Language

Regular expression syntax is similar in most programming languages, but there are some important differences. The Rx Toolkit can evaluate regular expressions using:

Select the regular expression engine you want to use from the drop-down list in the Regular Expression panel. You will need the requisite Perl, PHP, and Ruby interpreters installed to use those engines. Komodo has embedded interpreters for JavaScript and Python.

Creating Regular Expressions

Type or paste your expression in the Regular Expression pane. A regular expression can include metacharacters, anchors, quantifiers, digits, and alphanumeric characters.

Many of the display characteristics available in the Editor Pane can be enabled in the Regular Expression field. However, these characteristics must be manually enabled via key bindings. For example, to display line numbers in the Regular Expression field, press 'Ctrl'+'Shift'+'6' ('Cmd'+'Shift'+'6' on Mac OS X) if the default key binding scheme is in effect.

Note: Do not enclose regular expressions in forward slashes ("/"). The Rx Toolkit does not recognize enclosing slashes.

Adding Metacharacters to a Regular Expression

The Shortcuts menu provides a list of all of the metacharacters that are valid in the selected language.

To add a metacharacter to a regular expression:

  1. Click Shortcuts to the right of the Regular Expression pane.
  2. Select a metacharacter from the list. The metacharacter is added to the Regular Expression pane.

If you already know the metacharacter you need, just type it in the Regular Expression pane.

Setting the Match Type

The buttons at the top of the Rx Toolkit Window determine which function is used to match the regular expression to the search text. The options are based on module-level functions of Python's re module. Choose from the following options:

Adding Modifiers to a Regular Expression

Add modifiers to regular expression by selecting one or more of the check boxes to the right of the Regular Expression pane:

Note: You must use the Modifiers check boxes to add modifiers to a regular expression. The Rx Toolkit does not recognize modifiers entered in the Regular Expression pane.

Evaluating Regular Expressions

A debugged regular expression correctly matches the intended patterns and provides information about which variable contains which pattern.

If there is a match...

If there is no match...

Match Results

If a regular expression collects multiple words, phrases or numbers and stores them in groups, the Match Results pane displays details of the contents of each group.

The Match Results pane is displayed with as many as four columns, depending on which match type is selected. The Group column displays a folder for each match; the folders contain numbered group variables. The Span column displays the length in characters of each match. The Value column lists the values of each variable.

Modifier Examples

This section shows some sample regular expressions with various modifiers applied. In all examples, the default match type (Match All) is assumed:

Using Ignore Case

The Ignore case modifier ignores alphabetic case distinctions while matching. Use this when you do not want to specify the case in the pattern you are trying to match.

To match the following test string...

Testing123

...you could use the following regular expression with Ignore case selected:

^([a-z]+)(\d+)

The following results are displayed in the Match Results pane:

Discussion

This regular expression matches the entire test string.

The ^ matches the beginning of a string. The [a-z] matches any lowercase letter from "a" to "z". The + matches any lowercase letter from "a" to "z" one or more times. The Ignore case modifier lets the regular expression match any uppercase or lowercase letters. Therefore ^([a-z]+) matches "Testing". The (\d+)matches any digit one or more times, so it matches "123".

Using Multi-Line Mode

The Multi-line modifier allows ^ and $ to match next to newline characters. Use this when a pattern is more than one line long and has at least one newline character.

To match the subject part of the following test string...

"okay?"

...you could use the following regular expression with Multi-line selected:

^(\"okay\?\")

The following results are displayed in the Match Results pane:

Discussion

This regular expression matches the entire test string.

The ^ matches the beginning of any line. The \" matches the double quotes in the test string. The string matches the literal word "okay". The \? matches the question mark "?". The \" matches the terminal double quotes. There is only one variable group in this regular expression, and it contains the entire test string.

Using Single-Line Mode

The Single-line modifier mode allows "." to match newline characters. Use this when a pattern is more than one line long, has at least one newline character, and you want to match newline characters.

To match the following test string...

Subject: Why did this
work?

...you could use the following regular expression with Single-line selected:

(:[\t ]+)(.*)work\?

The following results are displayed in the Match Results pane:

Discussion

This regular expression matches everything in the test string following the word "Subject", including the colon and the question mark.

The (\s+) matches any space one or more times, so it matches the space after the colon. The (.*) matches any character zero or more times, and the Single-line modifier allows the period to match the newline character. Therefore (.*) matches "Why did this <newline> match". The \? matches the terminal question mark "?".

Using Multi-line Mode and Single-line Mode

To match more of the following test string...

Subject: Why did this
work?

...you would need both the Multi-line and Single-line modifiers selected for this regular expression:

([\t ]+)(.*)^work\?

The following results are displayed in the Match Results pane:

Discussion

This regular expression matches everything in the test string following the word "Subject", including the colon and the question mark.

The ([\t ]+) matches a Tab character or a space one or more times, which matches the space after the colon. The (.*) matches any character zero or more times, which matches "Why did this <newline>". The ^work matches the literal "work" on the second line. The \? matches the terminal question mark "?".

If you used only the Single-line modifier, this match would fail because the caret "^" would only match the beginning of a string.

If you used only the Multi-line modifier, this match would fail because the period "." would not match the newline character.

Using Verbose

The Verbose modifier ignores whitespace and comments in the regular expression. Use this when you want to pretty print and/or add comments to a regular expression.

To match the following test string...

testing123

...you could use the following regular expression with the Verbose modifier selected:

(.*?)  (\d+)  # this matches testing123

The following results are displayed in the Match Results pane:

Discussion

This regular expression matches the entire test string.

The .* matches any character zero or more times, the ? makes the * not greedy, and the Verbose modifier ignores the spaces after the (.*?). Therefore, (.*?) matches "testing" and populates the "Group 1" variable. The (\d+) matches any digit one or more times, so this matches "123" and populates the "Group 2" variable. The Verbose modifier ignores the spaces after (\d+) and ignores the comments at the end of the regular expression.

Using Regular Expressions

Once a regular expression has been built and debugged, you can add it to your code by copying and pasting the regular expression into the Komodo Editor Pane. Each language is a little different in the way it incorporates regular expressions. The following are examples of regular expressions used in Perl, Python, PHP and Tcl.

Perl

This Perl code uses a regular expression to match two different spellings of the same word. In this case the program prints all instances of "color" and "colour".

while($word = <STDIN>){
    print "$word" if ($word =~ /colou?r/i );
}

The metacharacter "?" specifies that the preceding character, "u", occurs zero or one times. The modifier "i" (ignore case) that follows /colou?r/ means that the regular expression will match $word, regardless of whether the specified characters are uppercase or lowercase (for example, Color, COLOR and CoLour will all match).

Python

This Python code uses a regular expression to match a pattern in a string. In Python, regular expressions are available via the re module.

import re
m = re.search("Java[Ss]cript", "in the JavaScript tutorial")
if m:
    print "matches:", m.group()
else:
    print "Doesn't match."

The re.search() function returns a match object if the regular expression matches; otherwise, it returns none. The character class "[Ss]" is used to find the word "JavaScript", regardless of whether the "s" is capitalized. If there is a match, the script uses the group() method to return the matching strings. Otherwise the program prints "Doesn't Match".

Tcl

This Tcl code uses a regular expression to match all lines in a document that contain a URL.

while {[gets $doc line]!=-1} {
   if {regexp -nocase {www\..*\.com} $line} {
       puts $line

This while loop searches every line in a file for any instance of a URL and displays the results. Tcl implements regular expressions using the regexp and regsub commands. In the example shown above, the regexp is followed by the -nocase option, which specifies that the following regular expression should match, regardless of case. The regular expression attempts to match all web addresses. Notice the use of backslashes to include the literal dots (.) that follow "www" and precede "com".

PHP

This PHP code uses a Perl Compatible Regular Expressions(PCRE) to search for valid phone numbers in the United States and Canada; that is, numbers with a three-digit area code, followed by an additional seven digits.

$numbers = array("777-555-4444",
                  "800-123-4567",
                 "(999)555-1111",
                 "604.555.1212",
                 "555-1212",
                 "This is not a number",
                 "1234-123-12345",
                 "123-123-1234a",
                 "abc-123-1234");

function isValidPhoneNumber($number) {
    return preg_match("/\(?\d{3}\)?[-\s.]?\d{3}[-\s.]\d{4}$/x", $number);
}

foreach ($numbers as $number) {
    if (isValidPhoneNumber($number)) {
        echo "The number '$number' is valid\n";
    } else {
        echo "The number '$number' is not valid\n";
    }
}

This PHP example uses the preg_match function for matching regular expressions. Other PCRE functions are also available. If the function isValidPhone returns true, the program outputs a statement that includes the valid phone number. Otherwise, it outputs a statement advising that the number is not valid.

Ruby

This Ruby code uses a regular expression that does simple email addresses validation.

puts 'Enter your email address:'
cmdline = gets.chomp
addrtest = /^\w{1,30}\@\w{1,30}\.\w{2,4}$/i
if addrtest.match(cmdline)
  puts 'Address is valid.'
else
  puts 'Address is NOT valid.'
  tries = tries - 1
end

The regular expression is stored in the addrtest variable, which is compared to the cmdline variable using the match method. The regular expression specifies that the address must have:

It will accept some addresses that are not fully RFC compliant, and will not accept long user or domain names. A more robust regular expression is:

 /^([a-z0-9]+[._]?){1,}[a-z0-9]+\@(([a-z0-9]+[-]?){1,}[a-z0-9]+\.){1,}[a-z]{2,4}$/i

This does not limit the length of the username or domain. It also enforces some additional requirements:

Writing a regular expression that is fully compliant with RFC 822 is quite a bit more complicated.