Perl Tutorial
(Komodo IDE only)
Overview
Before You begin
This tutorial assumes:
- That ActivePerl build 623 or greater is installed on your system. ActivePerl is a free distribution of the core Perl language. See Komodo’s Installation Guide for configuration instructions.
- That you have a connection to the Internet.
- That you are interested in Perl. You don’t need to have previous knowledge of Perl; the tutorial will walk you through a simple program and suggest some resources for further information.
Perl Tutorial Scenario
You have exported a number of email messages to a text file. You want to extract the name of the sender and the contents of the email, and convert it to XML format. You intend to eventually transform it to HTML using XSLT. To create an XML file from a text source file, you will use a Perl program that parses the data and places it within XML tags. In this tutorial you will:
- Install a Perl module for parsing text files containing comma-separated values.
- Open the Perl Tutorial Project and associated files.
- Analyze parse.pl the Perl program included in the Tutorial Project.
- Generate output by running the program.
- Debug the program using the Komodo debugger.
Installing Perl Modules Using PPM
One of the great strengths of Perl is the wealth of free modules available for extending the core Perl distribution. ActivePerl includes the Perl Package Manger (PPM) that makes it easy to browse, download and update Perl modules from module repositories on the internet. These modules are added to the core ActivePerl installation.
Running the Perl Package Manager
The Text::CSV_XS Perl module is necessary for this tutorial. To install it using PPM:
- Open the Run Command dialog box. Select Tools > Run Command.
In the Run field, enter the command:
ppm install Text::CSV_XS
Click the Run button to run the command. PPM connects to the default repository, downloads the necessary files and installs them.
About PPM
- PPM can be run directly from the command line with the
ppm
command. Enterppm help
for more information on command-line options. - By default, PPM accesses the Perl Package repository at http://ppm.activestate.com. The ActiveState repository contains binary versions of most packages available from CPAN, the Comprehensive Perl Archive Network.
- More information about PPM is available in the online documentation. PPM documentation is also included with your ActivePerl distribution.
- On Linux systems where ActivePerl has been installed by the super-user (i.e.
root
), most users will not have permissions to install packages with PPM. Runppm
as root at the command line to install packages.
Tip: It is also possible to install Perl modules without PPM using the CPAN shell. See the CPAN FAQ for more information.
Opening Files
Open the Perl Tutorial Project
Select Help > Tutorials > Perl Tutorial.
The tutorial project will open in the Places sidebar.
Open the Perl Tutorial Files
On the Projects sidebar, double-click the files parse.pl
, mailexport.xml
and mailexport.txt
. These files open in the Editor Pane, and a tab at the top of the pane displays their names.
Overview of the Tutorial Files
- mailexport.txt: This file was generated by exporting the contents of an email folder (using the email program’s own Export function) to a comma-separated text file. Notice that the key to the file contents are listed on the first line. The Perl program will use this line as a reference when parsing the email messages.
- parse.pl: This is the Perl program that will parse
mailexport.txt
and generatemailexport.xml
. - mailexport.xml: This file was generated by
parse.pl
, usingmailexport.txt
as input. When you runparse.pl
(in Generating Output), this file will be regenerated.
Analyzing the Program
Introduction
In this step, you will examine the Perl program on a line-by-line basis. Ensure that Line Numbers are enabled in Komodo (View > View Line Numbers). Ensure that the parse.pl
file is displayed in the Komodo Editor Pane.
Setting Up the Program
Line 1 - Shebang Line
- Komodo analyzes this line for hints about what language the file contains.
- Warning messages are enabled with the “-w” switch.
Tip: Syntax elements are displayed in different colors. You can adjust the display options for language elements in the Preferences dialog box.
Lines 2 to 4 - External Modules
- These lines load external Perl modules used by the program.
- Perl module files have a “.pm” extension; “use strict” uses the “strict.pm” module, part of the core Perl distribution.
use Text::CSV_XS
refers to the module installed in Installing Perl Modules Using PPM.
Writing the Output Header
Lines 6 to 7 - Open Files
- Input and output files are opened; if the output file does not exist, it is created.
- Scalar variables, indicated by the “$” symbol, store the files.
- Strict mode (enabled by loading “strict.pm” in line 2) requires that variables be declared using the format “my $variable”.
Tip: Scalar variables store “single” items; their symbol (“$”) is shaped like an “s”, for scalar
.
Lines 9 to 13 - Print the Header to the Output File
- “<<” is a “here document” indicator that defines the string to be printed
- The text “EOT” is arbitrary and user-defined, and defines the beginning and end of the string
- The second EOT on line 13 indicates the end of output
- Lines 10 and 11 are data that will be printed to the output file
Setting Up Input Variables
Lines 15 to 16 - Assign Method Call to Scalar Variable
- The result of the method call “new” is assigned to the scalar variable $csv
- The method “new” is contained in the module Text::CSV_XS
({binary => 1})
tells the method to treat the data as binary
Tip: Good Perl code is liberally annotated with comments (indicated by the “#” symbol).
Lines 18 to 19 - Method “getline”
- The “getline” method is contained in the module Text::CSV_XS, referenced in the $csv scalar variable.
- The “getline” method reads the first line of mailexport.txt (referenced in the $in variable), parses the line into fields, and returns a reference to the resulting array to the $fields variable.
Starting the Processing Loop
Line 21 - “while” Loop
- The “while” statement is conditional
- The condition is “1’, so the program endlessly repeats the loop because the condition is always met
- The logic for breaking out of the loop is on line 25
- The loop is enclosed in braces; the opening brace is on line 21, the closing brace on line 51
Tip: Click on the minus symbol to the left of line 21. The entire section of nested code will be collapsed. This is Code Folding.
Tip: Click the mouse pointer on line 21. Notice that the opening brace changes to a bold red font. The closing brace on line 51 is displayed the same way.
Lines 22 to 25 - Extracting a Line of Input Data
- The “getline” function extracts one line of data from the input file and places it in the
$record
scalar variable. - If “getline” returns an empty array, the input file has been fully processed and the program exits the loop and proceeds to line 52.
Tip: Variable arrays store lists of items indexed by number; their symbol (@
) is shaped like an “a”, for array
.
Converting Characters with a Regular Expression
Lines 27 to 31 - “foreach”
- The “foreach” loop cycles through the elements stored in the
@$record
array. - The regular expressions on lines 29 and 30 find the characters “<” and “&“, and replace them with their character entity values. “<” and “&” are reserved characters in XML.
Tip: Komodo’s Regular Expression (Rx) Toolkit is a powerful tool for creating and debugging regular expressions. See Regular Expressions for more information.
Combining Field Reference and Field Data
Lines 33 to 35 - hash slice
- Line 35 combines the @$record array with the field reference generated in line 19
Tip: Variable hashes are indicated by the symbol “%”, and store lists of items indexed by string.
Writing Data to the Output File
Lines 37 to 50 - Writing Data to the Output File
- One line at a time, lines from the input file are processed and written to the output file.
- Portions of the data line (stored in the $record scalar variable) are extracted based on the corresponding text in the field reference (the first line in the input file, stored in the $fields variable).
Closing the Program
Line 51 - Closing the Processing Loop
- At line 51, processing will loop back to the opening brace on line 21.
- The logic to exit the loop is on line 25.
Lines 52 to 54 - Ending the Program
- Line 52 prints the closing tag to the XML file.
- Line 53 closes the output file or, if it cannot, fails with the error “Can’t write mailexport.xml”.
- Line 54 closes the input file. It is not necessary to check the status when closing the input file because this only fails if the program contains a logic error.
Run the Program to Generate Output
To start, you will simply generate the output by running the program through the debugger without setting any breakpoints.
- Clear the contents of mailexport.xml: Click on the “mailexport.xml” tab in the Editor Pane. Delete the contents of the file - you will regenerate it in the next step. Save the file.
- Run the Debugger: Click on the “parse.pl” tab in the editor. From the menu, select Debug > Go/Continue. In the Debugging Options dialog box,
click OK to accept the defaults. - View the contents of mailexport.xml: Click on the “mailexport.xml” tab in the editor. Komodo informs you that the file has changed. Click OK to reload the file.
Debugging the Program
In this step you’ll add breakpoints to the program and “debug” it. Adding breakpoints lets you to run the program in chunks, making it possible to watch variables and view output as it is generated. Before you begin, ensure that line numbering is enabled in Komodo (View > View Line Numbers).
Tip: Debugger commands can be accessed from the Debug menu, by shortcut keys, or from the Debug Toolbar. For a summary of debugger commands, see Debugger Command List.
Tip: What do the debugger commands do?
- Step In executes the current line of code and pauses at the following line.
- Step Over executes the current line of code. If the line of code calls a function or method, the function or method is executed in the background and the debugger pauses at the line that follows the original line.
- Step Out when the debugger is within a function or method, Step Out will execute the code without stepping through the code line by line. The debugger will stop on the line of code following the function or method call in the calling program.
- Set a breakpoint: On the “parse.pl” tab, click in the grey margin immediately to the left of the code on line 9 of the program. This will set a breakpoint, indicated by a red circle.
- Run the Debugger: Select Debug > Go/Continue. In the Debugging Options dialog box, click OK to accept the defaults. The debugger will process the program until it encounters the first breakpoint.
- Watch the debug process: A yellow arrow on the breakpoint indicates the position at which the debugger has halted. Click on the “mailexport.xml” tab. Komodo informs you that the file has changed. Click OK to reload the file.
- View variables: In the Bottom Pane, see the Debug tab. The variables “$in” and “$out” appear in the Locals tab.
- Line 9 - Step In: Select Debug > Step In. “Step In” is a debugger command that causes the debugger to execute the current line and then stop at the next processing line (notice that the lines between 9 and 13 are raw output indicated by “here” document markers).
- Line 16 - Step In: On line 16, the processing transfers to the module Text::CSV_XS. Komodo opens the file CSV_XS.pm and stops the debugger at the active line in the module.
- Line 61 - Step Out: Select Debug > Step Out. The Step Out command will make the debugger execute the function in Text::CSV_XS and pause at the next line of processing, which is back in
parse.pl
on line 19. - Line 19 - Step Over: Select Debug > Step Over. The debugger will process the function in line 19 without opening the module containing the “getline” function.
- Line 22 - Set Another Breakpoint: After the debugger stops at line 21, click in the grey margin immediately to the left of the code on line 22 to set another breakpoint.
- Line 22 - Step Out: It appears that nothing happened. However, the debugger actually completed one iteration of the “while loop” (from lines 21 to 51). To see how this works, set another breakpoint at line 37, and Step Out again. The debugger will stop at line 37. On the Debug Session tab, look at the data assigned to the
$record
variable. Then Step Out, and notice that$record
is no longer displayed, and the debugger is back on line 21. Step Out again, and look at the$record
variable - it now contains data from the next record in the input file. - Line 37 - Stop the Debugger: Select Debug > Stop to stop the Komodo debugger.
Note: The perl debugger will not break on certain parts of control structures, such as lines containing only braces ( {
}
).
With Perl 5.6 and earlier, the debugger will also not break at the start of while
, until
, for
, or foreach
statements.
Tip: Output was not written to mailexport.xml after every iteration of the while loop, because Perl maintains an internal buffer for writing to files. You can set the buffer to “autoflush” using the special Perl variable $|
.
More Perl Resources
Documentation
There is a wealth of documentation available for Perl. The first source for language documentation is the Perl distribution installed on your system. To access the documentation contained in the Perl distribution, use the following commands:
- Open the Run Command dialog box (Tools > Run Command), and then type
perldoc perldoc
. A description of theperldoc
command will be displayed on your screen. Perldoc is used to navigate the documentation contained in your Perl distribution.
Documentation for the latest version of ActivePerl is available online at: http://docs.activestate.com/activeperl/
Tutorials and Reference Sites
There are many Perl tutorials and beginner Perl sites on the Internet, such as learn.perl.org, which provides book reviews, tips, and access to Perl news lists and books.