ActiveState::Scineplex - Perl extension to access Scineplex code lexer.


ActiveState::Scineplex - Perl extension to access Scineplex code lexer.


  use ActiveState::Scineplex qw(Annotate);
  $color_info = Annotate($code, $lang, %options);


Scineplex is a C library for heuristic parsing of source code in various languages. Scineplex is based on the Scintilla sources. The ActiveState::Scineplex module provide a Perl interface to this library.

Currently this module implements an interface consisting of one function, Annotate, which returns a scineplex-driven colorization for one or more lines of source code. It either returns a string giving the colorization or throws an exception.

    $color_info = Annotate($code, $lang, %options);

The $code is one or more lines of source-code to be analyzed passed as a single string. The lines are separated by any newline sequence.

The $lang argument can be one of 'perl', 'python', 'ruby', 'vbscript', or 'xslt'. The default is 'perl'.

Additional %options can be passed as key/value pairs. The following options are supported (defaults in parentheses):

    outputFormat => 'html' | 'json' | 'line' | 'classic' ('line')
    parsingStartState => number (0) 
    DumpSource => 0 | 1 (0)
    DumpEndState => 0 | 1 (0)
    DumpFoldLevels => 0 | 1 (0)
    StopAfterDataSectionLine1 => 0 | 1 (0)

The outputFormat is the most important option. In classic mode, Annotate echos back each character on the start of a line, followed by separating white-space and its style value:

    $res = Annotate('$abc = 3;', 'perl', outputFormat => 'classic');
    print $res;
    $       12
    a       12
    b       12
    c       12
    chr(32) 0
    =       10
    chr(32) 0
    3       4
    ;       10
    chr(10) 0

Symbolic names for the numeric style values can be looked up in the %SCE_TOKEN hash (exportable). For example $SCE_TOKEN{perl}{12} is the string "SCE_PL_SCALAR".

Setting outputFormat to line gives a terser output, and represents each numeric style with the character corresponding to the style added to the ASCII value of character '0':

    $res = Annotate('$abc = 3;', 'perl', outputFormat => 'line');
    print $res;

Setting outputFormat to html returns an HTML-encoded string containing the original code wrapped in span tags with generic classes with names like "variable", "operator", etc. This kind of output is designed to be wrapped in pre tags, and styled with a CSS file of that contains rules like

    pre span.comments {
      color: 0x696969;
      font-style: italic;

Default text is not placed in a span tag.

Setting outputFormat to json returns a JSON array of arrays. Each one of the inner arrays contains a generic style label together with the span in positions; [$tag, $line, $col, $len]. The returned JSON array will also be valid Perl code and can be converted to a Perl array using Perl's builtin eval function.


    $res = Annotate('$abc = 3;', 'perl', outputFormat => 'json');
    print $res;
    $array = eval $res;

The parsingStartState setting should be used only when you know that the code starts with a given style, such as lines 3-5 of a multi-line string.

The DumpSource flag is used only with line output. It is intended mostly for human consumption, and produces output like the following:

    $res = Annotate('$abc = 3;', 'perl', DumpSource=>1);
    print $res;
    $abc = 3;

The DumpEndState is used only in line mode, and gives the styles for whichever characters constitute the line-end sequence:

    $res = Annotate(qq($abc = 3;\r\n), 'perl', DumpSource=>1, DumpEndState=>1);
    print $res;
    $abc = 3;

The DumpFoldLevels is used only in line mode, and gives the fold levels as a 4-hex-digit sequence in a leading column.

    $res = Annotate(qq(if(1) {\n$abc = 3;\n}\n), 'perl', DumpSource=>1, DumpEndState=>1);
    print $res;
    2400 if(1) {
    0401 $abc = 3
    0401 }

The StopAfterDataSectionLine1 is used only for Perl code in line mode.


Info on scintilla available at


Copyright (C) 2005 by ActiveState Software Inc.