Perl 5 version 32.0 documentation

split

  • split /PATTERN/,EXPR,LIMIT

  • split /PATTERN/,EXPR
  • split /PATTERN/
  • split

    Splits the string EXPR into a list of strings and returns the list in list context, or the size of the list in scalar context. (Prior to Perl 5.11, it also overwrote @_ with the list in void and scalar context. If you target old perls, beware.)

    If only PATTERN is given, EXPR defaults to $_ .

    Anything in EXPR that matches PATTERN is taken to be a separator that separates the EXPR into substrings (called "fields") that do not include the separator. Note that a separator may be longer than one character or even have no characters at all (the empty string, which is a zero-width match).

    The PATTERN need not be constant; an expression may be used to specify a pattern that varies at runtime.

    If PATTERN matches the empty string, the EXPR is split at the match position (between characters). As an example, the following:

    1. print join(':', split(/b/, 'abc')), "\n";

    uses the b in 'abc' as a separator to produce the output a:c . However, this:

    1. print join(':', split(//, 'abc')), "\n";

    uses empty string matches as separators to produce the output a:b:c ; thus, the empty string may be used to split EXPR into a list of its component characters.

    As a special case for split, the empty pattern given in match operator syntax (// ) specifically matches the empty string, which is contrary to its usual interpretation as the last successful match.

    If PATTERN is /^/ , then it is treated as if it used the multiline modifier (/^/m ), since it isn't much use otherwise.

    /m and any of the other pattern modifiers valid for qr (summarized in qr/STRING/msixpodualn in perlop) may be specified explicitly.

    As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20" , but not e.g. / / ). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator. However, this special treatment can be avoided by specifying the pattern / / instead of the string " " , thereby allowing only a single space character to be a separator. In earlier Perls this special case was restricted to the use of a plain " " as the pattern argument to split; in Perl 5.18.0 and later this special case is triggered by any expression which evaluates to the simple string " " .

    As of Perl 5.28, this special-cased whitespace splitting works as expected in the scope of use feature 'unicode_strings . In previous versions, and outside the scope of that feature, it exhibits The Unicode Bug in perlunicode: characters that are whitespace according to Unicode rules but not according to ASCII rules can be treated as part of fields rather than as field separators, depending on the string's internal encoding.

    If omitted, PATTERN defaults to a single space, " " , triggering the previously described awk emulation.

    If LIMIT is specified and positive, it represents the maximum number of fields into which the EXPR may be split; in other words, LIMIT is one greater than the maximum number of times EXPR may be split. Thus, the LIMIT value 1 means that EXPR may be split a maximum of zero times, producing a maximum of one field (namely, the entire value of EXPR). For instance:

    1. print join(':', split(//, 'abc', 1)), "\n";

    produces the output abc , and this:

    1. print join(':', split(//, 'abc', 2)), "\n";

    produces the output a:bc , and each of these:

    1. print join(':', split(//, 'abc', 3)), "\n";
    2. print join(':', split(//, 'abc', 4)), "\n";

    produces the output a:b:c .

    If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.

    If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case). Thus, the following:

    1. print join(':', split(/,/, 'a,b,c,,,')), "\n";

    produces the output a:b:c , but the following:

    1. print join(':', split(/,/, 'a,b,c,,,', -1)), "\n";

    produces the output a:b:c:::.

    In time-critical applications, it is worthwhile to avoid splitting into more fields than necessary. Thus, when assigning to a list, if LIMIT is omitted (or zero), then LIMIT is treated as though it were one larger than the number of variables in the list; for the following, LIMIT is implicitly 3:

    1. my ($login, $passwd) = split(/:/);

    Note that splitting an EXPR that evaluates to the empty string always produces zero fields, regardless of the LIMIT specified.

    An empty leading field is produced when there is a positive-width match at the beginning of EXPR. For instance:

    1. print join(':', split(/ /, ' abc')), "\n";

    produces the output :abc . However, a zero-width match at the beginning of EXPR never produces an empty field, so that:

    1. print join(':', split(//, ' abc'));

    produces the output :a:b:c (rather than : :a:b:c).

    An empty trailing field, on the other hand, is produced when there is a match at the end of EXPR, regardless of the length of the match (of course, unless a non-zero LIMIT is given explicitly, such fields are removed, as in the last example). Thus:

    1. print join(':', split(//, ' abc', -1)), "\n";

    produces the output :a:b:c: .

    If the PATTERN contains capturing groups, then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences); if any group does not match, then it captures the undef value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards the LIMIT. Consider the following expressions evaluated in list context (each returned list is provided in the associated comment):

    1. split(/-|,/, "1-10,20", 3)
    2. # ('1', '10', '20')
    3. split(/(-|,)/, "1-10,20", 3)
    4. # ('1', '-', '10', ',', '20')
    5. split(/-|(,)/, "1-10,20", 3)
    6. # ('1', undef, '10', ',', '20')
    7. split(/(-)|,/, "1-10,20", 3)
    8. # ('1', '-', '10', undef, '20')
    9. split(/(-)|(,)/, "1-10,20", 3)
    10. # ('1', '-', undef, '10', undef, ',', '20')