This is Info file perl.info, produced by Makeinfo version 1.68 from the input file bigperl.texi. settitle perl  File: perl.info, Node: perltrap, Next: perlport, Prev: perlsec, Up: Top Perl traps for the unwary ************************* NAME ==== perltrap - Perl traps for the unwary DESCRIPTION =========== The biggest trap of all is forgetting to use the -w switch; see *Note Perlrun: perlrun,. The second biggest trap is not making your entire program runnable under `use strict'. The third biggest trap is not reading the list of changes in this version of Perl; see *Note Perldelta: perldelta,. Awk Traps --------- Accustomed *awk* users should take special note of the following: * The English module, loaded via use English; allows you to refer to special variables (like $/) with names (like $RS), as though they were in *awk*; see *Note Perlvar: perlvar, for details. * Semicolons are required after all simple statements in Perl (except at the end of a block). Newline is not a statement delimiter. * Curly brackets are required on ifs and whiles. * Variables begin with "$", "@" or "%" in Perl. * Arrays index from 0. Likewise string positions in substr() and index(). * You have to decide whether your array has numeric or string indices. * Hash values do not spring into existence upon mere reference. * You have to decide whether you want to use string or numeric comparisons. * Reading an input line does not split it for you. You get to split it to an array yourself. And the split() operator has different arguments than *awk*'s. * The current input line is normally in $_, not $0. It generally does not have the newline stripped. ($0 is the name of the program executed.) See *Note Perlvar: perlvar,. * $ does not refer to fields-it refers to substrings matched by the last match pattern. * The print() statement does not add field and record separators unless you set $, and $\. You can set $OFS and $ORS if you're using the English module. * You must open your files before you print to them. * The range operator is "..", not comma. The comma operator works as in C. * The match operator is "=~", not "~". ("~" is the one's complement operator, as in C.) * The exponentiation operator is "**", not "^". "^" is the XOR operator, as in C. (You know, one could get the feeling that *awk* is basically incompatible with C.) * The concatenation operator is ".", not the null string. (Using the null string would render `/pat/ /pat/' unparsable, because the third slash would be interpreted as a division operator-the tokenizer is in fact slightly context sensitive for operators like "/", "?", and ">". And in fact, "." itself can be the beginning of a number.) * The next, exit, and continue keywords work differently. * The following variables work differently: Awk Perl ARGC $#ARGV or scalar @ARGV ARGV[0] $0 FILENAME $ARGV FNR $. - something FS (whatever you like) NF $#Fld, or some such NR $. OFMT $# OFS $, ORS $\ RLENGTH length($&) RS $/ RSTART length($`) SUBSEP $; * You cannot set $RS to a pattern, only a string. * When in doubt, run the *awk* construct through a2p and see what it gives you. C Traps ------- Cerebral C programmers should take note of the following: * Curly brackets are required on if's and while's. * You must use `elsif' rather than `else if'. * The break and continue keywords from C become in Perl last and next, respectively. Unlike in C, these do not work within a `do { } while' construct. * There's no switch statement. (But it's easy to build one on the fly.) * Variables begin with "$", "@" or "%" in Perl. * printf() does not implement the "*" format for interpolating field widths, but it's trivial to use interpolation of double-quoted strings to achieve the same effect. * Comments begin with "#", not "/*". * You can't take the address of anything, although a similar operator in Perl is the backslash, which creates a reference. * `ARGV' must be capitalized. `$ARGV[0]' is C's `argv[1]', and `argv[0]' ends up in $0. * System calls such as link(), unlink(), rename(), etc. return nonzero for success, not 0. * Signal handlers deal with signal names, not numbers. Use `kill -l' to find their names on your system. Sed Traps --------- Seasoned *sed* programmers should take note of the following: * Backreferences in substitutions use "$" rather than "\". * The pattern matching metacharacters "(", ")", and "|" do not have backslashes in front. * The range operator is ..., rather than comma. Shell Traps ----------- Sharp shell programmers should take note of the following: * The backtick operator does variable interpolation without regard to the presence of single quotes in the command. * The backtick operator does no translation of the return value, unlike *csh*. * Shells (especially *csh*) do several levels of substitution on each command line. Perl does substitution in only certain constructs such as double quotes, backticks, angle brackets, and search patterns. * Shells interpret scripts a little bit at a time. Perl compiles the entire program before executing it (except for BEGIN blocks, which execute at compile time). * The arguments are available via @ARGV, not $1, $2, etc. * The environment is not automatically made available as separate scalar variables. Perl Traps ---------- Practicing Perl Programmers should take note of the following: * Remember that many operations behave differently in a list context than they do in a scalar one. See *Note Perldata: perldata, for details. * Avoid barewords if you can, especially all lowercase ones. You can't tell by just looking at it whether a bareword is a function or a string. By using quotes on strings and parentheses on function calls, you won't ever get them confused. * You cannot discern from mere inspection which builtins are unary operators (like chop() and chdir()) and which are list operators (like print() and unlink()). (User-defined subroutines can be *only* list operators, never unary ones.) See *Note Perlop: perlop,. * People have a hard time remembering that some functions default to $_, or @ARGV, or whatever, but that others which you might expect to do not. * The construct is not the name of the filehandle, it is a readline operation on that handle. The data read is assigned to $_ only if the file read is the sole condition in a while loop: while () { } while (defined($_ = )) { }.. ; # data discarded! * Remember not to use = when you need `=~'; these two constructs are quite different: $x = /foo/; $x =~ /foo/; * The `do {}' construct isn't a real loop that you can use loop control on. * Use my() for local variables whenever you can get away with it (but see *Note Perlform: perlform, for where you can't). Using `local()' actually gives a local value to a global variable, which leaves you open to unforeseen side-effects of dynamic scoping. * If you localize an exported variable in a module, its exported value will not change. The local name becomes an alias to a new value but the external name is still an alias for the original. Perl4 to Perl5 Traps -------------------- Practicing Perl4 Programmers should take note of the following Perl4-to-Perl5 specific traps. They're crudely ordered according to the following list: Discontinuance, Deprecation, and BugFix traps Anything that's been fixed as a perl4 bug, removed as a perl4 feature or deprecated as a perl4 feature with the intent to encourage usage of some other perl5 feature. Parsing Traps Traps that appear to stem from the new parser. Numerical Traps Traps having to do with numerical or mathematical operators. General data type traps Traps involving perl standard data types. Context Traps - scalar, list contexts Traps related to context within lists, scalar statements/declarations. Precedence Traps Traps related to the precedence of parsing, evaluation, and execution of code. General Regular Expression Traps using s///, etc. Traps related to the use of pattern matching. Subroutine, Signal, Sorting Traps Traps related to the use of signals and signal handlers, general subroutines, and sorting, along with sorting subroutines. OS Traps OS-specific traps. DBM Traps Traps specific to the use of `dbmopen()', and specific dbm implementations. Unclassified Traps Everything else. If you find an example of a conversion trap that is not listed here, please submit it to Bill Middleton <`wjm@best.com'> for inclusion. Also note that at least some of these can be caught with the `use warnings' pragma or the -w switch. Discontinuance, Deprecation, and BugFix traps --------------------------------------------- Anything that has been discontinued, deprecated, or fixed as a bug from perl4. * Discontinuance Symbols starting with "_" are no longer forced into package main, except for $_ itself (and `@_', etc.). package test; $_legacy = 1; package main; print "\$_legacy is ",$_legacy,"\n"; # perl4 prints: $_legacy is 1 # perl5 prints: $_legacy is * Deprecation Double-colon is now a valid package separator in a variable name. Thus these behave differently in perl4 vs. perl5, because the packages don't exist. $a=1;$b=2;$c=3;$var=4; print "$a::$b::$c "; print "$var::abc::xyz\n"; # perl4 prints: 1::2::3 4::abc::xyz # perl5 prints: 3 Given that `::' is now the preferred package delimiter, it is debatable whether this should be classed as a bug or not. (The older package delimiter, ' ,is used here) $x = 10 ; print "x=${'x}\n" ; # perl4 prints: x=10 # perl5 prints: Can't find string terminator "'" anywhere before EOF You can avoid this problem, and remain compatible with perl4, if you always explicitly include the package name: $x = 10 ; print "x=${main'x}\n" ; Also see precedence traps, for parsing $:. * BugFix The second and third arguments of `splice()' are now evaluated in scalar context (as the Camel says) rather than list context. sub sub1{return(0,2) } # return a 2-element list sub sub2{ return(1,2,3)} # return a 3-element list @a1 = ("a","b","c","d","e"); @a2 = splice(@a1,&sub1,&sub2); print join(' ',@a2),"\n"; # perl4 prints: a b # perl5 prints: c d e * Discontinuance You can't do a goto into a block that is optimized away. Darn. goto marker1; for(1){ marker1: print "Here I is!\n"; } # perl4 prints: Here I is! # perl5 dumps core (SEGV) * Discontinuance It is no longer syntactically legal to use whitespace as the name of a variable, or as a delimiter for any kind of quote construct. Double darn. $a = ("foo bar"); $b = q baz ; print "a is $a, b is $b\n"; # perl4 prints: a is foo bar, b is baz # perl5 errors: Bareword found where operator expected * Discontinuance The archaic while/if BLOCK BLOCK syntax is no longer supported. if { 1 } { print "True!"; } else { print "False!"; } # perl4 prints: True! # perl5 errors: syntax error at test.pl line 1, near "if {" * BugFix The `**' operator now binds more tightly than unary minus. It was documented to work this way before, but didn't. print -4**2,"\n"; # perl4 prints: 16 # perl5 prints: -16 * Discontinuance The meaning of `foreach{}' has changed slightly when it is iterating over a list which is not an array. This used to assign the list to a temporary array, but no longer does so (for efficiency). This means that you'll now be iterating over the actual values, not over copies of the values. Modifications to the loop variable can change the original values. @list = ('ab','abc','bcd','def'); foreach $var (grep(/ab/,@list)){ $var = 1; } print (join(':',@list)); # perl4 prints: ab:abc:bcd:def # perl5 prints: 1:1:bcd:def To retain Perl4 semantics you need to assign your list explicitly to a temporary array and then iterate over that. For example, you might need to change foreach $var (grep(/ab/,@list)){ to foreach $var (@tmp = grep(/ab/,@list)){ Otherwise changing $var will clobber the values of @list. (This most often happens when you use $_ for the loop variable, and call subroutines in the loop that don't properly localize $_.) * Discontinuance split with no arguments now behaves like `split ' '' (which doesn't return an initial null field if $_ starts with whitespace), it used to behave like `split /\s+/' (which does). $_ = ' hi mom'; print join(':', split); # perl4 prints: :hi:mom # perl5 prints: hi:mom * BugFix Perl 4 would ignore any text which was attached to an -e switch, always taking the code snippet from the following arg. Additionally, it would silently accept an -e switch without a following arg. Both of these behaviors have been fixed. perl -e'print "attached to -e"' 'print "separate arg"' # perl4 prints: separate arg # perl5 prints: attached to -e perl -e # perl4 prints: # perl5 dies: No code specified for -e. * Discontinuance In Perl 4 the return value of push was undocumented, but it was actually the last value being pushed onto the target list. In Perl 5 the return value of push is documented, but has changed, it is the number of elements in the resulting list. @x = ('existing'); print push(@x, 'first new', 'second new'); # perl4 prints: second new # perl5 prints: 3 * Deprecation Some error messages will be different. * Discontinuance Some bugs may have been inadvertently removed. :-) Parsing Traps ------------- Perl4-to-Perl5 traps from having to do with parsing. * Parsing Note the space between . and = $string . = "more string"; print $string; # perl4 prints: more string # perl5 prints: syntax error at - line 1, near ". =" * Parsing Better parsing in perl 5 sub foo {} &foo print("hello, world\n"); # perl4 prints: hello, world # perl5 prints: syntax error * Parsing "if it looks like a function, it is a function" rule. print ($foo == 1) ? "is one\n" : "is zero\n"; # perl4 prints: is zero # perl5 warns: "Useless use of a constant in void context" if using -w * Parsing String interpolation of the `$#array' construct differs when braces are to used around the name. @ = (1..3); print "${#a}"; # perl4 prints: 2 # perl5 fails with syntax error @ = (1..3); print "$#{a}"; # perl4 prints: {a} # perl5 prints: 2 Numerical Traps --------------- Perl4-to-Perl5 traps having to do with numerical operators, operands, or output from same. * Numerical Formatted output and significant digits print 7.373504 - 0, "\n"; printf "%20.18f\n", 7.373504 - 0; # Perl4 prints: 7.375039999999996141 7.37503999999999614 # Perl5 prints: 7.373504 7.37503999999999614 * Numerical This specific item has been deleted. It demonstrated how the auto-increment operator would not catch when a number went over the signed int limit. Fixed in version 5.003_04. But always be wary when using large integers. If in doubt: use Math::BigInt; * Numerical Assignment of return values from numeric equality tests does not work in perl5 when the test evaluates to false (0). Logical tests now return an null, instead of 0 $p = ($test == 1); print $p,"\n"; # perl4 prints: 0 # perl5 prints: Also see `"General Regular Expression Traps using s' in this node for another example of this new feature... * Bitwise string ops When bitwise operators which can operate upon either numbers or strings (`& | ^ ~') are given only strings as arguments, perl4 would treat the operands as bitstrings so long as the program contained a call to the vec() function. perl5 treats the string operands as bitstrings. (See `Bitwise String Operators', *Note Perlop: perlop, for more details.) $fred = "10"; $barney = "12"; $betty = $fred & $barney; print "$betty\n"; # Uncomment the next line to change perl4's behavior # ($dummy) = vec("dummy", 0, 0); # Perl4 prints: 8 # Perl5 prints: 10 # If vec() is used anywhere in the program, both print: 10 General data type traps ----------------------- Perl4-to-Perl5 traps involving most data-types, and their usage within certain expressions and/or context. * (Arrays) Negative array subscripts now count from the end of the array. @a = (1, 2, 3, 4, 5); print "The third element of the array is $a[3] also expressed as $a[-2] \n"; # perl4 prints: The third element of the array is 4 also expressed as # perl5 prints: The third element of the array is 4 also expressed as 4 * (Arrays) Setting `$#array' lower now discards array elements, and makes them impossible to recover. @a = (a,b,c,d,e); print "Before: ",join('',@a); $#a =1; print ", After: ",join('',@a); $#a =3; print ", Recovered: ",join('',@a),"\n"; # perl4 prints: Before: abcde, After: ab, Recovered: abcd # perl5 prints: Before: abcde, After: ab, Recovered: ab * (Hashes) Hashes get defined before use local($s,@a,%h); die "scalar \$s defined" if defined($s); die "array \@a defined" if defined(@a); die "hash \%h defined" if defined(%h); # perl4 prints: # perl5 dies: hash %h defined Perl will now generate a warning when it sees defined(@a) and defined(%h). * (Globs) glob assignment from variable to variable will fail if the assigned variable is localized subsequent to the assignment @a = ("This is Perl 4"); *b = *a; local(@a); print @b,"\n"; # perl4 prints: This is Perl 4 # perl5 prints: * (Globs) Assigning undef to a glob has no effect in Perl 5. In Perl 4 it undefines the associated scalar (but may have other side effects including SEGVs). * (Scalar String) Changes in unary negation (of strings) This change effects both the return value and what it does to auto(magic)increment. $x = "aaa"; print ++$x," : "; print -$x," : "; print ++$x,"\n"; # perl4 prints: aab : -0 : 1 # perl5 prints: aab : -aab : aac * (Constants) perl 4 lets you modify constants: $foo = "x"; &mod($foo); for ($x = 0; $x < 3; $x++) { &mod("a"); } sub mod { print "before: $_[0]"; $_[0] = "m"; print " after: $_[0]\n"; } # perl4: # before: x after: m # before: a after: m # before: m after: m # before: m after: m # Perl5: # before: x after: m # Modification of a read-only value attempted at foo.pl line 12. # before: a * (Scalars) The behavior is slightly different for: print "$x", defined $x # perl 4: 1 # perl 5: * (Variable Suicide) Variable suicide behavior is more consistent under Perl 5. Perl5 exhibits the same behavior for hashes and scalars, that perl4 exhibits for only scalars. $aGlobal{ "aKey" } = "global value"; print "MAIN:", $aGlobal{"aKey"}, "\n"; $GlobalLevel = 0; &test( *aGlobal ); sub test { local( *theArgument ) = @_; local( %aNewLocal ); # perl 4 != 5.001l,m $aNewLocal{"aKey"} = "this should never appear"; print "SUB: ", $theArgument{"aKey"}, "\n"; $aNewLocal{"aKey"} = "level $GlobalLevel"; # what should print $GlobalLevel++; if( $GlobalLevel<4 ) { &test( *aNewLocal ); } } # Perl4: # MAIN:global value # SUB: global value # SUB: level 0 # SUB: level 1 # SUB: level 2 # Perl5: # MAIN:global value # SUB: global value # SUB: this should never appear # SUB: this should never appear # SUB: this should never appear Context Traps - scalar, list contexts ------------------------------------- * (list context) The elements of argument lists for formats are now evaluated in list context. This means you can interpolate list values now. @fmt = ("foo","bar","baz"); format STDOUT= @<<<<< @||||| @>>>>> @fmt; . write; # perl4 errors: Please use commas to separate fields in file # perl5 prints: foo bar baz * (scalar context) The `caller()' function now returns a false value in a scalar context if there is no caller. This lets library files determine if they're being required. caller() ? (print "You rang?\n") : (print "Got a 0\n"); # perl4 errors: There is no caller # perl5 prints: Got a 0 * (scalar context) The comma operator in a scalar context is now guaranteed to give a scalar context to its arguments. @y= ('a','b','c'); $x = (1, 2, @y); print "x = $x\n"; # Perl4 prints: x = c # Thinks list context interpolates list # Perl5 prints: x = 3 # Knows scalar uses length of list * (list, builtin) `sprintf()' funkiness (array argument converted to scalar array count) This test could be added to t/op/sprintf.t @z = ('%s%s', 'foo', 'bar'); $x = sprintf(@z); if ($x eq 'foobar') {print "ok 2\n";} else {print "not ok 2 '$x'\n";} # perl4 prints: ok 2 # perl5 prints: not ok 2 printf() works fine, though: printf STDOUT (@z); print "\n"; # perl4 prints: foobar # perl5 prints: foobar Probably a bug. Precedence Traps ---------------- Perl4-to-Perl5 traps involving precedence order. Perl 4 has almost the same precedence rules as Perl 5 for the operators that they both have. Perl 4 however, seems to have had some inconsistencies that made the behavior differ from what was documented. * Precedence LHS vs. RHS of any assignment operator. LHS is evaluated first in perl4, second in perl5; this can affect the relationship between side-effects in sub-expressions. @arr = ( 'left', 'right' ); $a{shift @arr} = shift @arr; print join( ' ', keys %a ); # perl4 prints: left # perl5 prints: right * Precedence These are now semantic errors because of precedence: @list = (1,2,3,4,5); %map = ("a",1,"b",2,"c",3,"d",4); $n = shift @list + 2; # first item in list plus 2 print "n is $n, "; $m = keys %map + 2; # number of items in hash plus 2 print "m is $m\n"; # perl4 prints: n is 3, m is 6 # perl5 errors and fails to compile * Precedence The precedence of assignment operators is now the same as the precedence of assignment. Perl 4 mistakenly gave them the precedence of the associated operator. So you now must parenthesize them in expressions like /foo/ ? ($a += 2) : ($a -= 2); Otherwise /foo/ ? $a += 2 : $a -= 2 would be erroneously parsed as (/foo/ ? $a += 2 : $a) -= 2; On the other hand, $a += /foo/ ? 1 : 2; now works as a C programmer would expect. * Precedence open FOO || die; is now incorrect. You need parentheses around the filehandle. Otherwise, perl5 leaves the statement as its default precedence: open(FOO || die); # perl4 opens or dies # perl5 errors: Precedence problem: open FOO should be open(FOO) * Precedence perl4 gives the special variable, $: precedence, where perl5 treats `$::' as main package $a = "x"; print "$::a"; # perl 4 prints: -:a # perl 5 prints: x * Precedence perl4 had buggy precedence for the file test operators vis-a-vis the assignment operators. Thus, although the precedence table for perl4 leads one to believe `-e $foo .= "q"' should parse as `((-e $foo) .= "q")', it actually parses as `(-e ($foo .= "q"))'. In perl5, the precedence is as documented. -e $foo .= "q" # perl4 prints: no output # perl5 prints: Can't modify -e in concatenation * Precedence In perl4, keys(), each() and values() were special high-precedence operators that operated on a single hash, but in perl5, they are regular named unary operators. As documented, named unary operators have lower precedence than the arithmetic and concatenation operators `+ - .', but the perl4 variants of these operators actually bind tighter than `+ - .'. Thus, for: %foo = 1..10; print keys %foo - 1 # perl4 prints: 4 # perl5 prints: Type of arg 1 to keys must be hash (not subtraction) The perl4 behavior was probably more useful, if less consistent. General Regular Expression Traps using s///, etc. ------------------------------------------------- All types of RE traps. * Regular Expression `s'$lhs'$rhs'' now does no interpolation on either side. It used to interpolate $lhs but not $rhs. (And still does not match a literal '$' in string) $a=1;$b=2; $string = '1 2 $a $b'; $string =~ s'$a'$b'; print $string,"\n"; # perl4 prints: $b 2 $a $b # perl5 prints: 1 2 $a $b * Regular Expression `m//g' now attaches its state to the searched string rather than the regular expression. (Once the scope of a block is left for the sub, the state of the searched string is lost) $_ = "ababab"; while(m/ab/g){ &doit("blah"); } sub doit{local($_) = shift; print "Got $_ "} # perl4 prints: blah blah blah # perl5 prints: infinite loop blah... * Regular Expression Currently, if you use the `m//o' qualifier on a regular expression within an anonymous sub, all closures generated from that anonymous sub will use the regular expression as it was compiled when it was used the very first time in any such closure. For instance, if you say sub build_match { my($left,$right) = @_; return sub { $_[0] =~ /$left stuff $right/o; }; } build_match() will always return a sub which matches the contents of $left and $right as they were the first time that build_match() was called, not as they are in the current call. This is probably a bug, and may change in future versions of Perl. * Regular Expression If no parentheses are used in a match, Perl4 sets $+ to the whole match, just like $&. Perl5 does not. "abcdef" =~ /b.*e/; print "\$+ = $+\n"; # perl4 prints: bcde # perl5 prints: * Regular Expression substitution now returns the null string if it fails $string = "test"; $value = ($string =~ s/foo//); print $value, "\n"; # perl4 prints: 0 # perl5 prints: Also see `Numerical Traps' in this node for another example of this new feature. * Regular Expression `s`lhs`rhs`' (using backticks) is now a normal substitution, with no backtick expansion $string = ""; $string =~ s`^`hostname`; print $string, "\n"; # perl4 prints: # perl5 prints: hostname * Regular Expression Stricter parsing of variables used in regular expressions s/^([^$grpc]*$grpc[$opt$plus$rep]?)//o; # perl4: compiles w/o error # perl5: with Scalar found where operator expected ..., near "$opt$plus" an added component of this example, apparently from the same script, is the actual value of the s'd string after the substitution. `[$opt]' is a character class in perl4 and an array subscript in perl5 $grpc = 'a'; $opt = 'r'; $_ = 'bar'; s/^([^$grpc]*$grpc[$opt]?)/foo/; print ; # perl4 prints: foo # perl5 prints: foobar * Regular Expression Under perl5, `m?x?' matches only once, like `?x?'. Under perl4, it matched repeatedly, like `/x/' or `m!x!'. $test = "once"; sub match { $test =~ m?once?; } &match(); if( &match() ) { # m?x? matches more then once print "perl4\n"; } else { # m?x? matches only once print "perl5\n"; } # perl4 prints: perl4 # perl5 prints: perl5 Subroutine, Signal, Sorting Traps --------------------------------- The general group of Perl4-to-Perl5 traps having to do with Signals, Sorting, and their related subroutines, as well as general subroutine traps. Includes some OS-Specific traps. * (Signals) Barewords that used to look like strings to Perl will now look like subroutine calls if a subroutine by that name is defined before the compiler sees them. sub SeeYa { warn"Hasta la vista, baby!" } $SIG{'TERM'} = SeeYa; print "SIGTERM is now $SIG{'TERM'}\n"; # perl4 prints: SIGTERM is main'SeeYa # perl5 prints: SIGTERM is now main::1 Use -w to catch this one * (Sort Subroutine) reverse is no longer allowed as the name of a sort subroutine. sub reverse{ print "yup "; $a <=> $b } print sort reverse a,b,c; # perl4 prints: yup yup yup yup abc # perl5 prints: abc * warn() won't let you specify a filehandle. Although it _always_ printed to STDERR, warn() would let you specify a filehandle in perl4. With perl5 it does not. warn STDERR "Foo!"; # perl4 prints: Foo! # perl5 prints: String found where operator expected OS Traps -------- * (SysV) Under HPUX, and some other SysV OSes, one had to reset any signal handler, within the signal handler function, each time a signal was handled with perl4. With perl5, the reset is now done correctly. Any code relying on the handler _not_ being reset will have to be reworked. Since version 5.002, Perl uses sigaction() under SysV. sub gotit { print "Got @_... "; } $SIG{'INT'} = 'gotit'; $| = 1; $pid = fork; if ($pid) { kill('INT', $pid); sleep(1); kill('INT', $pid); } else { while (1) {sleep(10);} } # perl4 (HPUX) prints: Got INT... # perl5 (HPUX) prints: Got INT... Got INT... * (SysV) Under SysV OSes, `seek()' on a file opened to append `<< '> >>> now does the right thing w.r.t. the fopen() manpage. e.g., - When a file is opened for append, it is impossible to overwrite information already in the file. open(TEST,">>seek.test"); $start = tell TEST ; foreach(1 .. 9){ print TEST "$_ "; } $end = tell TEST ; seek(TEST,$start,0); print TEST "18 characters here"; # perl4 (solaris) seek.test has: 18 characters here # perl5 (solaris) seek.test has: 1 2 3 4 5 6 7 8 9 18 characters here Interpolation Traps ------------------- Perl4-to-Perl5 traps having to do with how things get interpolated within certain expressions, statements, contexts, or whatever. * Interpolation @ now always interpolates an array in double-quotish strings. print "To: someone@somewhere.com\n"; # perl4 prints: To:someone@somewhere.com # perl5 errors : In string, @somewhere now must be written as \@somewhere * Interpolation Double-quoted strings may no longer end with an unescaped $ or @. $foo = "foo$"; $bar = "bar@"; print "foo is $foo, bar is $bar\n"; # perl4 prints: foo is foo$, bar is bar@ # perl5 errors: Final $ should be \$ or $name Note: perl5 DOES NOT error on the terminating @ in $bar * Interpolation Perl now sometimes evaluates arbitrary expressions inside braces that occur within double quotes (usually when the opening brace is preceded by `$' or `@'). @www = "buz"; $foo = "foo"; $bar = "bar"; sub foo { return "bar" }; print "|@{w.w.w}|${main'foo}|"; # perl4 prints: |@{w.w.w}|foo| # perl5 prints: |buz|bar| Note that you can `use strict;' to ward off such trappiness under perl5. * Interpolation The construct "this is $$x" used to interpolate the pid at that point, but now apparently tries to dereference $x. $$ by itself still works fine, however. print "this is $$x\n"; # perl4 prints: this is XXXx (XXX is the current pid) # perl5 prints: this is * Interpolation Creation of hashes on the fly with `eval "EXPR"' now requires either both `$''s to be protected in the specification of the hash name, or both curlies to be protected. If both curlies are protected, the result will be compatible with perl4 and perl5. This is a very common practice, and should be changed to use the block form of `eval{}' if possible. $hashname = "foobar"; $key = "baz"; $value = 1234; eval "\$$hashname{'$key'} = q|$value|"; (defined($foobar{'baz'})) ? (print "Yup") : (print "Nope"); # perl4 prints: Yup # perl5 prints: Nope Changing eval "\$$hashname{'$key'} = q|$value|"; to eval "\$\$hashname{'$key'} = q|$value|"; causes the following result: # perl4 prints: Nope # perl5 prints: Yup or, changing to eval "\$$hashname\{'$key'\} = q|$value|"; causes the following result: # perl4 prints: Yup # perl5 prints: Yup # and is compatible for both versions * Interpolation perl4 programs which unconsciously rely on the bugs in earlier perl versions. perl -e '$bar=q/not/; print "This is $foo{$bar} perl5"' # perl4 prints: This is not perl5 # perl5 prints: This is perl5 * Interpolation You also have to be careful about array references. print "$foo{" perl 4 prints: { perl 5 prints: syntax error * Interpolation Similarly, watch out for: $foo = "array"; print "\$$foo{bar}\n"; # perl4 prints: $array{bar} # perl5 prints: $ Perl 5 is looking for `$array{bar}' which doesn't exist, but perl 4 is happy just to expand $foo to "array" by itself. Watch out for this especially in eval's. * Interpolation `qq()' string passed to eval eval qq( foreach \$y (keys %\$x\) { \$count++; } ); # perl4 runs this ok # perl5 prints: Can't find string terminator ")" DBM Traps --------- General DBM traps. * DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script, run under perl5, to fail. The build of perl5 must have been linked with the same dbm/ndbm as the default for `dbmopen()' to function properly without tie'ing to an extension dbm implementation. dbmopen (%dbm, "file", undef); print "ok\n"; # perl4 prints: ok # perl5 prints: ok (IFF linked with -ldbm or -lndbm) * DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script, run under perl5, to fail. The error generated when exceeding the limit on the key/value size will cause perl5 to exit immediately. dbmopen(DB, "testdb",0600) || die "couldn't open db! $!"; $DB{'trap'} = "x" x 1024; # value too large for most dbm/ndbm print "YUP\n"; # perl4 prints: dbm store returned -1, errno 28, key "trap" at - line 3. YUP # perl5 prints: dbm store returned -1, errno 28, key "trap" at - line 3. Unclassified Traps ------------------ Everything else. * require/do trap using returned value If the file doit.pl has: sub foo { $rc = do "./do.pl"; return 8; } print &foo, "\n"; And the do.pl file has the following single line: return 3; Running doit.pl gives the following: # perl 4 prints: 3 (aborts the subroutine early) # perl 5 prints: 8 Same behavior if you replace do with require. * split on empty string with LIMIT specified $string = ''; @list = split(/foo/, $string, 2) Perl4 returns a one element list containing the empty string but Perl5 returns an empty list. As always, if any of these are ever officially declared as bugs, they'll be fixed and removed.  File: perl.info, Node: perlunicode, Next: perllocale, Prev: perlform, Up: Top Unicode support in Perl *********************** NAME ==== perlunicode - Unicode support in Perl DESCRIPTION =========== Important Caveat ---------------- WARNING: The implementation of Unicode support in Perl is incomplete. The following areas need further work. Input and Output Disciplines There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of focus in the near future. Regular Expressions The existing regular expression compiler does not produce polymorphic opcodes. This means that the determination on whether to match Unicode characters is made when the pattern is compiled, based on whether the pattern contains Unicode characters, and not when the matching happens at run time. This needs to be changed to adaptively match Unicode if the string to be matched is Unicode. `use utf8' still needed to enable a few features The utf8 pragma implements the tables used for Unicode support. These tables are automatically loaded on demand, so the utf8 pragma need not normally be used. However, as a compatibility measure, this pragma must be explicitly used to enable recognition of UTF-8 encoded literals and identifiers in the source text. Byte and Character semantics ---------------------------- Beginning with version 5.6, Perl uses logically wide characters to represent strings internally. This internal representation of strings uses the UTF-8 encoding. In future, Perl-level operations can be expected to work with characters rather than bytes, in general. However, as strictly an interim compatibility measure, Perl v5.6 aims to provide a safe migration path from byte semantics to character semantics for programs. For operations where Perl can unambiguously decide that the input data is characters, Perl now switches to character semantics. For operations where this determination cannot be made without additional information from the user, Perl decides in favor of compatibility, and chooses to use byte semantics. This behavior preserves compatibility with earlier versions of Perl, which allowed byte semantics in Perl operations, but only as long as none of the program's inputs are marked as being as source of Unicode character data. Such data may come from filehandles, from calls to external programs, from information provided by the system (such as %ENV), or from literals and constants in the source text. If the -C command line switch is used, (or the ${^WIDE_SYSTEM_CALLS} global flag is set to 1), all system calls will use the corresponding wide character APIs. This is currently only implemented on Windows. Regardless of the above, the bytes pragma can always be used to force byte semantics in a particular lexical scope. See *Note Bytes: (pm.info)bytes,. The utf8 pragma is primarily a compatibility device that enables recognition of UTF-8 in literals encountered by the parser. It may also be used for enabling some of the more experimental Unicode support features. Note that this pragma is only required until a future version of Perl in which character semantics will become the default. This pragma may then become a no-op. See *Note Utf8: (pm.info)utf8,. Unless mentioned otherwise, Perl operators will use character semantics when they are dealing with Unicode data, and byte semantics otherwise. Thus, character semantics for these operations apply transparently; if the input data came from a Unicode source (for example, by adding a character encoding discipline to the filehandle whence it came, or a literal UTF-8 string constant in the program), character semantics apply; otherwise, byte semantics are in effect. To force byte semantics on Unicode data, the bytes pragma should be used. Under character semantics, many operations that formerly operated on bytes change to operating on characters. For ASCII data this makes no difference, because UTF-8 stores ASCII in single bytes, but for any character greater than `chr(127)', the character may be stored in a sequence of two or more bytes, all of which have the high bit set. But by and large, the user need not worry about this, because Perl hides it from the user. A character in Perl is logically just a number ranging from 0 to 2**32 or so. Larger characters encode to longer sequences of bytes internally, but again, this is just an internal detail which is hidden at the Perl level. Effects of character semantics ------------------------------ Character semantics have the following effects: * Strings and patterns may contain characters that have an ordinal value larger than 255. Presuming you use a Unicode editor to edit your program, such characters will typically occur directly within the literal strings as UTF-8 characters, but you can also specify a particular character with an extension of the `\x' notation. UTF-8 characters are specified by putting the hexadecimal code within curlies after the `\x'. For instance, a Unicode smiley face is `\x{263A}'. A character in the Latin-1 range (128..255) should be written `\x{ab}' rather than `\xab', since the former will turn into a two-byte UTF-8 code, while the latter will continue to be interpreted as generating a 8-bit byte rather than a character. In fact, if the `use warnings' pragma of the -w switch is turned on, it will produce a warning that you might be generating invalid UTF-8. * Identifiers within the Perl script may contain Unicode alphanumeric characters, including ideographs. (You are currently on your own when it comes to using the canonical forms of characters-Perl doesn't (yet) attempt to canonicalize variable names for you.) * Regular expressions match characters instead of bytes. For instance, "." matches a character instead of a byte. (However, the `\C' pattern is provided to force a match a single byte ("`char'" in C, hence `\C').) * Character classes in regular expressions match characters instead of bytes, and match against the character properties specified in the Unicode properties database. So `\w' can be used to match an ideograph, for instance. * Named Unicode properties and block ranges make be used as character classes via the new `\p{}' (matches property) and `\P{}' (doesn't match property) constructs. For instance, `\p{Lu}' matches any character with the Unicode uppercase property, while `\p{M}' matches any mark character. Single letter properties may omit the brackets, so that can be written `\pM' also. Many predefined character classes are available, such as `\p{IsMirrored}' and `\p{InTibetan}'. * The special pattern `\X' match matches any extended Unicode sequence (a "combining character sequence" in Standardese), where the first character is a base character and subsequent characters are mark characters that apply to the base character. It is equivalent to `(?:\PM\pM*)'. * The tr/// operator translates characters instead of bytes. It can also be forced to translate between 8-bit codes and UTF-8. For instance, if you know your input in Latin-1, you can say: while (<>) { tr/\0-\xff//CU; # latin1 char to utf8 ... } Similarly you could translate your output with tr/\0-\x{ff}//UC; # utf8 to latin1 char No, s/// doesn't take /U or /C (yet?). * Case translation operators use the Unicode case translation tables when provided character input. Note that `uc()' translates to uppercase, while ucfirst translates to titlecase (for languages that make the distinction). Naturally the corresponding backslash sequences have the same semantics. * Most operators that deal with positions or lengths in the string will automatically switch to using character positions, including `chop()', `substr()', `pos()', `index()', `rindex()', `sprintf()', write(), and length(). Operators that specifically don't switch include vec(), pack(), and `unpack()'. Operators that really don't care include `chomp()', as well as any other operator that treats a string as a bucket of bits, such as `sort()', and the operators dealing with filenames. * The pack()/`unpack()' letters "c" and "C" do not change, since they're often used for byte-oriented formats. (Again, think "`char'" in the C language.) However, there is a new "U" specifier that will convert between UTF-8 characters and integers. (It works outside of the utf8 pragma too.) * The `chr()' and `ord()' functions work on characters. This is like `pack("U")' and `unpack("U")', not like `pack("C")' and `unpack("C")'. In fact, the latter are how you now emulate byte-oriented `chr()' and `ord()' under utf8. * And finally, `scalar reverse()' reverses by character rather than by byte. Character encodings for input and output ---------------------------------------- [XXX: This feature is not yet implemented.] CAVEATS ======= As of yet, there is no method for automatically coercing input and output to some encoding other than UTF-8. This is planned in the near future, however. Whether an arbitrary piece of data will be treated as "characters" or "bytes" by internal operations cannot be divined at the current time. Use of locales with utf8 may lead to odd results. Currently there is some attempt to apply 8-bit locale info to characters in the range 0..255, but this is demonstrably incorrect for locales that use characters above that range (when mapped into Unicode). It will also tend to run slower. Avoidance of locales is strongly encouraged. SEE ALSO ======== *Note Bytes: (pm.info)bytes,, *Note Utf8: (pm.info)utf8,, `"${^WIDE_SYSTEM_CALLS}"', *Note Perlvar: perlvar,  File: perl.info, Node: perlvar, Next: perlsub, Prev: perlopentut, Up: Top Perl predefined variables ************************* NAME ==== perlvar - Perl predefined variables DESCRIPTION =========== Predefined Names ---------------- The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics, or analogs in the shells. Nevertheless, if you wish to use long variable names, you need only say use English; at the top of your program. This will alias all the short names to the long names in the current package. Some even have medium names, generally borrowed from *awk*. If you don't mind the performance hit, variables that depend on the currently selected filehandle may instead be set by calling an appropriate object method on the IO::Handle object. (Summary lines below for this contain the word HANDLE.) First you must say use IO::Handle; after which you may use either method HANDLE EXPR or more safely, HANDLE->method(EXPR) Each method returns the old value of the IO::Handle attribute. The methods each take an optional EXPR, which if supplied specifies the new value for the IO::Handle attribute in question. If not supplied, most methods do nothing to the current value-except for autoflush(), which will assume a 1 for you, just to be different. Because loading in the IO::Handle class is an expensive operation, you should learn how to use the regular built-in variables. A few of these variables are considered "read-only". This means that if you try to assign to this variable, either directly or indirectly through a reference, you'll raise a run-time exception. The following list is ordered by scalar variables first, then the arrays, then the hashes. $ARG $_ The default input and pattern-searching space. The following pairs are equivalent: while (<>) {...} # equivalent only in while! while (defined($_ = <>)) {...} /^Subject:/ $_ =~ /^Subject:/ tr/a-z/A-Z/ $_ =~ tr/a-z/A-Z/ chomp chomp($_) Here are the places where Perl will assume $_ even if you don't use it: * Various unary functions, including functions like ord() and int(), as well as the all file tests (-f, -d) except for -t, which defaults to STDIN. * Various list functions like print() and unlink(). * The pattern matching operations m//, s///, and tr/// when used without an `=~' operator. * The default iterator variable in a foreach loop if no other variable is supplied. * The implicit iterator variable in the grep() and map() functions. * The default place to put an input record when a `< > operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen. (Mnemonic: underline is understood in certain operations.) $ Contains the subpattern from the corresponding set of capturing parentheses from the last pattern match, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digits.) These variables are all read-only and dynamically scoped to the current BLOCK. $MATCH $& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK). (Mnemonic: like & in some editors.) This variable is read-only and dynamically scoped to the current BLOCK. The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches. See `BUGS' in this node. $PREMATCH $` The string preceding whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: ``' often precedes a quoted string.) This variable is read-only. The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches. See `BUGS' in this node. $POSTMATCH $' The string following whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK). (Mnemonic: `'' often follows a quoted string.) Example: $_ = 'abcdefghi'; /def/; print "$`:$&:$'\n"; # prints abc:def:ghi This variable is read-only and dynamically scoped to the current BLOCK. The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches. See `BUGS' in this node. $LAST_PAREN_MATCH $+ The last bracket matched by the last search pattern. This is useful if you don't know which one of a set of alternative patterns matched. For example: /Version: (.*)|Revision: (.*)/ && ($rev = $+); (Mnemonic: be positive and forward looking.) This variable is read-only and dynamically scoped to the current BLOCK. @+ This array holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. `$+[0]' is the offset into the string of the end of the entire match. This is the same value as what the pos function returns when called on the variable that was matched against. The nth element of this array holds the offset of the nth submatch, so `$+[1]' is the offset past where $1 ends, `$+[2]' the offset past where $2 ends, and so on. You can use `$#+' to determine how many subgroups were in the last successful match. See the examples given for the `@-' variable. $MULTILINE_MATCHING Set to 1 to do multi-line matching within a string, 0 to tell Perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. (Mnemonic: * matches multiple things.) This variable influences the interpretation of only ^ and `$'. A literal newline can be searched for even when `$* == 0'. Use of $* is deprecated in modern Perl, supplanted by the `/s' and `/m' modifiers on pattern matching. input_line_number HANDLE EXPR $INPUT_LINE_NUMBER $NR $. The current input record number for the last file handle from which you just read() (or called a seek or tell on). The value may be different from the actual physical line number in the file, depending on what notion of "line" is in effect-see $/ on how to change that. An explicit close on a filehandle resets the line number. Because `< <' >> never does an explicit close, line numbers increase across ARGV files (but see examples in `eof', *Note Perlfunc: perlfunc,). Consider this variable read-only: setting it does not reposition the seek pointer; you'll have to do that on your own. Localizing $. has the effect of also localizing Perl's notion of "the last read filehandle". (Mnemonic: many programs use "." to mean the current line number.) input_record_separator HANDLE EXPR $INPUT_RECORD_SEPARATOR $RS $/ The input record separator, newline by default. This influences Perl's idea of what a "line" is. Works like *awk*'s RS variable, including treating empty lines as a terminator if set to the null string. (An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character terminator, or to undef to read through the end of file. Setting it to `"\n\n"' means something slightly different than setting to "", if the file contains consecutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to `"\n\n"' will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / delimits line boundaries when quoting poetry.) undef $/; # enable "slurp" mode $_ = ; # whole file now here s/\n[ \t]+/ /g; Remember: the value of $/ is a string, not a regex. *awk* has to be better for something. :-) Setting $/ to a reference to an integer, scalar containing an integer, or scalar that's convertible to an integer will attempt to read records instead of lines, with the maximum record size being the referenced integer. So this: $/ = \32768; # or \"32768", or \$var_containing_32768 open(FILE, $myfile); $_ = ; will read a record of no more than 32768 bytes from FILE. If you're not reading from a record-oriented file (or your OS doesn't have record-oriented files), then you'll likely get a full chunk of data with every read. If a record is larger than the record size you've set, you'll get the record back in pieces. On VMS, record reads are done with the equivalent of sysread, so it's best not to mix record and non-record reads on the same file. (This is unlikely to be a problem, because any file you'd want to read in record mode is probably unusable in line mode.) Non-VMS systems do normal I/O, so it's safe to mix record and non-record reads of a file. See also `"Newlines"', *Note Perlport: perlport,. Also see $.. autoflush HANDLE EXPR $OUTPUT_AUTOFLUSH $| If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel. Default is 0 (regardless of whether the channel is really buffered by the system or not; $| tells you only whether you've asked Perl explicitly to flush after each write). STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe or socket, such as when you are running a Perl program under rsh and want to see the output as it's happening. This has no effect on input buffering. See `getc', *Note Perlfunc: perlfunc, for that. (Mnemonic: when you want your pipes to be piping hot.) output_field_separator HANDLE EXPR $OUTPUT_FIELD_SEPARATOR $OFS $, The output field separator for the print operator. Ordinarily the print operator simply prints out its arguments without further adornment. To get behavior more like *awk*, set this variable as you would set *awk*'s OFS variable to specify what is printed between fields. (Mnemonic: what is printed when there is a "," in your print statement.) output_record_separator HANDLE EXPR $OUTPUT_RECORD_SEPARATOR $ORS $\ The output record separator for the print operator. Ordinarily the print operator simply prints out its arguments as is, with no trailing newline or other end-of-record string added. To get behavior more like *awk*, set this variable as you would set *awk*'s ORS variable to specify what is printed at the end of the print. (Mnemonic: you set $\ instead of adding "\n" at the end of the print. Also, it's just like $/, but it's what you get "back" from Perl.) $LIST_SEPARATOR $" This is like $, except that it applies to array and slice values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.) $SUBSCRIPT_SEPARATOR $SUBSEP $; The subscript separator for multidimensional array emulation. If you refer to a hash element as $foo{$a,$b,$c} it really means $foo{join($;, $a, $b, $c)} But don't put @foo{$a,$b,$c} # a slice--note the @ which means ($foo{$a},$foo{$b},$foo{$c}) Default is "\034", the same as SUBSEP in *awk*. If your keys contain binary data there might not be any safe value for $;. (Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon. Yeah, I know, it's pretty lame, but $, is already taken for something more important.) Consider using "real" multidimensional arrays as described in *Note Perllol: perllol,. $OFMT $# The output format for printed numbers. This variable is a half-hearted attempt to emulate *awk*'s OFMT variable. There are times, however, when *awk* and Perl have differing notions of what counts as numeric. The initial value is "%.ng", where n is the value of the macro DBL_DIG from your system's `float.h'. This is different from *awk*'s default OFMT setting of "%.6g", so you need to set $# explicitly to get *awk*'s value. (Mnemonic: # is the number sign.) Use of $# is deprecated. format_page_number HANDLE EXPR $FORMAT_PAGE_NUMBER $% The current page number of the currently selected output channel. Used with formats. (Mnemonic: % is page number in *nroff*.) format_lines_per_page HANDLE EXPR $FORMAT_LINES_PER_PAGE $= The current page length (printable lines) of the currently selected output channel. Default is 60. Used with formats. (Mnemonic: = has horizontal lines.) format_lines_left HANDLE EXPR $FORMAT_LINES_LEFT $- The number of lines left on the page of the currently selected output channel. Used with formats. (Mnemonic: lines_on_page - lines_printed.) @- $-[0] is the offset of the start of the last successful match. `$-['n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match. Thus after a match against $_, $& coincides with `substr $_, $-[0], $+[0] - $-[0]'. Similarly, `$'n coincides with `substr $_, $-['n`], $+['n`] - $-['n] if `$-['n] is defined, and $+ coincides with `substr $_, $-[$#-], $+[$#-]'. One can use `$#-' to find the last matched subgroup in the last successful match. Contrast with `$#+', the number of subgroups in the regular expression. Compare with `@+'. This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. `$-[0]' is the offset into the string of the beginning of the entire match. The nth element of this array holds the offset of the nth submatch, so `$+[1]' is the offset where $1 begins, `$+[2]' the offset where $2 begins, and so on. You can use `$#-' to determine how many subgroups were in the last successful match. Compare with the `@+' variable. After a match against some variable $var: $` is the same as `substr($var, 0, $-[0]') $& is the same as `substr($var, $-[0], $+[0] - $-[0]') $' is the same as `substr($var, $+[0]') $1 is the same as `substr($var, $-[1], $+[1] - $-[1])' $2 is the same as `substr($var, $-[2], $+[2] - $-[2])' $3 is the same as `substr $var, $-[3], $+[3] - $-[3]') format_name HANDLE EXPR $FORMAT_NAME $~ The name of the current report format for the currently selected output channel. Default is the name of the filehandle. (Mnemonic: brother to $^.) format_top_name HANDLE EXPR $FORMAT_TOP_NAME $^ The name of the current top-of-page format for the currently selected output channel. Default is the name of the filehandle with _TOP appended. (Mnemonic: points to top of page.) format_line_break_characters HANDLE EXPR $FORMAT_LINE_BREAK_CHARACTERS $: The current set of characters after which a string may be broken to fill continuation fields (starting with ^) in a format. Default is " \n-", to break on whitespace or hyphens. (Mnemonic: a "colon" in poetry is a part of a line.) format_formfeed HANDLE EXPR $FORMAT_FORMFEED $^L What formats output as a form feed. Default is \f. $ACCUMULATOR $^A The current value of the write() accumulator for format() lines. A format contains formline() calls that put their result into $^A. After calling its format, write() prints out the contents of $^A and empties. So you never really see the contents of $^A unless you call formline() yourself and then look at it. See *Note Perlform: perlform, and `formline()', *Note Perlfunc: perlfunc,. $CHILD_ERROR $? The status returned by the last pipe close, backtick (```') command, successful call to wait() or waitpid(), or from the system() operator. This is just the 16-bit status word returned by the wait() system call (or else is made up to look like it). Thus, the exit value of the subprocess is really (`<< $? '> 8 >>>), and `$? & 127' gives which signal, if any, the process died from, and `$? & 128' reports whether there was a core dump. (Mnemonic: similar to *sh* and *ksh*.) Additionally, if the `h_errno' variable is supported in C, its value is returned via $? if any `gethost*()' function fails. If you have installed a signal handler for `SIGCHLD', the value of $? will usually be wrong outside that handler. Inside an END subroutine $? contains the value that is going to be given to exit(). You can modify $? in an END subroutine to change the exit status of your program. For example: END { $? = 1 if $? == 255; # die would make it 255 } Under VMS, the pragma `use vmsish 'status'' makes $? reflect the actual VMS exit status, instead of the default emulation of POSIX status. Also see `Error Indicators' in this node. $OS_ERROR $ERRNO $! If used numerically, yields the current value of the C errno variable, with all the usual caveats. (This means that you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used an a string, yields the corresponding system error string. You can assign a number to $! to set errno if, for instance, you want `"$!"' to return the string for error n, or you want to set the exit value for the die() operator. (Mnemonic: What just went bang?) Also see `Error Indicators' in this node. $EXTENDED_OS_ERROR $^E Error information specific to the current operating system. At the moment, this differs from $! under only VMS, OS/2, and Win32 (and for MacPerl). On all other platforms, $^E is always just the same as $!. Under VMS, $^E provides the VMS status value from the last system error. This is more specific information about the last system error than that provided by $!. This is particularly important when $! is set to *EVMSERR*. Under OS/2, $^E is set to the error code of the last call to OS/2 API either via CRT, or directly from perl. Under Win32, $^E always returns the last error information reported by the Win32 call `GetLastError()' which describes the last error from within the Win32 API. Most Win32-specific code will report errors via $^E. ANSI C and Unix-like calls set errno and so most portable Perl code will report errors via $!. Caveats mentioned in the description of $! generally apply to $^E, also. (Mnemonic: Extra error explanation.) Also see `Error Indicators' in this node. $EVAL_ERROR $@ The Perl syntax error message from the last eval() operator. If null, the last eval() parsed and executed correctly (although the operations you invoked may have failed in the normal fashion). (Mnemonic: Where was the syntax error "at"?) Warning messages are not collected in this variable. You can, however, set up a routine to process warnings by setting `$SIG{__WARN__}' as described below. Also see `Error Indicators' in this node. $PROCESS_ID $PID $$ The process number of the Perl running this script. You should consider this variable read-only, although it will be altered across fork() calls. (Mnemonic: same as shells.) $REAL_USER_ID $UID $< The real uid of this process. (Mnemonic: it's the uid you came from, if you're running setuid.) $EFFECTIVE_USER_ID $EUID $> The effective uid of this process. Example: $< = $>; # set real to effective uid ($<,$>) = ($>,$<); # swap real and effective uid (Mnemonic: it's the uid you went to, if you're running setuid.) `< $< '> and `< $' >> can be swapped only on machines supporting setreuid(). $REAL_GROUP_ID $GID $( The real gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getgid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. However, a value assigned to $( must be a single number used to set the real gid. So the value given by $( should not be assigned back to $( without being forced numeric, such as by adding zero. (Mnemonic: parentheses are used to group things. The real gid is the group you left, if you're running setgid.) $EFFECTIVE_GROUP_ID $EGID $) The effective gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getegid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. Similarly, a value assigned to $) must also be a space-separated list of numbers. The first number sets the effective gid, and the rest (if any) are passed to setgroups(). To get the effect of an empty list for setgroups(), just repeat the new effective gid; that is, to force an effective gid of 5 and an effectively empty setgroups() list, say ` $) = "5 5" '. (Mnemonic: parentheses are used to group things. The effective gid is the group that's right for you, if you're running setgid.) `< $< '>, `< $' >>, $( and $) can be set only on machines that support the corresponding *set[re][ug]id()* routine. $( and $) can be swapped only on machines supporting setregid(). $PROGRAM_NAME $0 Contains the name of the program being executed. On some operating systems assigning to $0 modifies the argument area that the *ps* program sees. This is more useful as a way of indicating the current program state than it is for hiding the program you're running. (Mnemonic: same as *sh* and *ksh*.) $[ The index of the first element in an array, and of the first character in a substring. Default is 0, but you could theoretically set it to 1 to make Perl behave more like *awk* (or Fortran) when subscripting and when evaluating the index() and substr() functions. (Mnemonic: [ begins subscripts.) As of release 5 of Perl, assignment to $[ is treated as a compiler directive, and cannot influence the behavior of any other file. Its use is highly discouraged. $] The version + patchlevel / 1000 of the Perl interpreter. This variable can be used to determine whether the Perl interpreter executing a script is in the right range of versions. (Mnemonic: Is this version of perl in the right bracket?) Example: warn "No checksumming!\n" if $] < 3.019; See also the documentation of `use VERSION' and `require VERSION' for a convenient way to fail if the running Perl interpreter is too old. The use of this variable is deprecated. The floating point representation can sometimes lead to inaccurate numeric comparisons. See $^V for a more modern representation of the Perl version that allows accurate string comparisons. $COMPILING $^C The current value of the flag associated with the -c switch. Mainly of use with *-MO=...* to allow code to alter its behavior when being compiled, such as for example to AUTOLOAD at compile time rather than normal, deferred loading. See `perlcc' in this node. Setting `$^C = 1' is similar to calling `B::minus_c'. $DEBUGGING $^D The current value of the debugging flags. (Mnemonic: value of -D switch.) $SYSTEM_FD_MAX $^F The maximum system file descriptor, ordinarily 2. System file descriptors are passed to exec()ed processes, while higher file descriptors are not. Also, during an open(), system file descriptors are preserved even if the open() fails. (Ordinary file descriptors are closed before the open() is attempted.) The close-on-exec status of a file descriptor will be decided according to the value of $^F when the corresponding file, pipe, or socket was opened, not the time of the exec(). $^H WARNING: This variable is strictly for internal use only. Its availability, behavior, and contents are subject to change without notice. This variable contains compile-time hints for the Perl interpreter. At the end of compilation of a BLOCK the value of this variable is restored to the value when the interpreter started to compile the BLOCK. When perl begins to parse any block construct that provides a lexical scope (e.g., eval body, required file, subroutine body, loop body, or conditional block), the existing value of $^H is saved, but its value is left unchanged. When the compilation of the block is completed, it regains the saved value. Between the points where its value is saved and restored, code that executes within BEGIN blocks is free to change the value of $^H. This behavior provides the semantic of lexical scoping, and is used in, for instance, the `use strict' pragma. The contents should be an integer; different bits of it are used for different pragmatic flags. Here's an example: sub add_100 { $^H |= 0x100 } sub foo { BEGIN { add_100() } bar->baz($boon); } Consider what happens during execution of the BEGIN block. At this point the BEGIN block has already been compiled, but the body of foo() is still being compiled. The new value of $^H will therefore be visible only while the body of foo() is being compiled. Substitution of the above BEGIN block with: BEGIN { require strict; strict->import('vars') } demonstrates how `use strict 'vars'' is implemented. Here's a conditional version of the same lexical pragma: BEGIN { require strict; strict->import('vars') if $condition } %^H WARNING: This variable is strictly for internal use only. Its availability, behavior, and contents are subject to change without notice. The %^H hash provides the same scoping semantic as $^H. This makes it useful for implementation of lexically scoped pragmas. $INPLACE_EDIT $^I The current value of the inplace-edit extension. Use undef to disable inplace editing. (Mnemonic: value of -i switch.) $^M By default, running out of memory is an untrappable, fatal error. However, if suitably built, Perl can use the contents of $^M as an emergency memory pool after die()ing. Suppose that your Perl were compiled with -DPERL_EMERGENCY_SBRK and used Perl's malloc. Then $^M = 'a' x (1 << 16); would allocate a 64K buffer for use when in emergency. See the INSTALL file in the Perl distribution for information on how to enable this option. To discourage casual use of this advanced feature, there is no *Note English: (pm.info)English, long name for this variable. $OSNAME $^O The name of the operating system under which this copy of Perl was built, as determined during the configuration process. The value is identical to `$Config{'osname'}'. See also *Note Config: (pm.info)Config, and the -V command-line switch documented in *Note Perlrun: perlrun,. $PERLDB $^P The internal variable for debugging support. The meanings of the various bits are subject to change, but currently indicate: 1. Debug subroutine enter/exit. 2. Line-by-line debugging. 3. Switch off optimizations. 4. Preserve more data for future interactive inspections. 5. Keep info about source lines on which a subroutine is defined. 6. Start with single-step on. 7. Use subroutine address instead of name when reporting. 8. Report `goto &subroutine' as well. 9. Provide informative "file" names for evals based on the place they were compiled. 10. Provide informative names to anonymous subroutines based on the place they were compiled. Some bits may be relevant at compile-time only, some at run-time only. This is a new mechanism and the details may change. $LAST_REGEXP_CODE_RESULT $^R The result of evaluation of the last successful `(?{ code })' regular expression assertion (see *Note Perlre: perlre,). May be written to. $EXCEPTIONS_BEING_CAUGHT $^S Current state of the interpreter. Undefined if parsing of the current module/eval is not finished (may happen in $SIG{__DIE__} and $SIG{__WARN__} handlers). True if inside an eval(), otherwise false. $BASETIME $^T The time at which the program began running, in seconds since the epoch (beginning of 1970). The values returned by the -M, -A, and -C filetests are based on this value. $PERL_VERSION $^V The revision, version, and subversion of the Perl interpreter, represented as a string composed of characters with those ordinals. Thus in Perl v5.6.0 it equals `chr(5) . chr(6) . chr(0)' and will return true for `$^V eq v5.6.0'. Note that the characters in this string value can potentially be in Unicode range. This can be used to determine whether the Perl interpreter executing a script is in the right range of versions. (Mnemonic: use ^V for Version Control.) Example: warn "No "our" declarations!\n" if $^V and $^V lt v5.6.0; See the documentation of `use VERSION' and `require VERSION' for a convenient way to fail if the running Perl interpreter is too old. See also $] for an older representation of the Perl version. $WARNING $^W The current value of the warning switch, initially true if -w was used, false otherwise, but directly modifiable. (Mnemonic: related to the -w switch.) See also *Note Warnings: (pm.info)warnings,. ${^WARNING_BITS} The current set of warning checks enabled by the `use warnings' pragma. See the documentation of warnings for more details. ${^WIDE_SYSTEM_CALLS} Global flag that enables system calls made by Perl to use wide character APIs native to the system, if available. This is currently only implemented on the Windows platform. This can also be enabled from the command line using the -C switch. The initial value is typically 0 for compatibility with Perl versions earlier than 5.6, but may be automatically set to 1 by Perl if the system provides a user-settable default (e.g., `$ENV{LC_CTYPE}'). The bytes pragma always overrides the effect of this flag in the current lexical scope. See *Note Bytes: (pm.info)bytes,. $EXECUTABLE_NAME $^X The name that the Perl binary itself was executed as, from C's `argv[0]'. This may not be a full pathname, nor even necessarily in your path. $ARGV contains the name of the current file when reading from <>. @ARGV The array @ARGV contains the command-line arguments intended for the script. `$#ARGV' is generally the number of arguments minus one, because `$ARGV[0]' is the first argument, not the program's command name itself. See $0 for the command name. @INC The array @INC contains the list of places that the `do EXPR', require, or use constructs look for their library files. It initially consists of the arguments to any -I command-line switches, followed by the default Perl library, probably `/usr/local/lib/perl', followed by ".", to represent the current directory. If you need to modify this at runtime, you should use the `use lib' pragma to get the machine-dependent library properly loaded also: use lib '/mypath/libdir/'; use SomeMod; @_ Within a subroutine the array @_ contains the parameters passed to that subroutine. See *Note Perlsub: perlsub,. %INC The hash %INC contains entries for each filename included via the do, require, or use operators. The key is the filename you specified (with module names converted to pathnames), and the value is the location of the file found. The require operator uses this hash to determine whether a particular file has already been included. %ENV $ENV{expr} The hash %ENV contains your current environment. Setting a value in ENV changes the environment for any child processes you subsequently fork() off. %SIG $SIG{expr} The hash %SIG contains signal handlers for signals. For example: sub handler { # 1st argument is signal name my($sig) = @_; print "Caught a SIG$sig--shutting down\n"; close(LOG); exit(0); } $SIG{'INT'} = \&handler; $SIG{'QUIT'} = \&handler; ... $SIG{'INT'} = 'DEFAULT'; # restore default action $SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT Using a value of `'IGNORE'' usually has the effect of ignoring the signal, except for the `CHLD' signal. See *Note Perlipc: perlipc, for more about this special case. Here are some other examples: $SIG{"PIPE"} = "Plumber"; # assumes main::Plumber (not recommended) $SIG{"PIPE"} = \&Plumber; # just fine; assume current Plumber $SIG{"PIPE"} = *Plumber; # somewhat esoteric $SIG{"PIPE"} = Plumber(); # oops, what did Plumber() return?? Be sure not to use a bareword as the name of a signal handler, lest you inadvertently call it. If your system has the sigaction() function then signal handlers are installed using it. This means you get reliable signal handling. If your system has the SA_RESTART flag it is used when signals handlers are installed. This means that system calls for which restarting is supported continue rather than returning when a signal arrives. If you want your system calls to be interrupted by signal delivery then do something like this: use POSIX ':signal_h'; my $alarm = 0; sigaction SIGALRM, new POSIX::SigAction sub { $alarm = 1 } or die "Error setting SIGALRM handler: $!\n"; See *Note POSIX: (pm.info)POSIX,. Certain internal hooks can be also set using the %SIG hash. The routine indicated by `$SIG{__WARN__}' is called when a warning message is about to be printed. The warning message is passed as the first argument. The presence of a __WARN__ hook causes the ordinary printing of warnings to STDERR to be suppressed. You can use this to save warnings in a variable, or turn warnings into fatal errors, like this: local $SIG{__WARN__} = sub { die $_[0] }; eval $proggie; The routine indicated by `$SIG{__DIE__}' is called when a fatal exception is about to be thrown. The error message is passed as the first argument. When a __DIE__ hook routine returns, the exception processing continues as it would have in the absence of the hook, unless the hook routine itself exits via a goto, a loop exit, or a die(). The __DIE__ handler is explicitly disabled during the call, so that you can die from a __DIE__ handler. Similarly for `__WARN__'. Due to an implementation glitch, the `$SIG{__DIE__}' hook is called even inside an eval(). Do not use this to rewrite a pending exception in `$@', or as a bizarre substitute for overriding CORE::GLOBAL::die(). This strange action at a distance may be fixed in a future release so that `$SIG{__DIE__}' is only called if your program is about to exit, as was the original intent. Any other use is deprecated. __DIE__/`__WARN__' handlers are very special in one respect: they may be called to report (probable) errors found by the parser. In such a case the parser may be in inconsistent state, so any attempt to evaluate Perl code from such a handler will probably result in a segfault. This means that warnings or errors that result from parsing Perl should be used with extreme caution, like this: require Carp if defined $^S; Carp::confess("Something wrong") if defined &Carp::confess; die "Something wrong, but could not load Carp to give backtrace... To see backtrace try starting Perl with -MCarp switch"; Here the first line will load Carp *unless* it is the parser who called the handler. The second line will print backtrace and die if Carp was available. The third line will be executed only if Carp was not available. See `die', *Note Perlfunc: perlfunc,, `warn', *Note Perlfunc: perlfunc,, `eval', *Note Perlfunc: perlfunc,, and `eval', *Note Warnings: (pm.info)warnings, for additional information. Error Indicators ---------------- The variables `$@', $!, $^E, and $? contain information about different types of error conditions that may appear during execution of a Perl program. The variables are shown ordered by the "distance" between the subsystem which reported the error and the Perl process. They correspond to errors detected by the Perl interpreter, C library, operating system, or an external program, respectively. To illustrate the differences between these variables, consider the following Perl expression, which uses a single-quoted string: eval q{ open PIPE, "/cdrom/install |"; @res = ; close PIPE or die "bad pipe: $?, $!"; }; After execution of this statement all 4 variables may have been set. `$@' is set if the string to be eval-ed did not compile (this may happen if open or close were imported with bad prototypes), or if Perl code executed during evaluation die()d . In these cases the value of $@ is the compile error, or the argument to die (which will interpolate $! and $?!). (See also *Note Fatal: (pm.info)Fatal,, though.) When the eval() expression above is executed, open(), `< >, and close are translated to calls in the C run-time library and thence to the operating system kernel. $! is set to the C library's errno if one of these calls fails. Under a few operating systems, $^E may contain a more verbose error indicator, such as in this case, "CDROM tray not closed." Systems that do not support extended error messages leave $^E the same as $!. Finally, $? may be set to non-0 value if the external program `/cdrom/install' fails. The upper eight bits reflect specific error conditions encountered by the program (the program's exit() value). The lower eight bits reflect mode of failure, like signal death and core dump information See wait(2) for details. In contrast to $! and $^E, which are set only if error condition is detected, the variable $? is set on each wait or pipe close, overwriting the old value. This is more like `$@', which on every eval() is always set on failure and cleared on success. For more details, see the individual descriptions at `$@', $!, $^E, and $?. Technical Note on the Syntax of Variable Names ---------------------------------------------- Variable names in Perl can have several formats. Usually, they must begin with a letter or underscore, in which case they can be arbitrarily long (up to an internal limit of 251 characters) and may contain letters, digits, underscores, or the special sequence `::' or `''. In this case, the part before the last `::' or `'' is taken to be a *package qualifier*; see *Note Perlmod: perlmod,. Perl variable names may also be a sequence of digits or a single punctuation or control character. These names are all reserved for special uses by Perl; for example, the all-digits names are used to hold data captured by backreferences after a regular expression match. Perl has a special syntax for the single-control-character names: It understands `^X' (caret X) to mean the control-X character. For example, the notation $^W (dollar-sign caret W) is the scalar variable whose name is the single character control-W. This is better than typing a literal control-W into your program. Finally, new in Perl 5.6, Perl variable names may be alphanumeric strings that begin with control characters (or better yet, a caret). These variables must be written in the form `${^Foo}'; the braces are not optional. `${^Foo}' denotes the scalar variable whose name is a control-F followed by two o's. These variables are reserved for future special uses by Perl, except for the ones that begin with `^_' (control-underscore or caret-underscore). No control-character name that begins with `^_' will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs. `$^_' itself, however, *is* reserved. Perl identifiers that begin with digits, control characters, or punctuation characters are exempt from the effects of the package declaration and are always forced to be in package main. A few other names are also exempt: ENV STDIN INC STDOUT ARGV STDERR ARGVOUT SIG In particular, the new special `${^_XYZ}' variables are always taken to be in package main, regardless of any package declarations presently in scope. BUGS ==== Due to an unfortunate accident of Perl's implementation, `use English' imposes a considerable performance penalty on all regular expression matches in a program, regardless of whether they occur in the scope of `use English'. For that reason, saying `use English' in libraries is strongly discouraged. See the Devel::SawAmpersand module documentation from CPAN (http://www.perl.com/CPAN/modules/by-module/Devel/) for more information. Having to even think about the $^S variable in your exception handlers is simply wrong. `$SIG{__DIE__}' as currently implemented invites grievous and difficult to track down errors. Avoid it and use an `END{}' or CORE::GLOBAL::die override instead.