Perl has been called the duct tape of the Internet and will likely forever be so. In the words of it's creator, Larry Wall, perl makes easy things easy, and hard things possible. It is a rich language that helps you program all manners of sysadmin tasks quickly, scale/grow them and maintain them well though their lifetime.
The goal of this course is to introduce you to perl. By going through this course, you will be able to create simple programs in perl for your everyday needs as a system administrator. In addition, the examples in this course material can be used as starting point for more complex scripts that you want to build but did not know where to begin. The history section should give you some perspective about perl.
This material is organized in such a way that explanations of most of the concepts occur after they have been introduced earlier. This way, you can try a few things and play around with the snippet and look forward to the gory details to follow. The exercises usually test what has been explained, but introduce a few things that have not been explicitly covered, so that you have an incentive to try them out and see what you get.
Finally, this course is not a substitute for programming, nor does it substitute the documentation that comes with perl! The more you code, the better you can program. What is not obvious is that the more you read, the lesser you need to program.
It is mandatory that you read the perl documentation available on your system. At the very least, you should try to read all the manual pages mentioned in this course. Reasonably competent system administrators can implement a lot of their common tasks by minor modifications to the program snippet from the documentation. The resources section gives you details on where to look for complete and authoritative information.
Perl is currently the most convenient and popular language for writing Web/CGI programs, System administration scripts, Database applications for the web, and even complex Data mining, Archival and retrieval. Yahoo, Amazon, Dejanews, CDROM.COM, Netscape, Mozilla, Apache.org, Synopsys, Paul Ingram group, and Compaq are some of the companies where perl is used extensively. Microsoft has homed in on perl for it's internal scripting technologies research and development. Needless to say, almost every Unix shop uses perl. Why?
Before perl was available on Unix, the usual recourse for programmers wanting to build home brew solutions/systems was a mix of shell, awk sed etc. for quick jobs, and C only if speed and efficiency was needed. This saved a lot of time and development effort. However, programming quick tasks in C is not fun. Shell scripts, on the other hand, don't scale well when data or processing has to be done repeatedly. Perl was conceived by Larry Wall when he found that shell/C lacked what he needed. Not content with writing just another program, he wrote perl so that he can re-use it for one other project. He then released it to the Usenet. The rest as they say, is history.
Perl is designed to mimic the flexibility of C language and it's power to manipulate everything in the machine. At the same time perl was designed to help the programmer to quickly prototype an idea and whip up a working solution much faster than in most other languages.
Perl was originally used for manipulating text data (perl stands for Practical Extraction and Reporting Language). But it excelled in data transformation and file manipulation and all manners of system tasks and quietly filled a huge niche of everyday programming for everyday tasks.
Perl is a distilled essence of Unix. It is a language built to emulate the best features of sh, sed, awk, and C and many others. It adheres to the Unix philosophy of keeping tools simple and building complexity through the way the tools are strung together into a solution. The only difference is that all the powerful tools of Unix are now available as native constructs in perl. This saves time, increases speed, and reduces error by keeping the complexity in one place: the language, instead of the program. This has also made it possible for perl programs to run virtually unchanged on all manners and flavors of O/S!
Perl resembles English more than you expect it to. This is by design. It borrows a lot of concepts from natural languages. For example, it uses visually distinct ways to refer to different types of variables: single values, lists, and relationships. By stark contrast, most computer languages do not let you figure out the type of a variable from it's name.
Also, perl is designed to be learned once, used many times. You typically learn a small subset of perl when you start, and learn more concepts as you go. The key feature of perl, like a natural language, is that you are able to program as you learn, much like a child and an adult are able to communicate reasonably well with their own levels of competence in spoken language.
Perl allows local ambiguity in programs. This means that you can operate on some things implicitly and know that they will be doing the right thing when the program runs. The programs are thus shorter, easier to read and make better sense. Contrast the following paragraphs:
1. Mary is 18 years old. Vijay is 19. Mary and Vijay meet everyday for music lessons. Mary and Vijay see Vinni and Vicki everyday after practice. After meeting Vinni and Vicki everyday, Mary and Vijay go go for a movie with Vinni and Vicki.
2. Mary is 18 years old. Vijay is 19. They meet everyday for music lessons. After the practice they see Vinni and Vicki and they all go for a movie.
The first paragraph is what programs in most other languages look like. The second paragraph is what an equivalent perl program might look like. Perl also borrows extensively from the best of other languages. Here is a simple table (courtesy of postings on the Usenet newsgroup comp.lang.perl.misc ):
++++++++++++++++++++++++++++++++++++++++++++++++++++ Feature Ancestor(s) ++++++++++++++++++++++++++++++++++++++++++++++++++++ range operator(..) awk, sed math operators (+,*,/) FORTRAN match operator (=~) awk scalars as number/string sh, awk, lisp varying length strings BASIC, awk substr awk lists lisp, APL, shell slices Ada, FORTRAN statement modifiers BASIC-PLUS glob('*') csh blocks Algol #comments shell system functions Unix, libc $ for variables shell quotes (', ", and `) shell m//, s// sed sort qsort from libc do, if, while, for C foreach shell OO setup python UNIVERSAL class smalltalk? unless, until BASIC-PLUS require LISP \u,|u,\l,\L vi $0 is changeable sendmail \w,\s emacs formats FORTRAN, COBOL, BASIC \e, $% troff grep, map LISP BEGIN, END awk chr, ord Pascal -e, -f, -d /bin/test from Unix! pack 'u' uuencode and, or, not REXX autoloading in modules lisp /i flag grep $package'variable (obsolete) Ada open syntax shell [] and {} dynamic structs python sub arguments (variadic) shell, lisp tied arrays BASIC-PLUS system calls, networking Unix, C
Contrary to most programming languages which have a minimalist set of constructs (called orthogonal) and in which there is one way to do a particular task, perl was designed with redundancy and multiple constructs that do similar things. This has led to the perl motto There's more than one way to do it, abbreviated to TMTOWTDI or Tim Toady. This also makes experimentation possible and keeps the programming from becoming a boring chore. It also makes perl programming accessible to all levels of programmers. The better you get, the more concise and clear your programs get, and the more you start to use common idioms.
The first version of perl was released in 1987. After successive refinements version 4 of perl was released in 1991, which also coincided with the first release of The Camel book, Programming Perl.
Perl version 4 quickly became very popular. As many people started using perl for more than a few simple tasks, the limitations of the language made it difficult for people to add features. To prevent perl from forking into many versions, a complete rewrite of perl was done and released as version 5. Perl version 5, as opposed to perl version 4 was more extensible, contained large-scale-programming features and added completely new features like lexical variables and closures, a re-hauled regular expression engine, references, and pretty much everything else. Version 5 also supported more operating systems and a clean abstraction (DBI) for database support, a Tk port to perl and also boasted a Win32 port for PCs running Microsoft operating systems (this port has since been integrated into the core perl distribution in source form).
For the most current updates and feature list for perl, you should see the distribution, which is always available at http://www.perl.com/CPAN/src. If you have a complete perl installation, AND if you're using perl 5.005 and above, perldoc perlhist should give you everything you want to know.
We will start our session with a simple example. Before going into the example, we need a digression on how perl runs your program, and also on how you run a perl program.
Perl is an interpreted, byte compiled language. That is, perl will read your source program and compile it into an internal format to run it. This means that your program can be run without being converted into a binary form. Your source program is the executable. The fact that it is interpreted means that you can greatly reduce development and build time. The fact that it is also compiled means that it is much faster than traditional interpreted languages like Basic or TCP.
You run your perl programs just like you run Unix shell scripts. You can call perl with your program as it's argument, or specify the location of your perl binary as the first line of the script and run the script directly. We will use /usr/bin/perl as the location of the perl binary in all our examples, but please replace that with the appropriate location for your site before running programs from this document. Here is the listing of a simple example program:
1 #!/usr/bin/perl -w 2 use strict; 3 my $who; 4 my $day = (localtime)[6]; 5 6 print "What's your name? "; 7 chomp( $who = <STDIN> ); 8 9 if ($day == 5 ) { 10 print "Have a nice weekend, $who!\n"; 11 } 12 else { 13 print "Have a nice day, $who!\n"; 14 }
In line 1, we specify that this program is a perl script (on Unix like systems only). This makes your kernel arrange for perl to run this program when it's invoked by it's name. In line 2, we tell perl to use strict variable and prototype checking. This is not mandatory. However, we will always use it to catch errors and silly mistakes early. In lines 3 and 4 we declare two lexical variables. Line 4 also sets the variable $day to the day of the week (which is available as the seventh element of the return value of the localtime function in perl, which mimics the standard C library function call of the same name). Our code uses the number 6 to pull the seventh element because perl arrays are usually indexed starting at 0. (It is possible to override this setting but we usually don't).
Line 6 prints a prompt string to standard output (which is usually your terminal for interactive scripts like this example). Line 7 does many things. It uses the perl diamond operator, <\> to read one line from STDIN (which is usually the terminal in interactive programs like this). This value is now stored in the variable $who. The function chomp removes the trailing newline in the variable $who. At the end of line 7, we have the user's input (minus the newline character) stored in the variable $who.
Lines 9 through 14 illustrate conditional branching in perl. Conditional branching in perl is similar to the C or shell `if' statement. In line 9, we check if the value of $day is 5. The standard C library returns a number between 0 and 6 for the day of the week, 0 being Sunday. Thus, we check if today is Friday, and print the appropriate wish in lines 10 and 13. At line 14, the program ends, and perl does it's normal cleanup and exits back to the calling program (our shell).
To run the above program we will save it into a file and invoke it. To follow a convention, we name it as wish.plx (plx is the current custom to name a perl `executable'). You should make the program executable by doing a chmod +x wish.plx. Now, assuming you have the program under your current working directory, type:
./wish.plx
This program illustrates quite a lot of perl, so let's go over the major features that we have illustrated, line by line, along with pointers to appropriate sections of the perl man-page that comes with perl:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Line Description ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 O/S will invoke perl to run your program (perlrun) 2 Make perl check for common mistakes (perlrun) 3,4 Variable declarations (perlfunc, perldata) 6 print function (perlfunc, perldata) 7 chomp, <> operator (perlfunc, perlop, perlvar) 9..13 conditional branching (perlsyn) == operator (perlop) string interpolation (perldata, perlop) braces {} (perlsyn, under `Compound Statements')
Almost all programs and scripts manipulate data. This is fairly obvious, but the kind of basic data types available in your programming language largely determines the kind of programs you can write easily and write well. Most programming languages give you a basic set of data types and constructs to build complex data types with them. Also, most languages differ in the amount of management overhead in building complex data structures from basic data types.
Perl provides you with three basic, but powerful data-types. Unlike most other languages, these data types allow you to grow/shrink them dynamically without you ever having to worry about memory allocation/de-allocation. Perl does it all for you. The three fundamental data types in perl are called Scalars, Lists and Hashes.
A scalar is the fundamental data type in perl. A scalar can hold a single value. This value may be a string, number, a file-handle or a reference to another perl data-type. Strings and numbers are a sequence of characters. Filehandles are special values used as place-holders to refer to open files in your program. In short, a scalar value can be equated to the English word 'the'. Scalars are prefixed with a dollar sign($). I spite of the apparently different types of values you can store in a scalar variable, perl stores them in a single format and converts between them as necessary.
Here are some examples:
$a = 'this'; #stores the string 'this' in $a $nirvana = 42; #stores the number 42 in $nirvana $ref = \$a; #stores a reference to the variable $a in $ref
You can build a scalar from other scalars through numeric and string operations. The most common operation used for building scalars is string interpolation. A double-quoted string containing scalar variables inside it will be automatically interpolated with the values of the scalars. Here is an example:
$a = 20; $b = 22; $c = $a + $b; $answer = "The sum of $a and $b " . "is $c\n"; print $answer;
This prints: I<The sum of 20 and 22 is 42>
The `+' operator is the familiar numeric addition. The `.' operator is the string concatenation operator that concatenates it's left and right operands and returns the result.
As you may have noticed, a semicolon terminates a perl statement or declaration. You can also group multiple statements into a block by grouping them within curly braces.
$bar = 'abc' x 4;
? Try it!
How do you generate the full path of a file given it's name and the
directory in which it is found? Given the directory in $dir
and the
filename in $file
, what is the perl statement that will store the
full-name in $full_path
?
A literal list is a collection of scalar values. Lists are stored in perl arrays. Thus, an array is a list each of whose element really contains a scalar value. This is an important point that will help you manipulate list elements without confusion. As with scalars, lists can be built dynamically, and their size can be increased or decreased by adding, deleting or splicing elements at will. List names act like the English word `these'. You prefix an array with the @ character. However, to get the scalar element of an array, you provide the index of the element within square braces and prefix the name of the array with a $. Here are some examples:
my($borderline, @living, @server_ports); $borderline = 'prions'; @living = ('plants', "animals", 'viruses', $borderline); @server_ports = qw(http smtp pop3 telnet ftp);
Note that each element of the list is a scalar. You can add/modify/delete them using different perl functions and operators. Here are a few examples:
#Initializing/Adding to arrays
my(@a) = ('this', 'that', 'and'); #define an array named `a' push @a, 'others'; print "@a\n"; #prints `this that and others'
#Modifying existing elements
$a[0] = 'you'; #a[0] is the first element of a. It is a scalar. $a[1] = 'me';
print "@a\n"; #prints `you me and others'
my $size = @a; #$size gets 4, the number of elements in @a! $size = scalar(@a); #same as above
pop @a; #deletes last element of array (and returns it's value) @a =(); #deletes everything in @a
valid_users
to contain the user-names root,
admin and http.sys
to the above array. What happens if you use
unshift
instead of push
? Try printing out the array in both cases and
find out!undef(@array)
do?
The final perl data structure we will see is a hash. A hash is very much like a list, but it is indexed by strings (a list is indexed by number). A hash is like a database indexed by a single key field. Hashes are initialized by specifying the key and value in pairs. For example:
%colors = ( 'red' => '#FF0000', 'green' => '#00FF00');
Hash keys are strings and hash values are scalars, so you can refer to them as if the values were real scalars by enclosing the key within curly braces. Here is an example of adding another element to the above hash and using a value stored in it:
$colors{'blue'} = '#0000FF'; print( qq(<BODY BGCOLOR="$colors{red}">Red</BODY>) );
Here is how it works. %colors
is the hash. It's name is colors. The key for which
we want to create a value is blue. So the actual value is at colors{'blue'},
However hash values are scalars, so we prefix it with a $. Thus,
$colors{'blue'}
refers to this value. Similarly, $colors{red} refer to
the value stored with the key 'red'.
Try each of the below statements and see if the result matches with the comments (You can ignore anything followed by a '#' because those are comments):
$dozens = int( 97/12 ); # gets 8
$_ = 'A single sentence.'; $l = length($_); #$l is now 17
$is = substr($_, 9, 4); #$is is now 'is'
$_ =~ tr/st/tp/; #$_ is now 'A tingle tenpence.'; $_ =~ s/t/s/; #$_ is now 'A single tenpence.'; print uc($_); #prints "A SINGLE TENPENCE."; $pi = sprintf("%.4f", atan2(1, 1)*4); #$pi gets '3.1416';
@a = (1, 2, 3); $last = pop @a; #$last gets 3 unshift @a, 0; #@a is now (0,1,2)
@sorted = sort('jack', 'jill', 'fred', 'barney'); print "@sorted"; #prints `barney fred jack jill'
%h = ('emacs' => 'RMS', 'perl' => 'Larry', 'bind' => 'Vixie'); @software = keys %h; @authors = values %h;
while ( ($k, $v) = each %h) { print "$k was the brainchild of $v\n"; }
Verify that the above code gives the below output (not necessarily in the same order):
emacs was the brainchild of RMS perl was the brainchild of Larry bind was the brainchild of Vixie
Everything in perl is an expression. An expression is a basic unit of
program in perl that returns a result. For example, the print
statement in
perl is actually an expression that returns a value.
$result = print("Foo\n");
A perl statement is merely an expression evaluated for side effects. For example, we almost always never need to use the return value of an expression that contains a print statement. Thus, we write the print statement as below, and ignore the result:
print "Result of previous stmt = $result\n";
Expressions can not only return results, but can also be assigned to under appropriate conditions. When the return value of an expression is merely used to assign it to something else, it is said to be used as an rvalue. In contrast, when you assign to an expression, it is said to be used in an lvalue context. Quite a few of perl functions/operations can act as lvalues. This is quite contrary to most other languages, so you may need to try a few examples to get familiar with this concept:
1 $_ = "ABC\n"; 2 print substr($_,1,1); #prints 'B' 3 substr($_, 1, 1) = 'C'; 4 print; #prints 'ACC'
In line 1, we assign the value ``ABC\n'' to the perl builtin variable $_. The variable $_ is the default value used in quiet a lot of perl constructs where an argument is not explicitly provided.
In line 2, we use a perl function substr. This function takes 3 arguments:
substr( EXPR, OFFSET, LEN)
The first argument is an expression. The second argument is an offset and the third argument is the length. substr returns a sub string of length LEN in EXPR starting at offset OFFSET. String offsets start at 0, like most other offsets in perl. Thus, the result of line 2 is to print the character ``B'' from ``ABC\n''.
In line 3, we see that substr actually returns the location in $_
which begins at offset 1 and has a length 1. This is namely the part of
``ABC\n'' that starts at ``B'' and ends right there (length is 1). When we
assign 'C' to this expression, perl does something very natural: it
replaces the substring ``B'' in ``ABC\n'' with ``C''. Thus, the original string
is converted to ``ACC\n''! This may need getting used to, because there are
very few equivalents of such flexibility that you will find in other languages.
Expressions can also return different things based on the context in which they are called! The two major types of context are described below.
A scalar context expects/returns a single scalar value. If you use an
expression in a scalar context, the expression or it's return value(s)
are coerced into a scalar. For example:
$count = @lines;
Here, @lines is an expression that returns a list of all elements contained in the array @lines. This expression is forced into a scalar context by the assignment statement. In a scalar context, this gives the number of elements of the array @lines. Thus, $count will really contain the number of elements in the array @lines.
A list context expects/returns a list of scalars. If you use an expression
in a list context, the expression or it's return value(s)
is/are coerced
into a list. For example:
@lines = <STDIN>;
Here, @lines provides a list context to the expression <STDIN>. This in turn makes the expression <STDIN> slurp the entire STDIN (until an eof or CTRL-Z) and return it as a list of lines. Thus, if you were to type 10 lines in the terminal followed by a CTRL-D after this statement, @lines will contain 10 elements, each of which will contain the respective line you entered.
In this example:
@one = 1;
@one provides a list context to the single value 1. The value 1 is coerced into a one element list whose first and only value is 1. Thus, the result here is that @one will now contain one element with value 1.
This works for lists in general, but there is a special case of a literal list that you should be aware of: A literal list appears like a ``C comma operator'' in a scalar context. Here is an example to illustrate this important distinction:
@a = (12, 0, 32, -23); $b = @a; print "b = $b\n";
$c = (12, 0, 32, -23); print "c = $c\n";
this prints:
b = 4 c = -23
$\="\n"; #force a newline on every print print localtime(time); print scalar localtime(time);
Perl predefined variables are builtin variables that automatically take on certain `sensible' values at runtime. As we noted before, statements are expressions that return value(s). In the absence of an explicit assignment, some of the expressions take default arguments. In addition, some of the perl expressions may return their results into certain default variables. In other cases, changing the settings of some internal variables will make other perl functions behave differently.
This may all seem rather confusing, but it helps us in writing uncluttered code. You will be able to understand them better with usage and will someday actually depend on them! The following collection is part of the huge set of builtin variables in perl. We will follow these up with examples:
print_help_and_exit() if ( @ARGV && $ARGV[0] eq '-h' );
if ( -f $ARGV[0] ) { die "File $ARGV[0] exists already! Won't overwrite!\n"; }
Here is a program that prints all arguments passed to it.
#!/usr/bin/perl -w use strict; my $arg;
foreach (@ARGV) { $arg++; print "Argument $arg: $_\n"; }
while (($key, $value) = each %ENV) { print "$key=$value\n"; }
foreach (@INC) { print "$_\n"; }
Here is a one liner command to find out where any of the standard perl modules are located. We take ``CPAN.pm'' as an example:
$file = 'CPAN.pm'; foreach (@INC) { print "Found $file under $_/$file\n" if ( -f "$_/$file"); }
You can override this variable within your program. When you tell perl to use/require some libraries (eg. IO::File), perl will search all the directories in @INC for the Module/library. This is very useful for using perl libraries which you cannot install in your perl installation. You install them under a directory of your own and add to your @INC using the use lib pragmatic module.
For example, if you have installed the latest cool whiz-bang version of Foo::Bar under your $HOME/lib directory, here is what you would do:
use lib '/my/home/dir/lib'; use Foo::Bar; { #...whatever... }
Whenever you use some perl construct which expects/returns a value and get away without an error/warning from the -w switch, it means that perl managed to understand what you wanted, and stored it, or retrieved it from somewhere. $_ is the most common place for cases when the expected/returned value is a scalar.
Example:
while ( <FH> ) { split; }
In the above example, which is a standard perl idiom, the <FH> operator returns a single line of input from the file pointed to by FH. In the absence of an assignment within the while loop conditional, it gets automatically placed in $_. The next statement split usually needs 3 arguments: the character/expression to split on, the value to split, and the number of values into which it should be split. In the absence of any of these values, the default value to split is $_. This makes the code above look much better than the one below, which doesn't assume many defaults:
while ( defined($var = <FH>) ) { @_ = split " ", $var; }
As explained in the example above, the default destination of split is @_. In the context of a subroutine call, @_ contains all the arguments to the subroutine. Note that perl subroutines can have a variable number of arguments on each invocation. @_ will automatically be sized accordingly. Sine @_ is a global variable, the old value of @_ is restored as soon as the subroutine call ends!
When you use the < > operator to read data from a file, perl automatically stores the current line number in the file in a variable named $.. How does perl know where a line ends and the next one begins? Well, that is what the record separator variable, $/, is for! As with most perl predefined variables, this takes on a default value. Here is a way to read in a whole file to a single scalar:
$/ = ''; open(INPUT, '/var/adm/messages') || die "/var/adm/messages: $!\n"; $slurp = <INPUT>; close INPUT;
In the absence of an explicit assignment to $/, perl assumes that ``\n'' is the record separator between lines. If you clear this variable, as above, you can read whole files at a time.
Similarly, every print statement will tack on the value of the builtin variable $\ to every line/record you write. This variable is null by default, but if you want to, you can change this. See the -p and <-l> variables in perlrun for more usage information.
When you invoke a perl program, there are two things that happen: firstly, the parent process that invokes the program (usually, your shell interpreter) forks itself. The calling program now takes on the role as the parent process and the return value of the fork call is the child's PID. The child process will be made to call the perl interpreter on your program. The name of your program is passed to itself as $0 at runtime. Similarly, the PID of your program is available as $$ at runtime.
Type the following example into a test program, say, me.plx:
#!/usr/bin/perl -w print "I am called as $0\n"; print "My PID is $$\n";
If you run it as, say, /your/home/me.plx, you will get something like this:
I am called as /your/home/me.plx My PID is 2506
Perl allows you to interact with your O/S, mostly through system calls for which it provides a perl function equivalent of the same name as the system call. If your system call fails for any reason, perl arranges for actual system error to be available as $! variable. You may use this in two ways: If you use it as if it was a number, it will give you the actual errno. If you use it as if it was a string, it gives you the system error string. If you ever get an O/S error code, you can find out exactly what it means using perl. Here's an example:
/usr/bin/perl -e '$! = 2; print $!'
Here is how this works: The first assignment to $! will set $! as if it was a number. The print statement expects a list of scalars as arguments, and thus $! will be retrieved in a string context, and hence will contain the system error string. The above statement should print No such file or directory
More typically, you use it like this:
open(FOO, '/some/file') or die "/some/file: $!\n";
chdir('/for/bidden') or die "Can't cd /for/bidden: $!\n";
if ( !unlink('/read/only/dir/file')) {
log_it_somewhere(); #your own logging routine, for example
die "Can't delete /read/only/dir/file: $!\n"; }
When you call a program from within perl, usually using the back-ticks `` or qx{} or system function, perl arranges for the status returned by the command to be available as $?. $? also gets set if the last pipe returned a bad status. The actual value in $? is a combination of the exit status of the command, the signal it received (if at all) when it died, and whether the program dumped core while dying. However your program encountered any of these error conditions within an eval statement in perl, the variable that is set is $@.
Example:
`/etc/nowhere/hostname`; $error = $? >> 8; $signal = $? & 127; $core = $? & 128;
print "Exit status of child: $error\n"; print "Caught signal $signal\n" if $signal; print "No core dumps\n" unless $core;
prints:
Exit status of child: 1 No core dumps
When you run any program, you typically run it as yourself, which is really a uid and gid. However, you may run programs that are setuid to some fixed user id, or setgid to some fixed group. In such instances, the program runs under the effective uid/gid of the setuid/setgid program even though you have your own real uid/gid. Here is an example:
perl -e 'print "UID = $<\n", "Effective UID = $>\n";'
$( and $) variables have slightly different semantics because you can belong to multiple groups. Both of these variables return your primary group and a space separated list of all the groups to which you belong. See perlvar for more information.
To do anything useful with your data, you will need to operate on them. Perl provides the standard crop of operators and more. Here is a run down of some of them. For more details, read perlop.
Most of the mathematical operators are available within the standard perl interpreter. The following table summarizes some standard operators:
+ Numeric addition - Subtraction * Multiplication / Division (floating point) % Modulus operator
All of the above operands can also be used in conjunction with the assignment operator to shorten your code. Here are a few examples you can try:
$total_size = $total_size + $size; $total_size += $size; #gives same result as above
$usage_pct = 100.0*($disk_capacity - $disk_free)/$disk_capacity;
$seconds_since_midnight = time() % 86400; #relative to GMT!
$free_space_left = $current_free - $file_size;
To compare two numeric values, you use the numeric comparison operators in perl. This are very similar to those in C. Here are these operators, without much explanation. Try them out.
+++++++++++++++++++++++++++++++++++++++++++++++++++ Operator Return Value +++++++++++++++++++++++++++++++++++++++++++++++++++
== true if left and right side are numerically equal != true unless left side is equal to right side
< true if left side is less than right side > true if left side is greater than right side <=> returns -1 if left is less than right side (numerically) returns +1 if left is greater than right side (numerically) returns 0 if left is equal (useful for numeric sorting)
Examples:
if ( 2 == 2 ) { print "Yes, 2 == 2. What else did you expect?\n"; }
To compare two strings, perl provides a different set of operators. The behavior of these operators is identical to that of their numeric equivalents. The string comparisons are done in fashion very similar to the strcmp C library function.
eq, ne equality tests for strings (similar to == and !=) lt, gt strings (similar to < and > ) cmp similar to <=>, for strings
Here are some examples:
if ( 'Anakin' lt 'Darth_Vader' ) { print "Dark side looks bigger!\n"; }
print "Which file do you want to change? "; chomp($file = <STDIN>); if ( $file eq '/etc/passwd' ) { print "Turned to the dark side, did you?\n"; }
The standard crop of logical operators are available in perl too. Logical operators return true or false. However, the meaning of true and false is different in perl than other languages, because perl considers strings and numbers to be the same data-type: Scalar. Here is a quick overview of truth as it applies to perl scalars:
The empty string ``'' is false. Any string that evaluates to ``0'' is false. Any number that evaluates to 0 is false. Any undefined value is false. All else is true. Sometimes, this is surprising:
1 print "Yes\n" if ( "0.0" == ''); #"0.0" evaluates to 0 2 print "What?\n" if ( "0.0" ); #string "0.0" evaluates to true!
In line 1, we see that the string ``0.0'' is converted to 0 in the numeric context of the == operator. The empty string on the right side is similarly converted into false/0. However, in line 2, the string ``0.0'' evaluates to TRUE according to the rules. Thus, the statement does print out something.
The perl logical operators are &&, || and !. The logical and and or operators are short circuit operators, like C. This means that the second operand is evaluated only when it's necessary. Here are some examples:
$home = $ENV{HOME} || (getpwuid($<))[7] || die "No home directory!\n";
print "Your machine is wide open!\n" if ( $< && -r "/etc/shadow");
When you need to match a string with a pattern or make changes to it using a regular expression match and replace, you use the binding operator, =~. To negate the logical sense of a match, you use the !~ operator. Here are some examples:
$host = 'samba.org.au'; if ( $host =~ /\./ ) { print "$host seems to be fully qualified!\n";
if ($host !~ /\.(com|org|edu|mil|gov|net)$/ ) { $country = $host; $country =~ s#.*\.##; #remove everything except the TLD marker print "It's country of origin is: $country\n"; } }
There are two operations that you do on numbers that have analogues in a string. You may want to concatenate strings together, like adding numbers. Or you may want to concatenate the same string multiple times. There are string operators just for such needs. Here are the operators, by example:
$config_file = 'resolv.conf'; $file = '/etc/' . $config_file;
$recurse = 5; $GNU = 'GNU' . (' Not Unix' x $recurse); print "GNU expands recursively to: $GNU!\n";
Where do you use the 'x' operator? Well, here is a simple way to generate an attention grabbing notice:
use Sys::Hostname; $stars = '*' x 79; $host = hostname(); $wall = "ATTN: Machine $host going down. Please logoff NOW!"; print "$stars\n$wall\n$stars\n\n";
In addition to && and || for logical operations, perl provides and and Some new logical operators
In addition to && and || for logical operations, perl provides and and or. These behave identically to the && and || except that they have very low precedence. Precedence determines the order of evaluation within a single statement. Here is an example where not knowing the precedence might bite you (in fact, the perl and/or operators were designed just so that people don't make this mistake). Perl allows you to call functions without using parentheses around the arguments. If you need to open a file, here is how you'd do it with parentheses around the arguments, without checking the return values:
open(FOO, '/etc/passwd');
This can also be written conveniently as:
open FOO, '/etc/passwd';
These two function calls work exactly the same way. Now, if you need to add some error checking of the return value of the open call, you would do something like this:
open(FOO, 'bar') || die "bar: $!\n";
Unfortunately, the equivalent
open FOO, 'bar' || die "bar: $!\n";
never works as intended. Why? Well, this function call is exactly the same as:
open(FOO, 'bar' || die "bar: $!\n");
This is definitely not what we want. Remember that 'bar' always returns true. Thus, the die statement can never be executed! The unintended result is that if the open really bombs out, you would never catch it! This is a situation where the or comes to the rescue:
open FOO, 'bar' or die "bar: $!\n";
This is clearer to the eye, and also works right.
Perl provides lots of high level operations for file manipulation that would take quite a lot of work to do in other languages. Perl provides an abstraction called Filehandle to refer to open files in your program. This is very much like the file pointer in C. Perl provides a single function called open that allows you to access almost any data source with an amazingly simple and familiar syntax.
Following the Unix convention, perl provides three default Filehandles that are direct analogues to C: STDIN, STDOUT and STDERR. In the absence of an explicit Filehandle, the magical diamond operator `<>' automatically reads from STDIN. In the absence of an explicit Filehandle your print statements automatically print to STDOUT (You override this by using the select function call in perl). Some perl functions (namely warn and die) will print automatically to STDERR with no need for a Filehandle argument (pun intended). You can close the standard file handles if needed (say, a daemon process) or redirect them within perl. Here are some examples where these Filehandles figure, even though you don't see them:
$next_line = <>;
print "This prints to your standard output!\n";
warn("No more disk space!\n") unless ($free_space > $file_size);
unlink("/") or die "Can't do that!\n";
die("Please run manually!\n") unless ( -t STDIN );
The first example implicitly uses STDIN (if your program did not have any arguments). The next example shows the standard usage of the print statement. warn and die will automatically write to STDERR. In the last example, we use a function -t that operates on a file handle and returns true if it's a terminal.
Perl's open function wears many hats. Depending on the arguments you supply to it, you can open just about any file, in any mode, without having to specify all the excruciating details and without looking at the manuals for the right usage every time. Under traditional usage, open accepts two arguments, which is the Filehandle and the name of the file. Under traditional usage, open accepts two arguments, which is the Filehandle and the name of the file. But the name of the file can include information about what mode you want it opened with, as well standard shell piping and redirection characters. This gives you enough flexibility to pretty much operate on anything under the O/S. If you forget to close a file after using it, perl closes it automatically when it exits. If you open the Filehandle again (for the same file or a different file altogether), the previously opened file is closed automatically. That's not all! If you use the open call with a single argument, the file with the same name as the first argument is opened by default. Here are a few examples:
open(PASSWD, '/etc/passwd'); open(LOG, "> $logfile"); open(RCMD, "rsh $host uname -a 2>&1 |"); open(MAIL, "|/usr/lib/sendmail -oi -t");
The first example opens /etc/passwd for reading only. The second example opens the file name contained in $logfile for writing. In the third example, something even more interesting happens: The command uname -a is executed on a remote host whose name is contained in the variable $host, and it's standard error AND standard output are available for you through the Filehandle RCMD! This obviates the need for intermediate files. Similarly, the last example opens a pipe to a sendmail process on the machine. By writing to the Filehandle MAIL in this example, you will actually be sending data to the sendmail process. When you close this Filehandle, you would have actually sent an email from perl!
Here is a program fragment that will print the whole file in which it is contained:
open 0; print <0>;
This surprising fragment works as follows: the open is called with one argument, 0. The second argument is automatically set to $0 by perl. $0 is, as we saw earlier, the name of the program itself. Thus, you are opening the program file itself with this open statement! The Filehandle to this file is 0.
In the print statement, the output Filehandle is STDOUT (or the currently selected output Filehandle). If you remember the way the diamond operator works, it gets you the next line in a scalar context, and the entire file in a list context. the function print takes a list as argument and this presents a list context to <0>, which reads your entire program. See later for some more predefined Filehandles and how to use them.
Filehandles can be stored in scalars also, using many of the standard perl modules available with the perl distribution. Here is a simple fragment that uses the perl module IO::File (see Module Basics for more explanation of modules, classes and objects in perl.
0 #!/usr/bin/perl -w 1 use IO::File;
2 my $fh = new IO::File;
3 $fh->open('/etc/resolv.conf'); 4 print STDOUT <$fh>;
5 $fh->close;
In line 1, we express our intent to use the IO::File module in our program. In line 2, we initialize a variable $fh with an object constructed from the new method in IO::File. If this statement succeeds, we now have a generic IO::File object with which we can manipulate files. The advantage of a variable Filehandle object is that you can dictate it's scope of usage and safely manipulate it without causing side-effects on the rest of the program. With the standard FILEHANDLE notation, you would usually create a Filehandle that has a global scope within your program.
Proceeding further, in line 3, we use the IO::File open method call to open a specific file. The arrow notation ( -> ) is used to access methods of an object or class. From now on, we can use $fh within the diamond operator to read from the file which was opened in line 3. Finally, after having printed the entire file to STDOUT (remember list context?), we close the Filehandle.
There are certain file handles that perl will make available for you without an explicit open. If you run a perl program with some arguments, perl removes all arguments it can understand, and makes the rest of them available to your program as @ARGV. Now, if your program doesn't use these arguments in any way, and you use the diamond operator (<>) for reading in data, perl will consider each of those arguments as files to be opened, open them in order, and supply their contents when you use the <> operator! Here is a simple example that emulates the Unix cat command in some ways:
#!/usr/bin/perl -w while ( <> ) { print; }
When you call this program without any arguments, perl will use STDIN as the input file when you read data using <>. If you call this program with some filenames as arguments, perl will cycle through each of them and print their contents to STDOUT (remember that STDOUT is the default Filehandle for the print statement)! How does perl know when a file ends? You can use the eof operator to find out. How do you find out the currently opened Filehandle? Perl provides it in the Filehandle named ARGV. What is the name of the currently opened Filehandle? $ARGV. Here is how you test this:
#!/usr/bin/perl -w while ( <> ) { next unless eof; print "File is $ARGV\n"; }
The next statement skips processing the current line unless it is the last line of the file (which makes the eof function return true!).
There are occasions when your program needs some small amount of input that you'd rather have in a file, but you don't want the script to hard code the name of the file or you don't want to carry the file around with the program. The Filehandle DATA is what you need in such cases. Perl will read your program until it reaches the end of your program or the end of the file. If perl reads a line which says __END__ (without any other characters) it stops reading the program right there. Anything that follows is available to your program with the DATA Filehandle. Here is an example:
#!/usr/bin/perl -w print <DATA>; __END__ This line three erros. This line ends input.
The open and close on the above Filehandles happens automatically, so you don't need to do that explicitly.
We know what kinds of variables are out there in perl. But there are rules for making legal variable names as well as rules governing how they are interpolated within strings.
Variables can contain the following characters: [a-zA-Z0-9_]. That is, you can use alphabets, digits and underscores within variable names. The first character should not be a digit. As mentioned before, you can embed variable names within strings to avoid much hassle in building complex strings. You define a plain string using single/double quotes, as we have seen in the examples. Here are the actual rules for building strings in perl:
$ss = 'He said, \'She said, "Shut Up!" \'... ';
$tobe = "To be"; $q = "$tobe or not $tobe is the question!\n";
qx{echo '$foo'}; #prints the value of $foo variable
In addition to the standard quoting characters, perl provides additional syntax to allow you to simplify creation of strings with embedded quotes. These are the q{}, qq{}, qx{} and qr{} operators. These operators are flexible in that you can use ANY character as the quoting character. For example, instead of the curly braces, you can use the # character as quoting character:
$something = q#Single quoted#; $nother = qq#Not '$something'#;
Thus, quoting operators allow you to embed the normal quotes within your strings without needing to escape them with backslashes galore.
Examples:
$crazy = 'Please don\'t use \'\' within this string'; $ok = q{Please don't use '' within this string};
$foo = "<A HREF=\"mailto:$address\">Mail us</A>";
is better written as:
$foo = qq{<A HREF="mailto:$address">Mail us</A>};
$ip_patt = qr{^\d+\.\d+\.\d+\.\d+}; print "Yes!\n" if ( '127.0.0.1' =~ /$ip_patt/ );
Perl provides features for you to interact with the operating system. The
most common constructs used in such cases are the system
function and the
qx
or ` `
operator. However, there are also perl shortcuts for these,
if you need to avoid using the O/S (as when you want to make scripts
portable across different O/S-es). Here are some examples:
chomp( $hostname = qx{ hostname }); print "Host = $hostname\n";
use Sys::Hostname; #need to run h2ph after install print "Host = ", hostname, "\n";
system("rm $file"); system("mv $file1 $file2");
However, this is better written as:
unlink $file; rename($file1, $file2) || die "can't rename: $!\n";
A daemon is different from normal programs: it should not have a controlling terminal, and it should be immune to signals that the launching shell/program is sent. If you close all standard Filehandles, the process will still have a controlling terminal. It will also inherit a working directory which you want to set to /. Here is one way to do it:
use POSIX qw/:setsid/; close(STDIN); close(STDOUT); close(STDIN); chdir('/'); fork && exit; setsid(); #reopen STDIN, STDOUT etc. if needed..
The setsid call is imported from the POSIX module (may not be fully
implemented in some O/S). setsid()
will make the program it's own
process group leader. The program will also have no controlling terminal.
Perl also provides direct analogues to the C standard library calls. This way, you don't need to program in C or invoke unix commands to get at data that you would very easily get through the C standard library:
These functions allow you to get/set time related values. localtime()
returns a 9-element array of a time value as returned by the time()
call in perl, and contains the time attributes in your local timezone.
However, in scalar context, it returns a string much like the unix
date
command. Here are examples:
($second, $minute, $hour, $month_day, $month, $year, $weekday, $day_of_year, $isdst) = localtime( time );
$date = scalar(localtime);
If you don't provide an argument, localtime will use the result of a
time()
call as an argument. Two important points about the list
context version of localtime concern the month value and the year: the
month value goes from 0 through 11! The year value is a two digit
year, but NO, it is NOT a Y2K bug! The two digit year value is the year
offset from the base year of 1900. Thus, to get the full year, you
would do something like:
$full_year = (localtime)[5] + 1900;
And yes, perl IS Y2K compliant, as much as your O/S is, though perl programs may not be, depending on how you wrote them.
These functions allow you to get the password file/NIS entries from within
perl. You could get a value by specifying the key through getpwnam
and
getpwuid
. Or you could cycle through the entire list using getpwent
.
$root_shell = (getpwuid(0))[7]; print "Blech!\n" unless $root_shell =~ /bash/;
These functions allow you to get at the file meta information. These have similar semantics to the unix system calls of the same name.
use File::stat; $s = stat("/etc/passwd"); print "/etc/passwd Last modified at: ", scalar(localtime $s->mtime);
In spite of the above functions being available from within perl, most of us shell out from perl to do things like ``rm'', ``mv'' or ``ln''. In most cases, you don't have to. Here are some examples:
unlink("/tmp/myfile") or die "cannot remove /tmp/myfile: $!\n"; rename("/tmp/oldfile", "/tmp/newfile") or die;
Example:
chown 0, 0, '/etc/passwd', '/etc/shadow'; chmod 0600, '/etc/shadow';
Here is an example: find all text files within current directory:
opendir(DIR, '.'); while (defined($file = readdir(DIR)) ) { next unless -T $file; print "text file: $file\n"; } closedir(DIR);
Regular expressions are powerful tools that match a pattern in a
string value. Regular expressions allow us to extract parts of
information that are most relevant to us within the input data, and also
allow us to transform them into any other form we need. If you are
familiar with the unix grep
command, you have used regular expressions
already. Perl's support for regular expressions is built into the core
language, so it is fast and flexible. Regular expressions regex are
abstractions of general patterns you are looking for, so they can get a
bit terse and hairy to read. Perl's regex syntax is however rich and
supports extensions that allow you to write perfectly readable regex.
The following Metacharacters allow you to match different types and amount of text:
. match ANY character (except a newline) \s, \S whitespace, non-whitespace \w, \W word, non-word character (word = a-zA-Z_0-9) \d, \D digit, non-digit ^, $ beginning/end of line * match zero or more of preceding expression + match one or more of preceding expression ? match zero or once {n,m} match from n to m repetitions of preceding expression () grouping [] character class (eg. a thru z is [a-z]) | alternation $1..99 matched groups
For exact descriptions see perlre. For now, we will explain a few of these Metacharacters with examples in the following sections.
The simplest regex is a plain string. If you use it to match something, it will succeed only if your input data contains the exact same string as the regex. However, within your pattern (regex) you can use Metacharacters to match huge amounts of data in a few characters of the regex. Here is a simple example of some entries in a logfile:
Jun 14 22:06:31 indus.fell.com in.ftpd[492]: connect from 146.223.45.6 Jul 13 12:30:07 indus.fell.com in.telnetd[570]: connect from 10.0.15.21
This is a log of telnet/ftp sessions initiated to the machine indus.fell.com, which happens to be a linux box. This is similar to most syslog entries you will encounter, in that you may want to extract different parts of this data for different purposes. Our aim in the following examples is to construct a regular expression that matches three things: the address of the client machine, the service on this server, and the PID of the process that serviced the request. Right now, we are interested only in telnet/ftp connections. We know that the daemons are in.ftpd and in.telnetd. Here is one way to find the client IP address in the second line.
/connect from 10.0.15.21/
Unfortunately, this will only match connections originating from 10.0.15.21 (actually it will also match 1000115021, but we'll see later how to change that). What if you want to match ANY ip address? This is where Metacharacters come to the rescue. The Metacharacters \d signifies a digit. The next regular expression will match any IP address:
/connect from ([\d\.]+)/
The square brackets allow us to match a class of characters. In our case, this comprises of a digit (\d) and a literal dot (.) character. The plus (+) following this character class asks the expression to match a digit or a literal dot one or more times. Unfortunately, our expression not only matches valid IP addresses but spurious values as well (example: 345.567.890111.11)! In our case, we are sure the logfile will not contain such bogus matches, but in a general case, we will have to specify the pattern to match as exactly as possible. You also see the entire IP address pattern enclosed within brackets. Why?
Regular expressions just match. However, in practice, you might want a global match out of which you need only a subset of characters for further processing. In such cases, back-references allow you to store parts of matches and retrieve them after a match. This is what makes perl regexes really powerful. Perl stores each submatch enclosed within brackets () in internal variables named $1, $2.. etc.
Back-references allow substitution and data reduction. In the above example of matching an IP address, the bracketed sub-pattern contains the IP address <when the whole pattern matches>! Thus, here is one way to make a list of all unique IP addresses that connected to your machine:
my($ip, %connections, $n);
open(MESSAGES, '/var/log/secure') or die("can't open logfile: $!\n"); while ( <MESSAGES> ) { next unless /in\.telnetd.+connect from ([\d\.]+)/ #XXX $connections{ $1 }++; } close(MESSAGES); foreach $ip ( keys %connections) { printf("%-15s connected %5d times\n", $ip, $connections{$ip}); }
In the line marked '#XXX' we do two things: we reject all lines that do not seem to have the string in.telnetd in them (The dot character, ., is a metacharacter that matches ANY character. To make a literal match for the ``.'' in in.telnetd we need to prefix it with a backslash to escape the character). Next, we store the IP address on matched lines. The very next line allows us to keep counts of connections keyed in by IP address. The IP address is stored in the variable $1 at the end of a successful match, which we use as the key. At the end, we print out a formatted cumulative statistics.. here is the result of the program on a sample machine:
146.225.32.42 connected 2 times 10.0.15.2 connected 21 times 10.0.15.254 connected 1 times 10.0.15.3 connected 4 times
The printf statement allows us to format our output in a way very
similar to the printf()
standard library call in C.
The important concept with perl regular expressions is that perl tries ALL possibilities for a match to succeed. This is done through back-tracking and bumping-along which is very similar to what we do when we solve a maze problem: if we hit a wall, we backtrack to the last place where we had a choice of paths. After we backtrack to this point, we abandon our failed path and continue along another. In our example, when we match the subexpression ``in\.telnetd'', perl does something like the following:
The first two characters of the hostname ``indus.fell.com'' match the first two characters of our pattern. However, the next literal character d does NOT match the literal ``.'' in our pattern \.! Now perl doesn't declare a failure at this point! It now tries to bump along to the next character in the target string (which happens to be 'n') and tries the pattern. It fails immediately since the character n does not match our subexpression's first character, i. This happens until it reaches the right place ``in.telnetd''. At this point the first subexpression in\.telnetd matches exactly. Now the regex match proceeds to conclusion because it does succeed for this line.
Perl will not attempt to find all matches in a string. It will stop at the very first match. In addition, even if the pattern will match multiple places, perl will match at the earliest point in the target string. Here is an example:
Writing c-shell scripts is a sure way to go to hell!
If we try to match /hell/ in this example, it would NOT match the last word in the example. It will match right in the middle of ``c-shell'', because that is the earliest place where the match succeeds! This is an important issue that will help you avoid spurious matches. How do we match the word ``hell'' in the above example? The pattern /\bhell/ will do. This is because the \b character matches a word-boundary which means that a \b will NEVER match \w. Thus, the character ``s'' in ``c-shell'' will fail to match \b and so the regex match algorithm will bump along until it finds hell :-)
When you specify a + to match multiple characters, perl will match as many characters as it can in the beginning. If later parts of the pattern cause the match to fail, perl will backtrack into the submatch by one character and retry the failed match from the same point. This is best described by an example string and pattern:
STRING: All that is gold does not grow old PATTERN1: /old/ PATTERN2: /.+old/
Pattern 1 will match the ``old'' within the word ``gold'' in the string. This follows from the explanation in the previous section. Pattern 2 will however match the sub-pattern ``old'' at the very last word! This is because the + character is greedy. Thus, .+ gobbles up the entire string at the beginning. The sub-pattern ``old'' now fails, so perl backtracks the .+ to contain all but the last character. This fails too. Perl backtracks again, and fails. The next backtracking places the start of match before the ``o'' in ``old''. This matches with the sub-pattern ``old'' and perl reports success. In this case, the ``old'' in the regex matches the last word.
As with other things, regex match in perl returns different values depending on the context in which you match. Here are the general rules:
scalar context returns number of matches list context returns all matches within groups
When we introduce brackets in our regex, perl groups the subtext that matched each bracketed sub-expression and stores them in internal variables $1, $2 etc.. However, this only happens in scalar context. In a list context, all the bracketed matches are returned to the list context. Here is an example:
$_ = 'All that is gold does not grow old';
print "SCALAR: $1\n" if /(.+)old/; @foo = /(old)(.+old)/; print "LIST: @foo\n";
prints:
SCALAR: All that is gold does not grow LIST: old does not grow old
Here are some basic examples that use some simple patterns to match various things you would commonly extract from input data:
if ( 'One word' =~ /\w+/ ) { print "Matched $&\n"; } #Matched One
$_ = 'One value: +23.45'; if ( /[-+]?\d+/ ) { print "Matched $&\n"; } #"Matched +23"
if ( 12345 =~ /^\d{3,5}$/ ) { print "Number within range\n"; }
$_ = 'brave fools embark on travel through bare desert'; print $& if /foo.*bar/;
#prints "fools embark on travel through bar"
$_ = 'brave fools embark on travel through bare desert'; print $& if /foo.*?bar/;
#prints "fools embar"
/NFS server\s+(\S+)\s+not responding/ hostname can be retrieved as $1 (if match succeeds)
With greedy quantifiers in previous subexpressions, a later '*' will match zero times and still report success:
$_ = 'Has a long number 12437';
if ( /(.*)(\d*)/ ) { print "String: $1, number: $2\n"; } #gives "String: Has a long number 12437, number: "
Greediness, backtracking and 'first successful match' combine to produce non-intuitive results, if you're not careful.
$_ = 'Has a long number 12437'; if ( /(.*)(\d+)$/ ) { print "String: $1, number: $2\n"; } #gives "String: Has a long number 1243, number: 7" !!
The above expression is better written as
if ( /(.*?)(\d+)$/ ) { print "String: $1, number: $2\n"; }
$_ = 'your food is in the bar under the barn'; if ( /foo(.*)bar/ ) { print "matched: $1\n";} #gives "matched: d is in the bar under the"
Here is the complete specification for a perl regex match operation:
m/expr/gsimox;
You can choose to leave out the m (which stands for match, by the way) and just use /pattern/ which is what you normally do. However, perl allows you to use ANY character as the pattern delimiter, and allows you to write the regex in a more readable manner. Here are some regexes, all of which match the same pattern: finding the directory name of a file.
1. /(\/[^\s]+)\/[^\/\s]+/;
2. m,(/[^\s]+)/[^/\s]+,;
3. m{ (/[^\s]+) #a slash followed by any non space character / #start of filename part [^/\s]+ #a filename (assume no spaces in the filename) }x;
As we see from regex 1, match patterns can be very hairy. The reason why we had all those leaning toothpicks(\/) was due to the fact that the pattern was delimited by a /. In such cases, if you want to match a literal forward slash, you need to quote/escape it with the \ character. Regex 2 is clearer because it now uses comma characters to delimit the pattern. This, you don't have to quote the /. Even after this substantial improvement in readability, the pattern looks difficult. Regex 3 is probably the easiest for humans to parse. We don't offer any explanation, as it is self-evident. See below for more details on the /x modifier. With such powerful constructs perl allows you to match almost any type of pattern (nested patterns are one exception).
However, a match is not the only reason to use a regex. Once you perform a match, you can actually substitute whatever you matched, with anything else you may want to change it to. Here is the spec for the regex Substitution operator:
s{expr}{replacement}egsimox;
The modifiers e,g,i,m,o,s,x} specify different ways in which the match can be directed. The one additional modifier you see is the /e modifier. Here are examples that illustrate some of them:
$_ = "The path to my magic scripting language is /usr/bin/awk\n";
s{/(awk|sed|sh|csh|bash|ed)\b}{/perl};
print;
This prints ``The path to my magic scripting language is /usr/bin/perl''.
$val = 'something'; $new = 'somthinels'; while ( <> ) { print if s/$val/$new/o; }
Perl version 5 introduced the ability to include arbitrary comments within a regex by specifying the x modifier. This allows you to write crystal clear regexes that you would otherwise have a hard time understanding on second glance. We have seen this in an example above. Here is another, more hairy example:
/^\w+\s+\d+\s+[\d:]+\s+.+?(in\.\w+)\[\d+\]:\s+connect\s+from\s+([\d\.]+)$/
Better written as:
m{ ^\w+\s+\d+ #Date in year \s+ [\d:]+ #Time \s+.+? #ignore junk
(in\.\w+) #get the service daemon that was connected to
\[\d+\] #the PID within []
:\s+connect\s+from\s+
([\d\.]+)$ #the originating client IP.. }x;
The clarity that you get with the /x modifier is well worth the effort of increasing your LOC.
Example:
if ( 'Pre match Post' =~ /\s+match\s+/ ) { print "Pre match: $`\n"; print "Match : $&\n"; print "Post match: $'\n"; }
The /e modifier allows you to substitute a matched pattern with the results of perl code within the substitution string! This is very powerful. Here is a simple example:
$_ = '2 candies at 35 cents = '; s{ (\d+)\D+(\d+) #get numbers .*$ }{ $& . #append to end ($1 * $2) . ' cents' }ex; print; #prints "2 candies at 35 cents = 70 cents";
Here is another example: if you want to change the IP address of a host, and you have a table of the new IP addresses for each old IP, here is a simple way to change it:
%new_ip = ( '10.0.0.1' => '10.1.1.1', '192.168.100.2' => '172.16.45.2');
@old = ('10.0.0.1', '10.3.14.3', '192.168.100.2', '192.168.100.3'); @new = @old;
foreach ( @new ) { s/([\d\.]+)/$new_ip{$1} ? $new_ip{$1} . ' <--- ' : $1 /e; } print join("\n", @new), "\n";
This code snippet prints:
10.1.1.1 <--- 10.3.14.3 172.16.45.2 <--- 192.168.100.3
We have crafted the regex to add the ``<--'' for clarity. This makes you clearly see where the changes have taken place in our example.
This brief introduction to regular expressions should help you craft simple regular expressions. For more details, consult perlre or the regular expressions book listed in Further reading.
Perl allows you to write free form code, just like any other language. However, if you write large programs, or programs that behave in a variety of ways, you would like to bunch similar tasks together, and also re-use same code fragments over and over again. Perl subroutines are designed for this type of abstraction.
Subroutines are perl's way of dividing a problem into manageable chunks. They are exact analogues to functions in C. You declare a subroutine in perl as follows:
sub my_sub_name { my(@arguments) = @_; #statements; }
Subroutines in perl are different from equivalent concepts in other languages in two important aspects: subroutines in perl have variable number of arguments, and the arguments are NOT named. Subroutines in perl can return anything they want (scalar, list or nothing).
IF a subroutine does not explicitly return a value, and the calling statement/expression uses the subroutine in a context requiring a return value, the subroutine's LAST evaluated expression becomes the return value. Here is an example:
sub sum_two_numbers { $_[0] + $_[1]; }
All parameters passed to the subroutine are passed automatically through the @_ variable. However, these parameters are not copied into the subroutine's stack. Instead, any modifications to these values directly affect the original values in the calling expression's name-space.
To prevent this, and to get local copies of the parameters, declare them using 'my':
sub sum_two { my($arg1, $arg2) = @_; return $arg1 + $arg2; }
Dynamic scoping (local) happens by default unless you declare variables as lexical. Dynamic scoped variables are global variables, accessible to the entire program/package. Subroutines may overwrite them, causing values to be changed in unpredictable ways. Typically, global values have non-intuitive consequences if you use them all over the program. Data is not protected when you use dynamically scoped variables.
Lexical scoping (my): increases data privacy. When you declare a variable using the my scoping operator, it creates a new variable and grants it a scope of the closets enclosing block. No other block can access these values unless they are passed as arguments. There are a few variables where local is unavoidable. In fact, the entire Module export mechanism in perl is built on clever use of local variables.
Using my is almost always better than local.
sub get_net { my($ip) = shift; my($a, $b, $c, $d) = split /\./, $ip; return "$a.0.0.0" if $a < 128; return "$a.$b.0.0" if $a < 192; return "$a.$b.$c.0"; }
use IO::File; my $fh = new IO::File; $fh->open('/var/adm/messages') || die "/var/adm/messages: $!\n"; while ( <$fh> ) { next unless /SYSERR/; print "Mailer error: $!\n"; } $fh->close;
$MAXSIZE = 1000000; $MAXAGE = 5; sub wanted { return unless ( -s $_ > $MAXSIZE || -M $_ > $MAXAGE ); print "Purging file: $File::Find::name\n"; unlink $_; } find( \&wanted, "/var/tmp", "/usr/local/tmp");
use File::stat; $s = stat('/my/file'); print "File size = ", $s->sz, "\n"; print "Inode = ", $s->ino, "\n";
use Net::Ping; $p = new Net::Ping; #some perl versions need root access
if ( ! $p->ping("vtc.teamtaos.com", 2)) { print "Taos vtc is down!\n"; }
use Net::DNS::Resolver; $res = new Net::DNS::Resolver; $query = $res->search('vtc.teamtaos.com'); if ($query) { foreach $rr ($query->answer) { next unless $rr->type eq "A"; print $rr->address, "\n"; } }
prints: "207.33.46.3 [as of Sun Jul 18 23:19:52 PDT 1999]
use Net::FTP;
$ftp = Net::FTP->new("ftp.cdrom.com"); $ftp->login("anonymous","me\@taos.com"); $ftp->cwd("/pub/perl/CPAN"); $ftp->get("README.html"); $ftp->quit;
use LWP::Simple; $content = get('http://www.linux.org') || '';
use MIME::Base64 qw/decode_base64/; $doc = '...'; #get_the_mime_encoded_part $realdoc = decode_base64($doc); print SOME_MSDOC_FH $realdoc;
use Text::Wrap qw(fill $tabstop $columns); $tabstop = 4; $columns = 72; print fill("\t", "", `cat /tmp/dead.letter`);
use Mail::Mailer qw(sendmail);
$mailer = new Mail::Mailer; my %headers = ( 'To' => 'me@taos.com', 'From' => 'me@taos.com'); $headers{'Subject'} = "testing";
$mailer->open(\%headers); print $mailer "This is a test\n\n"; $mailer->close;
To illustrate the fact that there's more than one way to do it in perl, we will take a very simple example: given some IP addresses, sort them by network and host number. The approaches described here are not the only ones.. they were chose for their gradation in complexity of algorithm design and how easy it is to grow your algorithms as you go.
Let us take the following list as a test example to be sorted:
@ip = ('223.1.3.4', '127.0.0.1', '192.168.100.1', '223.1.3.1');
The sorted output should look like:
127.0.0.1 192.168.100.1 223.1.3.1 223.1.3.4
The perl sort function accepts an optional subroutine reference or subroutine name as argument, which it uses every time it needs to compare any two elements of the input array/list. The subroutine may be anything you like, except that it should assume the following: the comparison keys are available to your subroutine as the global variables $a and $b!
This method uses the standard split command to extract the individual numbers comprising the IP address. It then compares the respective bytes numerically. The short-circuit nature of the or operator ensures that the sort terminates at the very first byte that is different.
sub numeric { my($a1, $a2, $a3, $a4) = split /\./, $a; my($b1, $b2, $b3, $b4) = split /\./, $b; $a1 <=> $b1 or $a2 <=> $b2 or $a3 <=> $b3 or $a4 <=> $b4; } @result = sort numeric @ip; print "Sorted: @result\n";
The pack function in perl will allow you to compact values into a tight structure which you can unpack later for use. This allows you to conserve space AND also gain a measure of efficiency in passing data around.
sub packed { pack('C4', split(/\./, $a)) cmp pack('C4', split(/\./, $b)); } @result = sort packed @ip; print "Sorted: @result\n";
This is the same idea as above, but builds a cache of already seen IP addresses. This optimization will save you computation time when you have large sets of elements to sort.
{ my %cache; sub cached { ($cache{$a} ||= pack('C4', split /\./, $a)) cmp ($cache{$b} ||= pack('C4', split /\./, $b)); } } @result = sort cached @ip; print "Sorted: @result\n";
As mentioned before, this document is merely a primer. If you need to get in deeper, the following resources will help you greatly.
perl manual pages perlfaq (perldoc perlfaq)
`picking up perl' (http://www.ebb.org/PickingUpPerl/)
man perlstyle (for style issues)
Learning Perl (Randal Schwartz, Tom Christiansen) Programming Perl (Larry Wall, Tom Christiansen, Randal Schwartz) Perl in a Nutshell Perl Cookbook (Tom Christiansen and Nathan Torkington) Perl, the programmer's companion (Nigel Chapman)
comp.lang.perl.misc, comp.lang.perl.moderated
Perl home page: http://www.perl.com/ CPAN multiplexer: http://www.perl.com/CPAN The Perl Journal: http://tpj.com/ Apache perl : http://perl.apache.org/ The perl oasis : http://www.oasis.leo.org/perl/00-index.html Randal's columns: http://www.stonehenge.com/merlyn/UnixReview
Official guide to CGI programming Scripting Languages ( http://www.scriptics.com/people/john.ousterhout/scripting.html) Larry Wall's interview (http://webreview.com/wr/pub/97/02/28/feature/index.html) FMTEYEWTK (http://language.perl.com/info/documentation.html