Website Technologies: Perl

In computer programming, Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall, a linguist working as a systems administrator for NASA, in 1987, as a general purpose Unix scripting language to make report processing easier.[1][2] Since then, it has undergone many changes and revisions and became widely popular among programmers. Larry Wall continues to oversee development of the core language, and its newest version, Perl 6.

Perl borrows features from other programming languages including C, shell scripting (sh), AWK, sed and Lisp.[3] The language provides powerful text processing facilities without arbitrary data length limits, like the many Unix tools present at the time, [4] making it the ideal language for manipulating text files. It is also used for graphics programming, system administration, network programming, applications that require database access and CGI programming on the Web. Perl is nicknamed as the Swiss Army chainsaw of the programming languages because of its flexibility and adaptability.[5]

Overview

Perl is a general-purpose programming language originally developed for text manipulation and now used for a wide range of tasks including system administration, web development, network programming, GUI development, and more.

The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).[16] Its major features include support for multiple programming paradigms (procedural, object-oriented, and functional styles), reference counting memory management (without a cycle-detecting garbage collector), built-in support for text processing, and a large collection of third-party modules.

According to Larry Wall, Perl has two slogans. The first is "There's more than one way to do it", commonly known as TMTOWTDI and the second is "Easy things should be easy and hard things should be possible".[17]

[edit]
Features

The overall structure of Perl derives broadly from C. Perl is procedural in nature, with variables, expressions, assignment statements, brace-delimited code blocks, control structures, and subroutines.

Perl also takes features from shell programming. All variables are marked with leading sigils, which unambiguously identify the data type (scalar, array, hash, etc.) of the variable in context. Importantly, sigils allow variables to be interpolated directly into strings. Perl has many built-in functions which provide tools often used in shell programming (though many of these tools are implemented by programs external to the shell) like sorting, and calling on system facilities.

Perl takes lists from Lisp, associative arrays (hashes) from AWK, and regular expressions from sed. These simplify and facilitate many parsing, text handling, and data management tasks.

In Perl 5, features were added that support complex data structures, first-class functions (i.e., closures as values), and an object-oriented programming model. These include references, packages, class-based method dispatch, and lexically scoped variables, along with compiler directives (for example, the strict pragma). A major additional feature introduced with Perl 5 was the ability to package code as reusable modules. Larry Wall later stated that "The whole intent of Perl 5's module system was to encourage the growth of Perl culture rather than the Perl core."[18]

All versions of Perl do automatic data typing and memory management. The interpreter knows the type and storage requirements of every data object in the program; it allocates and frees storage for them as necessary using reference counting (so it cannot deallocate circular data structures without manual intervention). Legal type conversions—for example, conversions from number to string—are done automatically at run time; illegal type conversions are fatal errors.

Language structure

In Perl, the minimal Hello world program may be written as follows:
print "Hello, world!\n"

This prints the string Hello, world! and a newline, symbolically expressed by an n character whose interpretation is altered by the preceding escape character (a backslash).

The canonical form of the program is slightly more verbose:
#!/usr/bin/perl
print "Hello, world!\n";

The hash mark character introduces a comment in Perl, which runs up to the end of the line of code and is ignored by the compiler. The comment used here is of a special kind: it’s called the shebang line. This tells Unix-like operating systems where to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning perl. (Note that on Microsoft Windows systems, Perl programs are typically invoked by associating the .pl extension with the Perl interpreter. In order to deal with such circumstances, perl detects the shebang line and parses it for switches,[24] so it is not strictly true that the shebang line is ignored by the compiler.)

The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to or moved away from the end of a block or file without having to adjust semicolons.

Version 5.10 of Perl introduces a say function that implicitly appends a newline character to its output, making the minimal "Hello world" program even shorter:
say 'Hello, world!'

[edit]
Data types

Perl has a number of fundamental data types, the most commonly used and discussed being: scalars, arrays, hashes, filehandles and subroutines:
A scalar is a single value; it may be a number, a string or a reference
An array is an ordered collection of scalars
A hash, or associative array, is a map from strings to scalars; the strings are called keys and the scalars are called values.
A file handle is a map to a file, device, or pipe which is open for reading, writing, or both.
A subroutine is a piece of code that may be passed arguments, be executed, and return data

Most variables are marked by a leading sigil, which identifies the data type being accessed (not the type of the variable itself), except filehandles, which don't have a sigil. The same name may be used for variables of different data types, without conflict.
$foo # a scalar
@foo # an array
%foo # a hash
FOO # a file handle
&FOO # a constant (but the & is optional)
&foo # a subroutine (but the & is optional)

File handles and constants need not be uppercase, but it is a common convention owing to the fact that there is no sigil to denote them. Both are global in scope, but file handles are interchangeable with references to file handles, which can be stored in scalars, which in turn permit lexical scoping. Doing so is encouraged in Damian Conway's Perl Best Practices. As a convenience, the open function in Perl 5.6 and newer will autovivify undefined scalars to file handle references.

Numbers are written in the bare form; strings are enclosed by quotes of various kinds.
$name = "joe";
$color = 'red';

$number1 = 42;
$number2 = '42';

# This evaluates to true
if ($number1 == $number2) { print "Numbers and strings of numbers are the same!"; }

$answer = "The answer is $number1"; # Variable interpolation: The answer is 42
$price = 'This device costs $42'; # No interpolation in single quotes

$album = "It's David Bowie's \"Heroes\""; # literal quotes inside a string;
$album = 'It\'s David Bowie\'s "Heroes"'; # same as above with single quotes;
$album = q(It's David Bowie's "Heroes"); # the quote-like operators q() and qq() allow
# almost any delimiter instead of quotes, to
# avoid excessive backslashing

$multilined_string =<This is my multilined string
note that I am terminating it with the "EOF" word.
EOF

Perl will convert strings into numbers and vice versa depending on the context in which they are used. In the following example the strings $n and $m are treated as numbers when they are the arguments to the addition operator. This code prints the number '5', discarding non numeric information for the operation, although the variable values remain the same. (The string concatenation operator is the period, not the + symbol.)
$n = '3 apples';
$m = '2 oranges';
print $n + $m;

Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
$false = 0; # the number zero
$false = 0.0; # the number zero as a float
$false = 0b0; # the number zero in binary
$false = 0x0; # the number zero in hexadecimal
$false = '0'; # the string zero
$false = ""; # the empty string
$false = undef; # the return value from undef

All other values are evaluated to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. (Any non-numeric string would also have this property, but this particular string is ignored by Perl with respect to numeric warnings.) A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, as '0E0' is literally "zero times ten to the zeroth power."

Evaluated boolean expressions also return scalar values. Although the documentation does not promise which particular true or false is returned (and thus cannot be relied on), many boolean operators return 1 for true and the empty-string for false (which evaluates to zero in a numeric context). The defined() function tells if the variable has any value set. In the above examples defined($false) is true for every value except undef.

If a specifically 1 or 0 result (as in C) is needed, an explicit conversion is thought by some authors to be required:
my $real_result = $boolean_result ? 1 : 0;

However, if it's known that the value is either 1 or undef, an implicit conversion can be used instead:
my $real_result = $boolean_result + 0;

A list is written by listing its elements, separated by commas, and enclosed by parentheses where required by operator precedence.
@scores = (32, 45, 16, 5);

It can be written many other ways as well, some straightforward and some less so:
# An explicit and straightforward way
@scores = ('32', '45', '16', '5');

# Equivalent to the above, but the qw() quote-like operator saves typing of
# quotes and commas and reduces visual clutter; almost any delimiter can be
# used instead of parentheses
@scores = qw(32 45 16 5);

# The split function returns a list of strings, which are extracted
# from the expression using a regex template.
# This may be useful for reading from a file of comma-separated values (CSV)
@scores = split /,/, '32,45,16,5';

# It's also possible to use a postfix for operator and aliasing of
# the $_ magic variable to the next value of the list during each
# iteration; this is pointless here, but similar idioms are widely used
# in some circumstances.
push @scores, $_ foreach 32, 45, 16, 5;

A hash may be initialized from a list of key/value pairs:
%favorite = (
joe => 'red',
sam => 'blue'
);

The => operator is equivalent to a comma, except that it assumes quotes around the preceding token if it is a bare identifier: (joe => 'red') is the same as ('joe' => 'red'). It can therefore be used to elide quote marks, improving readability.

Individual elements of a list are accessed by providing a numerical index, in square brackets. Individual values in a hash are accessed by providing the corresponding key, in curly braces. The $ sigil identifies the accessed element as a scalar.
$scores[2] # an element of @scores
$favorite{joe} # a value in %favorite

Thus, a hash can also be specified by setting its keys individually:
$favorite{joe} = 'red';
$favorite{sam} = 'blue';

Multiple elements may be accessed by using the @ sigil instead (identifying the result as a list).
@scores[2, 3, 1] # three elements of @scores
@favorite{'joe', 'sam'} # two values in %favorite
@favorite{qw(joe sam)} # same as above

The number of elements in an array can be obtained by evaluating the array in scalar context or with the help of the $# sigil. The latter gives the index of the last element in the array, not the number of elements. Note: the syntax highlighting in Wikipedia's software mistakenly considers some of the following code to be part of the comments.
$count = @friends; # Assigning to a scalar forces scalar context

$#friends; # The index of the last element in @friends
$#friends+1; # Usually the number of elements in @friends is one more
# than $#friends because the first element is at index 0,
# not 1, unless the programmer reset this to a different
# value, which most Perl manuals discourage.

There are a few functions that operate on entire hashes.
@names = keys %addressbook;
@addresses = values %addressbook;

# Every call to each returns the next key/value pair.
# All values will be eventually returned, but their order
# cannot be predicted.
while (($name, $address) = each %addressbook) {
print "$name lives at $address\n";
}

# Similar to the above, but sorted alphabetically
foreach my $next_name (sort keys %addressbook) {
print "$next_name lives at $addressbook{$next_name}\n";
}

Perl

Labels

Recent Posts

Blog Archive

Followers