GSSG


Using the Perl system() function

Every manual documents the Perl system() function slightly differently, each leaving out key bits of information. This is my own attempt to document the Perl system() function as well as provide some usage examples and misexamples.

The system() Basics.

The system() function executes an operating system command by forking a child process and waiting for the child process to complete, returning some sort of exit status for the child that you can test.

The arguments to the system() function can take three forms (parenthesis optional):

  1. system($STRING);
  2. system($PROGRAM, @ARGUMENT_LIST);
  3. system { $PROGRAM } $FAUX_PROGRAM, @ARGUMENT_LIST

The first form is the most commonly used. What determines which form you use (or which Perl will really use) depends on whether the command you're trying to execute contains shell metacharacters. Shell metacharacters include things like pipes, redirects and statement separators: (ps -aux | column 1 > pids); date. Specifically the characters:

& ; ` ' \ " | * ? ~ < > ^ ( ) [ ] { } $ \n \r

If the operating system command you're trying to execute uses any of these, you must use the first form, e.g:

system("$tace < $tcmd > $tout");

In this case the sole string argument is passed to a command shell.

One key thing to remember is that once you introduce the system() function, your script is no longer portable, without modification, to Perl running under Windows and possibly other Unix operating systems

But which command shell?

An operating system like Unix has many command shells: sh, csh, tcsh, bash, zsh, etc. Which will be used? The default shell of the user running the Perl script will be used, as defined in the /etc/passwd file. The potential pitfall here is if you develop a script as a one user and then give it to someone else or install it as a system cron job as another user, like 'root', the system() calls may run under a different shell. The builtin shell commands and the order of commands on $PATH won't be the same as when you were developing the script!

You can try to minimize this potential problem by adding an extra layer of shell to force the shell you want:

system("/bin/tcsh -c 'diff file file.old > file.diff 2>&1 /dev/null'");

But the preferred approach is:

  1. Only use shell operations that are common to all Unix shells (like simple redirection, pipes, etc.) if possible.
  2. Always use full path names to the operating system commands you invoke.
    (Although this makes your script less portable to other Unix systems, it's an important security concession.)

avoid:

system("chown $user.sgd newdata.txt");

embrace:

system("/usr/bin/chown $user:sgd newdata.txt");
system("/usr/ucb/chown $user.sgd newdata.txt");

The second form of the system() function.

In the second form of the system() function:

system($PROGRAM, @ARGUMENT_LIST);

no shell metacharacters are checked, you must be simply invoking a command on a list of arguments:

system('/usr/bin/cp', $from, $to);

This gains two potential efficiencies over the generic string form of the system() function. First, it doesn't need to invoke a shell to execute the command, it can do so directly via the Unix execvp system call, so there's less process overhead at run time. The second efficiency is that you don't necessarily end up doing as much string concatenation to form the arguments at compile time and runtime.

Perl pulls a fast one.

It turns out, if you use the first form and you don't use any shell metacharacters, Perl will split your string on whitespace and turn it into the second form to avoid the extra shell invocation:

system('mv /etc/group_comb /etc/group') => system('mv', '/etc/group_comb', '/etc/group')

So you don't necessarily need to worry about using the second form, just let Perl do the work for you!

The third form of the system() function.

The third form of the system() function:

system { $PROGRAM } $FAUX_PROGRAM, @ARGUMENT_LIST

is a variant on the second form that uses Perl's indirect object syntax to allow you to separate what command is invoked from how that command is seen in the process table via commands like 'ps' and 'top':

system { '/bin/sleep' } 'sleeping', 1000;

The opportunities to take advantage of this are rare but there is one special case of it that can be useful:

system { $args[0] } @args;

This forces the execvp form of the system() function regardless of whether @args contains multiple elements or just one element that only appears to contain shell metacharacters.

What exactly does system() return?

The system() function returns two numeric values folded into one. The first is the Unix signal (e.g INT=2, QUIT=3, KILL=9), if any, that terminated the command. This is stored in the lower eight bits returned. The next higher eight bits contain the exit code for the command you executed.

The simple approach is to just test against a return value of '0' which inidicates no problems detected:

embrace:

if (system("$sub_script $user $file") == 0) {
	print "success!\n";
	}

But don't turn this into an inverted logical operation:

avoid:

if (!system("$sub_script $user $file")) {
	print "success!\n";
	}

The result is a compound numeric value, not a logical value. This also reads inverted and is confusing. If you need more information on error, then you can break up the return code:

my $status = system("rotate_quickly.pl");
my $signal = $status & 0xff;
my $exit_code = ($status >> 8) & 0xff;

Specific things to watch out for:

The only hope you have of writing truely portable Perl scripts with respect to the system() function is to avoid using it altogether:

Avoid using system() when there's already a Perl function that does the same thing.

avoid:

system("rm $dir/orf_geneontology.tmp");
system "mkdir $tempDir";
system("/bin/kill -HUP $pid");

embrace:

unlink("$dir/orf_geneontology.tmp");
mkdir $tempDir;
$number_killed = kill('HUP', $pid);

Avoid using system() when there's already a Perl module that does the same thing.

avoid:

system("cp $datafile $cpfile");
system("/usr/bin/mv $tab_file $arcFile");

embrace:

use File::Copy;
copy($datafile, $cpfile);
move($tab_file, $arcFile);

A personal style choice.

My personal style choice when using system() on a complicated command with switches, when no shell metacharacters are involved, is to combine the second form with Perl's super comma, =>, as follows:

system('/gnu/cvs', '-d' => '/share/cvs', 'commit', '-m' => "'$dstr'", "$cvsdir/genes.txt");

Where this syntax pays off is when you have some fixed options you can set up front and some variable options later on in the program:

my @backup_save_command = ("$oracle/nsr_backup", '-b' => $backup_pool, '-g' => $backup_group);
...
system(@backup_save_command, '-f' => $source_file);

Variations on the system() theme.

There are several variations on the system() function theme about which you should be aware but I won't discuss in detail.

Backticks or qx//

Backticks, ``, work like system() but return a result:

chomp(my date = `/usr/bin/date`);
my $ps = `/usr/bin/ps -aux`; # get all the ps output as a single string

You should use backticks when you want the output of the system() call and don't need to check the error status.

The pipe form of the open() command.

The next variation on the system() call is the pipe form of the open() command:

embrace:

open(PS, "/usr/bin/ps -aux |");
while (<PS>) { print } # process the output of ps command one line at a time
close(PS);

This is generally preferable to creating a temporary file with the output and reading it back in:

avoid:

my $temporary_file = "/tmp/ps_output$$";
system("/usr/bin/ps -aux > $temporary_file");
open(PS, $temporary_file);
while (<PS>) { print } # process the output of ps command one line at a time
close(PS);
unlink($temporary_file);

The exec() function.

One frustrating thing about the documentation for system() is that it is often described as an exec() call that forks and many references are made to the exec() documentation. The reality is that most Perl programmers will use the system() function ten to a hundred times more often than they will ever use exec().

The exec() function is similar to system() except that it never returns. Instead it passes control to the operating system command that is invoked. It uses the same argument forms as system(). It is typically used in very short scripts that manipulate the Unix environment in some specific way and then invoke one command to run in that modified environment:

#!/usr/bin/perl -w

$ENV{LD_LIBRARY_PATH} = "$ENV{LD_LIBRARY_PATH}:/non_standard/lib";

exec ${0}_original, @ARGV;
The exec() call never returns so it's considered an error to have any other Perl statement follow it, except for a die, warn or exit in case the exec() call itself fails for some reason, e.g. you don't have permission to run the replacement command:
exec('netscape');
die "oops, nobody installed netscape on this system";

- cdl


Last updated 08/15/03
Questions, comments, additions and/or suggestions? Mail the webmaster.