[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11. Portable Shell Programming

When writing your own checks, there are some shell-script programming techniques you should avoid in order to make your code portable. The Bourne shell and upward-compatible shells like the Korn shell and Bash have evolved over the years, and many features added to the original System7 shell are now supported on all interesting porting targets. However, the following discussion between Russ Allbery and Robert Lipe is worth reading:

Russ Allbery:

The GNU assumption that /bin/sh is the one and only shell leads to a permanent deadlock. Vendors don't want to break users' existing shell scripts, and there are some corner cases in the Bourne shell that are not completely compatible with a Posix shell. Thus, vendors who have taken this route will never (OK…"never say never") replace the Bourne shell (as /bin/sh) with a Posix shell.

Robert Lipe:

This is exactly the problem. While most (at least most System V's) do have a Bourne shell that accepts shell functions most vendor /bin/sh programs are not the Posix shell.

So while most modern systems do have a shell somewhere that meets the Posix standard, the challenge is to find it.

For this reason, part of the job of M4sh (see section Programming in M4sh) is to find such a shell. But to prevent trouble, if you're not using M4sh you should not take advantage of features that were added after Unix version 7, circa 1977 (see section Systemology); you should not use aliases, negated character classes, or even unset. # comments, while not in Unix version 7, were retrofitted in the original Bourne shell and can be assumed to be part of the least common denominator.

On the other hand, if you're using M4sh you can assume that the shell has the features that were added in SVR2 (circa 1984), including shell functions, return, unset, and I/O redirection for builtins. For more information, refer to http://www.in-ulm.de/~mascheck/bourne/. However, some pitfalls have to be avoided for portable use of these constructs; these will be documented in the rest of this chapter. See in particular Shell Functions and Limitations of Shell Builtins.

Some ancient systems have quite small limits on the length of the `#!' line; for instance, 32 bytes (not including the newline) on SunOS 4. However, these ancient systems are no longer of practical concern.

The set of external programs you should run in a configure script is fairly small. See (standards)Utilities in Makefiles section `Utilities in Makefiles' in GNU Coding Standards, for the list. This restriction allows users to start out with a fairly small set of programs and build the rest, avoiding too many interdependencies between packages.

Some of these external utilities have a portable subset of features; see Limitations of Usual Tools.

There are other sources of documentation about shells. The specification for the Posix Shell Command Language, though more generous than the restrictive shell subset described above, is fairly portable nowadays. Also please see the Shell FAQs.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.1 Shellology

There are several families of shells, most prominently the Bourne family and the C shell family which are deeply incompatible. If you want to write portable shell scripts, avoid members of the C shell family. The the Shell difference FAQ includes a small history of Posix shells, and a comparison between several of them.

Below we describe some of the members of the Bourne shell family.

Ash

Ash is often used on GNU/Linux and BSD systems as a light-weight Bourne-compatible shell. Ash 0.2 has some bugs that are fixed in the 0.3.x series, but portable shell scripts should work around them, since version 0.2 is still shipped with many GNU/Linux distributions.

To be compatible with Ash 0.2:

Bash

To detect whether you are running Bash, test whether BASH_VERSION is set. To require Posix compatibility, run `set -o posix'. See (bash)Bash POSIX Mode section `Bash Posix Mode' in The GNU Bash Reference Manual, for details.

Bash 2.05 and later

Versions 2.05 and later of Bash use a different format for the output of the set builtin, designed to make evaluating its output easier. However, this output is not compatible with earlier versions of Bash (or with many other shells, probably). So if you use Bash 2.05 or higher to execute configure, you'll need to use Bash 2.05 for all other build tasks as well.

Ksh

The Korn shell is compatible with the Bourne family and it mostly conforms to Posix. It has two major variants commonly called `ksh88' and `ksh93', named after the years of initial release. It is usually called ksh, but is called sh on some hosts if you set your path appropriately.

Solaris systems have three variants: /usr/bin/ksh is `ksh88'; it is standard on Solaris 2.0 and later. /usr/xpg4/bin/sh is a Posix-compliant variant of `ksh88'; it is standard on Solaris 9 and later. /usr/dt/bin/dtksh is `ksh93'. Variants that are not standard may be parts of optional packages. There is no extra charge for these packages, but they are not part of a minimal OS install and therefore some installations may not have it.

Starting with Tru64 Version 4.0, the Korn shell /usr/bin/ksh is also available as /usr/bin/posix/sh. If the environment variable BIN_SH is set to xpg4, subsidiary invocations of the standard shell conform to Posix.

Pdksh

A public-domain clone of the Korn shell called pdksh is widely available: it has most of the `ksh88' features along with a few of its own. It usually sets KSH_VERSION, except if invoked as /bin/sh on OpenBSD, and similarly to Bash you can require Posix compatibility by running `set -o posix'. Unfortunately, with pdksh 5.2.14 (the latest stable version as of January 2007) Posix mode is buggy and causes pdksh to depart from Posix in at least one respect:

 
$ echo "`echo \"hello\"`"
hello
$ set -o posix
$ echo "`echo \"hello\"`"
"hello"

The last line of output contains spurious quotes. This is yet another reason why portable shell code should not contain "`…\"…\"…`" constructs (see section Shell Substitutions).

Zsh

To detect whether you are running zsh, test whether ZSH_VERSION is set. By default zsh is not compatible with the Bourne shell: you must execute `emulate sh', and for zsh versions before 3.1.6-dev-18 you must also set NULLCMD to `:'. See (zsh)Compatibility section `Compatibility' in The Z Shell Manual, for details.

The default Mac OS X sh was originally Zsh; it was changed to Bash in Mac OS X 10.2.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.2 Here-Documents

Don't rely on `\' being preserved just because it has no special meaning together with the next symbol. In the native sh on OpenBSD 2.7 `\"' expands to `"' in here-documents with unquoted delimiter. As a general rule, if `\\' expands to `\' use `\\' to get `\'.

With OpenBSD 2.7's sh

 
$ cat <<EOF
> \" \\
> EOF
" \

and with Bash:

 
bash-2.04$ cat <<EOF
> \" \\
> EOF
\" \

Some shells mishandle large here-documents: for example, Solaris 10 dtksh and the UnixWare 7.1.1 Posix shell, which are derived from Korn shell version M-12/28/93d, mishandle braced variable expansion that crosses a 1024- or 4096-byte buffer boundary within a here-document. Only the part of the variable name after the boundary is used. For example, ${variable} could be replaced by the expansion of ${ble}. If the end of the variable name is aligned with the block boundary, the shell reports an error, as if you used ${}. Instead of ${variable-default}, the shell may expand ${riable-default}, or even ${fault}. This bug can often be worked around by omitting the braces: $variable. The bug was fixed in `ksh93g' (1998-04-30) but as of 2006 many operating systems were still shipping older versions with the bug.

Many shells (including the Bourne shell) implement here-documents inefficiently. In particular, some shells can be extremely inefficient when a single statement contains many here-documents. For instance if your `configure.ac' includes something like:

 
if <cross_compiling>; then
  assume this and that
else
  check this
  check that
  check something else
  …
  on and on forever
  …
fi

A shell parses the whole if/fi construct, creating temporary files for each here-document in it. Some shells create links for such here-documents on every fork, so that the clean-up code they had installed correctly removes them. It is creating the links that can take the shell forever.

Moving the tests out of the if/fi, or creating multiple if/fi constructs, would improve the performance significantly. Anyway, this kind of construct is not exactly the typical use of Autoconf. In fact, it's even not recommended, because M4 macros can't look into shell conditionals, so we may fail to expand a macro when it was expanded before in a conditional path, and the condition turned out to be false at runtime, and we end up not executing the macro at all.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.3 File Descriptors

Most shells, if not all (including Bash, Zsh, Ash), output traces on stderr, even for subshells. This might result in undesirable content if you meant to capture the standard-error output of the inner command:

 
$ ash -x -c '(eval "echo foo >&2") 2>stderr'
$ cat stderr
+ eval echo foo >&2
+ echo foo
foo
$ bash -x -c '(eval "echo foo >&2") 2>stderr'
$ cat stderr
+ eval 'echo foo >&2'
++ echo foo
foo
$ zsh -x -c '(eval "echo foo >&2") 2>stderr'
# Traces on startup files deleted here.
$ cat stderr
+zsh:1> eval echo foo >&2
+zsh:1> echo foo
foo

One workaround is to grep out uninteresting lines, hoping not to remove good ones.

If you intend to redirect both standard error and standard output, redirect standard output first. This works better with HP-UX, since its shell mishandles tracing if standard error is redirected first:

 
$ sh -x -c ': 2>err >out'
+ :
+ 2> err $ cat err
1> out

Don't try to redirect the standard error of a command substitution. It must be done inside the command substitution. When running `: `cd /zorglub` 2>/dev/null' expect the error message to escape, while `: `cd /zorglub 2>/dev/null`' works properly.

It is worth noting that Zsh (but not Ash nor Bash) makes it possible in assignments though: `foo=`cd /zorglub` 2>/dev/null'.

Some shells, like ash, don't recognize bi-directional redirection (`<>'). And even on shells that recognize it, it is not portable to use on fifos: Posix does not require read-write support for named pipes, and Cygwin does not support it:

 
$ mkfifo fifo
$ exec 5<>fifo
$ echo hi >&5
bash: echo: write error: Communication error on send

When catering to old systems, don't redirect the same file descriptor several times, as you are doomed to failure under Ultrix.

 
ULTRIX V4.4 (Rev. 69) System #31: Thu Aug 10 19:42:23 GMT 1995
UWS V4.4 (Rev. 11)
$ eval 'echo matter >fullness' >void
illegal io
$ eval '(echo matter >fullness)' >void
illegal io
$ (eval '(echo matter >fullness)') >void
Ambiguous output redirect.

In each case the expected result is of course `fullness' containing `matter' and `void' being empty. However, this bug is probably not of practical concern to modern platforms.

Solaris 10 sh will try to optimize away a : command in a loop after the first iteration, even if it is redirected:

 
$ for i in 1 2 3 ; do : >x$i; done
$ ls
x1

As a workaround, echo or eval can be used.

Don't rely on file descriptors 0, 1, and 2 remaining closed in a subsidiary program. If any of these descriptors is closed, the operating system may open an unspecified file for the descriptor in the new process image. Posix says this may be done only if the subsidiary program is set-user-ID or set-group-ID, but HP-UX 11.23 does it even for ordinary programs.

Don't rely on open file descriptors being open in child processes. In ksh, file descriptors above 2 which are opened using `exec n>file' are closed by a subsequent `exec' (such as that involved in the fork-and-exec which runs a program or script). Thus, using sh, we have:

 
$ cat ./descrips
#!/bin/sh -
echo hello >&5
$ exec 5>t
$ ./descrips
$ cat t
hello
$

But using ksh:

 
$ exec 5>t
$ ./descrips
hello
$ cat t
$

Within the process which runs the `descrips' script, file descriptor 5 is closed.

Don't rely on redirection to a closed file descriptor to cause an error. With Solaris /bin/sh, when the redirection fails, the output goes to the original file descriptor.

 
$ bash -c 'echo hi >&3' 3>&-; echo $?
bash: 3: Bad file descriptor
1
$ /bin/sh -c 'echo hi >&3' 3>&-; echo $?
hi
0

DOS variants cannot rename or remove open files, such as in `mv foo bar >foo' or `rm foo >foo', even though this is perfectly portable among Posix hosts.

A few ancient systems reserved some file descriptors. By convention, file descriptor 3 was opened to `/dev/tty' when you logged into Eighth Edition (1985) through Tenth Edition Unix (1989). File descriptor 4 had a special use on the Stardent/Kubota Titan (circa 1990), though we don't now remember what it was. Both these systems are obsolete, so it's now safe to treat file descriptors 3 and 4 like any other file descriptors.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.4 File System Conventions

Autoconf uses shell-script processing extensively, so the file names that it processes should not contain characters that are special to the shell. Special characters include space, tab, newline, NUL, and the following:

 
" # $ & ' ( ) * ; < = > ? [ \ ` |

Also, file names should not begin with `~' or `-', and should contain neither `-' immediately after `/' nor `~' immediately after `:'. On Posix-like platforms, directory names should not contain `:', as this runs afoul of `:' used as the path separator.

These restrictions apply not only to the files that you distribute, but also to the absolute file names of your source, build, and destination directories.

On some Posix-like platforms, `!' and `^' are special too, so they should be avoided.

Posix lets implementations treat leading `//' specially, but requires leading `///' and beyond to be equivalent to `/'. Most Unix variants treat `//' like `/'. However, some treat `//' as a "super-root" that can provide access to files that are not otherwise reachable from `/'. The super-root tradition began with Apollo Domain/OS, which died out long ago, but unfortunately Cygwin has revived it.

While autoconf and friends are usually run on some Posix variety, they can be used on other systems, most notably DOS variants. This impacts several assumptions regarding file names.

For example, the following code:

 
case $foo_dir in
  /*) # Absolute
     ;;
  *)
     foo_dir=$dots$foo_dir ;;
esac

fails to properly detect absolute file names on those systems, because they can use a drivespec, and usually use a backslash as directory separator. If you want to be portable to DOS variants (at the price of rejecting valid but oddball Posix file names like `a:\b'), you can check for absolute file names like this:

 
case $foo_dir in
  [\\/]* | ?:[\\/]* ) # Absolute
     ;;
  *)
     foo_dir=$dots$foo_dir ;;
esac

Make sure you quote the brackets if appropriate and keep the backslash as first character (see Limitations of Shell Builtins).

Also, because the colon is used as part of a drivespec, these systems don't use it as path separator. When creating or accessing paths, you can use the PATH_SEPARATOR output variable instead. configure sets this to the appropriate value for the build system (`:' or `;') when it starts up.

File names need extra care as well. While DOS variants that are Posixy enough to run autoconf (such as DJGPP) are usually able to handle long file names properly, there are still limitations that can seriously break packages. Several of these issues can be easily detected by the doschk package.

A short overview follows; problems are marked with SFN/LFN to indicate where they apply: SFN means the issues are only relevant to plain DOS, not to DOS under Microsoft Windows variants, while LFN identifies problems that exist even under Microsoft Windows variants.

No multiple dots (SFN)

DOS cannot handle multiple dots in file names. This is an especially important thing to remember when building a portable configure script, as autoconf uses a .in suffix for template files.

This is perfectly OK on Posix variants:

 
AC_CONFIG_HEADERS([config.h])
AC_CONFIG_FILES([source.c foo.bar])
AC_OUTPUT

but it causes problems on DOS, as it requires `config.h.in', `source.c.in' and `foo.bar.in'. To make your package more portable to DOS-based environments, you should use this instead:

 
AC_CONFIG_HEADERS([config.h:config.hin])
AC_CONFIG_FILES([source.c:source.cin foo.bar:foobar.in])
AC_OUTPUT
No leading dot (SFN)

DOS cannot handle file names that start with a dot. This is usually not important for autoconf.

Case insensitivity (LFN)

DOS is case insensitive, so you cannot, for example, have both a file called `INSTALL' and a directory called `install'. This also affects make; if there's a file called `INSTALL' in the directory, `make install' does nothing (unless the `install' target is marked as PHONY).

The 8+3 limit (SFN)

Because the DOS file system only stores the first 8 characters of the file name and the first 3 of the extension, those must be unique. That means that `foobar-part1.c', `foobar-part2.c' and `foobar-prettybird.c' all resolve to the same file name (`FOOBAR-P.C'). The same goes for `foo.bar' and `foo.bartender'.

The 8+3 limit is not usually a problem under Microsoft Windows, as it uses numeric tails in the short version of file names to make them unique. However, a registry setting can turn this behavior off. While this makes it possible to share file trees containing long file names between SFN and LFN environments, it also means the above problem applies there as well.

Invalid characters (LFN)

Some characters are invalid in DOS file names, and should therefore be avoided. In a LFN environment, these are `/', `\', `?', `*', `:', `<', `>', `|' and `"'. In a SFN environment, other characters are also invalid. These include `+', `,', `[' and `]'.

Invalid names (LFN)

Some DOS file names are reserved, and cause problems if you try to use files with those names. These names include `CON', `AUX', `COM1', `COM2', `COM3', `COM4', `LPT1', `LPT2', `LPT3', `NUL', and `PRN'. File names are case insensitive, so even names like `aux/config.guess' are disallowed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.5 Shell Pattern Matching

Nowadays portable patterns can use negated character classes like `[!-aeiou]'. The older syntax `[^-aeiou]' is supported by some shells but not others; hence portable scripts should never use `^' as the first character of a bracket pattern.

Outside the C locale, patterns like `[a-z]' are problematic since they may match characters that are not lower-case letters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.6 Shell Substitutions

Contrary to a persistent urban legend, the Bourne shell does not systematically split variables and back-quoted expressions, in particular on the right-hand side of assignments and in the argument of case. For instance, the following code:

 
case "$given_srcdir" in
.)  top_srcdir="`echo "$dots" | sed 's|/$||'`" ;;
*)  top_srcdir="$dots$given_srcdir" ;;
esac

is more readable when written as:

 
case $given_srcdir in
.)  top_srcdir=`echo "$dots" | sed 's|/$||'` ;;
*)  top_srcdir=$dots$given_srcdir ;;
esac

and in fact it is even more portable: in the first case of the first attempt, the computation of top_srcdir is not portable, since not all shells properly understand "`…"…"…`". Worse yet, not all shells understand "`…\"…\"…`" the same way. There is just no portable way to use double-quoted strings inside double-quoted back-quoted expressions (pfew!).

$@

One of the most famous shell-portability issues is related to `"$@"'. When there are no positional arguments, Posix says that `"$@"' is supposed to be equivalent to nothing, but the original Unix version 7 Bourne shell treated it as equivalent to `""' instead, and this behavior survives in later implementations like Digital Unix 5.0.

The traditional way to work around this portability problem is to use `${1+"$@"}'. Unfortunately this method does not work with Zsh (3.x and 4.x), which is used on Mac OS X. When emulating the Bourne shell, Zsh performs word splitting on `${1+"$@"}':

 
zsh $ emulate sh
zsh $ for i in "$@"; do echo $i; done
Hello World
!
zsh $ for i in ${1+"$@"}; do echo $i; done
Hello
World
!

Zsh handles plain `"$@"' properly, but we can't use plain `"$@"' because of the portability problems mentioned above. One workaround relies on Zsh's "global aliases" to convert `${1+"$@"}' into `"$@"' by itself:

 
test "${ZSH_VERSION+set}" = set && alias -g '${1+"$@"}'='"$@"'

Zsh only recognizes this alias when a shell word matches it exactly; `"foo"${1+"$@"}' remains subject to word splitting. Since this case always yields at least one shell word, use plain `"$@"'.

A more conservative workaround is to avoid `"$@"' if it is possible that there may be no positional arguments. For example, instead of:

 
cat conftest.c "$@"

you can use this instead:

 
case $# in
0) cat conftest.c;;
*) cat conftest.c "$@";;
esac

Autoconf macros often use the set command to update `$@', so if you are writing shell code intended for configure you should not assume that the value of `$@' persists for any length of time.

${10}

The 10th, 11th, … positional parameters can be accessed only after a shift. The 7th Edition shell reported an error if given ${10}, and Solaris 10 /bin/sh still acts that way:

 
$ set 1 2 3 4 5 6 7 8 9 10
$ echo ${10}
bad substitution
${var:-value}

Old BSD shells, including the Ultrix sh, don't accept the colon for any shell substitution, and complain and die. Similarly for ${var:=value}, ${var:?value}, etc.

${var=literal}

Be sure to quote:

 
: ${var='Some words'}

otherwise some shells, such as on Digital Unix V 5.0, die because of a "bad substitution".


Solaris /bin/sh has a frightening bug in its interpretation of this. Imagine you need set a variable to a string containing `}'. This `}' character confuses Solaris /bin/sh when the affected variable was already set. This bug can be exercised by running:

 
$ unset foo
$ foo=${foo='}'}
$ echo $foo
}
$ foo=${foo='}'   # no error; this hints to what the bug is
$ echo $foo
}
$ foo=${foo='}'}
$ echo $foo
}}
 ^ ugh!

It seems that `}' is interpreted as matching `${', even though it is enclosed in single quotes. The problem doesn't happen using double quotes.

${var=expanded-value}

On Ultrix, running

 
default="yu,yaa"
: ${var="$default"}

sets var to `M-yM-uM-,M-yM-aM-a', i.e., the 8th bit of each char is set. You don't observe the phenomenon using a simple `echo $var' since apparently the shell resets the 8th bit when it expands $var. Here are two means to make this shell confess its sins:

 
$ cat -v <<EOF
$var
EOF

and

 
$ set | grep '^var=' | cat -v

One classic incarnation of this bug is:

 
default="a b c"
: ${list="$default"}
for c in $list; do
  echo $c
done

You'll get `a b c' on a single line. Why? Because there are no spaces in `$list': there are `M- ', i.e., spaces with the 8th bit set, hence no IFS splitting is performed!!!

One piece of good news is that Ultrix works fine with `: ${list=$default}'; i.e., if you don't quote. The bad news is then that QNX 4.25 then sets list to the last item of default!

The portable way out consists in using a double assignment, to switch the 8th bit twice on Ultrix:

 
list=${list="$default"}

…but beware of the `}' bug from Solaris (see above). For safety, use:

 
test "${var+set}" = set || var={value}
${#var}
${var%word}
${var%%word}
${var#word}
${var##word}

Posix requires support for these usages, but they do not work with many traditional shells, e.g., Solaris 10 /bin/sh.

Also, pdksh 5.2.14 mishandles some word forms. For example if `$1' is `a/b' and `$2' is `a', then `${1#$2}' should yield `/b', but with pdksh it yields the empty string.

`commands`

Posix requires shells to trim all trailing newlines from command output before substituting it, so assignments like `dir=`echo "$file" | tr a A`' do not work as expected if `$file' ends in a newline.

While in general it makes no sense, do not substitute a single builtin with side effects, because Ash 0.2, trying to optimize, does not fork a subshell to perform the command.

For instance, if you wanted to check that cd is silent, do not use `test -z "`cd /`"' because the following can happen:

 
$ pwd
/tmp
$ test -z "`cd /`" && pwd
/

The result of `foo=`exit 1`' is left as an exercise to the reader.

The MSYS shell leaves a stray byte in the expansion of a double-quoted command substitution of a native program, if the end of the substitution is not aligned with the end of the double quote. This may be worked around by inserting another pair of quotes:

 
$ echo "`printf 'foo\r\n'` bar" > broken
$ echo "`printf 'foo\r\n'`"" bar" | cmp - broken
- broken differ: char 4, line 1

Upon interrupt or SIGTERM, some shells may abort a command substitution, replace it with a null string, and wrongly evaluate the enclosing command before entering the trap or ending the script. This can lead to spurious errors:

 
$ sh -c 'if test `sleep 5; echo hi` = hi; then echo yes; fi'
$ ^C
sh: test: hi: unexpected operator/operand

You can avoid this by assigning the command substitution to a temporary variable:

 
$ sh -c 'res=`sleep 5; echo hi`
         if test "x$res" = xhi; then echo yes; fi'
$ ^C
$(commands)

This construct is meant to replace ``commands`', and it has most of the problems listed under `commands`.

This construct can be nested while this is impossible to do portably with back quotes. Unfortunately it is not yet universally supported. Most notably, even recent releases of Solaris don't support it:

 
$ showrev -c /bin/sh | grep version
Command version: SunOS 5.10 Generic 121005-03 Oct 2006
$ echo $(echo blah)
syntax error: `(' unexpected

nor does IRIX 6.5's Bourne shell:

 
$ uname -a
IRIX firebird-image 6.5 07151432 IP22
$ echo $(echo blah)
$(echo blah)

If you do use `$(commands)', make sure that the commands do not start with a parenthesis, as that would cause confusion with a different notation `$((expression))' that in modern shells is an arithmetic expression not a command. To avoid the confusion, insert a space between the two opening parentheses.

Avoid commands that contain unbalanced parentheses in here-documents, comments, or case statement patterns, as many shells mishandle them. For example, Bash 3.1, `ksh88', pdksh 5.2.14, and Zsh 4.2.6 all mishandle the following valid command:

 
echo $(case x in x) echo hello;; esac)
$((expression))

Arithmetic expansion is not portable as some shells (most notably Solaris 10 /bin/sh) don't support it.

Among shells that do support `$(( ))', not all of them obey the Posix rule that octal and hexadecimal constants must be recognized:

 
$ bash -c 'echo $(( 010 + 0x10 ))'
24
$ zsh -c 'echo $(( 010 + 0x10 ))'
26
$ zsh -c 'emulate sh; echo $(( 010 + 0x10 ))'
24
$ pdksh -c 'echo $(( 010 + 0x10 ))'
pdksh:  010 + 0x10 : bad number `0x10'
$ pdksh -c 'echo $(( 010 ))'
10

When it is available, using arithmetic expansion provides a noticeable speedup in script execution; but testing for support requires eval to avoid syntax errors. The following construct is used by AS_VAR_ARITH to provide arithmetic computation when all arguments are provided in decimal and without a leading zero, and all operators are properly quoted and appear as distinct arguments:

 
if ( eval 'test $(( 1 + 1 )) = 2' ) 2>/dev/null; then
  eval 'func_arith ()
  {
    func_arith_result=$(( $* ))
  }'
else
  func_arith ()
  {
    func_arith_result=`expr "$@"`
  }
fi
func_arith 1 + 1
foo=$func_arith_result
^

Always quote `^', otherwise traditional shells such as /bin/sh on Solaris 10 treat this like `|'.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.7 Assignments

When setting several variables in a row, be aware that the order of the evaluation is undefined. For instance `foo=1 foo=2; echo $foo' gives `1' with Solaris /bin/sh, but `2' with Bash. You must use `;' to enforce the order: `foo=1; foo=2; echo $foo'.

Don't rely on the following to find `subdir/program':

 
PATH=subdir$PATH_SEPARATOR$PATH program

as this does not work with Zsh 3.0.6. Use something like this instead:

 
(PATH=subdir$PATH_SEPARATOR$PATH; export PATH; exec program)

Don't rely on the exit status of an assignment: Ash 0.2 does not change the status and propagates that of the last statement:

 
$ false || foo=bar; echo $?
1
$ false || foo=`:`; echo $?
0

and to make things even worse, QNX 4.25 just sets the exit status to 0 in any case:

 
$ foo=`exit 1`; echo $?
0

To assign default values, follow this algorithm:

  1. If the default value is a literal and does not contain any closing brace, use:
     
    : ${var='my literal'}
    
  2. If the default value contains no closing brace, has to be expanded, and the variable being initialized is not intended to be IFS-split (i.e., it's not a list), then use:
     
    : ${var="$default"}
    
  3. If the default value contains no closing brace, has to be expanded, and the variable being initialized is intended to be IFS-split (i.e., it's a list), then use:
     
    var=${var="$default"}
    
  4. If the default value contains a closing brace, then use:
     
    test "${var+set}" = set || var="has a '}'"
    

In most cases `var=${var="$default"}' is fine, but in case of doubt, just use the last form. See section Shell Substitutions, items `${var:-value}' and `${var=value}' for the rationale.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.8 Parentheses in Shell Scripts

Beware of two opening parentheses in a row, as many shell implementations treat them specially. Posix requires that the command `((cat))' must behave like `(cat)', but many shells, including Bash and the Korn shell, treat `((cat))' as an arithmetic expression equivalent to `let "cat"', and may or may not report an error when they detect that `cat' is not a number. As another example, `pdksh' 5.2.14 misparses the following code:

 
if ((true) || false); then
  echo ok
fi

To work around this problem, insert a space between the two opening parentheses. There is a similar problem and workaround with `$(('; see Shell Substitutions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9 Slashes in Shell Scripts

Unpatched Tru64 5.1 sh omits the last slash of command-line arguments that contain two trailing slashes:

 
$ echo / // /// //// .// //.
/ / // /// ./ //.
$ x=//
$ eval "echo \$x"
/
$ set -x
$ echo abc | tr -t ab //
+ echo abc
+ tr -t ab /
/bc

Unpatched Tru64 4.0 sh adds a slash after `"$var"' if the variable is empty and the second double-quote is followed by a word that begins and ends with slash:

 
$ sh -xc 'p=; echo "$p"/ouch/'
p=
+ echo //ouch/
//ouch/

However, our understanding is that patches are available, so perhaps it's not worth worrying about working around these horrendous bugs.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.10 Special Shell Variables

Some shell variables should not be used, since they can have a deep influence on the behavior of the shell. In order to recover a sane behavior from the shell, some variables should be unset; M4sh takes care of this and provides fallback values, whenever needed, to cater for a very old `/bin/sh' that does not support unset. (see section Portable Shell Programming).

As a general rule, shell variable names containing a lower-case letter are safe; you can define and use these variables without worrying about their effect on the underlying system, and without worrying about whether the shell changes them unexpectedly. (The exception is the shell variable status, as described below.)

Here is a list of names that are known to cause trouble. This list is not exhaustive, but you should be safe if you avoid the name status and names containing only upper-case letters and underscores.

?

Not all shells correctly reset `$?' after conditionals (see Limitations of Shell Builtins). Not all shells manage `$?' correctly in shell functions (see section Shell Functions) or in traps (see Limitations of Shell Builtins). Not all shells reset `$?' to zero after an empty command.

 
$ bash -c 'false; $empty; echo $?'
0
$ zsh -c 'false; $empty; echo $?'
1
_

Many shells reserve `$_' for various purposes, e.g., the name of the last command executed.

BIN_SH

In Tru64, if BIN_SH is set to xpg4, subsidiary invocations of the standard shell conform to Posix.

CDPATH

When this variable is set it specifies a list of directories to search when invoking cd with a relative file name that did not start with `./' or `../'. Posix 1003.1-2001 says that if a nonempty directory name from CDPATH is used successfully, cd prints the resulting absolute file name. Unfortunately this output can break idioms like `abs=`cd src && pwd`' because abs receives the name twice. Also, many shells do not conform to this part of Posix; for example, zsh prints the result only if a directory name other than `.' was chosen from CDPATH.

In practice the shells that have this problem also support unset, so you can work around the problem as follows:

 
(unset CDPATH) >/dev/null 2>&1 && unset CDPATH

You can also avoid output by ensuring that your directory name is absolute or anchored at `./', as in `abs=`cd ./src && pwd`'.

Configure scripts use M4sh, which automatically unsets CDPATH if possible, so you need not worry about this problem in those scripts.

CLICOLOR_FORCE

When this variable is set, some implementations of tools like ls attempt to add color to their output via terminal escape sequences, even when the output is not directed to a terminal, and can thus cause spurious failures in scripts. Configure scripts use M4sh, which automatically unsets this variable.

DUALCASE

In the MKS shell, case statements and file name generation are case-insensitive unless DUALCASE is nonzero. Autoconf-generated scripts export this variable when they start up.

ENV
MAIL
MAILPATH
PS1
PS2
PS4

These variables should not matter for shell scripts, since they are supposed to affect only interactive shells. However, at least one shell (the pre-3.0 UWIN Korn shell) gets confused about whether it is interactive, which means that (for example) a PS1 with a side effect can unexpectedly modify `$?'. To work around this bug, M4sh scripts (including `configure' scripts) do something like this:

 
(unset ENV) >/dev/null 2>&1 && unset ENV MAIL MAILPATH
PS1='$ '
PS2='> '
PS4='+ '

(actually, there is some complication due to bugs in unset; see see Limitations of Shell Builtins).

FPATH

The Korn shell uses FPATH to find shell functions, so avoid FPATH in portable scripts. FPATH is consulted after PATH, but you still need to be wary of tests that use PATH to find whether a command exists, since they might report the wrong result if FPATH is also set.

GREP_OPTIONS

When this variable is set, some implementations of grep honor these options, even if the options include direction to enable colored output via terminal escape sequences, and the result can cause spurious failures when the output is not directed to a terminal. Configure scripts use M4sh, which automatically unsets this variable.

IFS

Long ago, shell scripts inherited IFS from the environment, but this caused many problems so modern shells ignore any environment settings for IFS.

Don't set the first character of IFS to backslash. Indeed, Bourne shells use the first character (backslash) when joining the components in `"$@"' and some shells then reinterpret (!) the backslash escapes, so you can end up with backspace and other strange characters.

The proper value for IFS (in regular code, not when performing splits) is `SPCTABRET'. The first character is especially important, as it is used to join the arguments in `$*'; however, note that traditional shells, but also bash-2.04, fail to adhere to this and join with a space anyway.

LANG
LC_ALL
LC_COLLATE
LC_CTYPE
LC_MESSAGES
LC_MONETARY
LC_NUMERIC
LC_TIME

You should set all these variables to `C' because so much configuration code assumes the C locale and Posix requires that locale environment variables be set to `C' if the C locale is desired; `configure' scripts and M4sh do that for you. Export these variables after setting them.

LANGUAGE

LANGUAGE is not specified by Posix, but it is a GNU extension that overrides LC_ALL in some cases, so you (or M4sh) should set it too.

LC_ADDRESS
LC_IDENTIFICATION
LC_MEASUREMENT
LC_NAME
LC_PAPER
LC_TELEPHONE

These locale environment variables are GNU extensions. They are treated like their Posix brethren (LC_COLLATE, etc.) as described above.

LINENO

Most modern shells provide the current line number in LINENO. Its value is the line number of the beginning of the current command. M4sh, and hence Autoconf, attempts to execute configure with a shell that supports LINENO. If no such shell is available, it attempts to implement LINENO with a Sed prepass that replaces each instance of the string $LINENO (not followed by an alphanumeric character) with the line's number. In M4sh scripts you should execute AS_LINENO_PREPARE so that these workarounds are included in your script; configure scripts do this automatically in AC_INIT.

You should not rely on LINENO within eval or shell functions, as the behavior differs in practice. The presence of a quoted newline within simple commands can alter which line number is used as the starting point for $LINENO substitutions within that command. Also, the possibility of the Sed prepass means that you should not rely on $LINENO when quoted, when in here-documents, or when line continuations are used. Subshells should be OK, though. In the following example, lines 1, 9, and 14 are portable, but the other instances of $LINENO do not have deterministic values:

 
$ cat lineno
echo 1. $LINENO
echo "2. $LINENO
3. $LINENO"
cat <<EOF
5. $LINENO
6. $LINENO
7. \$LINENO
EOF
( echo 9. $LINENO )
eval 'echo 10. $LINENO'
eval 'echo 11. $LINENO
echo 12. $LINENO'
echo 13. '$LINENO'
echo 14. $LINENO '
15.' $LINENO
f () { echo $1 $LINENO;
echo $1 $LINENO }
f 18.
echo 19. \
$LINENO
$ bash-3.2 ./lineno
1. 1
2. 3
3. 3
5. 4
6. 4
7. $LINENO
9. 9
10. 10
11. 12
12. 13
13. $LINENO
14. 14
15. 14
18. 16
18. 17
19. 19
$ zsh-4.3.4 ./lineno
1. 1
2. 2
3. 2
5. 4
6. 4
7. $LINENO
9. 9
10. 1
11. 1
12. 2
13. $LINENO
14. 14
15. 14
18. 0
18. 1
19. 19
$ pdksh-5.2.14 ./lineno
1. 1
2. 2
3. 2
5. 4
6. 4
7. $LINENO
9. 9
10. 0
11. 0
12. 0
13. $LINENO
14. 14
15. 14
18. 16
18. 17
19. 19
$ sed '=' <lineno |
>   sed '
>     N
>     s,$,-,
>     t loop
>     :loop
>     s,^\([0-9]*\)\(.*\)[$]LINENO\([^a-zA-Z0-9_]\),\1\2\1\3,
>     t loop
>     s,-$,,
>     s,^[0-9]*\n,,
>   ' |
>   sh
1. 1
2. 2
3. 3
5. 5
6. 6
7. \7
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
18. 16
18. 17
19. 20

In particular, note that `config.status' (and any other subsidiary script created by AS_INIT_GENERATED) might report line numbers relative to the parent script as a result of the potential Sed pass.

NULLCMD

When executing the command `>foo', zsh executes `$NULLCMD >foo' unless it is operating in Bourne shell compatibility mode and the zsh version is newer than 3.1.6-dev-18. If you are using an older zsh and forget to set NULLCMD, your script might be suspended waiting for data on its standard input.

PATH_SEPARATOR

On DJGPP systems, the PATH_SEPARATOR environment variable can be set to either `:' or `;' to control the path separator Bash uses to set up certain environment variables (such as PATH). You can set this variable to `;' if you want configure to use `;' as a separator; this might be useful if you plan to use non-Posix shells to execute files. See section File System Conventions, for more information about PATH_SEPARATOR.

PWD

Posix 1003.1-2001 requires that cd and pwd must update the PWD environment variable to point to the logical name of the current directory, but traditional shells do not support this. This can cause confusion if one shell instance maintains PWD but a subsidiary and different shell does not know about PWD and executes cd; in this case PWD points to the wrong directory. Use ``pwd`' rather than `$PWD'.

RANDOM

Many shells provide RANDOM, a variable that returns a different integer each time it is used. Most of the time, its value does not change when it is not used, but on IRIX 6.5 the value changes all the time. This can be observed by using set. It is common practice to use $RANDOM as part of a file name, but code shouldn't rely on $RANDOM expanding to a nonempty string.

status

This variable is an alias to `$?' for zsh (at least 3.1.6), hence read-only. Do not use it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.11 Shell Functions

Nowadays, it is difficult to find a shell that does not support shell functions at all. However, some differences should be expected.

Inside a shell function, you should not rely on the error status of a subshell if the last command of that subshell was exit or trap, as this triggers bugs in zsh 4.x; while Autoconf tries to find a shell that does not exhibit the bug, zsh might be the only shell present on the user's machine.

Likewise, the state of `$?' is not reliable when entering a shell function. This has the effect that using a function as the first command in a trap handler can cause problems.

 
$ bash -c 'foo(){ echo $?; }; trap foo 0; (exit 2); exit 2'; echo $?
2
2
$ ash -c 'foo(){ echo $?; }; trap foo 0; (exit 2); exit 2'; echo $?
0
2

DJGPP bash 2.04 has a bug in that return from a shell function which also used a command substitution causes a segmentation fault. To work around the issue, you can use return from a subshell, or `AS_SET_STATUS' as last command in the execution flow of the function (see section Common Shell Constructs).

Not all shells treat shell functions as simple commands impacted by `set -e', for example with Solaris 10 bin/sh:

 
$ bash -c 'f(){ return 1; }; set -e; f; echo oops
$ /bin/sh -c 'f(){ return 1; }; set -e; f; echo oops
oops

Shell variables and functions may share the same namespace, for example with Solaris 10 /bin/sh:

 
$ f () { :; }; f=; f
f: not found

For this reason, Autoconf (actually M4sh, see section Programming in M4sh) uses the prefix `as_fn_' for its functions.

Handling of positional parameters and shell options varies among shells. For example, Korn shells reset and restore trace output (`set -x') and other options upon function entry and exit. Inside a function, IRIX sh sets `$0' to the function name.

It is not portable to pass temporary environment variables to shell functions. Solaris /bin/sh does not see the variable. Meanwhile, not all shells follow the Posix rule that the assignment must affect the current environment in the same manner as special built-ins.

 
$ /bin/sh -c 'func(){ echo $a;}; a=1 func; echo $a'
⇒
⇒
$ ash -c 'func(){ echo $a;}; a=1 func; echo $a'
⇒1
⇒
$ bash -c 'set -o posix; func(){ echo $a;}; a=1 func; echo $a'
⇒1
⇒1

Some ancient Bourne shell variants with function support did not reset `$i, i >= 0', upon function exit, so effectively the arguments of the script were lost after the first function invocation. It is probably not worth worrying about these shells any more.

With AIX sh, a trap on 0 installed in a shell function triggers at function exit rather than at script exit, see See Limitations of Shell Builtins.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.12 Limitations of Shell Builtins

No, no, we are serious: some shells do have limitations! :)

You should always keep in mind that any builtin or command may support options, and therefore differ in behavior with arguments starting with a dash. For instance, even the innocent `echo "$word"' can give unexpected results when word starts with a dash. It is often possible to avoid this problem using `echo "x$word"', taking the `x' into account later in the pipe. Many of these limitations can be worked around using M4sh (see section Programming in M4sh).

.

Use . only with regular files (use `test -f'). Bash 2.03, for instance, chokes on `. /dev/null'. Remember that . uses PATH if its argument contains no slashes. Also, some shells, including bash 3.2, implicitly append the current directory to this PATH search, even though Posix forbids it. So if you want to use . on a file `foo' in the current directory, you must use `. ./foo'.

Not all shells gracefully handle syntax errors within a sourced file. On one extreme, some non-interactive shells abort the entire script. On the other, zsh 4.3.10 has a bug where it fails to react to the syntax error.

 
$ echo 'fi' > syntax
$ bash -c '. ./syntax; echo $?'
./syntax: line 1: syntax error near unexpected token `fi'
./syntax: line 1: `fi'
1
$ ash -c '. ./syntax; echo $?'
./syntax: 1: Syntax error: "fi" unexpected
$ zsh -c '. ./syntax; echo $?'
./syntax:1: parse error near `fi'
0
!

The Unix version 7 shell did not support negating the exit status of commands with !, and this feature is still absent from some shells (e.g., Solaris /bin/sh). Other shells, such as FreeBSD /bin/sh or ash, have bugs when using !:

 
$ sh -c '! : | :'; echo $?
1
$ ash -c '! : | :'; echo $?
0
$ sh -c '! { :; }'; echo $?
1
$ ash -c '! { :; }'; echo $?
{: not found
Syntax error: "}" unexpected
2

Shell code like this:

 
if ! cmp file1 file2 >/dev/null 2>&1; then
  echo files differ or trouble
fi

is therefore not portable in practice. Typically it is easy to rewrite such code, e.g.:

 
cmp file1 file2 >/dev/null 2>&1 ||
  echo files differ or trouble

More generally, one can always rewrite `! command' as:

 
if command; then (exit 1); else :; fi
{...}

Bash 3.2 (and earlier versions) sometimes does not properly set `$?' when failing to write redirected output of a compound command. This problem is most commonly observed with `{…}'; it does not occur with `(…)'. For example:

 
$ bash -c '{ echo foo; } >/bad; echo $?'
bash: line 1: /bad: Permission denied
0
$ bash -c 'while :; do echo; done >/bad; echo $?'
bash: line 1: /bad: Permission denied
0

To work around the bug, prepend `:;':

 
$ bash -c ':;{ echo foo; } >/bad; echo $?'
bash: line 1: /bad: Permission denied
1

Posix requires a syntax error if a brace list has no contents. However, not all shells obey this rule; and on shells where empty lists are permitted, the effect on `$?' is inconsistent. To avoid problems, ensure that a brace list is never empty.

 
$ bash -c 'false; { }; echo $?' || echo $?
bash: line 1: syntax error near unexpected token `}'
bash: line 1: `false; { }; echo $?'
2
$ zsh -c 'false; { }; echo $?' || echo $?
1
$ pdksh -c 'false; { }; echo $?' || echo $?
0
break

The use of `break 2' etc. is safe.

case

You don't need to quote the argument; no splitting is performed.

You don't need the final `;;', but you should use it.

Posix requires support for case patterns with opening parentheses like this:

 
case $file_name in
  (*.c) echo "C source code";;
esac

but the ( in this example is not portable to many Bourne shell implementations, which is a pity for those of us using tools that rely on balanced parentheses. For instance, with Solaris /bin/sh:

 
$ case foo in (foo) echo foo;; esac
error-->syntax error: `(' unexpected

The leading `(' can be omitted safely. Unfortunately, there are contexts where unbalanced parentheses cause other problems, such as when using a syntax-highlighting editor that searches for the balancing counterpart, or more importantly, when using a case statement as an underquoted argument to an Autoconf macro. See section Dealing with unbalanced parentheses, for tradeoffs involved in various styles of dealing with unbalanced `)'.

Zsh handles pattern fragments derived from parameter expansions or command substitutions as though quoted:

 
$ pat=\?; case aa in ?$pat) echo match;; esac
$ pat=\?; case a? in ?$pat) echo match;; esac
match

Because of a bug in its fnmatch, Bash fails to properly handle backslashes in character classes:

 
bash-2.02$ case /tmp in [/\\]*) echo OK;; esac
bash-2.02$

This is extremely unfortunate, since you are likely to use this code to handle Posix or MS-DOS absolute file names. To work around this bug, always put the backslash first:

 
bash-2.02$ case '\TMP' in [\\/]*) echo OK;; esac
OK
bash-2.02$ case /tmp in [\\/]*) echo OK;; esac
OK

Many Bourne shells cannot handle closing brackets in character classes correctly.

Some shells also have problems with backslash escaping in case you do not want to match the backslash: both a backslash and the escaped character match this pattern. To work around this, specify the character class in a variable, so that quote removal does not apply afterwards, and the special characters don't have to be backslash-escaped:

 
$ case '\' in [\<]) echo OK;; esac
OK
$ scanset='[<]'; case '\' in $scanset) echo OK;; esac
$

Even with this, Solaris ksh matches a backslash if the set contains any of the characters `|', `&', `(', or `)'.

Conversely, Tru64 ksh (circa 2003) erroneously always matches a closing parenthesis if not specified in a character class:

 
$ case foo in *\)*) echo fail ;; esac
fail
$ case foo in *')'*) echo fail ;; esac
fail

Some shells, such as Ash 0.3.8, are confused by an empty case/esac:

 
ash-0.3.8 $ case foo in esac;
error-->Syntax error: ";" unexpected (expecting ")")

Posix requires case to give an exit status of 0 if no cases match. However, /bin/sh in Solaris 10 does not obey this rule. Meanwhile, it is unclear whether a case that matches, but contains no statements, must also change the exit status to 0. The M4sh macro AS_CASE works around these inconsistencies.

 
$ bash -c 'case `false` in ?) ;; esac; echo $?'
0
$ /bin/sh -c 'case `false` in ?) ;; esac; echo $?'
255
cd

Posix 1003.1-2001 requires that cd must support the `-L' ("logical") and `-P' ("physical") options, with `-L' being the default. However, traditional shells do not support these options, and their cd command has the `-P' behavior.

Portable scripts should assume neither option is supported, and should assume neither behavior is the default. This can be a bit tricky, since the Posix default behavior means that, for example, `ls ..' and `cd ..' may refer to different directories if the current logical directory is a symbolic link. It is safe to use cd dir if dir contains no `..' components. Also, Autoconf-generated scripts check for this problem when computing variables like ac_top_srcdir (see section Performing Configuration Actions), so it is safe to cd to these variables.

See See section Special Shell Variables, for portability problems involving cd and the CDPATH environment variable. Also please see the discussion of the pwd command.

echo

The simple echo is probably the most surprising source of portability troubles. It is not possible to use `echo' portably unless both options and escape sequences are omitted. Don't expect any option.

Do not use backslashes in the arguments, as there is no consensus on their handling. For `echo '\n' | wc -l', the sh of Solaris outputs 2, but Bash and Zsh (in sh emulation mode) output 1. The problem is truly echo: all the shells understand `'\n'' as the string composed of a backslash and an `n'. Within a command substitution, `echo 'string\c'' will mess up the internal state of ksh88 on AIX 6.1 so that it will print the first character `s' only, followed by a newline, and then entirely drop the output of the next echo in a command substitution.

Because of these problems, do not pass a string containing arbitrary characters to echo. For example, `echo "$foo"' is safe only if you know that foo's value cannot contain backslashes and cannot start with `-'.

If this may not be true, printf is in general safer and easier to use than echo and echo -n. Thus, scripts where portability is not a major concern should use printf '%s\n' whenever echo could fail, and similarly use printf %s instead of echo -n. For portable shell scripts, instead, it is suggested to use a here-document like this:

 
cat <<EOF
$foo
EOF

Alternatively, M4sh provides AS_ECHO and AS_ECHO_N macros which choose between various portable implementations: `echo' or `print' where they work, printf if it is available, or else other creative tricks in order to work around the above problems.

eval

The eval command is useful in limited circumstances, e.g., using commands like `eval table_$key=\$value' and `eval value=table_$key' to simulate a hash table when the key is known to be alphanumeric.

You should also be wary of common bugs in eval implementations. In some shell implementations (e.g., older ash, OpenBSD 3.8 sh, pdksh v5.2.14 99/07/13.2, and zsh 4.2.5), the arguments of `eval' are evaluated in a context where `$?' is 0, so they exhibit behavior like this:

 
$ false; eval 'echo $?'
0

The correct behavior here is to output a nonzero value, but portable scripts should not rely on this.

You should not rely on LINENO within eval. See section Special Shell Variables.

Note that, even though these bugs are easily avoided, eval is tricky to use on arbitrary arguments. It is obviously unwise to use `eval $cmd' if the string value of `cmd' was derived from an untrustworthy source. But even if the string value is valid, `eval $cmd' might not work as intended, since it causes field splitting and file name expansion to occur twice, once for the eval and once for the command itself. It is therefore safer to use `eval "$cmd"'. For example, if cmd has the value `cat test?.c', `eval $cmd' might expand to the equivalent of `cat test;.c' if there happens to be a file named `test;.c' in the current directory; and this in turn mistakenly attempts to invoke cat on the file `test' and then execute the command .c. To avoid this problem, use `eval "$cmd"' rather than `eval $cmd'.

However, suppose that you want to output the text of the evaluated command just before executing it. Assuming the previous example, `echo "Executing: $cmd"' outputs `Executing: cat test?.c', but this output doesn't show the user that `test;.c' is the actual name of the copied file. Conversely, `eval "echo Executing: $cmd"' works on this example, but it fails with `cmd='cat foo >bar'', since it mistakenly replaces the contents of `bar' by the string `cat foo'. No simple, general, and portable solution to this problem is known.

exec

Posix describes several categories of shell built-ins. Special built-ins (such as exit) must impact the environment of the current shell, and need not be available through exec. All other built-ins are regular, and must not propagate variable assignments to the environment of the current shell. However, the group of regular built-ins is further distinguished by commands that do not require a PATH search (such as cd), in contrast to built-ins that are offered as a more efficient version of something that must still be found in a PATH search (such as echo). Posix is not clear on whether exec must work with the list of 17 utilities that are invoked without a PATH search, and many platforms lack an executable for some of those built-ins:

 
$ sh -c 'exec cd /tmp'
sh: line 0: exec: cd: not found

All other built-ins that provide utilities specified by Posix must have a counterpart executable that exists on PATH, although Posix allows exec to use the built-in instead of the executable. For example, contrast bash 3.2 and pdksh 5.2.14:

 
$ bash -c 'pwd --version' | head -n1
bash: line 0: pwd: --: invalid option
pwd: usage: pwd [-LP]
$ bash -c 'exec pwd --version' | head -n1
pwd (GNU coreutils) 6.10
$ pdksh -c 'exec pwd --version' | head -n1
pdksh: pwd: --: unknown option

When it is desired to avoid a regular shell built-in, the workaround is to use some other forwarding command, such as env or nice, that will ensure a path search:

 
$ pdksh -c 'exec true --version' | head -n1
$ pdksh -c 'nice true --version' | head -n1
true (GNU coreutils) 6.10
$ pdksh -c 'env true --version' | head -n1
true (GNU coreutils) 6.10
exit

The default value of exit is supposed to be $?; unfortunately, some shells, such as the DJGPP port of Bash 2.04, just perform `exit 0'.

 
bash-2.04$ foo=`exit 1` || echo fail
fail
bash-2.04$ foo=`(exit 1)` || echo fail
fail
bash-2.04$ foo=`(exit 1); exit` || echo fail
bash-2.04$

Using `exit $?' restores the expected behavior.

Some shell scripts, such as those generated by autoconf, use a trap to clean up before exiting. If the last shell command exited with nonzero status, the trap also exits with nonzero status so that the invoker can tell that an error occurred.

Unfortunately, in some shells, such as Solaris /bin/sh, an exit trap ignores the exit command's argument. In these shells, a trap cannot determine whether it was invoked by plain exit or by exit 1. Instead of calling exit directly, use the AC_MSG_ERROR macro that has a workaround for this problem.

export

The builtin export dubs a shell variable environment variable. Each update of exported variables corresponds to an update of the environment variables. Conversely, each environment variable received by the shell when it is launched should be imported as a shell variable marked as exported.

Alas, many shells, such as Solaris /bin/sh, IRIX 6.3, IRIX 5.2, AIX 4.1.5, and Digital Unix 4.0, forget to export the environment variables they receive. As a result, two variables coexist: the environment variable and the shell variable. The following code demonstrates this failure:

 
#!/bin/sh
echo $FOO
FOO=bar
echo $FOO
exec /bin/sh $0

when run with `FOO=foo' in the environment, these shells print alternately `foo' and `bar', although they should print only `foo' and then a sequence of `bar's.

Therefore you should export again each environment variable that you update; the export can occur before or after the assignment.

Posix is not clear on whether the export of an undefined variable causes the variable to be defined with the value of an empty string, or merely marks any future definition of a variable by that name for export. Various shells behave differently in this regard:

 
$ sh -c 'export foo; env | grep foo'
$ ash -c 'export foo; env | grep foo'
foo=
false

Don't expect false to exit with status 1: in native Solaris `/bin/false' exits with status 255.

for

To loop over positional arguments, use:

 
for arg
do
  echo "$arg"
done

You may not leave the do on the same line as for, since some shells improperly grok:

 
for arg; do
  echo "$arg"
done

If you want to explicitly refer to the positional arguments, given the `$@' bug (see section Shell Substitutions), use:

 
for arg in ${1+"$@"}; do
  echo "$arg"
done

But keep in mind that Zsh, even in Bourne shell emulation mode, performs word splitting on `${1+"$@"}'; see Shell Substitutions, item `$@', for more.

if

Using `!' is not portable. Instead of:

 
if ! cmp -s file file.new; then
  mv file.new file
fi

use:

 
if cmp -s file file.new; then :; else
  mv file.new file
fi

Or, especially if the else branch is short, you can use ||. In M4sh, the AS_IF macro provides an easy way to write these kinds of conditionals:

 
AS_IF([cmp -s file file.new], [], [mv file.new file])

This is especially useful in other M4 macros, where the then and else branches might be macro arguments.

Some very old shells did not reset the exit status from an if with no else:

 
$ if (exit 42); then true; fi; echo $?
42

whereas a proper shell should have printed `0'. But this is no longer a portability problem; any shell that supports functions gets it correct. However, it explains why some makefiles have lengthy constructs:

 
if test -f "$file"; then
  install "$file" "$dest"
else
  :
fi
printf

A format string starting with a `-' can cause problems. Bash interprets it as an option and gives an error. And `--' to mark the end of options is not good in the NetBSD Almquist shell (e.g., 0.4.6) which takes that literally as the format string. Putting the `-' in a `%c' or `%s' is probably easiest:

 
printf %s -foo

Bash 2.03 mishandles an escape sequence that happens to evaluate to `%':

 
$ printf '\045'
bash: printf: `%': missing format character

Large outputs may cause trouble. On Solaris 2.5.1 through 10, for example, `/usr/bin/printf' is buggy, so when using /bin/sh the command `printf %010000x 123' normally dumps core.

Since printf is not always a shell builtin, there is a potential speed penalty for using printf '%s\n' as a replacement for an echo that does not interpret `\' or leading `-'. With Solaris ksh, it is possible to use print -r -- for this role instead.

For a discussion of portable alternatives to both printf and echo, See Limitations of Shell Builtins.

pwd

With modern shells, plain pwd outputs a "logical" directory name, some of whose components may be symbolic links. These directory names are in contrast to "physical" directory names, whose components are all directories.

Posix 1003.1-2001 requires that pwd must support the `-L' ("logical") and `-P' ("physical") options, with `-L' being the default. However, traditional shells do not support these options, and their pwd command has the `-P' behavior.

Portable scripts should assume neither option is supported, and should assume neither behavior is the default. Also, on many hosts `/bin/pwd' is equivalent to `pwd -P', but Posix does not require this behavior and portable scripts should not rely on it.

Typically it's best to use plain pwd. On modern hosts this outputs logical directory names, which have the following advantages:

Also please see the discussion of the cd command.

read

No options are portable, not even support `-r' (Solaris /bin/sh for example).

set

With the FreeBSD 6.0 shell, the set command (without any options) does not sort its output.

The set builtin faces the usual problem with arguments starting with a dash. Modern shells such as Bash or Zsh understand `--' to specify the end of the options (any argument after `--' is a parameter, even `-x' for instance), but many traditional shells (e.g., Solaris 10 /bin/sh) simply stop option processing as soon as a non-option argument is found. Therefore, use `dummy' or simply `x' to end the option processing, and use shift to pop it out:

 
set x $my_list; shift

Avoid `set -', e.g., `set - $my_list'. Posix no longer requires support for this command, and in traditional shells `set - $my_list' resets the `-v' and `-x' options, which makes scripts harder to debug.

Some nonstandard shells do not recognize more than one option (e.g., `set -e -x' assigns `-x' to the command line). It is better to combine them:

 
set -ex

The option `-e' has historically been underspecified, with enough ambiguities to cause numerous differences across various shell implementations. Perhaps the best reference is this link, recommending a change to Posix 2008 to match ksh88 behavior. Note that mixing set -e and shell functions is asking for surprises:

 
set -e
doit()
{
  rm file
  echo one
}
doit || echo two

According to the recommendation, `one' should always be output regardless of whether the rm failed, because it occurs within the body of the shell function `doit' invoked on the left side of `||', where the effects of `set -e' are not enforced. Likewise, `two' should never be printed, since the failure of rm does not abort the function, such that the status of `doit' is 0.

The BSD shell has had several problems with the `-e' option. Older versions of the BSD shell (circa 1990) mishandled `&&', `||', `if', and `case' when `-e' was in effect, causing the shell to exit unexpectedly in some cases. This was particularly a problem with makefiles, and led to circumlocutions like `sh -c 'test -f file || touch file'', where the seemingly-unnecessary `sh -c '…'' wrapper works around the bug (see section Failure in Make Rules).

Even relatively-recent versions of the BSD shell (e.g., OpenBSD 3.4) wrongly exit with `-e' if a command within `&&' fails inside a compound statement. For example:

 
#! /bin/sh
set -e
foo=''
test -n "$foo" && exit 1
echo one
if :; then
  test -n "$foo" && exit 1
fi
echo two

does not print `two'. One workaround is to use `if test -n "$foo"; then exit 1; fi' rather than `test -n "$foo" && exit 1'. Another possibility is to warn BSD users not to use `sh -e'.

When `set -e' is in effect, a failed command substitution in Solaris /bin/sh cannot be ignored, even with `||'.

 
$ /bin/sh -c 'set -d; foo=`false` || echo foo; echo bar'
$ bash -c 'set -d; foo=`false` || echo foo; echo bar'
foo
bar

Portable scripts should not use `set -e' if trap is used to install an exit handler. This is because Tru64/OSF 5.1 sh sometimes enters the trap handler with the exit status of the command prior to the one that triggered the errexit handler:

 
$ sh -ec 'trap '\''echo $?'\'' 0; false'
0
$ sh -c 'set -e; trap '\''echo $?'\'' 0; false'
1

Thus, when writing a script in M4sh, rather than trying to rely on `set -e', it is better to append `|| AS_EXIT' to any statement where it is desirable to abort on failure.

Job control is not provided by all shells, so the use of `set -m' or `set -b' must be done with care. When using zsh in native mode, asynchronous notification (`set -b') is enabled by default, and using `emulate sh' to switch to Posix mode does not clear this setting (although asynchronous notification has no impact unless job monitoring is also enabled). Also, zsh 4.3.10 and earlier have a bug where job control can be manipulated in interactive shells, but not in subshells or scripts. Furthermore, some shells, like pdksh, fail to treat subshells as interactive, even though the parent shell was.

 
$ echo $ZSH_VERSION
4.3.10
$ set -m; echo $?
0
$ zsh -c 'set -m; echo $?'
set: can't change option: -m
$ (set -m); echo $?
set: can't change option: -m
1
$ pdksh -ci 'echo $-; (echo $-)'
cim
c
shift

Not only is shifting a bad idea when there is nothing left to shift, but in addition it is not portable: the shell of MIPS RISC/OS 4.52 refuses to do it.

Don't use `shift 2' etc.; while it in the SVR1 shell (1983), it is also absent in many pre-Posix shells.

source

This command is not portable, as Posix does not require it; use . instead.

test

The test program is the way to perform many file and string tests. It is often invoked by the alternate name `[', but using that name in Autoconf code is asking for trouble since it is an M4 quote character.

The `-a', `-o', `(', and `)' operands are not present in all implementations, and have been marked obsolete by Posix 2008. This is because there are inherent ambiguities in using them. For example, `test "$1" -a "$2"' looks like a binary operator to check whether two strings are both non-empty, but if `$1' is the literal `!', then some implementations of test treat it as a negation of the unary operator `-a'.

Thus, portable uses of test should never have more than four arguments, and scripts should use shell constructs like `&&' and `||' instead. If you combine `&&' and `||' in the same statement, keep in mind that they have equal precedence, so it is often better to parenthesize even when this is redundant. For example:

 
# Not portable:
test "X$a" = "X$b" -a \
  '(' "X$c" != "X$d" -o "X$e" = "X$f" ')'

# Portable:
test "X$a" = "X$b" &&
  { test "X$c" != "X$d" || test "X$e" = "X$f"; }

test does not process options like most other commands do; for example, it does not recognize the `--' argument as marking the end of options.

It is safe to use `!' as a test operator. For example, `if test ! -d foo; …' is portable even though `if ! test -d foo; …' is not.

test (files)

To enable configure scripts to support cross-compilation, they shouldn't do anything that tests features of the build system instead of the host system. But occasionally you may find it necessary to check whether some arbitrary file exists. To do so, use `test -f' or `test -r'. Do not use `test -x', because 4.3BSD does not have it. Do not use `test -e' either, because Solaris /bin/sh lacks it. To test for symbolic links on systems that have them, use `test -h' rather than `test -L'; either form conforms to Posix 1003.1-2001, but older shells like Solaris 8 /bin/sh support only `-h'.

test (strings)

Posix says that `test "string"' succeeds if string is not null, but this usage is not portable to traditional platforms like Solaris 10 /bin/sh, which mishandle strings like `!' and `-n'.

Posix also says that `test ! "string"', `test -n "string"' and `test -z "string"' work with any string, but many shells (such as Solaris, AIX 3.2, UNICOS 10.0.0.6, Digital Unix 4, etc.) get confused if string looks like an operator:

 
$ test -n =
test: argument expected
$ test ! -n
test: argument expected

Similarly, Posix says that both `test "string1" = "string2"' and `test "string1" != "string2"' work for any pairs of strings, but in practice this is not true for troublesome strings that look like operators or parentheses, or that begin with `-'.

It is best to protect such strings with a leading `X', e.g., `test "Xstring" != X' rather than `test -n "string"' or `test ! "string"'.

It is common to find variations of the following idiom:

 
test -n "`echo $ac_feature | sed 's/[-a-zA-Z0-9_]//g'`" &&
  action

to take an action when a token matches a given pattern. Such constructs should be avoided by using:

 
case $ac_feature in
  *[!-a-zA-Z0-9_]*) action;;
esac

If the pattern is a complicated regular expression that cannot be expressed as a shell pattern, use something like this instead:

 
expr "X$ac_feature" : 'X.*[^-a-zA-Z0-9_]' >/dev/null &&
  action

`expr "Xfoo" : "Xbar"' is more robust than `echo "Xfoo" | grep "^Xbar"', because it avoids problems when `foo' contains backslashes.

trap

It is safe to trap at least the signals 1, 2, 13, and 15. You can also trap 0, i.e., have the trap run when the script ends (either via an explicit exit, or the end of the script). The trap for 0 should be installed outside of a shell function, or AIX 5.3 /bin/sh will invoke the trap at the end of this function.

Posix says that `trap - 1 2 13 15' resets the traps for the specified signals to their default values, but many common shells (e.g., Solaris /bin/sh) misinterpret this and attempt to execute a "command" named - when the specified conditions arise. Posix 2008 also added a requirement to support `trap 1 2 13 15' to reset traps, as this is supported by a larger set of shells, but there are still shells like dash that mistakenly try to execute 1 instead of resetting the traps. Therefore, there is no portable workaround, except for `trap - 0', for which `trap '' 0' is a portable substitute.

Although Posix is not absolutely clear on this point, it is widely admitted that when entering the trap `$?' should be set to the exit status of the last command run before the trap. The ambiguity can be summarized as: "when the trap is launched by an exit, what is the last command run: that before exit, or exit itself?"

Bash considers exit to be the last command, while Zsh and Solaris /bin/sh consider that when the trap is run it is still in the exit, hence it is the previous exit status that the trap receives:

 
$ cat trap.sh
trap 'echo $?' 0
(exit 42); exit 0
$ zsh trap.sh
42
$ bash trap.sh
0

The portable solution is then simple: when you want to `exit 42', run `(exit 42); exit 42', the first exit being used to set the exit status to 42 for Zsh, and the second to trigger the trap and pass 42 as exit status for Bash. In M4sh, this is covered by using AS_EXIT.

The shell in FreeBSD 4.0 has the following bug: `$?' is reset to 0 by empty lines if the code is inside trap.

 
$ trap 'false

echo $?' 0
$ exit
0

Fortunately, this bug only affects trap.

Several shells fail to execute an exit trap that is defined inside a subshell, when the last command of that subshell is not a builtin. A workaround is to use `exit $?' as the shell builtin.

 
$ bash -c '(trap "echo hi" 0; /bin/true)'
hi
$ /bin/sh -c '(trap "echo hi" 0; /bin/true)'
$ /bin/sh -c '(trap "echo hi" 0; /bin/true; exit $?)'
hi

Likewise, older implementations of bash failed to preserve `$?' across an exit trap consisting of a single cleanup command.

 
$ bash -c 'trap "/bin/true" 0; exit 2'; echo $?
2
$ bash-2.05b -c 'trap "/bin/true" 0; exit 2'; echo $?
0
$ bash-2.05b -c 'trap ":; /bin/true" 0; exit 2'; echo $?
2
true

Don't worry: as far as we know true is portable. Nevertheless, it's not always a builtin (e.g., Bash 1.x), and the portable shell community tends to prefer using :. This has a funny side effect: when asked whether false is more portable than true Alexandre Oliva answered:

In a sense, yes, because if it doesn't exist, the shell will produce an exit status of failure, which is correct for false, but not for true.

unset

In some nonconforming shells (e.g., Bash 2.05a), unset FOO fails when FOO is not set. You can use

 
FOO=; unset FOO

if you are not sure that FOO is set.

A few ancient shells lack unset entirely. For some variables such as PS1, you can use a neutralizing value instead:

 
PS1='$ '

Usually, shells that do not support unset need less effort to make the environment sane, so for example is not a problem if you cannot unset CDPATH on those shells. However, Bash 2.01 mishandles unset MAIL in some cases and dumps core. So, you should do something like

 
( (unset MAIL) || exit 1) >/dev/null 2>&1 && unset MAIL || :

See section Special Shell Variables, for some neutralizing values. Also, see Limitations of Builtins, for the case of environment variables.

wait

The exit status of wait is not always reliable.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.13 Limitations of Usual Tools

The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.

awk

Don't leave white space before the opening parenthesis in a user function call. Posix does not allow this and GNU Awk rejects it:

 
$ gawk 'function die () { print "Aaaaarg!"  }
        BEGIN { die () }'
gawk: cmd. line:2:         BEGIN { die () }
gawk: cmd. line:2:                      ^ parse error
$ gawk 'function die () { print "Aaaaarg!"  }
        BEGIN { die() }'
Aaaaarg!

Posix says that if a program contains only `BEGIN' actions, and contains no instances of getline, then the program merely executes the actions without reading input. However, traditional Awk implementations (such as Solaris 10 awk) read and discard input in this case. Portable scripts can redirect input from `/dev/null' to work around the problem. For example:

 
awk 'BEGIN {print "hello world"}' </dev/null

Posix says that in an `END' action, `$NF' (and presumably, `$1') retain their value from the last record read, if no intervening `getline' occurred. However, some implementations (such as Solaris 10 `/usr/bin/awk', `nawk', or Darwin `awk') reset these variables. A workaround is to use an intermediate variable prior to the `END' block. For example:

 
$ cat end.awk
{ tmp = $1 }
END { print "a", $1, $NF, "b", tmp }
$ echo 1 | awk -f end.awk
a   b 1
$ echo 1 | gawk -f end.awk
a 1 1 b 1

If you want your program to be deterministic, don't depend on for on arrays:

 
$ cat for.awk
END {
  arr["foo"] = 1
  arr["bar"] = 1
  for (i in arr)
    print i
}
$ gawk -f for.awk </dev/null
foo
bar
$ nawk -f for.awk </dev/null
bar
foo

Some Awk implementations, such as HP-UX 11.0's native one, mishandle anchors:

 
$ echo xfoo | $AWK '/foo|^bar/ { print }'
$ echo bar | $AWK '/foo|^bar/ { print }'
bar
$ echo xfoo | $AWK '/^bar|foo/ { print }'
xfoo
$ echo bar | $AWK '/^bar|foo/ { print }'
bar

Either do not depend on such patterns (i.e., use `/^(.*foo|bar)/', or use a simple test to reject such implementations.

On `ia64-hp-hpux11.23', Awk mishandles printf conversions after %u:

 
$ awk 'BEGIN { printf "%u %d\n", 0, -1 }'
0 0

AIX version 5.2 has an arbitrary limit of 399 on the length of regular expressions and literal strings in an Awk program.

Traditional Awk implementations derived from Unix version 7, such as Solaris /bin/awk, have many limitations and do not conform to Posix. Nowadays AC_PROG_AWK (see section Particular Program Checks) finds you an Awk that doesn't have these problems, but if for some reason you prefer not to use AC_PROG_AWK you may need to address them.

Traditional Awk does not support multidimensional arrays or user-defined functions.

Traditional Awk does not support the `-v' option. You can use assignments after the program instead, e.g., $AWK '{print v $1}' v=x; however, don't forget that such assignments are not evaluated until they are encountered (e.g., after any BEGIN action).

Traditional Awk does not support the keywords delete or do.

Traditional Awk does not support the expressions a?b:c, !a, a^b, or a^=b.

Traditional Awk does not support the predefined CONVFMT variable.

Traditional Awk supports only the predefined functions exp, index, int, length, log, split, sprintf, sqrt, and substr.

Traditional Awk getline is not at all compatible with Posix; avoid it.

Traditional Awk has for (i in a) … but no other uses of the in keyword. For example, it lacks if (i in a) ….

In code portable to both traditional and modern Awk, FS must be a string containing just one ordinary character, and similarly for the field-separator argument to split.

Traditional Awk has a limit of 99 fields in a record. Since some Awk implementations, like Tru64's, split the input even if you don't refer to any field in the script, to circumvent this problem, set `FS' to an unusual character and use split.

Traditional Awk has a limit of at most 99 bytes in a number formatted by OFMT; for example, OFMT="%.300e"; print 0.1; typically dumps core.

The original version of Awk had a limit of at most 99 bytes per split field, 99 bytes per substr substring, and 99 bytes per run of non-special characters in a printf format, but these bugs have been fixed on all practical hosts that we know of.

HP-UX 11.00 and IRIX 6.5 Awk require that input files have a line length of at most 3070 bytes.

basename

Not all hosts have a working basename. You can use expr instead.

cat

Don't rely on any option.

cc

The command `cc -c foo.c' traditionally produces an object file named `foo.o'. Most compilers allow `-c' to be combined with `-o' to specify a different object file name, but Posix does not require this combination and a few compilers lack support for it. See section C Compiler Characteristics, for how GNU Make tests for this feature with AC_PROG_CC_C_O.

When a compilation such as `cc -o foo foo.c' fails, some compilers (such as CDS on Reliant Unix) leave a `foo.o'.

HP-UX cc doesn't accept `.S' files to preprocess and assemble. `cc -c foo.S' appears to succeed, but in fact does nothing.

The default executable, produced by `cc foo.c', can be

The C compiler's traditional name is cc, but other names like gcc are common. Posix 1003.1-2001 specifies the name c99, but older Posix editions specified c89 and anyway these standard names are rarely used in practice. Typically the C compiler is invoked from makefiles that use `$(CC)', so the value of the `CC' make variable selects the compiler name.

chgrp
chown

It is not portable to change a file's group to a group that the owner does not belong to.

chmod

Avoid usages like `chmod -w file'; use `chmod a-w file' instead, for two reasons. First, plain `-w' does not necessarily make the file unwritable, since it does not affect mode bits that correspond to bits in the file mode creation mask. Second, Posix says that the `-w' might be interpreted as an implementation-specific option, not as a mode; Posix suggests using `chmod -- -w file' to avoid this confusion, but unfortunately `--' does not work on some older hosts.

cmp

cmp performs a raw data comparison of two files, while diff compares two text files. Therefore, if you might compare DOS files, even if only checking whether two files are different, use diff to avoid spurious differences due to differences of newline encoding.

cp

Avoid the `-r' option, since Posix 1003.1-2004 marks it as obsolescent and its behavior on special files is implementation-defined. Use `-R' instead. On GNU hosts the two options are equivalent, but on Solaris hosts (for example) cp -r reads from pipes instead of replicating them.

Some cp implementations (e.g., BSD/OS 4.2) do not allow trailing slashes at the end of nonexistent destination directories. To avoid this problem, omit the trailing slashes. For example, use `cp -R source /tmp/newdir' rather than `cp -R source /tmp/newdir/' if `/tmp/newdir' does not exist.

The ancient SunOS 4 cp does not support `-f', although its mv does.

Traditionally, file timestamps had 1-second resolution, and `cp -p' copied the timestamps exactly. However, many modern file systems have timestamps with 1-nanosecond resolution. Unfortunately, `cp -p' implementations truncate timestamps when copying files, so this can result in the destination file appearing to be older than the source. The exact amount of truncation depends on the resolution of the system calls that cp uses; traditionally this was utime, which has 1-second resolution, but some newer cp implementations use utimes, which has 1-microsecond resolution. These newer implementations include GNU Core Utilities 5.0.91 or later, and Solaris 8 (sparc) patch 109933-02 or later. Unfortunately as of January 2006 there is still no system call to set timestamps to the full nanosecond resolution.

Bob Proulx notes that `cp -p' always tries to copy ownerships. But whether it actually does copy ownerships or not is a system dependent policy decision implemented by the kernel. If the kernel allows it then it happens. If the kernel does not allow it then it does not happen. It is not something cp itself has control over.

In Unix System V any user can chown files to any other user, and System V also has a non-sticky `/tmp'. That probably derives from the heritage of System V in a business environment without hostile users. BSD changed this to be a more secure model where only root can chown files and a sticky `/tmp' is used. That undoubtedly derives from the heritage of BSD in a campus environment.

GNU/Linux and Solaris by default follow BSD, but can be configured to allow a System V style chown. On the other hand, HP-UX follows System V, but can be configured to use the modern security model and disallow chown. Since it is an administrator-configurable parameter you can't use the name of the kernel as an indicator of the behavior.

date

Some versions of date do not recognize special `%' directives, and unfortunately, instead of complaining, they just pass them through, and exit with success:

 
$ uname -a
OSF1 medusa.sis.pasteur.fr V5.1 732 alpha
$ date "+%s"
%s
diff

Option `-u' is nonportable.

Some implementations, such as Tru64's, fail when comparing to `/dev/null'. Use an empty file instead.

dirname

Not all hosts have a working dirname, and you should instead use AS_DIRNAME (see section Programming in M4sh). For example:

 
dir=`dirname "$file"`       # This is not portable.
dir=`AS_DIRNAME(["$file"])` # This is more portable.
egrep

Posix 1003.1-2001 no longer requires egrep, but many hosts do not yet support the Posix replacement grep -E. Also, some traditional implementations do not work on long input lines. To work around these problems, invoke AC_PROG_EGREP and then use $EGREP.

Portable extended regular expressions should use `\' only to escape characters in the string `$()*+.?[\^{|'. For example, `\}' is not portable, even though it typically matches `}'.

The empty alternative is not portable. Use `?' instead. For instance with Digital Unix v5.0:

 
> printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
|foo
> printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
bar|
> printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
foo
|bar

$EGREP also suffers the limitations of grep (see Limitations of Usual Tools).

expr

Not all implementations obey the Posix rule that `--' separates options from arguments; likewise, not all implementations provide the extension to Posix that the first argument can be treated as part of a valid expression rather than an invalid option if it begins with `-'. When performing arithmetic, use `expr 0 + $var' if `$var' might be a negative number, to keep expr from interpreting it as an option.

No expr keyword starts with `X', so use `expr X"word" : 'Xregex'' to keep expr from misinterpreting word.

Don't use length, substr, match and index.

expr (`|')

You can use `|'. Although Posix does require that `expr ''' return the empty string, it does not specify the result when you `|' together the empty string (or zero) with the empty string. For example:

 
expr '' \| ''

Posix 1003.2-1992 returns the empty string for this case, but traditional Unix returns `0' (Solaris is one such example). In Posix 1003.1-2001, the specification was changed to match traditional Unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:

 
expr bar : foo \| foo : bar

Avoid this portability problem by avoiding the empty string.

expr (`:')

Portable expr regular expressions should use `\' to escape only characters in the string `$()*.0123456789[\^n{}'. For example, alternation, `\|', is common but Posix does not require its support, so it should be avoided in portable scripts. Similarly, `\+' and `\?' should be avoided.

Portable expr regular expressions should not begin with `^'. Patterns are automatically anchored so leading `^' is not needed anyway.

On the other hand, the behavior of the `$' anchor is not portable on multi-line strings. Posix is ambiguous whether the anchor applies to each line, as was done in older versions of GNU Coreutils, or whether it applies only to the end of the overall string, as in Coreutils 6.0 and most other implementations.

 
$ baz='foo
> bar'
$ expr "X$baz" : 'X\(foo\)$'

$ expr-5.97 "X$baz" : 'X\(foo\)$'
foo

The Posix standard is ambiguous as to whether `expr 'a' : '\(b\)'' outputs `0' or the empty string. In practice, it outputs the empty string on most platforms, but portable scripts should not assume this. For instance, the QNX 4.25 native expr returns `0'.

One might think that a way to get a uniform behavior would be to use the empty string as a default value:

 
expr a : '\(b\)' \| ''

Unfortunately this behaves exactly as the original expression; see the expr (`|') entry for more information.

Some ancient expr implementations (e.g., SunOS 4 expr and Solaris 8 /usr/ucb/expr) have a silly length limit that causes expr to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on `echo|sed' if expr fails. Nowadays this is of practical importance only for the rare installer who mistakenly puts `/usr/ucb' before `/usr/bin' in PATH.

On Mac OS X 10.4, expr mishandles the pattern `[^-]' in some cases. For example, the command

 
expr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'

outputs `apple-darwin8.1.0' rather than the correct `darwin8.1.0'. This particular case can be worked around by substituting `[^--]' for `[^-]'.

Don't leave, there is some more!

The QNX 4.25 expr, in addition of preferring `0' to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!

 
$ val=`expr 'a' : 'a'`; echo "$?: $val"
0: 1
$ val=`expr 'a' : 'b'`; echo "$?: $val"
1: 0

$ val=`expr 'a' : '\(a\)'`; echo "?: $val"
1: a
$ val=`expr 'a' : '\(b\)'`; echo "?: $val"
1: 0

In practice this can be a big problem if you are ready to catch failures of expr programs with some other method (such as using sed), since you may get twice the result. For instance

 
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'

outputs `a' on most hosts, but `aa' on QNX 4.25. A simple workaround consists of testing expr and using a variable set to expr or to false according to the result.

Tru64 expr incorrectly treats the result as a number, if it can be interpreted that way:

 
$ expr 00001 : '.*\(...\)'
1

On HP-UX 11, expr only supports a single sub-expression.

 
$ expr 'Xfoo' : 'X\(f\(oo\)*\)$'
expr: More than one '\(' was used.
fgrep

Posix 1003.1-2001 no longer requires fgrep, but many hosts do not yet support the Posix replacement grep -F. Also, some traditional implementations do not work on long input lines. To work around these problems, invoke AC_PROG_FGREP and then use $FGREP.

find

The option `-maxdepth' seems to be GNU specific. Tru64 v5.1, NetBSD 1.5 and Solaris find commands do not understand it.

The replacement of `{}' is guaranteed only if the argument is exactly {}, not if it's only a part of an argument. For instance on DU, and HP-UX 10.20 and HP-UX 11:

 
$ touch foo
$ find . -name foo -exec echo "{}-{}" \;
{}-{}

while GNU find reports `./foo-./foo'.

grep

Portable scripts can rely on the grep options `-c', `-l', `-n', and `-v', but should avoid other options. For example, don't use `-w', as Posix does not require it and Irix 6.5.16m's grep does not support it. Also, portable scripts should not combine `-c' with `-l', as Posix does not allow this.

Some of the options required by Posix are not portable in practice. Don't use `grep -q' to suppress output, because many grep implementations (e.g., Solaris) do not support `-q'. Don't use `grep -s' to suppress output either, because Posix says `-s' does not suppress output, only some error messages; also, the `-s' option of traditional grep behaved like `-q' does in most modern implementations. Instead, redirect the standard output and standard error (in case the file doesn't exist) of grep to `/dev/null'. Check the exit status of grep to determine whether it found a match.

Some traditional grep implementations do not work on long input lines. On AIX the default grep silently truncates long lines on the input before matching.

Also, many implementations do not support multiple regexps with `-e': they either reject `-e' entirely (e.g., Solaris) or honor only the last pattern (e.g., IRIX 6.5 and NeXT). To work around these problems, invoke AC_PROG_GREP and then use $GREP.

Another possible workaround for the multiple `-e' problem is to separate the patterns by newlines, for example:

 
grep 'foo
bar' in.txt

except that this fails with traditional grep implementations and with OpenBSD 3.8 grep.

Traditional grep implementations (e.g., Solaris) do not support the `-E' or `-F' options. To work around these problems, invoke AC_PROG_EGREP and then use $EGREP, and similarly for AC_PROG_FGREP and $FGREP. Even if you are willing to require support for Posix grep, your script should not use both `-E' and `-F', since Posix does not allow this combination.

Portable grep regular expressions should use `\' only to escape characters in the string `$()*.0123456789[\^{}'. For example, alternation, `\|', is common but Posix does not require its support in basic regular expressions, so it should be avoided in portable scripts. Solaris and HP-UX grep do not support it. Similarly, the following escape sequences should also be avoided: `\<', `\>', `\+', `\?', `\`', `\'', `\B', `\b', `\S', `\s', `\W', and `\w'.

Posix does not specify the behavior of grep on binary files. An example where this matters is using BSD grep to search text that includes embedded ANSI escape sequences for colored output to terminals (`\033[m' is the sequence to restore normal output); the behavior depends on whether input is seekable:

 
$ printf 'esc\033[mape\n' > sample
$ grep . sample
Binary file sample matches
$ cat sample | grep .
escape
join

Solaris 8 join has bugs when the second operand is standard input, and when standard input is a pipe. For example, the following shell script causes Solaris 8 join to loop forever:

 
cat >file <<'EOF'
1 x
2 y
EOF
cat file | join file -

Use `join - file' instead.

ln

Don't rely on ln having a `-f' option. Symbolic links are not available on old systems; use `$(LN_S)' as a portable substitute.

For versions of the DJGPP before 2.04, ln emulates symbolic links to executables by generating a stub that in turn calls the real program. This feature also works with nonexistent files like in the Posix spec. So `ln -s file link' generates `link.exe', which attempts to call `file.exe' if run. But this feature only works for executables, so `cp -p' is used instead for these systems. DJGPP versions 2.04 and later have full support for symbolic links.

ls

The portable options are `-acdilrtu'. Current practice is for `-l' to output both owner and group, even though ancient versions of ls omitted the group.

On ancient hosts, `ls foo' sent the diagnostic `foo not found' to standard output if `foo' did not exist. Hence a shell command like `sources=`ls *.c 2>/dev/null`' did not always work, since it was equivalent to `sources='*.c not found'' in the absence of `.c' files. This is no longer a practical problem, since current ls implementations send diagnostics to standard error.

The behavior of ls on a directory that is being concurrently modified is not always predictable, because of a data race where cached information returned by readdir does not match the current directory state. In fact, MacOS 10.5 has an intermittent bug where readdir, and thus ls, sometimes lists a file more than once if other files were added or removed from the directory immediately prior to the ls call. Since ls already sorts its output, the duplicate entries can be avoided by piping the results through uniq.

mkdir

No mkdir option is portable to older systems. Instead of `mkdir -p file-name', you should use AS_MKDIR_P(file-name) (see section Programming in M4sh) or AC_PROG_MKDIR_P (see section Particular Program Checks).

Combining the `-m' and `-p' options, as in `mkdir -m go-w -p dir', often leads to trouble. FreeBSD mkdir incorrectly attempts to change the permissions of dir even if it already exists. HP-UX 11.23 and IRIX 6.5 mkdir often assign the wrong permissions to any newly-created parents of dir.

Posix does not clearly specify whether `mkdir -p foo' should succeed when `foo' is a symbolic link to an already-existing directory. The GNU Core Utilities 5.1.0 mkdir succeeds, but Solaris mkdir fails.

Traditional mkdir -p implementations suffer from race conditions. For example, if you invoke mkdir -p a/b and mkdir -p a/c at the same time, both processes might detect that `a' is missing, one might create `a', then the other might try to create `a' and fail with a File exists diagnostic. The GNU Core Utilities (`fileutils' version 4.1), FreeBSD 5.0, NetBSD 2.0.2, and OpenBSD 2.4 are known to be race-free when two processes invoke mkdir -p simultaneously, but earlier versions are vulnerable. Solaris mkdir is still vulnerable as of Solaris 10, and other traditional Unix systems are probably vulnerable too. This possible race is harmful in parallel builds when several Make rules call mkdir -p to construct directories. You may use install-sh -d as a safe replacement, provided this script is recent enough; the copy shipped with Autoconf 2.60 and Automake 1.10 is OK, but copies from older versions are vulnerable.

mkfifo
mknod

The GNU Coding Standards state that mknod is safe to use on platforms where it has been tested to exist; but it is generally portable only for creating named FIFOs, since device numbers are platform-specific. Autotest uses mkfifo to implement parallel testsuites. Posix states that behavior is unspecified when opening a named FIFO for both reading and writing; on at least Cygwin, this results in failure on any attempt to read or write to that file descriptor.

mktemp

Shell scripts can use temporary files safely with mktemp, but it does not exist on all systems. A portable way to create a safe temporary file name is to create a temporary directory with mode 700 and use a file inside this directory. Both methods prevent attackers from gaining control, though mktemp is far less likely to fail gratuitously under attack.

Here is sample code to create a new temporary directory safely:

 
# Create a temporary directory $tmp in $TMPDIR (default /tmp).
# Use mktemp if possible; otherwise fall back on mkdir,
# with $RANDOM to make collisions less likely.
: ${TMPDIR=/tmp}
{
  tmp=`
    (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null
  ` &&
  test -n "$tmp" && test -d "$tmp"
} || {
  tmp=$TMPDIR/foo$$-$RANDOM
  (umask 077 && mkdir "$tmp")
} || exit $?
mv

The only portable options are `-f' and `-i'.

Moving individual files between file systems is portable (it was in Unix version 6), but it is not always atomic: when doing `mv new existing', there's a critical section where neither the old nor the new version of `existing' actually exists.

On some systems moving files from `/tmp' can sometimes cause undesirable (but perfectly valid) warnings, even if you created these files. This is because `/tmp' belongs to a group that ordinary users are not members of, and files created in `/tmp' inherit the group of `/tmp'. When the file is copied, mv issues a diagnostic without failing:

 
$ touch /tmp/foo
$ mv /tmp/foo .
error-->mv: ./foo: set owner/group (was: 100/0): Operation not permitted
$ echo $?
0
$ ls foo
foo

This annoying behavior conforms to Posix, unfortunately.

Moving directories across mount points is not portable, use cp and rm.

DOS variants cannot rename or remove open files, and do not support commands like `mv foo bar >foo', even though this is perfectly portable among Posix hosts.

od

In Mac OS X 10.3, od does not support the standard Posix options `-A', `-j', `-N', or `-t', or the XSI option `-s'. The only supported Posix option is `-v', and the only supported XSI options are those in `-bcdox'. The BSD hexdump program can be used instead.

This problem no longer exists in Mac OS X 10.4.3.

rm

The `-f' and `-r' options are portable.

It is not portable to invoke rm without operands. For example, on many systems `rm -f -r' (with no other arguments) silently succeeds without doing anything, but it fails with a diagnostic on NetBSD 2.0.2.

A file might not be removed even if its parent directory is writable and searchable. Many Posix hosts cannot remove a mount point, a named stream, a working directory, or a last link to a file that is being executed.

DOS variants cannot rename or remove open files, and do not support commands like `rm foo >foo', even though this is perfectly portable among Posix hosts.

rmdir

Just as with rm, some platforms refuse to remove a working directory.

sed

Patterns should not include the separator (unless escaped), even as part of a character class. In conformance with Posix, the Cray sed rejects `s/[^/]*$//': use `s%[^/]*$%%'. Even when escaped, patterns should not include separators that are also used as sed metacharacters. For example, GNU sed 4.0.9 rejects `s,x\{1\,\},,', while sed 4.1 strips the backslash before the comma before evaluating the basic regular expression.

Avoid empty patterns within parentheses (i.e., `\(\)'). Posix does not require support for empty patterns, and Unicos 9 sed rejects them.

Unicos 9 sed loops endlessly on patterns like `.*\n.*'.

Sed scripts should not use branch labels longer than 7 characters and should not contain comments. HP-UX sed has a limit of 99 commands (not counting `:' commands) and 48 labels, which can not be circumvented by using more than one script file. It can execute up to 19 reads with the `r' command per cycle. Solaris /usr/ucb/sed rejects usages that exceed a limit of about 6000 bytes for the internal representation of commands.

Avoid redundant `;', as some sed implementations, such as NetBSD 1.4.2's, incorrectly try to interpret the second `;' as a command:

 
$ echo a | sed 's/x/x/;;s/x/x/'
sed: 1: "s/x/x/;;s/x/x/": invalid command code ;

Input should not have unreasonably long lines, since some sed implementations have an input buffer limited to 4000 bytes. Likewise, not all sed implementations can handle embedded NUL or a missing trailing newline.

Portable sed regular expressions should use `\' only to escape characters in the string `$()*.0123456789[\^n{}'. For example, alternation, `\|', is common but Posix does not require its support, so it should be avoided in portable scripts. Solaris sed does not support alternation; e.g., `sed '/a\|b/d'' deletes only lines that contain the literal string `a|b'. Similarly, `\+' and `\?' should be avoided.

Anchors (`^' and `$') inside groups are not portable.

Nested parentheses in patterns (e.g., `\(\(a*\)b*)\)') are quite portable to current hosts, but was not supported by some ancient sed implementations like SVR3.

Some sed implementations, e.g., Solaris, restrict the special role of the asterisk `*' to one-character regular expressions and back-references, and the special role of interval expressions `\{m\}', `\{m,\}', or `\{m,n\}' to one-character regular expressions. This may lead to unexpected behavior:

 
$ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g'
x2x4
$ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g'
x

The `-e' option is mostly portable. However, its argument cannot start with `a', `c', or `i', as this runs afoul of a Tru64 5.1 bug. Also, its argument cannot be empty, as this fails on AIX 5.3. Some people prefer to use `-e':

 
sed -e 'command-1' \
    -e 'command-2'

as opposed to the equivalent:

 
sed '
  command-1
  command-2
'

The following usage is sometimes equivalent:

 
sed 'command-1;command-2'

but Posix says that this use of a semicolon has undefined effect if command-1's verb is `{', `a', `b', `c', `i', `r', `t', `w', `:', or `#', so you should use semicolon only with simple scripts that do not use these verbs.

Commands inside { } brackets are further restricted. Posix says that they cannot be preceded by addresses, `!', or `;', and that each command must be followed immediately by a newline, without any intervening blanks or semicolons. The closing bracket must be alone on a line, other than white space preceding or following it.

Contrary to yet another urban legend, you may portably use `&' in the replacement part of the s command to mean "what was matched". All descendants of Unix version 7 sed (at least; we don't have first hand experience with older sed implementations) have supported it.

Posix requires that you must not have any white space between `!' and the following command. It is OK to have blanks between the address and the `!'. For instance, on Solaris:

 
$ echo "foo" | sed -n '/bar/ ! p'
error-->Unrecognized command: /bar/ ! p
$ echo "foo" | sed -n '/bar/! p'
error-->Unrecognized command: /bar/! p
$ echo "foo" | sed -n '/bar/ !p'
foo

Posix also says that you should not combine `!' and `;'. If you use `!', it is best to put it on a command that is delimited by newlines rather than `;'.

Also note that Posix requires that the `b', `t', `r', and `w' commands be followed by exactly one space before their argument. On the other hand, no white space is allowed between `:' and the subsequent label name.

If a sed script is specified on the command line and ends in an `a', `c', or `i' command, the last line of inserted text should be followed by a newline. Otherwise some sed implementations (e.g., OpenBSD 3.9) do not append a newline to the inserted text.

Many sed implementations (e.g., MacOS X 10.4, OpenBSD 3.9, Solaris 10 /usr/ucb/sed) strip leading white space from the text of `a', `c', and `i' commands. Prepend a backslash to work around this incompatibility with Posix:

 
$ echo flushleft | sed 'a\
>    indented
> '
flushleft
indented
$ echo foo | sed 'a\
> \   indented
> '
flushleft
   indented

Posix requires that with an empty regular expression, the last non-empty regular expression from either an address specification or substitution command is applied. However, busybox 1.6.1 complains when using a substitution command with a replacement containing a back-reference to an empty regular expression; the workaround is repeating the regular expression.

 
$ echo abc | busybox sed '/a\(b\)c/ s//\1/'
sed: No previous regexp.
$ echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'
b
sed (`t')

Some old systems have sed that "forget" to reset their `t' flag when starting a new cycle. For instance on MIPS RISC/OS, and on IRIX 5.3, if you run the following sed script (the line numbers are not actual part of the texts):

 
s/keep me/kept/g  # a
t end             # b
s/.*/deleted/g    # c
:end              # d

on

 
delete me         # 1
delete me         # 2
keep me           # 3
delete me         # 4

you get

 
deleted
delete me
kept
deleted

instead of

 
deleted
deleted
kept
deleted

Why? When processing line 1, (c) matches, therefore sets the `t' flag, and the output is produced. When processing line 2, the `t' flag is still set (this is the bug). Command (a) fails to match, but sed is not supposed to clear the `t' flag when a substitution fails. Command (b) sees that the flag is set, therefore it clears it, and jumps to (d), hence you get `delete me' instead of `deleted'. When processing line (3), `t' is clear, (a) matches, so the flag is set, hence (b) clears the flags and jumps. Finally, since the flag is clear, line 4 is processed properly.

There are two things one should remember about `t' in sed. Firstly, always remember that `t' jumps if some substitution succeeded, not only the immediately preceding substitution. Therefore, always use a fake `t clear' followed by a `:clear' on the next line, to reset the `t' flag where needed.

Secondly, you cannot rely on sed to clear the flag at each new cycle.

One portable implementation of the script above is:

 
t clear
:clear
s/keep me/kept/g
t end
s/.*/deleted/g
:end
sleep

Using sleep is generally portable. However, remember that adding a sleep to work around timestamp issues, with a minimum granularity of one second, doesn't scale well for parallel builds on modern machines with sub-second process completion.

sort

Remember that sort order is influenced by the current locale. Inside `configure', the C locale is in effect, but in Makefile snippets, you may need to specify LC_ALL=C sort.

tar

There are multiple file formats for tar; if you use Automake, the macro AM_INIT_AUTOMAKE has some options controlling which level of portability to use.

touch

If you specify the desired timestamp (e.g., with the `-r' option), touch typically uses the utime or utimes system call, which can result in the same kind of timestamp truncation problems that `cp -p' has.

On ancient BSD systems, touch or any command that results in an empty file does not update the timestamps, so use a command like echo as a workaround. Also, GNU touch 3.16r (and presumably all before that) fails to work on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume. However, these problems are no longer of practical concern.

tr

Not all versions of tr handle all backslash character escapes. For example, Solaris 10 /usr/ucb/tr falls over, even though Solaris contains more modern tr in other locations. Therefore, it is more portable to use octal escapes, even though this ties the result to ASCII, when using tr to delete newlines or carriage returns.

 
$ { echo moon; echo light; } | /usr/ucb/tr -d '\n' ; echo
moo
light
$ { echo moon; echo light; } | /usr/bin/tr -d '\n' ; echo
moonlight
$ { echo moon; echo light; } | /usr/ucb/tr -d '\012' ; echo
moonlight

Not all versions of tr recognize ranges of characters: at least Solaris /usr/bin/tr still fails to do so. But you can use /usr/xpg4/bin/tr instead.

 
$ echo "Hazy Fantazy" | LC_ALL=C /usr/bin/tr a-z A-Z
HAZy FAntAZy
$ echo "Hazy Fantazy" | LC_ALL=C /usr/xpg4/bin/tr a-z A-Z
HAZY FANTAZY

Posix requires tr to operate on binary files. But at least Solaris /usr/ucb/tr and /usr/bin/tr still fail to handle `\0' as the octal escape for NUL; these programs always discard all NUL bytes from the input. On Solaris, when using tr to process a binary file that may contain NUL bytes, it is necessary to use /usr/xpg4/bin/tr instead, or /usr/xpg6/bin/tr if that is available.

 
$ printf 'a\0b\n' | /usr/ucb/tr '\0' '~' | wc -c
3
$ printf 'a\0b\n' | /usr/xpg4/bin/tr '\0' '~' | wc -c
4
$ printf 'a\0b\n' | /usr/ucb/tr x x | wc -c
3
$ printf 'a\0b\n' | /usr/xpg4/bin/tr x x | wc -c
4

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on January, 20 2010 using texi2html 1.76.