Bash practices - Part 1: Input validation and local variables

Matthias Noback

March 6, 2017

Forgive me for my bad pun. As I mentioned in my previous Bash post I’m going to show you some ways in which you can improve the design of Bash scripts. Again, it’s a weird language, and a lot of what’s below probably won’t feel natural to you. Anyway, there we go.

I started out with a piece of code that looked like this:

BUILD_DIR="build"

function clean_up() {
    rm -r "$BUILD_DIR"
}

clean_up

Function arguments

Inside a function you can use all global and environment variables, which easily leads to smelly code like this: clean_up will behave differently based on what’s in the global variable BUILD_DIR. This makes the function itself quite unpredictable, but also error-prone, as the value of BUILD_DIR may at one point not contain the name of a directory, or even be an empty string. Usually we would fix this by providing the path to the directory we’d like to remove as an argument of the function call, like this:

function clean_up() {
    rm -r "$1"
}

clean_up "$BUILD_DIR"

You may recognize this $1 syntax from previous Bash encounters: variables $1...n are the arguments that you provide when you run the script at the command-line. Likewise, when calling a Bash function, $1...n represent the arguments that the caller provided (by the way, I really like this symmetry between calling functions and running programs).

Passing the directory as an argument (although not a named, nor a typed argument) is good practice. It makes the function reusable. And equally important: predictable. Its behavior won’t be influenced by changes in global variables.

Input validation

The only problem so far is that the clean_up function doesn’t perform any input validation at all. You can even call this function without any argument and you won’t even receive a warning for that…

In order to function correctly, the following pre-conditions need to be met:

Argument 1 needs to be provided.
It should represent the path to an existing directory.

We can easily accomplish this by adding a -d test. However, we can’t really throw an exception if the directory doesn’t exist. The best thing we can do is follow Unix conventions:

print an error message to stderr.
exit with a non-zero exit code.

That way, the process running our script knows that it encountered a problem. We print the error message to stderr to prevent other processes from automatically processing the output in case the script was part of longer chain of commands (e.g. command-a | command-b > output.txt).

function clean_up() {
    if [[ ! -d "$1" ]]; then
        echo "Argument 1 should be the path of an existing directory" 1>&2
        exit 1
    fi

    rm -r "$1"
}

Note that we echo and exit where we would normally like to throw an exception. echo prints to stdout (file descriptor 1), but we’d like to print to stderr (file descriptor 2). We accomplish this by redirecting the output from 1 to 2: 1>&2

This starts to look like a reasonable function. However, $1 is still a bad variable name. It doesn’t explain what it represents. We’d rather call it directory. We can easily do so of course:

directory="$1"

if [[ ! -d "$directory" ]]; then
    #...
fi

rm -r "$directory"

Local, named variables

That’s much better already! However, by default, variables have no scope. This means that once we set a variable inside a function, it will be available outside that function:

function clean_up() {
    directory="$1"

    # ...
}

clean_up "build"

echo "$directory"

Bash has a way to mark variables as “local to the current scope”, by adding the local keyword in front of the variable name, like this: local directory="$1". However, I recommend using declare, as it has many more options, even allowing some rudimentary typing. Let’s use the -r option in this case to mark the variable as read-only (PHP could also benefit from such an option by the way).

function clean_up() {
    declare -r directory="$1"

    #...
}

clean_up "build"

# This will show an empty string:
echo "$directory"

A nice debugging suggestion is to use declare -p to print whatever variables (including environment variables) have been declared at that point in the script.

For completeness sake, this is the full code of the final solution:

#!/usr/bin/env bash

function clean_up() {
    declare -r directory="$1"

    if [[ ! -d "$directory" ]]; then
        echo "Argument 1 should be the path of an existing directory" 1>&2
        exit 1
    fi

    rm -r "$directory"
}

clean_up "build"

Conclusion

In this article we’ve improved the design of the clean_up method. This is what you’d call a “command” method: it does something, it has side effects, and it may either succeed or fail, providing no particular return value. In another article I’ll show you a query function that needs fixing.

Bash