Let's Build a Command Line Shell in C

15th June 2017

Let's build a command line shell in C. It won't be as useful as any of the real shells that are out there so we'll call it Kash — the Kinda Aimless Shell.1 Hopefully we'll learn something along the way.

Who Needs Kash?

A shell is a program that acts as an interface to an operating system. Shells can be graphical like the Windows Desktop or command-line-based like Bash, the Bourne Again Shell that ships with most Unix systems.

Real life shells are messy and complicated but building a simple shell is a common assignment for computer science students and a worthwhile project for anyone who spends time staring at a command prompt. Taking a peek behind the curtain strips much of the mystery from the arcane spells and incantations of the command line. At the same time, it can leave us with a newfound appreciation for the work that went into building the (miraculously free) shells we use every day.

Petty Kash

Kash will be a ridiculously simple shell. It won't support scripting or quoting or globbing or piping or redirection or most of the other features we'd need in a real shell to be productive. It will support process execution and a handful of builtin commands.

You can find the finished code for Kash on Github. It includes some extra error checking that I've omitted below for clarity.

If you want to follow along step-by-step we'll be starting at the bottom of the kash.c file and working our way upwards function-by-function. I'll add #include statements as we need them and indicate when the code should compile.

Step 1 — The Loop

A shell is a command processor. Our shell needs to read in commands from the user, interpret them, and act on them — and it needs to keep on doing so, command after command after command.

Sounds like a job for an infinite loop.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    while (true) {
        printf("> ");
        char *line = kash_read_line();
        char **tokens = kash_split_line(line);

        if (tokens[0] != NULL) {
            kash_exec(tokens);
        }

        free(tokens);
        free(line);
    }
}

Inside this loop we:

This code won't compile yet as we're missing three function definitions. We'll add two of them in the next step.

Step 2 — Parsing Input

We use the getline() function from <stdio.h> to read a single line of input from stdin.

char* kash_read_line() {
    char *line = NULL;
    size_t buflen = 0;
    getline(&line, &buflen, stdin);
    return line;
}

Yup, getline() isn't exactly an advertisement for the beauty and clarity of C, but it's what we've got.

We'll need to free the memory allocated by the getline() call before we loop to the next command or we'll end up with a memory leak. First though we need to split the input string into an array of individual tokens.

#include <string.h>

char** kash_split_line(char *line) {
    int length = 0;
    int capacity = 16;
    char **tokens = malloc(capacity * sizeof(char*));

    char *delimiters = " \t\r\n";
    char *token = strtok(line, delimiters);

    while (token != NULL) {
        tokens[length] = token;
        length++;

        if (length >= capacity) {
            capacity = (int) (capacity * 1.5);
            tokens = realloc(tokens, capacity * sizeof(char*));
        }

        token = strtok(NULL, delimiters);
    }

    tokens[length] = NULL;
    return tokens;
}

We use the strtok() function from <string.h> to split the string on instances of whitespace. Leading and trailing whitespace is ignored and consecutive whitespace characters are treated as a single delimiter.

The return value of kash_split_line() is a NULL terminated array of string pointers (i.e. the final value in the array is always NULL). As with kash_read_line(), we'll need to free the memory occupied by this array when we're finished with it.

This code still won't compile as it's missing a function definition for kash_exec(). We'll fix that in the next step.

Step 3 — Executing Commands

Finally we're ready to execute a command, which for now means launching a new process.

If you haven't met it before, the procedure for launching a new process on a Unix system is a little odd. We use the fork() command to clone our initial process, leaving us with two identical parent and child processes. We then use the exec() command in the child process to replace the running program with the program we actually want to run.2

#include <sys/wait.h>
#include <unistd.h>

void kash_exec(char **args) {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        execvp(args[0], args);
        perror("kash");
        exit(1);
    } else if (child_pid > 0) {
        int status;
        do {
            waitpid(child_pid, &status, WUNTRACED);
        } while (!WIFEXITED(status) && !WIFSIGNALED(status));
    } else {
        perror("kash");
    }
}

That's it, our code should now compile! If you've been following along you can type

$ cc -o kash kash.c

to compile the code and

$ ./kash

to run it. Try typing in some simple commands like pwd, ls /, and echo foobar — they should all produce the expected output.

Okay, slight problem... Now that we've started it, we have no way of exiting the shell! We're trapped in the infinite loop in our main() function.

For now, hit Ctrl-C to kill the program. We'll fix this issue in the next step when we add support for builtins.

Step 4 — Supporting Builtins

We'll add support for three builtin commands: exit, cd, and help. Each command will be handled by a dedicated function. Let's start with exit.

void kash_exit(char **args) {
    exit(0);
}

Each command function will have the same signature, accepting our array of string pointers as its single argument and returning void.

Our cd command to change the working directory will have a little more work to do.

void kash_cd(char **args) {
    if (args[1] == NULL) {
        fprintf(stderr, "kash: cd: missing argument\n");
    } else {
        if (chdir(args[1]) != 0) {
            perror("kash: cd");
        }
    }
}

Note that we couldn't have implemented cd as an external program because the working directory is a property of the shell process itself. An external program would simply have changed its own working directory and then exited, leaving the working directory of the shell untouched.

The help command will simply print a list of available builtins.

void kash_help(char **args) {
    char *helptext =
        "Kash - the Kinda Aimless Shell. "
        "The following commands are available:\n"
        "  cd       Change the working directory.\n"
        "  exit     Exit the shell.\n"
        "  help     Print this help text.\n";
    printf("%s", helptext);
}

We need some way of registering the association between a command name and its handler function — I'm going to do this using an array of structs.

struct builtin {
    char *name;
    void (*func)(char **args);
};

struct builtin builtins[] = {
    {"help", kash_help},
    {"exit", kash_exit},
    {"cd", kash_cd},
};

int kash_num_builtins() {
    return sizeof(builtins) / sizeof(struct builtin);
}

We're almost done. We just need to add a loop to the top of our kash_exec() function to check for a builtin command before launching an external process.

void kash_exec(char **args) {
    for (int i = 0; i < kash_num_builtins(); i++) {
        if (strcmp(args[0], builtins[i].name) == 0) {
            builtins[i].func(args);
            return;
        }
    }

    pid_t child_pid = fork();

    if (child_pid == 0) {
        execvp(args[0], args);
        perror("kash");
        exit(1);
    } else if (child_pid > 0) {
        int status;
        do {
            waitpid(child_pid, &status, WUNTRACED);
        } while (!WIFEXITED(status) && !WIFSIGNALED(status));
    } else {
        perror("kash");
    }
}

That's it, we have a working shell! We can launch external processes, change the working directory, print our help text, and exit cleanly on demand.

A Kashless Society?

Kash is cute but I don't think the shell magnates of the Unix world have much to worry about just yet. We could continue adding features like quoting and globbing and redirection to make it a more practical tool, but I'm not going to, at least for now. Enough reinventing the wheel for one day.

Notes

1

Don't like the name? It could have been worse — I did briefly consider Shelly.

2

Either there was a very good reason for doing things this way in the early 1970s or the Bell Labs guys have been fucking with us for the past 50 years.

3

Run man 3 exec for more detail on the exec() family of functions.

4

Run man 3 waitpid for more detail on the waitpid() function.