Skip to content

WOMM-Sh: A Minimal Unix Shell Built From Scratch

C | Process API & Shell Internals

mysh is a minimal Unix shell written in C — a hands-on exploration of OSTEP’s Process API. It reads commands from standard input, parses them into tokens, runs built-ins (cd, exit) in the parent process, and launches external programs via fork / execvp with optional stdout redirection (>).

Source


Project Overview

mysh implements a classic read–eval–print loop: print mysh>, read a line with getline(3), parse into an argv-style token array, execute via built-ins or fork + execvp, and repeat until exit or EOF. The design is intentionally small — no job control, pipes, background jobs, or custom line editing.


Features

  • Interactive REPL with mysh> prompt
  • Built-in commands: cd, exit (run in the parent so cd affects the shell)
  • External programs via fork + execvp
  • Output redirection: command > file (stdout truncate/create in the child before execvp)

Requirements: GCC (or another C compiler) and a POSIX environment (Linux, macOS, or WSL on Windows).


Quick Start

make
./mysh

Example session:

mysh> pwd
/home/user/project
mysh> ls -l > listing.txt
mysh> cd src
mysh> exit

Architecture

flowchart TB
    subgraph repl["main.c — REPL"]
        A[getline stdin]
        B[split_line]
        C[execute_command]
        A --> B --> C
    end

    subgraph parse["parser.c"]
        P[strtok tokenization]
    end

    subgraph exec["executor.c"]
        D{built-in?}
        E[builtins.c]
        F[fork]
        G["child: redirect to file"]
        H[execvp]
        I[parent: waitpid]
        D -->|yes| E
        D -->|no| F
        F --> G --> H
        F --> I
    end

    B --> P
    C --> D
Module Files Responsibility
Entry / REPL main.c Prompt, input loop, call parser + executor
Parser parser.c, parser.h Split input into a NULL-terminated token array
Executor executor.c, executor.h Dispatch built-ins vs external commands
Built-ins builtins.c, builtins.h cd, exit in the parent process

Project layout

WOMM-Sh/
├── Makefile
├── README.md
├── docs/
│   └── TECHNICAL.md
└── src/
    ├── main.c          # REPL loop
    ├── parser.c/h      # Tokenization
    ├── executor.c/h    # Built-ins, fork/exec, redirection
    └── builtins.c/h    # cd, exit

Module Highlights

Parser (split_line)

Uses strtok(3) with delimiters \t\r\n\a. Returns pointers into the mutated input line (no per-token copies). Grows the pointer array with realloc when token count exceeds TOKEN_BUFSIZE (64).

Built-ins

Registered via parallel name/function arrays:

char *builtin_str[] = { "cd", "exit" };
int (*builtin_func[])(char **) = { &builtin_cd, &builtin_exit };
Command Behavior Return
cd [dir] chdir(args[1]) in parent; errors if no argument 1
exit Stops the shell 0

Executor

execute_command(char **args) scans for built-ins first; otherwise fork()s, optionally rewrites argv for >, runs execvp in the child, and waitpid in the parent.


Execution Model

REPL sequence

sequenceDiagram
    participant User
    participant Main as main.c
    participant Parser as parser.c
    participant Exec as executor.c

    loop while status != 0
        Main->>User: printf "mysh> "
        User->>Main: getline
        alt EOF
            Main->>Main: break
        else line
            Main->>Parser: split_line(line)
            Parser-->>Main: token array
            Main->>Exec: execute_command(args)
            alt builtin exit
                Exec-->>Main: 0
            else builtin other / external
                Exec-->>Main: 1
            end
            Main->>Main: free(args)
        end
    end

External command path

sequenceDiagram
    participant Parent as parent shell
    participant Child as child
    participant OS as kernel

    Parent->>OS: fork()
    OS-->>Child: pid == 0
    OS-->>Parent: pid > 0

    Child->>Child: scan args for ">"
    opt redirection found
        Child->>OS: open + dup2 to STDOUT
        Child->>Child: truncate argv at redirect
    end
    Child->>OS: execvp(program, args)

    Parent->>OS: waitpid(child)

I/O Redirection

Supported syntax (stdout only, one redirect per command):

program [arg ...] > filename

The child scans for >, opens the file with O_WRONLY | O_CREAT | O_TRUNC, dup2s to STDOUT_FILENO, then NULL-terminates argv before execvp. Example: ls -l > out.txtexecvp sees ["ls", "-l", NULL].

Not implemented: <, >>, 2>, pipes, or heredocs.


Build System

Variable Value
CC gcc
CFLAGS -Wall -Wextra -O2
TARGET mysh

All four .c files compile and link in one invocation. Targets: make / make all, make clean.


POSIX Dependencies

API Used in
getline main.c
fork, execvp, chdir executor.c, builtins.c
waitpid executor.c
open, dup2 executor.c (redirection)
strtok, malloc, realloc parser.c

Does not build natively on Windows without WSL, MSYS2, or Cygwin.


Limitations & Extension Points

Current limitations

  • Whitespace-only tokenization (no quoting)
  • No cd with no args ($HOME fallback)
  • No pipelines, background &, or export
  • Redirection only on external commands (not built-ins)
  • Parent always waits synchronously

Natural extensions

Feature Approach
Pipes pipe + two forks, dup2 between children
>> append O_APPEND in open flags
Background jobs Fork without wait; job table + WNOHANG
Quoted strings Custom tokenizer instead of strtok

Summary

This project demonstrates how a real shell uses the fork/exec split — built-ins in the parent, programs in a child — and why that design matters for cd, I/O redirection, and process isolation. Every piece was implemented in C with POSIX APIs only, aligned with OSTEP Chapter 5 (Process API).


Back to C/C++ Projects

Comments