VCC: A Version Control System Built From Scratch
C++17 | Systems Programming & Developer Tooling
VCC is a custom-built version control system implemented entirely in C++. It replicates Git's core architecture — content-addressable object storage, SHA-1 hashing, blob/tree/commit graph, staging, history traversal, and time-travel checkouts — all without using any existing VCS libraries.
The Architecture
VCC is structured around six managers, each owning a specific domain of the version control lifecycle:
flowchart TD
WD[Working Directory] -->|add| SA(Staging Area / Index)
SA -->|write_tree| TR[Tree Object]
TR -->|commit| CM[Commit Object]
CM -->|update parent| P_CM[Parent Commit]
subgraph ".vcc/objects"
BL[Blob Object]
TR
CM
end
WD -.->|hashed to| BL
SA -.->|points to| BL
subgraph References
HEAD[HEAD .vcc/refs/heads/main]
end
CM <-->|pointed by| HEAD
CO[Checkout] -->|reads| CM
CO -->|restores| WD
| Manager | Responsibility |
|---|---|
| RepoManager | Initializes the .vcc directory structure and validates repository state |
| IndexManager | Hashes files into blobs, manages the staging area, respects .vccignore |
| TreeManager | Converts staged index entries into a deterministically sorted Tree object |
| CommitManager | Wraps a Tree with metadata (author, message, parent) into a Commit object |
| LogManager | Traverses the commit DAG backward from HEAD to render history |
| CheckoutManager | Restores the working directory to match any previous commit snapshot |
The Internal Database
When vcc init executes, it generates VCC's primary data store:
.vcc/
├── objects/ # Immutable object database (content-addressable)
│ ├── a1/ # Directory (first 2 chars of SHA-1)
│ │ └── b2c... # File (remaining 38 chars of SHA-1)
├── refs/
│ └── heads/
│ └── main # 40-char hash pointing to the latest commit
└── index # Mutable staging area (flat text file)
Deep Dive: The Object Model
VCC stores three object types inside .vcc/objects/, each addressed by the SHA-1 hash of its serialized content. This makes the entire database a content-addressable filesystem — objects are retrieved by what they contain, not by arbitrary IDs.
1. Blobs (Binary Large Objects)
Blobs store the exact byte-for-byte contents of a staged file. They carry zero metadata — no filename, no permissions, no timestamps. Rename src/main.cpp to src/app.cpp without changing the code? The blob hash stays identical.
std::string content = read_file(filename);
SHA1 checksum; checksum.update(content);
std::string hash = checksum.final();
2. Trees (Directory Snapshots)
Trees map blobs to human-readable filenames, forming the directory hierarchy. TreeManager::write_tree() reads the index, alphabetically sorts entries (for deterministic hashing), and constructs a payload:
Two directories with identical files produce identical Tree hashes — enabling automatic deduplication.
3. Commits (History Nodes)
Commits bind a Tree to a moment in time. The payload is purely text-based:
tree 724505ffcdbe7324f617f3e166d3f44f17e0e34a
parent 4fc37118897576c907dce5f629a0802113b578bf
author Jyotishmoy Deka
added codes
Because the parent hash is embedded inside the payload, altering any historical commit changes its hash, which recursively invalidates all subsequent commits — guaranteeing a tamper-proof, mathematically verifiable history.
Object Graph Visualization
The resulting DAG achieves extreme spatial efficiency through content-based deduplication:
flowchart TD
Commit2["Commit 2\n(added main.cpp)"] -->|Parent| Commit1["Commit 1\n(initial)"]
Commit2 -->|Tree| Tree2["Tree 2\n(Root Directory)"]
Commit1 -->|Tree| Tree1["Tree 1\n(Root Directory)"]
Tree2 -->|blob: main.cpp| BlobMain["Blob\n(main.cpp content)"]
Tree2 -->|blob: README.md| BlobReadme["Blob\n(README.md content)"]
Tree1 -->|blob: README.md| BlobReadme
Because README.md didn't change between Commit 1 and Commit 2, Tree 2 calculates the same SHA-1 for the file — dynamically deduplicating the storage footprint.
The Checkout Algorithm
Time-traveling to a previous commit (checkout <hash>) is a six-step deterministic process:
- Read Target — Locate the commit object in
.vcc/objects/ - Parse Payload — Line-by-line parse to extract the 40-char tree hash from the
treeprefix - Read Tree — Locate the Tree object using the extracted hash
- Parse Entries — Iteratively parse
<mode> <type> <hash> <filename>entries - Reconstruct Working Directory — For every blob entry, load the binary content and overwrite the local file
- Move HEAD — Rewrite
.vcc/refs/heads/mainto point to the target commit hash
Command Reference
| Command | Usage | Description |
|---|---|---|
init |
.\vcc.exe init |
Initialize a new VCC repository with the .vcc directory structure |
add |
.\vcc.exe add <file> |
Hash file contents to a blob, store in objects, and update the index |
write-tree |
.\vcc.exe write-tree |
Create a Tree object from the current staging area (returns SHA-1) |
commit |
.\vcc.exe commit "<msg>" |
Snapshot the tree with metadata and update HEAD |
log |
.\vcc.exe log |
Traverse and display the full commit history from HEAD |
checkout |
.\vcc.exe checkout <hash> |
Restore working directory to match a specific commit |
Interactive System Flow
Summary
This project demonstrates a deep understanding of how Git works internally — from SHA-1 content addressing and immutable object graphs to index management and working directory reconstruction. Every core concept — blobs, trees, commits, HEAD, checkout — was implemented from first principles in C++17 with raw file I/O and no external VCS libraries.