Master Git's internal structure by understanding its Directed Acyclic Graph. Learn how commits, branches, and refs work together to power version control.
Understanding Git's Directed Acyclic Graph
Every time you run git commit, git merge, or git rebase, you're manipulating a Directed Acyclic Graph (DAG). Yet most developers interact with Git through commands without understanding what's happening beneath the surface. This abstraction works fine until something breaks-you encounter a merge conflict, accidentally rewrite public history, or find yourself in "detached HEAD" state with no idea how you got there.
Understanding Git's DAG structure transforms Git from a set of memorized commands into a predictable, powerful tool. Instead of asking "what command do I run?" you'll ask "what structure do I want to create?" This mental model helps you reason about complex situations, debug issues, and perform advanced operations with confidence.
In this comprehensive guide, we'll explore Git's internal graph structure from first principles. You'll learn how commits, branches, and references fit together, how to visualize the DAG, and how to manipulate it safely. Whether you're a beginner who's confused by Git's internals or an intermediate developer looking to level up, you'll finish with a complete understanding of how Git really works.
Understanding Git's Directed Acyclic Graph
Git stores your project's history as a Directed Acyclic Graph where commits are nodes and parent relationships are edges. This structure might sound abstract, but it's the foundation of everything Git does. Let's break down what this actually means and why it matters.
What Makes It a DAG?
A Directed Acyclic Graph has three defining properties: direction (edges point from child to parent), no cycles (you can't follow parent pointers back to a commit you've already visited), and connections (every commit except the initial commit has at least one parent). These properties aren't just mathematical trivia-they fundamentally shape how Git works.
The direction of edges means there's always a clear "backward" path from any commit to the beginning of history. When you run git log, Git follows these parent pointers backward, showing you increasingly older commits. The absence of cycles guarantees this process eventually terminates-you won't loop infinitely trying to display history. And the connectivity ensures every commit is traceable back to the project's origin.
Consider what happens when you create a merge commit with two parents. Git stores this as a single node with two edges pointing backward, one to each parent. This preserves the complete history of both branches while creating a single point where they converge. The graph structure makes parallel development natural and explicit.
Git's Object Storage
Git implements this DAG using four core object types: blobs (file contents), trees (directory listings), commits (snapshots with metadata), and tags (pointers to commits). When you stage files and commit, Git creates a tree object representing your directory structure, creates a commit object pointing to that tree, and stores both in its object database.
Each commit object contains a SHA-1 hash, the tree hash, parent commit hashes, author timestamp, committer timestamp, and commit message. The hash uniquely identifies the commit based on its content-if anything changes (even the timestamp), the hash changes. This content-addressable storage means Git can detect corruption and guarantee integrity.
Here's what a commit object looks like internally:
tree 3a4b5c6d7e8f0g1h2i3j4k5l6m7n8o9p0q1r2s
parent 9z8y7x6w5v4u3t2s1r0q9o8n7m6l5k4j3i2h1g
author Jane Developer <[email protected]> 1735689600 +0000
committer Jane Developer <[email protected]> 1735689600 +0000
Add user authentication feature
The parent reference creates the edge in our DAG. When you create another commit based on this one, Git includes its hash as a parent, extending the chain. This is why Git's history is immutable-you can't change a commit's parent without changing its hash, which would require updating all descendant commits.
Working with the Graph Structure
Now that we understand what the DAG is, let's explore how Git represents and works with this structure in practice. References, branches, and specialized pointers all build on top of the commit graph to provide different ways to navigate and manipulate history.
References and Branches
References are human-readable names that point to commits. The most common reference is a branch, which is simply a movable pointer to a commit. When you create a branch with git branch feature-x, Git creates a file at .git/refs/heads/feature-x containing the SHA-1 hash of the current commit.
HEAD is a special reference that points to the branch you're currently working on. Normally, HEAD points to a branch name (like ref: refs/heads/main), but it can also point directly to a commit in "detached HEAD" state. When you make a new commit, Git updates the branch HEAD points to, moving the branch reference forward.
This design explains why branching in Git is instant and cheap-creating a branch just means writing a 41-byte file containing a commit hash. Switching branches updates HEAD and resets your working directory to match the target branch's commit. No files are copied or duplicated.
Other important references include:
- Remote-tracking branches (refs/remotes/origin/main) track remote branches
- Tags point to specific commits and don't move
- The reflog records every position HEAD has occupied
- Stashes are commits pointed to by special refs
Understanding references clarifies seemingly confusing behaviors. When you delete a branch with git branch -d, Git checks whether the branch's commits are reachable from another reference. If not, it warns you that you're losing commits. These commits still exist in the object database until garbage collection, but they're no longer accessible through normal references.
Visualizing History
Visualizing the DAG helps you understand complex situations. Git provides built-in tools for this, and third-party tools offer even more powerful visualizations.
The most basic visualization is git log --graph --oneline --all, which shows commits as nodes with lines connecting them:
* a1b2c3d (HEAD -> main) Fix critical bug
| * e5f6g7h (feature-x) Add new feature
|/
* d4e5f6g Initial commit
More sophisticated visualizations are available with git log --graph --pretty=format:'%h %d %s' --abbrev-commit or dedicated tools like gitk, tig, or GitKraken. These tools make the DAG structure explicit, showing merge commits, branch points, and commit relationships.
When visualizing, look for important patterns:
- Linear chains represent sequential development
- Merge commits have multiple parents
- Branch points occur where divergent commits share a parent
- Orphaned commits (no path from HEAD) will be garbage collected
Understanding these patterns helps you reason about what happened in your repository. If you see a merge commit with two parents, you know two lines of development converged. If you see commits that aren't reachable from any branch reference, you know they're awaiting garbage collection.
Practical DAG Manipulation
With a solid understanding of Git's graph structure, we can now explore how to safely manipulate the DAG for common workflows. The key is understanding which operations rewrite history (dangerous) and which add new commits (safe).
Safe Operations
Any operation that adds new commits without changing existing ones is safe, even for shared history. Merging is the canonical example-creating a merge commit adds a new node to the graph without modifying existing nodes.
# Safe: creates a merge commit
git checkout main
git merge feature-branch
This creates a new commit with two parents: the previous main commit and the tip of feature-branch. No existing commits are modified, so this is safe even if others have based work on main.
Cherry-picking also creates new commits rather than modifying existing ones:
# Safe: creates a new commit with the same changes
git cherry-pick abc1234
Git creates a new commit containing the changes from abc1234 but with a different hash (different parent, different timestamp). The original commit remains untouched in the graph.
Other safe operations include creating branches, reverting commits (creates a new commit that undoes changes), and fetching (adds remote-tracking references). These operations only add to the graph or update movable references.
Advanced Scenarios
Rebasing rewrites history by creating new commits and abandoning old ones. This is powerful but dangerous for shared history-anyone who has based work on the old commits will have conflicts when they rebase.
# Dangerous for shared history: rewrites commits
git checkout feature-branch
git rebase main
This operation finds the merge base between feature-branch and main, then for each commit on feature-branch after that base, creates a new commit with the same changes but main as the parent. Finally, it moves the feature-branch reference to the new commits.
Why is this dangerous? If someone else built work on your original feature-branch commits, they'll need to reconcile their work with the rebased commits. The old commits still exist (you can recover them from the reflog), but they're no longer referenced by a branch name.
Interactive rebasing provides even more power:
# Reorder, squash, or drop commits
git rebase -i HEAD~5
This opens an editor showing the last five commits, allowing you to reorder them, squash multiple commits into one, edit commits, or drop them entirely. Each action rewrites history by creating new commits.
When you need to recover lost work, the reflog is your safety net:
# Show everywhere HEAD has been
git reflog
# Restore to a previous state
git reset --hard HEAD@{5}
The reflog records every position HEAD has occupied, including commits you've "lost" through rebasing or branch deletion. Git garbage collection won't remove commits that are in the reflog, giving you time to recover mistakes.
Detached HEAD state occurs when HEAD points directly to a commit rather than a branch reference. This happens when you checkout a specific commit hash or a tag. Any commits you create in this state aren't referenced by a branch name, so they can be lost if you switch away. To preserve them, create a branch before switching:
git checkout -b rescue-branch
Key Takeaways
-
Git stores history as a Directed Acyclic Graph where commits are nodes and parent references are directed edges. This structure provides integrity, enables efficient operations, and makes parallel development explicit.
-
References are movable pointers to commits. Branches are simply references, and HEAD is a special reference pointing to your current location. Understanding this clarifies branch operations, detached HEAD, and how Git determines "current" branch.
-
Safe operations add new commits without modifying existing ones. Merging, cherry-picking, and branching are safe even for shared history because they only extend the graph or update references.
-
Rebasing rewrites history by creating new commits. This is powerful for cleaning up history but dangerous for shared branches. Use rebase on local feature branches, avoid it on public branches.
-
The reflog provides recovery from mistakes. Every position HEAD occupies is recorded, letting you restore lost commits. Before panic about lost work, check the reflog.
Git's DAG structure isn't just an implementation detail-it's the foundation for everything Git does. Understanding this structure transforms Git from a set of mysterious commands into a comprehensible tool. You'll reason about complex situations with confidence, perform advanced operations safely, and debug issues that would otherwise be baffling.
Next time you encounter a merge conflict, a confusing rebase, or an unexpected Git state, visualize the DAG. Ask yourself: what does the graph look like? What references point where? What operation will produce the structure I want? This mental model will serve you well through every Git challenge you face.