Building a Recursive Descent HTML Parser in Go

Technical Deep-Dive | Browser Internals | March 2026

Your browser just parsed this page. Here's exactly how that works — and how I built the same thing from scratch in Go.

Why Build Your Own HTML Parser?

When I built my Go Browser Engine, I had a choice: use a third-party HTML parsing library, or build one from scratch.

I chose scratch.

Not because it was easier — it wasn't. But because building the parser yourself forces you to confront a question that most developers never ask:

What actually IS HTML? Not what it looks like — what is it?

The answer changes how you think about the web forever.

What the Parser Has to Do

Before writing a single line of code, you need to understand the contract. The HTML parser sits at stage 2 of the browser pipeline:

Raw HTML string (text)
        │
        ▼
 ┌─────────────────┐
 │   HTML Parser   │   ← We are here
 └─────────────────┘
        │
        ▼
 *dom.Node tree (structured data)

Its job is deceptively simple: take a flat string of characters and produce a tree — a hierarchical data structure that represents the nesting of every element on the page.

Input:  <html><body><h1 class="title">Hello</h1><p>World</p></body></html>

Output:
  html
  └── body
      ├── h1  [class="title"]
      │   └── "Hello"
      └── p
          └── "World"

That transformation — from linear text to recursive tree — is the entire job.

The Fundamental Insight: HTML is a Grammar

To parse anything, you first need to define what it is precisely. HTML, like all formal languages, has a grammar — a set of rules that describe what valid input looks like.

Here's a simplified grammar for the subset we care about:

document    := nodeList
nodeList    := node*
node        := element | text
element     := '<' tagName attributes '>' nodeList '</' tagName '>'
text        := [any characters not starting with '<']
attributes  := attribute*
attribute   := name '=' '"' value '"'

Read this out loud: "A document is a list of nodes. A node is either an element or text. An element starts with <tagName attributes>, contains more nodes, and ends with </tagName>."

This recursive definition is the key insight. Elements contain nodes. Nodes can be elements. Elements can contain elements. This is why the parser is recursive descent — the grammar is recursive, so the parser naturally mirrors that structure.

The DOM: What We're Building Towards

The parser produces a tree of dom.Node objects. The node type is intentionally minimal — just two kinds of nodes, with the fields each needs:

type NodeType int

const (
    ElementNode NodeType = iota  // e.g. <h1 class="title">
    TextNode                     // e.g. "Hello"
)

type Node struct {
    Type     NodeType
    TagName  string            // "h1", "div", "a" — ElementNode only
    Attr     map[string]string // {"class": "title", "href": "/"} — ElementNode only
    Text     string            // raw string content — TextNode only
    Children []*Node           // child nodes — ElementNode only
}

That's the entire data model. Everything the CSS engine, layout engine, and JavaScript runtime need flows through this structure.

The Parser State

The parser has one job: consume the input string from left to right, one character at a time, never going back.

type Parser struct {
    input string  // the full raw HTML string
    pos   int     // our current position — advances only forward
}

pos is the cursor. It starts at 0 and marches toward len(input). Every helper function moves it forward. The parser never backtracks.

Three low-level helper methods form the foundation of everything:

// Look at the current character without consuming it
func (p *Parser) nextChar() byte {
    return p.input[p.pos]
}

// Consume and return the current character, advance pos
func (p *Parser) consumeChar() byte {
    ch := p.input[p.pos]
    p.pos++
    return ch
}

// Consume characters while a condition holds
func (p *Parser) consumeWhile(test func(byte) bool) string {
    var result strings.Builder
    for p.pos < len(p.input) && test(p.nextChar()) {
        result.WriteByte(p.consumeChar())
    }
    return result.String()
}

With these three primitives, every higher-level parsing function is just a pattern of "look, decide, consume, repeat".

Parsing a Text Node

The simplest case: we're not looking at a <, so everything until the next < is raw text.

func (p *Parser) parseText() *dom.Node {
    text := p.consumeWhile(func(ch byte) bool {
        return ch != '<'
    })
    return dom.NewText(text)
}

Input:  "Hello, world!</p>"
         ^pos

consumeWhile(ch != '<'):
  'H' → consumed
  'e' → consumed
  'l' → consumed
  ...
  '!' → consumed
  '<' → STOP

Output: TextNode{Text: "Hello, world!"}
Remaining input: "</p>"

Simple. Greedy. Stop at the first <.

Parsing Attributes

Attributes are the key-value pairs inside an opening tag: class="title" or href="/page".

func (p *Parser) parseAttribute() (string, string) {
    name := p.consumeIdentifier()   // "class"
    p.consumeChar()                 // skip '='
    p.consumeChar()                 // skip opening '"'
    value := p.consumeWhile(func(ch byte) bool {
        return ch != '"'            // read until closing '"'
    })
    p.consumeChar()                 // skip closing '"'
    return name, value
}

func (p *Parser) parseAttributes() map[string]string {
    attrs := make(map[string]string)
    for p.nextChar() != '>' {
        p.consumeWhitespace()
        name, value := p.parseAttribute()
        attrs[name] = value
    }
    return attrs
}

Step through class="title" href="/page">:

pos → 'c' ... consumeIdentifier() → "class"
pos → '=' ... consumeChar() skips it
pos → '"' ... consumeChar() skips it
pos → 't' ... consumeWhile(≠'"') → "title"
pos → '"' ... consumeChar() skips it

attrs = {"class": "title"}

consumeWhitespace() skips ' '

pos → 'h' ... consumeIdentifier() → "href"
pos → '=' ... skip
pos → '"' ... skip
pos → '/' ... consumeWhile(≠'"') → "/page"
pos → '"' ... skip

attrs = {"class": "title", "href": "/page"}

pos → '>' — STOP

Parsing an Element — The Core Function

Now the full element parser. This is the heart of the recursive descent:

func (p *Parser) parseElement() *dom.Node {
    // 1. Consume '<'
    p.consumeChar()

    // 2. Read the tag name: "h1", "div", "a", etc.
    tagName := p.consumeIdentifier()

    // 3. Parse all attributes until we hit '>'
    attrs := p.parseAttributes()

    // 4. Consume '>'
    p.consumeChar()

    // 5. ← RECURSION: parse all child nodes
    children := p.parseNodes()

    // 6. Consume the closing tag: '</', tagName, '>'
    p.consumeChar() // '<'
    p.consumeChar() // '/'
    p.consumeIdentifier() // tagName (not validated — consumed and discarded)
    p.consumeChar() // '>'

    // 7. Wrap everything into a Node and return
    return dom.NewElement(tagName, attrs).AddChildren(children)
}

Step 5 is everything. When parseElement encounters children, it calls parseNodes() — which calls parseElement() again for any child elements. The call stack mirrors the nesting of the HTML. This is what makes it recursive descent.

The Top-Level Loop: `parseNodes`

parseNodes is the orchestrator. It loops and dispatches to either parseElement or parseText based on a single character lookahead:

func (p *Parser) parseNodes() []*dom.Node {
    var nodes []*dom.Node
    for {
        p.consumeWhitespace()

        // Stop if we've reached EOF or a closing tag
        if p.pos >= len(p.input) || strings.HasPrefix(p.input[p.pos:], "</") {
            break
        }

        // Dispatch: '<' means element, anything else means text
        if p.nextChar() == '<' {
            nodes = append(nodes, p.parseElement())
        } else {
            nodes = append(nodes, p.parseText())
        }
    }
    return nodes
}

The decision at each step is made with one character of lookahead. That's it. No token buffer. No lookahead table. Just: is the next character a < or not?

The Call Stack is the Tree

This is the elegant part. When you trace through parsing <div><p>Hello</p></div>, the Go call stack at the deepest point looks exactly like the DOM tree being built:

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '14px'}}}%%
sequenceDiagram
    participant Top as Parse()
    participant N1 as parseNodes L1
    participant E1 as parseElement div
    participant N2 as parseNodes L2
    participant E2 as parseElement p
    participant N3 as parseNodes L3
    participant T as parseText()

    Top->>N1: start
    N1->>E1: sees div
    E1->>N2: recurse children
    N2->>E2: sees p
    E2->>N3: recurse children
    N3->>T: char != '<'
    T-->>N3: TextNode Hello
    N3-->>E2: [TextNode]
    E2-->>N2: ElementNode p
    N2-->>E1: [ElementNode p]
    E1-->>N1: ElementNode div
    N1-->>Top: [ElementNode div]

The recursion unwinds exactly as the closing tags are consumed. Each parseElement call stays on the stack until its matching </tag> is found. The tree builds itself from the inside out.

Full Parse Flow: State Machine View

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '14px'}}}%%
stateDiagram-v2
    [*] --> ParseNodes : Parse() called

    ParseNodes --> ConsumeWS : loop start
    ConsumeWS --> CheckExit

    CheckExit --> Done : EOF or closing tag
    CheckExit --> ParseElement : nextChar == '<'
    CheckExit --> ParseText : nextChar != '<'

    ParseElement --> ReadTag : skip '<'
    ReadTag --> ParseAttrs : consumeIdentifier()
    ParseAttrs --> ParseAttr : more attrs
    ParseAttr --> ParseAttrs : loop
    ParseAttrs --> SkipBracket : char == '>'

    SkipBracket --> ParseNodes : recurse children
    ParseNodes --> ClosingTag : closing tag found
    ClosingTag --> ParseNodes : return node

    ParseText --> ParseNodes : text node returned
    Done --> [*] : return nodes

A Complete Worked Example

Let's trace <h1 class="title">Hello</h1> from start to finish:

Input: <h1 class="title">Hello</h1>
       ^pos=0

parseNodes() called
  consumeWhitespace() — nothing to skip
  nextChar() == '<' → call parseElement()

    parseElement():
      consumeChar() → '<'  pos=1
      consumeIdentifier() → "h1"  pos=3
      parseAttributes():
        consumeWhitespace() — skips ' '  pos=4
        nextChar() != '>' → parseAttribute()
          consumeIdentifier() → "class"  pos=9
          consumeChar() → '='  pos=10
          consumeChar() → '"'  pos=11
          consumeWhile(≠'"') → "title"  pos=16
          consumeChar() → '"'  pos=17
          return ("class", "title")
        consumeWhitespace() — nothing
        nextChar() == '>' → STOP
        return {"class": "title"}
      consumeChar() → '>'  pos=18

      parseNodes() ← RECURSE for h1's children
        nextChar() == 'H' (not '<') → parseText()
          consumeWhile(≠'<') → "Hello"  pos=23
          return TextNode{"Hello"}
        nextChar() == '<'
        input[pos:] starts with "</" → BREAK
        return [TextNode{"Hello"}]

      consumeChar() → '<'  pos=24
      consumeChar() → '/'  pos=25
      consumeIdentifier() → "h1"  pos=27
      consumeChar() → '>'  pos=28

      return ElementNode{
        TagName: "h1",
        Attr:    {"class": "title"},
        Children: [TextNode{"Hello"}]
      }

parseNodes() returns [ElementNode{h1}]

Done. 28 characters processed. A complete subtree returned.

What This Parser Intentionally Omits

Real HTML parsers (like the one in Chrome's Blink engine) are hundreds of thousands of lines of C++, handling every edge case in the HTML5 specification. This one is intentionally minimal:

Feature	Status	Reason Omitted
Self-closing tags (`<br/>`, `<img/>`)	❌	Would need special-case void element list
`<!DOCTYPE html>`	❌ Ignored	Not meaningful for rendering
HTML comments (`<!-- -->`)	❌	Would corrupt the parse on real pages
Unquoted attribute values	❌	Added complexity for rare cases
Error recovery	❌	Real browsers have massive recovery logic
Malformed / unclosed tags	❌ Undefined	No error recovery

These omissions are not bugs — they're scope decisions. A browser engine for learning doesn't need to handle <br> the same way Chrome does. It needs to demonstrate the architecture clearly.

The right parser for learning is the simplest one that makes the next stage possible.

Why Recursive Descent?

There are many parsing strategies. Why recursive descent?

Alternatives considered:

Strategy	Tradeoff
Regex	Fast to write, impossible to maintain, cannot handle nesting
State machine (iterative)	Explicit stack management, more complex code
Parser generator (yacc, ANTLR)	Requires external tooling, hides the mechanism
Recursive descent	Code structure mirrors grammar, immediately readable

With recursive descent, parseElement calls parseNodes calls parseElement. The grammar rule "elements contain nodes which contain elements" is directly expressed as "the function parseElement calls the function parseNodes which calls the function parseElement". There is no gap between the grammar definition and the implementation.

This is the most important property for a learning project: the code IS the explanation.

The Output Feeds Everything Downstream

Once the parser returns its root *dom.Node, every other engine stage consumes it:

*dom.Node (parser output)
    │
    ├──▶ javascript runtime  — walks tree to find <script> tags
    │
    ├──▶ style engine        — walks tree to match CSS rules to each node
    │
    ├──▶ layout engine       — walks tree to compute X, Y, Width, Height
    │
    └──▶ paint / renderer    — walks tree to draw text and rectangles on screen

The DOM is the contract between the parser and everything else. Build it wrong, and every downstream stage fails. Build it right, and you have the foundation of a browser.

Key Takeaways

One character of lookahead is all you need to decide between element and text.
The call stack IS the parse tree — recursion naturally handles nesting.
consumeWhile(predicate) is the most powerful primitive — nearly every parsing operation is a specialisation of it.
Scope your parser to your grammar. A minimal but correct parser is better than a complex one with hidden bugs.
Recursive descent maps the grammar directly to code — making the implementation self-documenting.

Source Code

The full parser lives in internal/parser/html.go in the Go Browser Engine repository.

git clone https://github.com/jyotishmoy12/go-browser.git
cd go-browser
go test ./internal/parser/...

← Back to Blogs View Go Browser Engine