Visualizing the Trie: insertion and prefix search
A companion to Shipping sm2-flashcards. The Trie in app/trie.py is the project’s headline data structure, but how it actually evolves in memory during inserts — and how search_prefix walks it — isn’t obvious from the code alone.
These diagrams trace the exact sequence of _TrieNode allocations and child-pointer updates as three tags get inserted, then visualize the two-phase prefix search.
Step 1: Initialize the empty Trie
Before any data is inserted, the Trie consists of only a single, blank root node. It has no children, no associated card IDs, and is not a terminal node.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#fff'}}}%%
graph TD
Root(("Root<br/>(is_terminal: F)"))
classDef terminal fill:#81d4fa,stroke:#01579b,stroke-width:2px;
classDef normal fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
class Root normal;
Step 2: Insert “py” with Card 10
When insert("py", 10) is called, the Trie creates a new path.
- It creates a node for
'p'. - It creates a node for
'y', marks it terminal, and adds Card 10.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#fff'}}}%%
graph TD
Root(("Root<br/>(is_terminal: F)"))
P(("p<br/>(is_terminal: F)"))
Y(("y<br/>(is_terminal: T)<br/>[Cards: 10]"))
classDef terminal fill:#81d4fa,stroke:#01579b,stroke-width:2px;
classDef normal fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
class Y terminal;
class Root,P normal;
Root -- "children['p']" --> P
P -- "children['y']" --> Y
Step 3: Insert “pax” with Card 20
Next, insert("pax", 20). The Trie follows the existing 'p' branch and splits to a new 'a' branch.
- Follows Root →
p. - Creates a new branch for
'a'. - Creates a node for
'x', marks it terminal, and adds Card 20.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#fff'}}}%%
graph TD
Root(("Root<br/>(is_terminal: F)"))
P(("p<br/>(is_terminal: F)"))
Y(("y<br/>(is_terminal: T)<br/>[Cards: 10]"))
A(("a<br/>(is_terminal: F)"))
X(("x<br/>(is_terminal: T)<br/>[Cards: 20]"))
classDef terminal fill:#81d4fa,stroke:#01579b,stroke-width:2px;
classDef normal fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
class Y,X terminal;
class Root,P,A normal;
Root -- "children['p']" --> P
P -- "children['y']" --> Y
P -- "children['a']" --> A
A -- "children['x']" --> X
Step 4: Insert “python” with Card 10
Finally, insert("python", 10). This tag reuses the entire existing path of "py".
- Follows Root →
p→y. - Continues the path, creating new nodes for
't','h','o','n'. - Marks
'n'terminal and adds Card 10.
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#fff'}}}%%
graph TD
Root(("Root<br/>(is_terminal: F)"))
P(("p<br/>(is_terminal: F)"))
A(("a<br/>(is_terminal: F)"))
X(("x<br/>(is_terminal: T)<br/>[Cards: 20]"))
Y(("y<br/>(is_terminal: T)<br/>[Cards: 10]"))
T(("t<br/>(is_terminal: F)"))
H(("h<br/>(is_terminal: F)"))
O(("o<br/>(is_terminal: F)"))
N(("n<br/>(is_terminal: T)<br/>[Cards: 10]"))
classDef terminal fill:#81d4fa,stroke:#01579b,stroke-width:2px;
classDef normal fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
class Y,X,N terminal;
class Root,P,A,T,H,O normal;
Root -- "children['p']" --> P
P -- "children['a']" --> A
A -- "children['x']" --> X
P -- "children['y']" --> Y
Y -- "children['t']" --> T
T -- "children['h']" --> H
H -- "children['o']" --> O
O -- "children['n']" --> N
Step 5: Visualizing search_prefix("py")
This is how autocomplete executes on the final structure above.
The search happens in two phases, highlighted in different colors:
- Prefix lookup (
_find) — travels down the orange arrows to find the node representing"py". The'a'branch (and the entire"pax"subtree) is never touched. - Collection walk (
_walk) — starting from theynode, DFS collects every terminal node in that subtree (the blue path). Returns both"py"and"python".
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1f5fe', 'edgeLabelBackground':'#ffffff', 'tertiaryColor': '#fff'}}}%%
graph TD
Root(("Root"))
P(("p"))
A(("a<br/>(Ignored)"))
X(("x<br/>(Ignored)"))
Y(("y<br/>[MATCH #1: py]"))
T(("t"))
H(("h"))
O(("o"))
N(("n<br/>[MATCH #2: python]"))
Root -- "Lookup 'p'" --> P
P -- "Lookup 'y'" --> Y
P -. "Ignored Branch" .-> A
A --> X
Y -. "DFS Walk" .-> T
T --> H
H --> O
O --> N
linkStyle 0 stroke:#ff9100,stroke-width:4px
linkStyle 1 stroke:#ff9100,stroke-width:4px
linkStyle 4 stroke:#2979ff,stroke-width:3px,stroke-dasharray: 5 5
linkStyle 5 stroke:#2979ff,stroke-width:3px,stroke-dasharray: 5 5
linkStyle 6 stroke:#2979ff,stroke-width:3px,stroke-dasharray: 5 5
linkStyle 7 stroke:#2979ff,stroke-width:3px,stroke-dasharray: 5 5
classDef searchPath fill:#ffe0b2,stroke:#ff9100,stroke-width:3px
classDef matchNode fill:#b3e5fc,stroke:#01579b,stroke-width:4px
classDef normal fill:#e1f5fe,stroke:#0277bd,stroke-width:1px
class Root,A,X,T,H,O normal
class P searchPath
class Y,N matchNode
Two takeaways
- Shared prefixes share memory. Inserting
"python"reuses the entirep → ypath from"py". That’s why the trie isO(total characters across all keys), notO(num_keys × avg_key_length). Insert a thousand variants of"graph-*"and you only pay once for the"graph-"prefix. - Search is two phases, not one. The prefix lookup is a tight
O(k)walk down the trie. The collection phase isO(r)whereris the number of matching nodes underneath. Branches that don’t share the prefix are pruned entirely from the work — you never even look at them. That’s the asymptotic difference vs. a naivefor word in vocab: if word.startswith(prefix)scan.
This is what tests/bench_trie.py measures in the project — Trie prefix search vs. the naive scan on a 5,000-tag corpus. The Trie wins by an order of magnitude on this size and the gap widens as the corpus grows.