TL;DR
grepreturns text, not structure. An AI agent burns hundreds of tokens disambiguating definitions, imports, and false positives. Cartog parses code with tree-sitter, builds a symbol graph in SQLite, and answers queries (âwhere is X defined?â, âwho calls Y?â) in microseconds â without reloading the source.
Letâs see how an agent searches for text and how that relates to context.
An AI agent exploring a codebase uses grep and cat.
When it searches for UserService, it gets output like this:
src/auth.py:3: from services import UserService
src/auth.py:45: # UserService handles authentication
src/tests/test_auth.py:12: mock_user_service = UserService()
src/logs/config.py:8: logger.info("UserService initialized")
Out of those 4 results, only one is the actual definition of UserService.
The agent doesnât know which â it has to read every file to figure out whatâs relevant.
Why are we talking about tokens?
An AI agent doesnât âreadâ a file the way a human does. Everything that flows between it and the code âgrepoutput,catoutput â is split into tokens and billed to the model. Those tokens fit into a limited context window. The more an agent spends exploring, the less it has left for reasoning.
Past a certain volume, accuracy degrades (the context rot phenomenon).
Letâs count using the example above.
The agent runs grep UserService and gets back 4 lines (~80 tokens).
It doesnât know which is right, so it runs cat src/auth.py (~600 tokens),
then cat src/tests/test_auth.py (~400 tokens),
then cat src/logs/config.py (~300 tokens) to rule out the false positives.
Bottom line:
~1,400 tokens to answer âwhere is UserService defined?â, when a human would have just scanned for the class or def prefix in the first output.
Worse:
This cost repeats every question. âWho calls authenticate?â, âWhat does UserService inherit from?â, âWhat does verify_token do?â
Each question kicks off another grep â cat â cat â cat cycle. The context window fills up with redundant fragments before the agent has even started coding.
The fundamental problem: grep searches for strings, not concepts. It canât distinguish a class definition from an import, a comment, or a log line. The agent has to mentally reconstruct the codeâs structure from text fragments â and every fragment costs tokens.
Why is this a problem now? Agents have been using grep for years.
Because the cost has changed. When a human reads 4 files, they filter mentally in a few seconds. When an agent reads 4 files, it consumes context â and that context is limited, slow, and expensive. What used to be friction is now a bottleneck.
But then, how can a machine understand the structure of code without reading it like a human?
Tree-sitter is an incremental parser that produces a concrete syntax tree (CST) for any language.
Where a regex sees class UserService:, tree-sitter sees:
class_definition
name: identifier "UserService"
superclasses: argument_list
identifier "BaseService"
body: block
function_definition
name: identifier "authenticate"
parameters: ...
function_definition
name: identifier "refresh_token"
parameters: ...
The tree encodes structure:
UserService is a class, it inherits from BaseService, it contains two methods.
This isnât text anymore â itâs structural knowledge.
Cartog uses tree-sitter with grammars for 8 languages today: Python, TypeScript, JavaScript, Rust, Go, Ruby, Java, and Markdown (Markdown documents are indexed as symbols without edges, useful for semantic search).
Each grammar produces a language-specific CST, and a dedicated extractor turns that CST into normalized symbols and relationships.
Tree-sitter gives one tree per file.
To connect those trees together (e.g. a call that crosses three modules, or a class that inherits from another in a different package), you need to name things and name the links between them.
From the CST, cartog extracts two types of entities:
- A symbol is a named element: a function, a class, an import, a type, an enum.
- An edge is a relationship between symbols:
callswhen a function calls another,inheritswhen a class extends another,raiseswhen a function throws an exception,importswhen a module references another.
Each symbol gets a stable identifier of the form file_path:kind:qualified_name:
src/auth.py:class:UserService
src/auth.py:method:UserService.authenticate
src/auth.py:function:verify_token
This format survives line moves within a file â only a rename or a structural change invalidates the ID.
Thatâs what allows the graph to be compared between two indexings without losing existing references.
In total, cartog recognizes about a dozen symbol kinds and as many edge kinds. The full list (symbol kinds, edge kinds, language coverage) lives in the cartog GitHub repo â it evolves as new grammars are added.
With a graph of symbols and resolved edges, structural queries become possible.
Hereâs the kind of navigation the graph enables:
graph LR
AuthController -->|calls| authenticate["UserService.authenticate"]
authenticate -->|calls| verify_token
authenticate -->|calls| refresh_token
Admin -->|inherits| UserService
verify_token -->|raises| TokenExpiredError
A few concrete examples:
cartog refs UserService â Who uses UserService?
Returns every symbol with an edge pointing to UserService, grouped by kind (calls, imports, inherits).
cartog callees UserService.authenticate â What does this method call?
Follows outgoing calls edges to list direct dependencies.
cartog impact verify_token --depth 3 â What breaks if I change this function?
Walks back up the caller graph 3 levels deep. The grep equivalent would mean manually reading every caller file, then every file calling those callers, and so on.
cartog hierarchy BaseService â Whatâs the inheritance tree?
Displays the full hierarchy: parent and child classes.
cartog outline src/auth.py â File structure
Lists every symbol in a file with its hierarchy, without having to read the file itself.
All this sounds great on paper. But does it actually change anything for an agent in practice?
To quantify these gains, a benchmark compares cartog with grep/cat across 13 typical AI-agent scenarios (definition lookup, caller tracing, impact analysis) on 5 languages, including a Python fixture of 69 files / 4k lines indexed in 95ms:
| Metric | grep/cat | cartog |
|---|---|---|
| Tokens per query | ~1,700 | ~280 |
| Recall | 78% | 97% |
| Latency â read (outline, refs) | variable | 8-14 ”s |
| Latency â transitive analysis (impact depth-3) | N/A | 2.7-17 ms |
The biggest gains: tracing call chains (88% token reduction) and finding callers (95% reduction).
The agent consumes less context, gets more relevant results, and gets them in microseconds. It can ask structural questions about the code instead of reading it line by line.
Cartog leans on a deliberately minimal stack, picked so it can ship as a single binary running on the developerâs machine:
- Rust â a single binary, ~5 MB, cross-compiled for Linux, macOS (x86 + ARM) and Windows. No runtime to install, no JVM, no Node.
- tree-sitter â incremental, multi-language, structural parsing. Currently 8 embedded grammars (Python, TypeScript, JavaScript, Rust, Go, Ruby, Java, Markdown).
- SQLite (with
rusqlitebundled) â the entire graph in a single.cartog.dbfile (~1 MB for an average project). WAL mode allows concurrent reads â useful when the watcher and the MCP server run in parallel. - sqlite-vec + FTS5 â vector search (KNN) and full-text search (BM25) integrated into SQLite, no external server. Detailed in article 2.
- ONNX Runtime (via
fastembed) â embedding inference locally, on CPU. No external API, the code never leaves the machine. - rmcp â MCP server over stdio to expose the graph to agents.
The strengths emerging from these choices:
- Zero infrastructure â one binary, one database file. No Postgres to provision, no Neo4j to administer.
- 100% local â no data sent to any external service. Compatible with proprietary code, NDAs, air-gapped environments.
- Microsecond latency â the graph is precomputed, queries are indexed SELECTs.
- Cross-platform â a single
cargo install cartogcovers the major OSes.
The storage model is extensible: an internal spec already describes an S3 mode (graph shared across CI, team, machines), with local SQLite as cache. The format stays the same â only the distribution layer changes.
Curious about the exact schema (tables, columns, indexes) and architectural decisions? Everything is documented in the cartog GitHub repo and the docs.rs page.
The CLI commands above are the visible face. So that an agent can use them without plumbing, cartog ships as a Claude Code plugin â thatâs the recommended path: one-command install, preconfigured MCP server, embedded agent skill. The plugin exposes 12 MCP tools: cartog_search, cartog_refs, cartog_callees, cartog_impact, cartog_hierarchy, cartog_outline, cartog_deps, cartog_changes, cartog_rag_index, cartog_rag_search, cartog_index, cartog_stats.
For other MCP clients (Cursor, Windsurf, Zed, etc.), cartog serve exposes the same tools over stdio.
A separately installable agent skill, via npx skills add jrollin/cartog, teaches the agent when to use which tool: which search to route to cartog_search rather than cartog_rag_search, how to chain tools for a refactor, which heuristic fallback to use when a symbol isnât found.
We have a graph of symbols. But a
calls â verify_tokenedge doesnât say whichverify_tokensymbol to use.
Several may exist in the project. The graph is incomplete if we donât resolve that.
When tree-sitter sees verify_token() inside a method body, it creates an edge with target_name = "verify_token".
But pointing to which symbol exactly?
This ambiguity can be resolved with a heuristic cascade, from most specific to most general:
flowchart TD
A["verify_token â unresolved target"] --> B["Same file?"]
B -->|yes| R["â Resolved"]
B -->|no| C["Import path?"]
C -->|yes| R
C -->|no| D["Same directory?"]
D -->|yes| R
D -->|no| E["Unique in project?"]
E -->|yes| R
E -->|no| F["â Unresolved"]
A second pass then replays imports once theyâve been resolved, to recover cases where the import itself was ambiguous on the first round.
On our benchmarks, this heuristic resolves 25 to 37% of edges depending on the language. Thatâs enough for simple projects, but clearly not enough for complex codebases full of homonyms and re-exports.
But IDEs let me rename classes or functions without breaking anything? How does that work, and why doesnât cartog do it right away?
IDEs solve this by querying a language server â a partial compiler that knows the projectâs exact semantics (types, scopes, re-exports), where cartogâs heuristic only sees names and paths. Wiring cartog onto LSP raises resolution to 44-81%. Thatâs the topic of a dedicated article later in the series.
What if the agent is looking for âthe order validation logicâ without knowing the function name? There has to be another approachâŠ
The structural graph solves the problem of understanding code, but it requires knowing the exact name of the symbol youâre looking for.
Thatâs the topic of the next article in the series: semantic search over code.