ashlr
Blog
7 min read

Why I built a token-efficiency layer for Codex and Claude Code

Token counts are a real operating cost. ashlr is the open-source plugin that fixes the read-everything problem in Codex and Claude Code — cutting mean token usage by 57% across TypeScript, Python, and Rust reference repos.

releaseengineering

Token counts are a real operating cost. I started noticing it the same way you notice a slow memory leak: not all at once, but through accumulating evidence. The context window fills up faster than it should. Sessions hit the limit mid-refactor. The Anthropic invoice climbs past what feels proportionate to the actual work done.

The culprit is usually reads. When an AI coding agent opens a 15,000-token file to change three lines, it ships all 15,000 tokens to the model as context. Grep a large codebase for an import pattern and you might get 85,000 tokens of match output that the model skims and discards. Every native read, grep, and edit call transmits the full payload whether the model needs it or not.

That is the problem ashlr solves.


What it is

ashlr is an open-source Codex and Claude Code efficiency layer: 40 MCP tools, Codex workflow skills, 34 Claude Code slash commands, an animated Claude Code status line, and an optional hosted backend. Claude Code one-line install:

curl -fsSL https://plugin.ashlr.ai/install.sh | bash

The tools replace native read, grep, and edit workflows with compressed alternatives. ashlr__read returns a head+tail snip instead of the full file. ashlr__grep routes through a genome-aware retrieval index when one exists, and falls back to truncated ripgrep otherwise. ashlr__edit returns a diff summary instead of echoing the full before+after. Codex gets plugin packaging, MCP, skills, and nudge-first hooks; Claude Code gets auto redirects, slash commands, and the status line. The model gets enough context to do the work. The bill shrinks.


Why now

Two things converged. Codex and Claude Code both expose plugin/MCP surfaces, which means tools like this can run as standard MCP servers under compatible hosts — Codex packaging, Cursor, and Goose ports ship with the plugin today. And AI coding costs have matured from "toy" to "line item." Teams running AI coding agents across five engineers with heavy usage are spending real money. The efficiency question is no longer academic.


The moment it clicked

I was working in this repo — the ashlr-plugin codebase itself, 750 files, 149,462 lines — and ran a benchmark to see what the numbers actually looked like. Not rough estimates. Measured, reproducible numbers against real files.

Here is what the benchmark found on ashlr__read for a 15KB test file (server/tests/auth.test.ts, 10,846 bytes):

Bytes10,8461,623
Tokens2,709406
Ratio1.000.15

That is an 85% reduction on one file. For the CHANGELOG.md (41,100 bytes, 10,200 raw tokens), the ratio drops to 0.040 — the compressed view uses 406 tokens where the full file would have used 10,200. The model gets the structural information it needs; the padding disappears.

Across all file sizes in the benchmark:

ToolMean savingsp50 savings
TypeScript (vercel/ai)-62%
Python (pandas)-65%
Rust (tokio)-44%
Cross-language mean-57%

The overall -57% is the cross-language headline across TypeScript, Python, and Rust reference repos. The benchmark methodology is published in docs/benchmarks.md and docs/benchmarks-v2.json; the older self-repo large-read number is preserved in the raw benchmark JSON, but it is no longer the public headline.


What's free, what's paid

Everything in the free tier is the product. 40 MCP tools, 34 Claude Code slash commands, Codex workflow skills, the genome scribe loop, per-session token accounting, a calibration harness, a reproducible benchmark. MIT license. No account. Telemetry off by default.

Pro ($12/month or $120/year) adds hosted infrastructure for developers who need it: a cloud LLM summarizer so you do not need Ollama running locally, cross-machine stats sync, a live auto-updating badge. It does not remove or degrade anything in the free tier.

Team ($24/user/month) adds the shared CRDT genome — one authoritative retrieval index per repo, CRDT-merged across concurrent team members — plus org dashboards, policy packs, genome diffs on PRs, SSO, and audit log.

Enterprise covers on-prem deployment with private inference. Nothing about the free tier is crippled to push upgrades.


Who it's for

Individual developers today. If you use Codex or Claude Code heavily on real codebases, the free tier will measurably reduce your token consumption within the first session. Run bun run scripts/run-benchmark.ts --compare to measure your repo. In Codex sessions, inspect live savings after Ashlr tool calls with ashlr stats --json or ashlr__savings; in Claude Code, use /ashlr-benchmark or /ashlr-savings.

Teams via Pro. The shared genome becomes worth the overhead at three or more engineers working the same codebase — the retrieval index reflects everyone's edits, not just yours.

Enterprise via on-prem. If your company is sensitive about code leaving the VPC, the plugin's architecture supports private inference endpoints. The genome format is a public spec; nothing is locked to the hosted service.


Three things that surprised me building this

The session counter bug (v0.9.3). The status line was showing session +0 for many users even as lifetime totals kept climbing. The cause was subtle: Claude Code forwards CLAUDE_SESSION_ID to hooks but not to MCP server subprocesses. So the MCP server was writing savings to a PPID-hash bucket, and the status line was reading a CLAUDE_SESSION_ID bucket, and the two never intersected. The fix — candidateSessionIds() returning both and aggregating across them — sounds obvious in retrospect but required actually inspecting a live stats.json to see the orphaned bucket sitting there with 2,863 tokens that nobody was reading.

The `$` interpolation bug (v0.9.2). ashlr__multi_edit uses String.prototype.replace() for strict-mode edits. JavaScript's replace(string, string) interprets $&, $1, ` $ `, and $' in the replacement string. This silently corrupted any edit whose replacement contained a $` followed by certain characters — template literals, TypeScript generic constraints, currency strings. The fix was switching to slice + concat, which treats the replacement as a literal. The bug had been there since the tool launched; the security audit in v0.9.2 caught it.

Zombie processes (v1.0.1). Running /reload-plugins in Claude Code re-reads plugin manifests but does not kill MCP server subprocesses spawned by earlier reloads. If a terminal was opened before an upgrade, the old MCP server process keeps writing the old stats.json schema alongside the new writers. The v1 and v2 shapes are incompatible enough that the v1 writer was periodically wiping the sessions map that the v2 reader depended on. The fix was a fallback in pickSession() that surfaces the v1 singular counter when the v2 sessions map is empty. Full correctness requires restarting Claude Code, not just reloading plugins.


Open source

The full plugin — every tool, every skill, the genome format, the compression logic — is on GitHub at github.com/ashlrai/ashlr-plugin. The benchmark script is scripts/run-benchmark.ts and the seeded results are in docs/benchmarks-v2.json. The CI job at .github/workflows/ci.yml runs it on every push and opens a PR on Monday to refresh the data.

860 tests pass. 2 are skipped: a no-genome grep flake that only surfaces in the full-suite parallel environment (passes reliably in isolation) and a benchmark ratio assertion with the same root cause. Both are documented in-line.


Install

curl -fsSL https://plugin.ashlr.ai/install.sh | bash

For Codex:

git clone https://github.com/ashlrai/ashlr-plugin
cd ashlr-plugin && bun install
codex plugin marketplace add ashlrai/ashlr-plugin
codex plugin add ashlr@ashlr-marketplace
bun run scripts/cli.ts codex-doctor --json

For Claude Code:

/plugin marketplace add ashlrai/ashlr-plugin
/plugin install ashlr@ashlr-marketplace

Restart Claude Code. Run /ashlr-allow once to silence permission prompts. Run /ashlr-tour for a 60-second walkthrough on your actual codebase.

Full docs at plugin.ashlr.ai/docs. Pricing at plugin.ashlr.ai/pricing.

If the numbers look wrong on your codebase, run /ashlr-benchmark and open an issue. The methodology is public and the script is auditable.


Subscribe to updates

Get notified when we ship new releases and post engineering notes.

Subscribe on the status page
By Mason Wyatt