Hosted summarization
Server-side LLM compaction for large file reads — no local model required. How the hosted summarizer works, when it fires, and how it compares to local fallbacks.
Hosted summarization is a Pro and Team feature that runs file compaction on a server-side model rather than locally. It removes the need to install Ollama or configure a local LLM while keeping the same large-read compaction behavior.
What it does
When ashlr__read encounters a file above the LLM summarization threshold (~16 KB), it normally calls a local model endpoint (http://localhost:1234/v1 by default — LM Studio or Ollama). With a Pro token active, it instead calls https://api.ashlr.ai/llm/summarize, which runs hosted xAI Grok 4.3 inference and returns a structured summary with usage metadata.
The local plugin falls back to snipCompact truncation if the hosted call times out (2s timeout) or if the server is unreachable. The fallback is transparent — you always get output.
When it fires
| Condition | Behavior |
|---|---|
| File < 2 KB | Returned verbatim — no summarization |
| File 2–16 KB | snipCompact (head+tail) — no LLM needed |
| File ≥ 16 KB, no Pro token | Local LLM if reachable; snipCompact fallback |
| File ≥ 16 KB, Pro token present | Hosted summarizer (2s timeout); snipCompact fallback |
bypassSummary: true | snipCompact always — LLM bypassed |
Privacy
- File contents are sent to
https://api.ashlr.ai/llm/summarizeonly when you have a Pro token and the file exceeds the threshold. - Payloads are encrypted in transit (HTTPS). The server does not store file contents after the request completes.
- The session ID in the request header is an opaque hex value — never your user identity.
If you work on sensitive codebases and prefer not to send content to the hosted endpoint, set bypassSummary: true on any ashlr__read call or configure ASHLR_LLM_URL to point at your own local endpoint.
Setup
No configuration needed. Activate by signing in:
The plugin detects the token and automatically routes eligible reads to the hosted summarizer. Run /ashlr-status to confirm cloud features are active.
Confidence badges
Every summarized result includes a confidence badge:
| Badge | Meaning |
|---|---|
[ashlr confidence: high] | Full summary — model confident in coverage |
[ashlr confidence: medium] | Partial coverage — some sections may be elided |
[ashlr confidence: low] | Fallback to snipCompact — LLM timeout or file too large |
Pass bypassSummary: true to always get snipCompact output with a [ashlr confidence: high · bypassSummary:true recovers fidelity] marker.
Local fallback (free tier / offline)
Without a Pro token, ashlr__read uses:
- Local LLM — if
ASHLR_LLM_URLis set, or if LM Studio / Ollama is running onlocalhost:1234. - snipCompact — head + tail truncation, no model required.
Run /ashlr-ollama-setup to configure a local model for offline summarization.
Related
ashlr__read— the tool that calls the summarizer- snipCompact — the fallback algorithm
- Pro setup — configure your Pro token
/ashlr-ollama-setup— configure a local model