Claude Token Optimization — Analysis & Action Plan
Claude Token Optimization — Analysis & Action Plan
Source Videos
| # | Title | Creator | Tips Extracted |
|---|---|---|---|
| V1 | Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit | Nate B Jones | 18 |
| V2 | 18 Claude Code Token Hacks in 18 Minutes | Nate Herk | 23 |
| Total | 41 |
Part 1: Combined Tip List
Category: CLAUDE.md / System Prompt
| # | Source | Tip | Confidence |
|---|---|---|---|
| SP-1 | V1-9 | Prune system prompt regularly — especially instructions written for older, less capable models | HIGH |
| SP-2 | V1-10 | Don’t load entire repo into context if you haven’t tested whether it’s still necessary | HIGH |
| SP-3 | V1-12 | For API builders: enable prompt caching for stable context (system prompts, tool definitions, reference docs) | HIGH |
| SP-4 | V2-12 | Keep CLAUDE.md under 200 lines — treat it as an index, not a document. Point to files by path rather than embedding content. | HIGH |
| SP-5 | V2-21 | Use CLAUDE.md as a “systems constitution” — store stable architecture decisions, not conversations | HIGH |
| SP-6 | V2-22 | Add token-aware routing rules to CLAUDE.md: use Haiku sub-agents for multi-file exploration | MEDIUM |
| SP-7 | V2-23 | Add an “applied learning” section to CLAUDE.md that self-updates with one-line bullets on repeated failures (watch for bloat) | LOW-MEDIUM |
Category: Conversation Discipline
| # | Source | Tip | Confidence |
|---|---|---|---|
| CD-1 | V1-3 | Start fresh sessions every 10–15 turns — every turn re-reads the entire history | HIGH |
| CD-2 | V1-6 | If exploratory, declare intent at top and summarize before switching to execution | HIGH |
| CD-3 | V2-1 | Use /clear between unrelated tasks — don’t carry context from topic A into topic B | HIGH |
| CD-4 | V2-5 | Edit original message and regenerate instead of sending follow-up corrections | HIGH |
| CD-5 | V2-14 | Run /compact manually at ~60% context capacity, not at the default 95% auto-compact threshold | HIGH |
| CD-6 | V2-15 | Compact or clear before stepping away — prompt cache expires after 5 minutes, causing full re-read on return | MEDIUM |
Category: File & Input Hygiene
| # | Source | Tip | Confidence |
|---|---|---|---|
| FH-1 | V1-1 | Convert documents to Markdown before feeding them to Claude (can reduce 100k+ tokens → 4-6k for a PDF) | HIGH |
| FH-2 | V1-2 | Avoid screenshots when text will do — screenshots are “terribly inefficient” | HIGH |
| FH-3 | V2-10 | Paste only the relevant section, not the whole file | HIGH |
| FH-4 | V2-13 | Name specific files and functions in prompts — don’t say “check the repo,” say “check verify_user() in auth.js” | HIGH |
Category: Workflow Design
| # | Source | Tip | Confidence |
|---|---|---|---|
| WF-1 | V1-4 | Separate exploration/thinking sessions from execution sessions — never mix the two | HIGH |
| WF-2 | V1-5 | Front-load your intent so the model can act in a single pass without clarification turns | HIGH |
| WF-3 | V1-11 | Match model tier to task — Opus for reasoning, Sonnet for execution, Haiku for polish | MEDIUM |
| WF-4 | V1-17 | Instrument every agent call: track input tokens, output tokens, model mix, cost ratio | HIGH (agent builders) |
| WF-5 | V2-4 | Batch multi-step instructions into a single prompt message | MEDIUM |
| WF-6 | V2-6 | Use plan mode before any real task to prevent wrong-path token waste | HIGH |
| WF-7 | V2-7 | Run /context and /cost to make invisible token consumption visible | HIGH |
| WF-8 | V2-8 | Set up a terminal status line to see real-time context usage as a progress bar | MEDIUM |
| WF-9 | V2-9 | Keep usage dashboard open; check every 20-40 minutes to pace yourself | LOW |
| WF-10 | V2-11 | Watch Claude work in real time and interrupt if it goes off track | HIGH |
| WF-11 | V2-17 | Pick right model per job: Sonnet for coding, Haiku for sub-agents/simple tasks, Opus sparingly (<20% of usage) | MEDIUM |
| WF-12 | V2-18 | Use sub-agents deliberately — delegate one-off tasks to Haiku; avoid multi-agent teams unless necessary | HIGH |
| WF-13 | V2-19 | Schedule heavy sessions and multi-agent runs for off-peak hours (afternoons, evenings, weekends) | LOW |
| WF-14 | V2-20 | Go heavy when near a reset and budget remains; step away when near limit with time remaining | MEDIUM |
Category: Tool Use
| # | Source | Tip | Confidence |
|---|---|---|---|
| TU-1 | V1-7 | Audit and prune plugins/connectors — each injects tokens into every turn whether used or not | HIGH |
| TU-2 | V1-8 | Run /context before typing — see what’s loaded at session start | HIGH |
| TU-3 | V1-13 | Route web searches through Perplexity (via MCP) rather than native Claude browsing — saves 10-50k tokens/search | HIGH |
| TU-4 | V2-2 | Disconnect MCP servers not actively in use | HIGH |
| TU-5 | V2-3 | Prefer CLIs over MCP servers when a CLI equivalent exists | MEDIUM |
| TU-6 | V2-16 | Deny permissions for shell commands that produce large outputs Claude doesn’t need | HIGH |
Category: Agent Memory / Retrieval
| # | Source | Tip | Confidence |
|---|---|---|---|
| AM-1 | V1-14 | Use indexed retrieval — never dump full document sets into the context window on every call | HIGH |
| AM-2 | V1-15 | Pre-process and pre-summarize reference documents before they enter context | HIGH |
| AM-3 | V1-16 | Scope each agent’s context to the minimum it needs for its specific role | HIGH |
Category: Infrastructure
| # | Source | Tip | Confidence |
|---|---|---|---|
| IN-1 | V1-18 | Build guardrails infrastructure: auto-Markdown conversion, index-first retrieval, minimum-viable context scoping | HIGH (teams) |
Part 2: Adversarial Debate — Cluster Verdicts
Cluster 1: Conversation Hygiene
Verdict: HIGH confidence — compaction and fresh-start discipline are correct as general principles. The 60% manual compact rule is actionable; the 10-15 turn rule is too coarse. Context entropy, not a counter, is the real signal. The “edit original and regenerate” tip (CD-4) is underused and eliminates a correction turn entirely.
Cluster 2: File & Input Hygiene
Verdict: HIGH confidence — strongest and cheapest wins in the list. FH-3 (paste only relevant section) and FH-4 (name specific files) require no tooling. FH-1 (Markdown conversion) requires quality conversion tooling — naive conversion can be worse than the original. FH-2 has legitimate exceptions for visual/UI problems.
Cluster 3: Tool & Plugin Discipline
Verdict: HIGH confidence in principle, MEDIUM for tactics — auditing tool lists is clearly correct. MCP-vs-CLI preference is useful but not absolute. TU-6 (deny large-output shell commands) is high-ROI and underappreciated. The cognitive overhead of frequent MCP lifecycle management is a real cost — set up defaults, not per-session toggling.
Cluster 4: System Prompt & CLAUDE.md
Verdict: HIGH confidence for pruning (SP-1, SP-4); MEDIUM for index pattern (SP-5); LOW-MEDIUM for self-updating applied learning section (SP-7). The 200-line ceiling is a proxy metric, not a mechanistic limit — optimize for signal density, not line count. The applied learning section (SP-7) introduces a feedback loop with no clear governance — monitor aggressively for drift.
Cluster 5: Workflow Design
Verdict: HIGH for WF-6 (plan mode) and WF-10 (watch and interrupt). These have clear, measurable payoffs. MEDIUM for explore/execute separation (WF-1) — correct as principle, impractical as strict rule. Batching (WF-5) is better for experienced users with well-formed prompts.
Cluster 6: Model Tier Routing
Verdict: HIGH confidence in principle; MEDIUM for specific assignments — tier labels are snapshots that change as models improve. The sub-agent Haiku pattern (WF-12) is the most concrete application. Calibrate routing based on observed error rate per model per task, not generic tier labels.
Cluster 7: Visibility & Instrumentation
Verdict: HIGH for /context + /cost (TU-2, WF-7); MEDIUM for status line (WF-8); LOW for dashboard every 20-40 minutes (WF-9). Dashboard-watching is reactive and breaks flow — ambient monitoring (status line) is strictly better.
Cluster 8: Retrieval & Memory Architecture
Verdict: HIGH confidence, LOW daily relevance for individual Claude Code users — these are production agentic system tips. Correct and important for building PPA/PhoneBuddy pipelines, not for daily conversational use.
Cluster 9: Usage Pacing
Verdict: MEDIUM for WF-20 (session reset strategy); LOW for WF-13 (off-peak scheduling) — the off-peak performance claim is undocumented by Anthropic. Rate limit behavior is plan-dependent, not load-based.
Cluster 10: Prompt Caching & Infrastructure
Verdict: HIGH confidence, LOW relevance for Claude Code users; HIGH relevance for API builders — prompt caching is an API feature. Irrelevant to conversational Claude Code use. Critical for ppa-api Phase 2.
Part 3: Master Top 5 Tips (Highest Daily ROI)
| Rank | Tip | Why |
|---|---|---|
| #1 | CD-5: /compact at 60%, not 95% | Purely mechanical, zero setup, immediately measurable. Better context quality per session. |
| #2 | FH-3: Paste only the relevant section | Highest-leverage behavior with lowest implementation cost. 98% token reduction on file inputs. |
| #3 | WF-6: Plan mode before any real task | Eliminates the most expensive failure mode — executing confidently in the wrong direction. |
| #4 | WF-10: Watch and interrupt in real time | Catching a wrong path at turn 3 vs turn 15 is a 3-7x token multiplier difference. No setup required. |
| #5 | SP-1 + SP-4: Prune system prompt / Keep CLAUDE.md lean | Compounds permanently across every future session. Highest long-term ROI of any single action. |
Honorable mention: CD-4 (edit original and regenerate) — eliminates correction turns entirely when caught early enough.
Part 4: Application to This Codebase
Current Token Budget at Session Start (RoadTrip workspace)
| Source | ~Tokens | Action |
|---|---|---|
| Deferred tools | 10.1k | Unavoidable framework overhead |
| RoadTrip CLAUDE.md | 2.5k | Reduce: 268 lines → target 170 |
| MEMORY.md (auto-mem) | 2.7k | Prune stale implementation-status entries |
| System tools | 7.9k | Unavoidable |
| Skills manifest | 1k | Acceptable |
| MCP (Railway) | 2.8k | Disconnect during non-Railway sessions |
| Total | ~27k | Target: ~22k (save ~5k per session) |
RoadTrip (research/planning workspace) — Biggest Opportunity
CLAUDE.md: 268 lines → target 170 lines
Bloat candidates:
| Section | ~Lines | Why |
|---|---|---|
| Documentation Style Guide (emojis, tone, example pattern) | ~30 | Aesthetic preference. Not correctness-critical. Move to a reference file. |
| Quick Reference + Common Scenarios examples | ~25 | Redundant with the alias descriptions above them |
| “Shell Integration Ready” note | ~5 | Informational, not instructional |
| PowerShell alias full usage examples (gpush, gpush-dry, etc.) | ~40 | Point to infra/RoadTrip_profile.ps1 instead of documenting inline |
| Plan Validation Process 5-step | ~15 | Better as a SKILL.md entry or linked document than always-on context |
| Session Logging alias list (log-help, log-start, log-end) | ~10 | Move to profile.ps1 comments |
MEMORY.md: prune stale entries
Entries that are code-derivable (not worth loading every session):
- Implementation status bullets (rules-engine DONE, auth-validator DONE, etc.) — read the code
- Python environment paths — run
which python - “Next session” goal entries that are past their session
Action: Move implementation status tracking out of MEMORY.md and into a docs/status.md that’s not auto-loaded.
MCP servers: Railway MCP is 2.8k tokens injected every turn. Only connect when doing Railway work. For research-only sessions, disconnect at session start.
PhoneBuddy (production) — Medium Opportunity
CLAUDE.md: 100 lines — acceptable but trimmable
| Section | ~Lines | Recommendation |
|---|---|---|
| File Structure tree | ~15 | Remove — derivable from ls. Just say “flat structure, no src/ nesting” |
| Docker + Azure Container Apps deploy sections | ~15 | Rarely needed. One line: “see Dockerfile for non-Railway deploy options” |
| Local Dev block | ~10 | README territory. Remove from CLAUDE.md. |
Estimated savings: 25-30 lines → from 100 to ~70.
Production system prompt (highest-ROI change):
PhoneBuddy uses Claude Haiku for call classification. The classification prompt in main.py was likely written for an older model version. Haiku’s current capabilities mean:
- Multi-shot examples can often be reduced or removed
- Explicit “think step by step” scaffolding is less necessary
- Intent classification instructions can be shorter
- Action: Audit the classification prompt in
main.py. Test with a 30% shorter prompt — if quality holds, the per-call cost drops proportionally. In production, this compounds across every call received.
ppa-api (production) — Minimal Immediate Opportunity, High Future Impact
CLAUDE.md: 54 lines — already lean. Leave as is.
Phase 2 preparation: enable prompt caching
When Phase 2 adds SRCGEEE pipeline calls, the following should be cached at the API layer:
- System prompt / pipeline role definition
- Tool definitions
- SRCGEEE phase descriptions
- Static reference context (routing rules, agent constitution)
Per Nate B Jones: “Prompt caching can give you a 90% discount on repeated content… $0.50/M vs $5/M for Opus standard.” For a pipeline that runs many times per day, this is the single highest-ROI architectural decision.
Indexed retrieval for Phase 2: When Phase 2 adds Retrieve phase, implement indexed retrieval (AM-1) — never dump full history into each call. Retrieve only the relevant caller history chunks.
Part 5: Immediate Action Items
Priority ordered by ROI × ease:
| # | Action | Where | Impact |
|---|---|---|---|
| 1 | Add /compact at 60% habit to personal workflow |
Personal habit | Immediate |
| 2 | Trim RoadTrip CLAUDE.md from 268 → ~170 lines | RoadTrip/CLAUDE.md |
Every session |
| 3 | Prune MEMORY.md — remove implementation-status entries | memory/MEMORY.md |
Every session |
| 4 | Disconnect Railway MCP during non-Railway sessions | MCP settings | Per session |
| 5 | Audit PhoneBuddy classification prompt in main.py |
PhoneBuddy/main.py |
Per API call |
| 6 | Add prompt caching to ppa-api Phase 2 plan | Phase 2 design | Future |
| 7 | Trim PhoneBuddy CLAUDE.md from 100 → ~70 lines | PhoneBuddy/CLAUDE.md |
Every session |
| 8 | Move Plan Validation Process from CLAUDE.md to SKILL.md | RoadTrip/CLAUDE.md |
Every session |
Appendix: Key Quotes
“Every time you take a turn in a conversation, you read it as sending one line back. But Claude reads it as sending the entire conversation back.” — Nate B Jones
“Keep [CLAUDE.md] under 200 lines. Treat this like an index route to where more data lives.” — Nate Herk
“Passing everything to every agent is architectural laziness and it has real costs both in tokens burned and frankly in degraded agent performance.” — Nate B Jones
“Most people don’t need a bigger plan — they need to stop resending their entire conversation history 30 times. It’s not a limits problem. It’s a context hygiene problem.” — Nate Herk
“If your system prompt, your tool definitions, your reference documents aren’t cached, what are you doing?” — Nate B Jones