Claude Token Optimization — Analysis & Action Plan

Source Videos

# Title Creator Tips Extracted
V1 Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit Nate B Jones 18
V2 18 Claude Code Token Hacks in 18 Minutes Nate Herk 23
  Total   41

Part 1: Combined Tip List

Category: CLAUDE.md / System Prompt

# Source Tip Confidence
SP-1 V1-9 Prune system prompt regularly — especially instructions written for older, less capable models HIGH
SP-2 V1-10 Don’t load entire repo into context if you haven’t tested whether it’s still necessary HIGH
SP-3 V1-12 For API builders: enable prompt caching for stable context (system prompts, tool definitions, reference docs) HIGH
SP-4 V2-12 Keep CLAUDE.md under 200 lines — treat it as an index, not a document. Point to files by path rather than embedding content. HIGH
SP-5 V2-21 Use CLAUDE.md as a “systems constitution” — store stable architecture decisions, not conversations HIGH
SP-6 V2-22 Add token-aware routing rules to CLAUDE.md: use Haiku sub-agents for multi-file exploration MEDIUM
SP-7 V2-23 Add an “applied learning” section to CLAUDE.md that self-updates with one-line bullets on repeated failures (watch for bloat) LOW-MEDIUM

Category: Conversation Discipline

# Source Tip Confidence
CD-1 V1-3 Start fresh sessions every 10–15 turns — every turn re-reads the entire history HIGH
CD-2 V1-6 If exploratory, declare intent at top and summarize before switching to execution HIGH
CD-3 V2-1 Use /clear between unrelated tasks — don’t carry context from topic A into topic B HIGH
CD-4 V2-5 Edit original message and regenerate instead of sending follow-up corrections HIGH
CD-5 V2-14 Run /compact manually at ~60% context capacity, not at the default 95% auto-compact threshold HIGH
CD-6 V2-15 Compact or clear before stepping away — prompt cache expires after 5 minutes, causing full re-read on return MEDIUM

Category: File & Input Hygiene

# Source Tip Confidence
FH-1 V1-1 Convert documents to Markdown before feeding them to Claude (can reduce 100k+ tokens → 4-6k for a PDF) HIGH
FH-2 V1-2 Avoid screenshots when text will do — screenshots are “terribly inefficient” HIGH
FH-3 V2-10 Paste only the relevant section, not the whole file HIGH
FH-4 V2-13 Name specific files and functions in prompts — don’t say “check the repo,” say “check verify_user() in auth.js” HIGH

Category: Workflow Design

# Source Tip Confidence
WF-1 V1-4 Separate exploration/thinking sessions from execution sessions — never mix the two HIGH
WF-2 V1-5 Front-load your intent so the model can act in a single pass without clarification turns HIGH
WF-3 V1-11 Match model tier to task — Opus for reasoning, Sonnet for execution, Haiku for polish MEDIUM
WF-4 V1-17 Instrument every agent call: track input tokens, output tokens, model mix, cost ratio HIGH (agent builders)
WF-5 V2-4 Batch multi-step instructions into a single prompt message MEDIUM
WF-6 V2-6 Use plan mode before any real task to prevent wrong-path token waste HIGH
WF-7 V2-7 Run /context and /cost to make invisible token consumption visible HIGH
WF-8 V2-8 Set up a terminal status line to see real-time context usage as a progress bar MEDIUM
WF-9 V2-9 Keep usage dashboard open; check every 20-40 minutes to pace yourself LOW
WF-10 V2-11 Watch Claude work in real time and interrupt if it goes off track HIGH
WF-11 V2-17 Pick right model per job: Sonnet for coding, Haiku for sub-agents/simple tasks, Opus sparingly (<20% of usage) MEDIUM
WF-12 V2-18 Use sub-agents deliberately — delegate one-off tasks to Haiku; avoid multi-agent teams unless necessary HIGH
WF-13 V2-19 Schedule heavy sessions and multi-agent runs for off-peak hours (afternoons, evenings, weekends) LOW
WF-14 V2-20 Go heavy when near a reset and budget remains; step away when near limit with time remaining MEDIUM

Category: Tool Use

# Source Tip Confidence
TU-1 V1-7 Audit and prune plugins/connectors — each injects tokens into every turn whether used or not HIGH
TU-2 V1-8 Run /context before typing — see what’s loaded at session start HIGH
TU-3 V1-13 Route web searches through Perplexity (via MCP) rather than native Claude browsing — saves 10-50k tokens/search HIGH
TU-4 V2-2 Disconnect MCP servers not actively in use HIGH
TU-5 V2-3 Prefer CLIs over MCP servers when a CLI equivalent exists MEDIUM
TU-6 V2-16 Deny permissions for shell commands that produce large outputs Claude doesn’t need HIGH

Category: Agent Memory / Retrieval

# Source Tip Confidence
AM-1 V1-14 Use indexed retrieval — never dump full document sets into the context window on every call HIGH
AM-2 V1-15 Pre-process and pre-summarize reference documents before they enter context HIGH
AM-3 V1-16 Scope each agent’s context to the minimum it needs for its specific role HIGH

Category: Infrastructure

# Source Tip Confidence
IN-1 V1-18 Build guardrails infrastructure: auto-Markdown conversion, index-first retrieval, minimum-viable context scoping HIGH (teams)

Part 2: Adversarial Debate — Cluster Verdicts

Cluster 1: Conversation Hygiene

Verdict: HIGH confidence — compaction and fresh-start discipline are correct as general principles. The 60% manual compact rule is actionable; the 10-15 turn rule is too coarse. Context entropy, not a counter, is the real signal. The “edit original and regenerate” tip (CD-4) is underused and eliminates a correction turn entirely.

Cluster 2: File & Input Hygiene

Verdict: HIGH confidence — strongest and cheapest wins in the list. FH-3 (paste only relevant section) and FH-4 (name specific files) require no tooling. FH-1 (Markdown conversion) requires quality conversion tooling — naive conversion can be worse than the original. FH-2 has legitimate exceptions for visual/UI problems.

Cluster 3: Tool & Plugin Discipline

Verdict: HIGH confidence in principle, MEDIUM for tactics — auditing tool lists is clearly correct. MCP-vs-CLI preference is useful but not absolute. TU-6 (deny large-output shell commands) is high-ROI and underappreciated. The cognitive overhead of frequent MCP lifecycle management is a real cost — set up defaults, not per-session toggling.

Cluster 4: System Prompt & CLAUDE.md

Verdict: HIGH confidence for pruning (SP-1, SP-4); MEDIUM for index pattern (SP-5); LOW-MEDIUM for self-updating applied learning section (SP-7). The 200-line ceiling is a proxy metric, not a mechanistic limit — optimize for signal density, not line count. The applied learning section (SP-7) introduces a feedback loop with no clear governance — monitor aggressively for drift.

Cluster 5: Workflow Design

Verdict: HIGH for WF-6 (plan mode) and WF-10 (watch and interrupt). These have clear, measurable payoffs. MEDIUM for explore/execute separation (WF-1) — correct as principle, impractical as strict rule. Batching (WF-5) is better for experienced users with well-formed prompts.

Cluster 6: Model Tier Routing

Verdict: HIGH confidence in principle; MEDIUM for specific assignments — tier labels are snapshots that change as models improve. The sub-agent Haiku pattern (WF-12) is the most concrete application. Calibrate routing based on observed error rate per model per task, not generic tier labels.

Cluster 7: Visibility & Instrumentation

Verdict: HIGH for /context + /cost (TU-2, WF-7); MEDIUM for status line (WF-8); LOW for dashboard every 20-40 minutes (WF-9). Dashboard-watching is reactive and breaks flow — ambient monitoring (status line) is strictly better.

Cluster 8: Retrieval & Memory Architecture

Verdict: HIGH confidence, LOW daily relevance for individual Claude Code users — these are production agentic system tips. Correct and important for building PPA/PhoneBuddy pipelines, not for daily conversational use.

Cluster 9: Usage Pacing

Verdict: MEDIUM for WF-20 (session reset strategy); LOW for WF-13 (off-peak scheduling) — the off-peak performance claim is undocumented by Anthropic. Rate limit behavior is plan-dependent, not load-based.

Cluster 10: Prompt Caching & Infrastructure

Verdict: HIGH confidence, LOW relevance for Claude Code users; HIGH relevance for API builders — prompt caching is an API feature. Irrelevant to conversational Claude Code use. Critical for ppa-api Phase 2.


Part 3: Master Top 5 Tips (Highest Daily ROI)

Rank Tip Why
#1 CD-5: /compact at 60%, not 95% Purely mechanical, zero setup, immediately measurable. Better context quality per session.
#2 FH-3: Paste only the relevant section Highest-leverage behavior with lowest implementation cost. 98% token reduction on file inputs.
#3 WF-6: Plan mode before any real task Eliminates the most expensive failure mode — executing confidently in the wrong direction.
#4 WF-10: Watch and interrupt in real time Catching a wrong path at turn 3 vs turn 15 is a 3-7x token multiplier difference. No setup required.
#5 SP-1 + SP-4: Prune system prompt / Keep CLAUDE.md lean Compounds permanently across every future session. Highest long-term ROI of any single action.

Honorable mention: CD-4 (edit original and regenerate) — eliminates correction turns entirely when caught early enough.


Part 4: Application to This Codebase

Current Token Budget at Session Start (RoadTrip workspace)

Source ~Tokens Action
Deferred tools 10.1k Unavoidable framework overhead
RoadTrip CLAUDE.md 2.5k Reduce: 268 lines → target 170
MEMORY.md (auto-mem) 2.7k Prune stale implementation-status entries
System tools 7.9k Unavoidable
Skills manifest 1k Acceptable
MCP (Railway) 2.8k Disconnect during non-Railway sessions
Total ~27k Target: ~22k (save ~5k per session)

RoadTrip (research/planning workspace) — Biggest Opportunity

CLAUDE.md: 268 lines → target 170 lines

Bloat candidates:

Section ~Lines Why
Documentation Style Guide (emojis, tone, example pattern) ~30 Aesthetic preference. Not correctness-critical. Move to a reference file.
Quick Reference + Common Scenarios examples ~25 Redundant with the alias descriptions above them
“Shell Integration Ready” note ~5 Informational, not instructional
PowerShell alias full usage examples (gpush, gpush-dry, etc.) ~40 Point to infra/RoadTrip_profile.ps1 instead of documenting inline
Plan Validation Process 5-step ~15 Better as a SKILL.md entry or linked document than always-on context
Session Logging alias list (log-help, log-start, log-end) ~10 Move to profile.ps1 comments

MEMORY.md: prune stale entries

Entries that are code-derivable (not worth loading every session):

  • Implementation status bullets (rules-engine DONE, auth-validator DONE, etc.) — read the code
  • Python environment paths — run which python
  • “Next session” goal entries that are past their session

Action: Move implementation status tracking out of MEMORY.md and into a docs/status.md that’s not auto-loaded.

MCP servers: Railway MCP is 2.8k tokens injected every turn. Only connect when doing Railway work. For research-only sessions, disconnect at session start.


PhoneBuddy (production) — Medium Opportunity

CLAUDE.md: 100 lines — acceptable but trimmable

Section ~Lines Recommendation
File Structure tree ~15 Remove — derivable from ls. Just say “flat structure, no src/ nesting”
Docker + Azure Container Apps deploy sections ~15 Rarely needed. One line: “see Dockerfile for non-Railway deploy options”
Local Dev block ~10 README territory. Remove from CLAUDE.md.

Estimated savings: 25-30 lines → from 100 to ~70.

Production system prompt (highest-ROI change): PhoneBuddy uses Claude Haiku for call classification. The classification prompt in main.py was likely written for an older model version. Haiku’s current capabilities mean:

  • Multi-shot examples can often be reduced or removed
  • Explicit “think step by step” scaffolding is less necessary
  • Intent classification instructions can be shorter
  • Action: Audit the classification prompt in main.py. Test with a 30% shorter prompt — if quality holds, the per-call cost drops proportionally. In production, this compounds across every call received.

ppa-api (production) — Minimal Immediate Opportunity, High Future Impact

CLAUDE.md: 54 lines — already lean. Leave as is.

Phase 2 preparation: enable prompt caching

When Phase 2 adds SRCGEEE pipeline calls, the following should be cached at the API layer:

  • System prompt / pipeline role definition
  • Tool definitions
  • SRCGEEE phase descriptions
  • Static reference context (routing rules, agent constitution)

Per Nate B Jones: “Prompt caching can give you a 90% discount on repeated content… $0.50/M vs $5/M for Opus standard.” For a pipeline that runs many times per day, this is the single highest-ROI architectural decision.

Indexed retrieval for Phase 2: When Phase 2 adds Retrieve phase, implement indexed retrieval (AM-1) — never dump full history into each call. Retrieve only the relevant caller history chunks.


Part 5: Immediate Action Items

Priority ordered by ROI × ease:

# Action Where Impact
1 Add /compact at 60% habit to personal workflow Personal habit Immediate
2 Trim RoadTrip CLAUDE.md from 268 → ~170 lines RoadTrip/CLAUDE.md Every session
3 Prune MEMORY.md — remove implementation-status entries memory/MEMORY.md Every session
4 Disconnect Railway MCP during non-Railway sessions MCP settings Per session
5 Audit PhoneBuddy classification prompt in main.py PhoneBuddy/main.py Per API call
6 Add prompt caching to ppa-api Phase 2 plan Phase 2 design Future
7 Trim PhoneBuddy CLAUDE.md from 100 → ~70 lines PhoneBuddy/CLAUDE.md Every session
8 Move Plan Validation Process from CLAUDE.md to SKILL.md RoadTrip/CLAUDE.md Every session

Appendix: Key Quotes

“Every time you take a turn in a conversation, you read it as sending one line back. But Claude reads it as sending the entire conversation back.” — Nate B Jones

“Keep [CLAUDE.md] under 200 lines. Treat this like an index route to where more data lives.” — Nate Herk

“Passing everything to every agent is architectural laziness and it has real costs both in tokens burned and frankly in degraded agent performance.” — Nate B Jones

“Most people don’t need a bigger plan — they need to stop resending their entire conversation history 30 times. It’s not a limits problem. It’s a context hygiene problem.” — Nate Herk

“If your system prompt, your tool definitions, your reference documents aren’t cached, what are you doing?” — Nate B Jones