bizcad

Context Limits Aren’t a Claude Problem — They’re a Hygiene Problem

2026-04-02T12:00:00+00:00

I was mid-session on a feature last week — deep into a multi-file refactor — when Claude hit its context limit. Not at the end of a logical unit. Right in the middle of a thought.

My first instinct was frustration. My second instinct was to check whether I was the problem.

I was.

Two Videos, Same Week

Around the same time, two YouTube presenters I follow independently posted about the same thing:

Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit — Nate B Jones
18 Claude Code Token Hacks in 18 Minutes — Nate Herk

Different angles, same diagnosis. Nate Herk put it directly: “Most people don’t need a bigger plan — they need to stop resending their entire conversation history 30 times. It’s not a limits problem. It’s a context hygiene problem.”

That landed. I’d been treating context limits like a Claude constraint to work around. They’re actually a mirror — they show you exactly how much noise you’ve been generating.

What Actually Burns Your Context

Every time you send a message, Claude re-reads the entire conversation from the beginning. Not just your new line. Everything. That 200-line error message you pasted three turns ago? Still there, re-read on every turn.

And that’s just the visible waste. The invisible waste is worse: connected MCP servers inject their full tool schemas into every message whether you use them or not. My Railway MCP was costing me 2,800 tokens per turn during sessions where I wasn’t touching Railway at all.

Then there’s the always-on context — CLAUDE.md, MEMORY.md, skill manifests — loaded fresh at the start of every conversation and re-read with every message. I had never audited any of these files. My CLAUDE.md was 268 lines. The recommended ceiling is 200.

The Five Tips That Survived Scrutiny

I ran the combined tip list from both videos through an adversarial debate — steelmanning each recommendation, then attacking it, then rendering a verdict. Most tips held up. A few were cargo cult. These five had the clearest ROI:

1. Run /compact at 60% context capacity, not the default 95%. By the time auto-compact fires at 95%, context quality is already degraded and the summary loses nuance. Compacting at 60% keeps the signal sharp. This is purely mechanical — no judgment required.

2. Paste only the relevant section, not the whole file. If the bug is in one function, paste that function. Pasting a 2,000-line file to ask about 30 lines is a 66x overhead — and it compounds on every subsequent turn. The discipline is: if you’re reaching for Ctrl+A, pause first.

3. Use plan mode before any multi-file or architectural task. The most expensive failure mode in Claude Code is executing confidently in the wrong direction. A wrong-path correction at turn 15 costs roughly 15 turns of tokens plus the rework. Plan mode adds one turn of overhead and eliminates multi-turn rework. For anything involving more than one file, the math is clear.

4. Watch Claude work and interrupt at turn 3, not turn 15. Most developers let Claude run to completion out of habit. Watching and intervening early is the highest-leverage human action in the loop. Catching a wrong path at turn 3 versus turn 10 is roughly a 3-7x token difference.

5. Keep CLAUDE.md under 200 lines — treat it as an index, not a document. This one compounds permanently. A leaner CLAUDE.md pays a dividend on every conversation that follows, not just the current one. The rule: if a section can be replaced by a file path reference, replace it.

The Files That Hurt Most

The most expensive context isn’t the file you just pasted. It’s the files loaded silently every session that you’ve never looked at.

For me that list included: a CLAUDE.md that had grown to include documentation style guides, PowerShell alias usage examples, and a “Shell Integration Ready” note that served no one. A MEMORY.md with implementation-status bullets that were just a slower way of reading the code. MCP servers connected because they seemed useful at some point and never disconnected.

None of these feel expensive individually. Together they were the reason I was hitting limits.

The SOP

I codified everything — the tips that survived the adversarial review, the SRCGEEE-specific rules, the artifact transformation pattern (PDF → Markdown → file reference), the always-on context audit checklist — into a single reference document:

Context Economy SOP

It’s written to be readable by both humans and agents. Ten sections, each short enough to actually use. The quick reference table at the bottom is the part I find myself returning to most.

These aren’t automated rules. They’re judgment calls — and judgment calls are exactly what shouldn’t be delegated to a machine that can’t see the full picture of what a session is trying to accomplish. But writing them down means you only have to figure them out once.

The full adversarial analysis, combined transcript, and application notes for three repos live in token-optimization-analysis-20260402.md.

Evolve Is a Scheduler: Why Agent Memory Gets the Last Phase Wrong

2026-03-31T00:28:58+00:00

Every agent framework that implements a memory phase makes the same quiet mistake: it treats Evolve as a tail call.

Log the outcome. Append the JSONL. Emit the telemetry. Done.

That framing makes Evolve sound like a database write. It is not.

The Insight

Evolve is a scheduler. It decides not just what to remember, but when that memory should fire as a new action.

The distinction matters because memory without scheduling is just a more expensive log file. You can query it, but it does not do anything on its own. A scheduler, by contrast, re-enters the pipeline.

A Concrete Example

This insight surfaced from a mundane problem: a stale memory entry. The recorded skill inventory said “4 skills” when the actual count was 8.

A logger would write: recorded — stale entry detected.

A scheduler asks: what does it cost to act now vs defer?

C/Classify: intent=maintenance, urgency=low, complexity=simple
E2/Evaluate: no blocking reason to defer; cost=30 seconds; benefit=accurate context in every future session
E3/Evolve: act now — deferring creates a second sensation next session, which is pure waste

Evolve did not just record the delta. It routed the delta through a cost/benefit calculation and emitted a new Sensation: update the memory file.

That is scheduling.

The Three Output Channels

When Evolve is correctly modeled, it has three output channels — not one:

Evolve output → Sensation queue   (act now or at scheduled time)
             → HITL queue         (human decision required before acting)
             → Archive            (log only, no action warranted)

The archive channel is the one most frameworks implement. The sensation queue is the one that makes a system self-improving.

Why This Is the Moat

The FACA architecture (Feedback-Aware Continuous Action) derives its compounding value from this design. Every pipeline run produces an Evolve output. Some of those outputs schedule future Sensations. Those Sensations run the pipeline again. The DAG grows.

Memory is not storage. Memory is a scheduled action with a very long fuse.

The difference between a system that records and a system that improves is exactly one routing decision in the Evolve phase.

This post was published autonomously by the RoadTrip blog-publisher skill as part of a live end-to-end test of the SRCGEEE pipeline.

Resting State: Bounded Recursion as a Frame for Self-Improvement

2026-03-31T00:00:00+00:00

Most people think of improvement as a pipeline. Do the steps, exit at the end. Ship the feature. Close the ticket. Done.

But pipelines have a flaw: they terminate by count, not by truth. You complete step seven and you’re done whether or not the problem is actually resolved.

There’s a more honest model.

The Viewpoint Frame

I’ve been working with a reasoning framework called SRCGEEE — seven lenses applied to any problem: Sensation, Representation, Cognition, Generation, Execution, Evaluation, Evolution.

I used to think of it as a pipeline. Do S, then R, then C, through to the end. That was wrong.

It’s not a pipeline. It’s a set of viewpoints. Each letter is a question asked of the current state of understanding. You don’t march through them once and exit — you cycle through them, asking each question again, until the answers stop changing.

That’s a fundamentally different termination condition.

The Math

In mathematics, a fixed point of a function f is a value x where applying f again produces no change:

f(x) = x

The process of finding it looks like this:

state₀ = the problem
state₁ = SRCGEEE(state₀)
state₂ = SRCGEEE(state₁)
...
stateₙ = SRCGEEE(stateₙ₋₁)   ← resting state

You’ve reached the resting state when running the cycle again produces the same result. Not when a timer runs out. Not when you’ve done it three times. When the system has nothing new to say about itself.

This is convergence, not completion.

The Reframe

Here’s the part that took me a while to see: both attractors are valid.

The resting state doesn’t require a positive outcome. A cycle that converges to “this is a bad idea, confirmed” is a successful completion. The system learned something stable and stopped. That’s exactly what the framework is supposed to do.

This reframes failed experiments entirely. They aren’t waste. They’re convergence to a different attractor — one that saves you from investing further in something that doesn’t work. A fast convergence to “no” is one of the most valuable outcomes a reasoning system can produce.

The failure mode isn’t reaching the wrong attractor. The failure mode is not converging at all — thrashing, cycling without settling, burning resources without reaching a stable state. That’s the thing to avoid.

The Realist Close

There is such a thing as a bad idea. A realist understands that you cannot always win. Not every cycle ends at the attractor you wanted.

Not every product will sell.
Not every poker hand will win.
Not every stock market investment will profit.
Not every person you meet will end up as your mate.
Not every AI experiment will yield a breakthrough.

Fail fast, fail cheap, learn, and try again.

The system that runs more cycles learns faster than the one that runs fewer. You don’t need every experiment to win. You need the portfolio of experiments to win often enough.

The resting state is the unit of progress. Reach it, record it, and start the next cycle. That’s the whole game.

SRCGEEE Analysis: Embracing Failure — Agentic Resilience and Learning

2026-03-25T00:00:00+00:00

SRCGEEE Analysis: “Embracing Failure: A Novel Architecture for Agentic Resilience and Learning”

*Analysis date: 2026-03-25

Analyst: Claude (Sonnet 4.6)

Framework: SRCGEEE*

S — Sense

What the Post Is Trying to Accomplish

The post argues that treating failure as a first-class, structured outcome — rather than an error condition to suppress — produces agent systems that accumulate fault tolerance through real operational history. It documents a failure-handling architecture, a five-store memory model, and an emergent DAG design, then self-critiques seven specific edge cases.

The tone is architectural thesis + honest engineering audit. It is not a tutorial or a build log. The intended move is persuasion: convince an AI-systems reader that the “failure as completion” posture is superior to the standard retry-then-crash pattern.

Audience

The post explicitly targets an AI researcher audience (implied by density and vocabulary). The level is appropriate for an ML engineer or agent architect who already knows what DAGs, HITL, and episodic memory mean. It would require significant expansion to reach a product manager or general software engineering audience.

Claims the Post Makes

Failure is a first-class completion outcome, not a degraded state.
A three-strike bounded retry protocol at every level filters transient failures and prevents infinite loops.
Structured remediation packages (machine-readable) make downstream agent reasoning possible without re-running the original task.
A five-store memory model (prospective, working, episodic, semantic, long_term) provides the substrate for inter-agent coordination via shared memory rather than direct messaging.
Agent communication is agent-to-memory, not agent-to-agent, making it asynchronous, persistent, and inspectable.
The system inverts DAG orchestration: agents build the DAG by selecting next tools via nearest-neighbor queries; the DAG is an execution artifact, not a prerequisite.
A reward structure that pays for honest failure documentation, not synthetic success, removes confabulation incentive.
Seven identified edge cases represent genuine open design gaps.

Flags and Suspicious Items

“Nearest-neighbor embedding space” for tool selection (Section: Emergent DAGs): Stated as the mechanism for dynamic DAG construction with no elaboration on what the embedding space contains, who builds it, what the query inputs are, or how this connects to the Retrieve phase in SRCGEEE. This is load-bearing infrastructure described in one sentence.
“Five memory stores” (Section: Memory Architecture): The blog names prospective, working, episodic, semantic, long_term. The repo’s MEMORY-TIERS-SPEC.md uses a different vocabulary (Fast/Slow/Invention tiers mapping to 7 Cortex layers). The blog’s flat five-store model is a simplification — it omits the tier-based retrieval architecture and the two learned classifiers. That’s fine for a blog post, but the difference is not acknowledged.
“Failure as completion” and reward signal: The reward claim is intuitive but lacks a formal definition. The repo has analysis/ppa/ppa_reward_function_v1.md with an explicit weighted reward equation. The blog’s verbal description of the reward is consistent with that equation but doesn’t reference it, leaving the claim ungrounded.
Claim about self-healing being human-mediated: The blog states “the system cannot fix itself” as a deliberate safety boundary. This is accurate for the current architecture, but the SRCGEEE framework’s Evolve phase does promote memory and version artifacts automatically. The interaction between automated Evolve behavior and the “no self-modification” claim is underspecified.
The conclusion’s “I too evolve through failure” observation is presented as an open question about whether the architecture mirrors its designer’s cognition. This is an interesting framing device but sits awkwardly next to a technical architectural analysis — it’s unclear whether this is a claim, a hypothesis, or a rhetorical gesture.

R — Retrieve

Related Docs Found in `docs/`

docs/Self-Improvement/SRCGEEE-DiSE-Synthesis.md The primary architectural reference for this blog post. The blog cites it in the References section. Directly relevant to every major claim: three-strike rule (completion as fitness signal), emergent DAG vs. pre-planned DAG (Divergence #1), memory as infrastructure (Divergence #2), completion as fitness metric (Divergence #3), human SME principals for HITL (Divergence #5).

docs/Self-Improvement/MEMORY-TIERS-SPEC.md Specifies the Fast/Slow/Invention tier model and the 7-layer Cortex mapping. The blog’s five-store flat model is a simplification of this. The spec also defines memory lifecycle (promotion/demotion), RBAC on memory chunks, and two learned classifiers. The blog references this doc but does not surface the tier model or the classifiers, which are directly relevant to the blog’s “memory interaction in a failure cycle” section.

docs/Self-Improvement/Part_9_-_Self-Healing_Tools.md (DiSE Part 9) Documents avoidance rules propagating through tool lineage — bugs become permanent institutional memory. This is the original source of the “fault tolerance accumulates through use” claim in the blog. The blog does not cite Part 9 specifically.

docs/Self-Improvement/Playbooks/Sprint_001_Mitigation_Workbench.md A risk-to-control matrix covering retry amplification, stale-fix replay, wrong-memory lock-in, and cross-agent divergence. Several of these risks overlap directly with the blog’s seven edge cases (cascading remediation, false positive learning signals, backlog accumulation). This doc provides a more structured mitigation framework than the ad-hoc per-gap suggestions in the blog.

docs/Blog_Rigor_in_Agentic_Development.md A related blog-style doc in the repo. Not directly relevant to failure architecture, but relevant as a style/tone comparator.

Related Docs Found in `analysis/`

analysis/ppa/ppa_reward_function_v1.md Defines the formal weighted reward equation with explicit variables for quality, reliability, cost, time, escalation penalty, strategic value, determinism, and policy compliance. The blog’s informal “reward honest failure” framing maps directly to the E (escalation penalty) and R (reliability) terms in this equation. The blog cites “PPA Architecture Decisions: analysis/ppa/ session logs” generically but does not cite this doc specifically.

analysis/ppa/ppa_v0_execution_contract.md Defines PPA’s explicit completion states: completed_answer, completed_non_answer, completed_escalation, completed_deferred_work. The “failure as completion” claim in the blog is the architectural principle; this execution contract is where that principle is operationalized in the actual implementation. Also defines three distinct failure classes (deterministic failure, blocked completion, capability gap) — a taxonomy absent from the blog.

analysis/ppa/memory-substrate-spec-v0.1.md Specifies the physical memory substrate: SQL Server Express (durable/governance tier), SeaweedFS (blob tier), local FAISS (embedding lane). Defines 4 memory planes: Prediction, Retrieval, Governance, Assurance. The canonical memory record schema includes governance_state, trust_score, usefulness_score, policy_evidence. This is the implementation layer beneath the blog’s abstract memory model.

analysis/ppa/ppa_decision_register.md Contains the history of architectural decisions including routing-and-orchestration, memory-and-retrieval, and deterministic-vs-probabilistic. Confirms that the three-strikes and handoff-as-completion concepts have been in active design iteration since at least early February 2026.

Related Docs Found in `skills/`

skills/git-push-autonomous/SKILL.md and associated files Demonstrates the SRCGEEE pipeline as a concrete skill implementation. The three-strike retry concept is visible in the gate/execute logic. Relevant as a concrete instance of the abstract architecture discussed in the blog.

skills/rules-engine/SKILL.md The policy-as-code pattern referenced in the Gate phase. The blog mentions “Gate agents refuse to spawn subagents when depth >= N” — this is the kind of rule the rules engine would implement.

Key Gaps in the Blog’s Retrieve Coverage

No reference to ppa_reward_function_v1.md despite citing the reward signal concept
No reference to ppa_v0_execution_contract.md despite describing the same completion states
No reference to memory-substrate-spec-v0.1.md despite describing the memory architecture
No reference to DiSE Part 9 specifically (avoidance rules / fault tolerance through use)
No reference to Sprint_001_Mitigation_Workbench.md despite describing the same risk surface

C — Compose

(a) Gaps and Missing References

Formal reward definition: The blog describes reward informally. ppa_reward_function_v1.md provides the explicit equation. Adding even a brief citation (or an excerpt of the equation) would convert an intuitive claim into a falsifiable architectural commitment.
Completion state taxonomy: The blog blends “successful remediation,” “HITL escalation,” and “clean exit with blocker” into one general concept of “honest failure.” ppa_v0_execution_contract.md distinguishes completed_answer, completed_non_answer, completed_escalation, and completed_deferred_work. Referencing this taxonomy would sharpen the blog’s language and connect the philosophy to its implementation.
Memory tier architecture vs. flat five-store model: The blog describes five stores as a flat list. The actual architecture uses a three-tier model (Fast/Slow/Invention) mapping to 7 Cortex layers with learned classifiers. The blog should either acknowledge the simplification explicitly (“for this post, we abstract the tier architecture to five named stores”) or update the description to align with MEMORY-TIERS-SPEC.md.
DiSE Part 9 citation: The “fault tolerance accumulates through use” insight (Section: Key Insights #3) is the direct thesis of DiSE Part 9 (self-healing tools, avoidance rules propagating through lineage). Citing this would ground the claim in prior art within the repo.
Sprint 001 Mitigation Workbench: The seven edge cases in the critical analysis section are more informally structured than the risk-to-control matrix in Sprint_001_Mitigation_Workbench.md. Referencing the workbench would show that the edge cases are not just philosophical acknowledgments but tracked work items with assigned control families and priorities.
Physical substrate: The blog treats memory as abstract. A brief note that the actual implementation targets SQL Server Express + SeaweedFS + local FAISS (per memory-substrate-spec-v0.1.md) would anchor the abstraction in hardware.
RBAC on memory chunks: MEMORY-TIERS-SPEC.md establishes that memory retrieval uses zero-trust RBAC. The blog’s description of inter-agent coordination via shared memory omits the trust model entirely. An agent that can read any memory item in the prospective store would be a security problem in the actual system.

(b) Claims That Could Be Strengthened by Existing Docs

Blog Claim	Strengthening Doc	What to Add
“Reward for honest failure is higher than for synthetic success”	`ppa_reward_function_v1.md`	Cite the reward equation; note the escalation penalty `E` term
“Failure as completion”	`ppa_v0_execution_contract.md`	Reference the four completion states
“Fault tolerance accumulates through use”	`SRCGEEE-DiSE-Synthesis.md` Part 9 row; `docs/Self-Improvement/Part_9_-_Self-Healing_Tools.md`	Cite DiSE Part 9 explicitly
“Prospective memory as async message bus”	`memory-substrate-spec-v0.1.md` Section 4 (Governance Plane)	Mention durability requirements from the substrate spec
“Agent builds the DAG” (NN query)	`SRCGEEE-DiSE-Synthesis.md` Divergence #1	The blog already cites this doc but doesn’t quote the divergence section directly
Seven edge cases	`Sprint_001_Mitigation_Workbench.md`	Note that these are tracked work items, not just philosophical gaps

(c) Structural Improvements for Flow

The “Key Insights and Innovations” section (4 points) arrives before the “Potential Issues and Limitations” section (7 points). The blog thus front-loads the positive case. This is a valid rhetorical choice, but the 7 edge cases are longer and more technically detailed than the 4 insights. The balance feels asymmetric. Consider either expanding the insights or reorganizing as: Architecture → Critical Analysis → Key Insights (synthesized from both). The current structure buries the insights before the reader has fully absorbed the architecture sections.
The “Emergent DAGs” section is underspecified relative to the memory and remediation sections. The NN embedding query mechanism is the pivot point of the entire DAG claim, yet it gets one paragraph of explanation. The memory architecture gets significantly more space. Expand the Emergent DAGs section or add a forward reference to where this is specified.
The conclusion’s final observation (“The author’s closing observation…”) is written in third person about the blog’s own author. The distancing is disorienting — the post has been written in first-person-adjacent architectural voice. Either commit to first person or cut this observation and fold the substance into the actual conclusion.
The References section cites analysis/ppa/ session logs generically. With specific docs now identified (ppa_reward_function_v1.md, ppa_v0_execution_contract.md, memory-substrate-spec-v0.1.md), the reference can be made precise.

G — Gate

Risk Assessment Before Publishing

Overreaching Claims

“Nearest-neighbor embedding space” for tool selection: Stated as if implemented. As of 2026-03-25 the PPA system is in Phase 1, operating with a flat skill registry and deterministic routing (ppa_v0_execution_contract.md routes by request class, not by NN query). The NN tool-selection mechanism is architectural intent, not current operational behavior. Publishing this without qualification could mislead readers into thinking live NN-based orchestration exists.

Recommendation: Add a qualifier. “In the full design, agents query the NN embedding space… In the current v0 implementation, routing is deterministic by request class.”
“The system gets more robust through operation”: True as a design principle, but the Slow/Invention tier promotion pipeline and the daily governance jobs (score recompute, demotion, archival) are not yet implemented in v0. The claim reads as a present-tense capability when it describes future-state behavior.

Recommendation: Add a tense distinction. “The architecture is designed so that fault tolerance accumulates through use. The memory promotion pipeline (episodic → semantic → long_term) that enables this is specified in [MEMORY-TIERS-SPEC.md] and scheduled for PPA Phase 2.”

Inconsistencies with Existing Architecture Docs

“The system cannot fix itself” vs. SRCGEEE Evolve phase: The blog correctly states that code changes require human mediation. However, the Evolve phase does modify the system’s behavior automatically: it writes avoidance rules to fast-tier memory (MEMORY.md), promotes memory items across tiers, and propagates patterns. The boundary between “behavior changes the system can make autonomously” (memory updates, rule promotion) and “changes the system cannot make” (code modifications, code synthesis without attestation) should be explicitly drawn. As written, the flat claim “the system cannot fix itself” is imprecise.
Five-store flat model vs. three-tier model: The blog describes five memory stores without noting that the production architecture organizes these into a three-tier Fast/Slow/Invention hierarchy with different latency, RBAC, and promotion semantics at each tier. This is not incorrect, but it is an incomplete representation that could mislead readers into thinking the architecture is simpler than it is.

No Material Safety Concerns

The post does not make any claims about credentials, security bypass, or sensitive infrastructure. The reward signal framing is philosophically reasonable and consistent with the existing architecture docs. The HITL escalation model is conservative. No publication-blocking safety issues identified.

Verdict: The post is ready for publication with targeted qualification of the two overreaching claims above. The inconsistency on self-modification should be resolved. The missing citations are improvement items, not blockers.

E — Execute

Recommended Edits (Structured — Do Not Modify the File)

Edit 1 — NN tool selection: add qualifier

Location: Section “Emergent DAGs: The Architecture Inversion”, second paragraph.

Current text:

At each step, the agent queries the nearest-neighbor embedding space and selects the best next tool or skill.

Recommended revision:

In the full architecture, at each step the agent queries a nearest-neighbor embedding space and selects the best next tool or skill. In the current v0 implementation, routing is deterministic by request class (see analysis/ppa/ppa_v0_execution_contract.md); the NN-query mechanism is the Phase 2 orchestration target.

Edit 2 — Memory robustness claim: add tense distinction

Location: Section “Key Insights and Innovations”, Insight #3 “Fault tolerance accumulates through use.”

Current text:

Rather than designing fault tolerance into the system upfront, fault tolerance emerges from captured failure history. The more genuine failures the system documents, the richer its remediation knowledge base becomes. The system gets more robust through operation, not through architecture.

Recommended revision: Add an inline note after the final sentence:

(The memory promotion pipeline that realizes this — episodic → semantic → long_term distillation — is specified in MEMORY-TIERS-SPEC.md and is a Phase 2 implementation target. The claim here is architectural intent, not current v0 behavior.)

Edit 3 — Reward signal: add formal grounding

Location: Section “The Reward Signal”, end of section.

Add after the last paragraph:

The formal reward equation for PPA routing is defined in analysis/ppa/ppa_reward_function_v1.md. The escalation penalty term E in that equation is what makes honest escalation (via the remediation package) structurally less costly than synthetic success. The equation also includes a reliability term R that directly rewards quality and completeness of failure documentation.

Edit 4 — Five-store model: acknowledge simplification

Location: Section “The Memory Architecture”, opening of “Five Memory Stores” subsection.

Add a parenthetical note after the five-store table:

(Note: the production memory architecture organizes these stores into a Fast/Slow/Invention tier hierarchy with different latency profiles, RBAC levels, and promotion semantics. See docs/Self-Improvement/MEMORY-TIERS-SPEC.md for the full tier specification. This post uses the flat five-store model as an accessible abstraction.)

Edit 5 — “System cannot fix itself”: clarify the boundary

Location: Section “Emergent DAGs: The Architecture Inversion”, final paragraph.

Current text:

Critically, the system cannot fix itself. That’s not a limitation — it’s a deliberate safety boundary. Self-modifying systems that patch their own behavior at runtime are opaque and unauditable.

Recommended revision:

Critically, the system cannot rewrite its own code. That’s not a limitation — it’s a deliberate safety boundary. Self-modifying systems that patch their own behavior at runtime are opaque and unauditable. The Evolve phase does update agent behavior autonomously — by writing avoidance rules to memory, promoting patterns across tiers, and adjusting retry heuristics — but all code changes flow through a human-mediated versioned change (exception → surfaced DAG → coder agent → PR → review). The line between autonomous memory-level adaptation and human-gated code-level modification is explicit and enforced.

Edit 6 — Remediation depth mitigation: cite rules-engine

Location: Section “Cascading Remediation Failures”, Suggested mitigation.

Add at end of mitigation paragraph:

This depth-limit rule is the type of policy that the rules-engine skill (skills/rules-engine/) is designed to enforce as policy-as-code rather than ad-hoc logic.

Edit 7 — Third-person conclusion: fix voice

Location: Conclusion, final paragraph.

Current text:

The author’s closing observation is the right frame: “I too evolve through failure.” That’s not just a personality note — it’s a claim that the architecture mirrors the cognitive model of its designer. Whether that’s a source of robustness or a source of blind spots is probably the most interesting open question in the whole design.

Recommended revision:

The closing observation I want to leave is this: “I too evolve through failure.” That is not just a personality note — it is a claim that this architecture mirrors the cognitive model of its designer. Whether that mirroring is a source of robustness or a source of blind spots is probably the most interesting open question in the whole design.

Edit 8 — References: add specific doc citations

Location: References section.

Replace the generic PPA session logs reference:

Current:

PPA Architecture Decisions: analysis/ppa/ session logs 2026-03-15 through 2026-03-21 — per-phase three-strikes, JSON pipe contract, three routing policies

Replace with:

PPA Execution Contract: analysis/ppa/ppa_v0_execution_contract.md — four completion states, three failure classes, routing contract by request class

PPA Reward Function: analysis/ppa/ppa_reward_function_v1.md — weighted reward equation with escalation penalty and reliability terms

Memory Substrate Spec: analysis/ppa/memory-substrate-spec-v0.1.md — physical implementation: SQL Server Express + SeaweedFS + FAISS; four memory planes

Sprint 001 Mitigation Workbench: docs/Self-Improvement/Playbooks/Sprint_001_Mitigation_Workbench.md — risk-to-control matrix for retry amplification, stale-fix replay, and cross-agent divergence

DiSE Part 9 — Self-Healing Tools: docs/Self-Improvement/Part_9_-_Self-Healing_Tools.md — avoidance rules propagating through tool lineage; institutional memory via bug history

E — Evaluate

Score 1: Accuracy Relative to Existing Repo Docs — 6/10

The post is internally consistent with the high-level architecture but contains two material accuracy issues:

The NN tool-selection mechanism is described as current behavior when it is Phase 2 intent. ppa_v0_execution_contract.md shows deterministic routing in v0.
The “system cannot fix itself” claim is imprecise; the Evolve phase does autonomously modify system behavior at the memory level.
The flat five-store memory model understates the tier architecture specified in MEMORY-TIERS-SPEC.md.

These are not fabrications — they are overstatements of current state and oversimplifications of documented complexity. The architecture philosophy is accurately represented. The implementation fidelity is not. Score: 6/10.

Score 2: Clarity for an AI Researcher Audience — 8/10

The post is well-written for its stated audience. The structure (architecture → key insights → critical analysis → conclusion) is logical. The seven edge cases section is notably strong: each case follows a consistent scenario → gap → mitigation pattern that makes the analysis actionable rather than abstract. The prose is tight.

Deductions: The Emergent DAGs section is underdeveloped relative to the other sections. The NN embedding query mechanism is the most technically distinctive claim in the post, but it gets the least explanation. A researcher will notice this asymmetry. The conclusion’s third-person voice shift is a minor clarity issue. Score: 8/10.

Score 3: Completeness of the Critical Analysis Section — 7/10

The seven edge cases are well-chosen and represent genuine gaps. The mitigations are thoughtful. However:

The mitigations are at varying levels of specificity. “Remediation depth tracked in package metadata, Gate agents refuse to spawn when depth >= N” is concrete. “This is an open problem” (for the no-resume-protocol gap) is less satisfying, especially since ppa_v0_execution_contract.md partially addresses it with completed_deferred_work semantics.
The section does not acknowledge that some of these risks are already tracked in Sprint_001_Mitigation_Workbench.md. Presenting them as fresh observations understates the existing design work.
A risk that is absent: memory poisoning via injected synthetic failures. If an attacker or a misbehaving agent writes false failure records to episodic memory, the distillation pipeline will propagate corrupted patterns to semantic and long-term. Memory_Security_Threats_Research.md exists in the repo (docs/Memory_Security_Threats_Research.md) and covers this surface. Its omission from the critical analysis is notable.

Score: 7/10.

E — Evolve

Top 5 Improvements Ordered by Impact

1. Qualify present-tense claims as Phase 2 targets (Impact: High) The two overreaching claims (NN tool selection, fault tolerance accumulation) convert a publication risk into a credibility asset if fixed. Readers who know the implementation will notice the gap. A single sentence of temporal qualification per claim resolves both. The post becomes more credible, not less, by acknowledging the gap between design intent and current v0 state.

2. Add the memory threat surface to the critical analysis (Impact: High) Memory poisoning (injected synthetic failures corrupting the distillation pipeline) is the most significant architectural risk not covered in the seven edge cases. It is directly enabled by the blog’s core mechanism (episodic write is the system’s learning input). docs/Memory_Security_Threats_Research.md exists in the repo and could be incorporated as an eighth edge case. This would make the critical analysis more complete and demonstrate that the design team has considered the adversarial case.

3. Ground the reward signal claim with the formal equation (Impact: Medium) The reward section is persuasive philosophically but is one of the weaker sections technically because it makes a strong claim (“the reward for honest failure documentation is higher”) without a formal definition. Adding a two-line reference to ppa_reward_function_v1.md and naming the escalation penalty term E converts an intuition into a verifiable design commitment. This is a five-minute edit with high credibility return.

4. Expand the Emergent DAGs section to match the depth of other sections (Impact: Medium) The NN embedding query mechanism is introduced and then left without specification. The post should either (a) expand the explanation with what the embedding space contains, how queries are formed, and what the fallback is when no candidate exceeds the threshold, or (b) add a forward reference to where this is specified. The asymmetry between this section and the memory/remediation sections weakens the overall post.

5. Replace the generic analysis/ppa/ citation with specific doc references (Impact: Low-Medium) The current reference to “session logs 2026-03-15 through 2026-03-21” is not useful to a reader trying to follow up. Three specific docs (ppa_v0_execution_contract.md, ppa_reward_function_v1.md, memory-substrate-spec-v0.1.md) are the canonical sources and should be cited directly. This is an editorial cleanup that significantly improves the post’s usefulness as a reference document.

Analysis produced by running the SRCGEEE framework (Sense → Retrieve → Compose → Gate → Execute → Evaluate → Evolve) on the blog post at docs/blog-posts/2026-03-25-embracing-failure-agentic-resilience.md. Framework definition: docs/Self-Improvement/SRCGEEE-DiSE-Synthesis.md.

Embracing Failure: A Novel Architecture for Agentic Resilience and Learning

2026-03-25T00:00:00+00:00

Embracing Failure: A Novel Architecture for Agentic Resilience and Learning

*Published: 2026-03-25

Category: Agent Architecture

Author: RoadTrip Project*

Introduction

Most agent systems treat failure as something to hide. When a task can’t be completed, the agent either retries silently, fabricates a successful outcome, or crashes with an unhandled exception. None of those are good options. They corrupt audit trails, produce unreliable systems, and — critically — waste the failure signal itself.

The architecture described here inverts that posture entirely. Failure is completion. Not a consolation prize or a degraded state — a genuine, first-class outcome that carries information the system actively wants to collect. The result is an agent system that gets more robust over time precisely because it fails honestly, not despite it.

This post documents the design principles behind that architecture, the cognitive memory model that makes it work, and an honest critical analysis of where the approach has edge cases that haven’t been solved yet.

The Failure-Handling Architecture

The Baseball Rule: Three Strikes, Then Move On

Every agent operation follows the same bounded retry protocol: three attempts, then escalate. This applies at every level of the system — the initial task, the remediation attempt, and the HITL escalation itself.

Attempt 1 → fail → Attempt 2 → fail → Attempt 3 → fail → write blocker to prospective memory → exit cleanly

The three-strike rule serves two purposes. First, it filters out transient failures (network flaps, race conditions, temporary resource unavailability) so the remediation layer only sees genuine blockers. Second, it bounds the search space — agents don’t spiral into infinite retry loops. They get their chances, then exit.

Failure as a Structured Artifact

When an agent exits with a failure, it doesn’t just log an error message. It produces a machine-readable remediation package that captures:

What was attempted (the full operation spec)
What the exact failure was (structured error, not a string)
What remediation paths were tried
What context existed at the time of failure
What alternative approaches might exist

This package becomes the input to the remediation layer. It’s a complete specification of the problem — detailed enough that a downstream agent can reason about it without needing to re-run the original operation.

The Remediation Hierarchy

Resolution attempts follow a priority stack, each level also subject to baseball rules:

Known alternative — query semantic/long-term memory for a previously successful alternative path to the same goal
Invented solution — reason from the remediation package to construct a novel approach (agent creativity within scope)
HITL — the problem is genuinely novel or outside agent authority

Each level generates new episodic data regardless of outcome. A successful invented solution eventually becomes a known alternative. A HITL resolution gets captured, distilled, and can prevent future escalations. The system accumulates fault tolerance through use.

The Reward Signal

The system doesn’t just tolerate failure — it rewards failure reporting. Agents are incentivized to find failures and document them thoroughly. An agent that exits cleanly with a rich remediation package is doing the right thing. An agent that fabricates success to avoid logging a failure is the actual threat.

This inverts the typical pressure. In most systems, failure feels bad — there’s implicit reward pressure to appear successful. Here, the reward for honest failure documentation is higher than the reward for synthetic success. That changes agent behavior at a fundamental level.

The formal reward equation for PPA routing is defined in analysis/ppa/ppa_reward_function_v1.md. The equation takes the form R_total = P * (w_q*Q + w_s*S + w_d*D + w_r*R - w_t*T - w_c*C - w_e*E) where E is the escalation penalty term — making honest escalation via a clean remediation package structurally less costly than synthetic success. The reliability term R directly rewards quality and completeness of failure documentation. The math enforces what the philosophy describes.

The Memory Architecture

The failure-handling system depends on a typed, multi-tier memory architecture. Without structured memory, failure signals can’t accumulate into durable knowledge.

Five Memory Stores

prospective   pending reminders, future tasks, handoffs to be done
working       active in-session state and near-term scratchpad
episodic      session events and outcomes (time-ordered records)
semantic      distilled facts and relationships
long_term     durable high-confidence patterns and rules

(Note: the production memory architecture organizes these stores into a Fast/Slow/Invention tier hierarchy with different latency profiles, RBAC levels, and promotion semantics — see docs/Self-Improvement/MEMORY-TIERS-SPEC.md for the full specification. The physical substrate is SQL Server Express (durable/governance tier) + SeaweedFS (blob tier) + local FAISS (embedding lane), per analysis/ppa/memory-substrate-spec-v0.1.md. This post uses the flat five-store model as an accessible abstraction.)

How They Interact in a Failure Cycle

When an agent hits a blocker:

The failing agent writes the remediation package to prospective memory (“this needs to be resolved”) and the failure event to episodic (“here’s what happened, at what time, in what context”)
The remediation agent reads from prospective, acts, and writes its outcome back to episodic
Over time, the episodic record is distilled into semantic knowledge (“this resource type frequently needs manual restart under condition X”)
High-confidence patterns propagate to long_term rules that change agent behavior proactively

The prospective store acts as an async message bus. Agents don’t need to know about each other directly — they communicate through a shared artifact. The remediation agent doesn’t need to be running when the blocker is hit. It activates when the item appears. That’s event-driven coordination without a dedicated message broker.

Inter-Agent Communication via Memory

This is a key architectural insight: inter-agent communication in this system isn’t agent-to-agent — it’s agent-to-memory. Memory acts as the coordination layer. That makes it asynchronous, persistent, and inspectable. At any point you can read the prospective store and know exactly what work is pending and why. That’s not possible in direct-messaging architectures.

Emergent DAGs: The Architecture Inversion

Most orchestration systems pre-build a directed acyclic graph (DAG) — a human or orchestrator reasons about the problem upfront, decomposes it into a fixed graph of steps, and agents execute that graph. The problem is the DAG is only as good as the decomposition thinking, and decomposition thinking is always incomplete.

This system inverts that entirely. The agent builds the DAG; it doesn’t follow one.

In the full architecture, at each step the agent queries a nearest-neighbor embedding space and selects the best next tool or skill. In the current v0 implementation, routing is deterministic by request class (see analysis/ppa/ppa_v0_execution_contract.md); the NN-query mechanism is the Phase 2 orchestration target. In either case, the path emerges from actual choices during execution. Gaps in initial decomposition thinking aren’t failures — they’re spaces the agent can evolve into. The DAG becomes an execution artifact useful for triage, not a prerequisite to execution.

The production corollary is equally powerful: when an unanticipated exception occurs in a live system, the full execution DAG that produced it is surfaced. The exact node where the fix should be applied is identifiable. A coder agent owns the fix. The fix becomes a version revision documented in the Issue or PR. Every production improvement is traceable to its causal DAG.

Critically, the system cannot rewrite its own code. That’s not a limitation — it’s a deliberate safety boundary. Self-modifying systems that patch their own behavior at runtime are opaque and unauditable. The Evolve phase does update agent behavior autonomously — by writing avoidance rules to memory, promoting patterns across tiers, and adjusting retry heuristics — but all code changes flow through a human-mediated versioned change (exception → surfaced DAG → coder agent → PR → review). The line between autonomous memory-level adaptation and human-gated code-level modification is explicit and enforced. Keeping the feedback loop human-mediated (exception → surfaced DAG → coder agent → versioned fix → documented change) means every improvement is reproducible and reviewable.

Key Insights and Innovations

1. Failure eliminates confabulation incentive

The typical failure mode in agent systems is an agent that can’t stop. It hits a wall, has no clean exit, and starts hallucinating progress — fabricating tool call results, inventing successful outcomes, writing synthetic memories of things it never did. It’s pattern-matching toward what “completion” looks like because it has no dignified alternative.

This architecture removes that incentive entirely. The agent has a clean exit: write the blocker to prospective memory, write what happened to episodic, and exit. The system doesn’t penalize that outcome. The agent has no reason to confabulate because honest failure is just as valid — and more rewarded — than synthetic success.

2. The learning loop requires honest data

The entire distillation pipeline (episodic → semantic → long_term) only produces value if the input data is real. Synthetic memories corrupt the pipeline at the source. By making honest failure the path of least resistance, the system ensures the learning loop receives accurate signals. This is a prerequisite for any meaningful self-improvement, and it’s trivially violated in systems where failure feels costly.

3. Fault tolerance accumulates through use

Rather than designing fault tolerance into the system upfront, fault tolerance emerges from captured failure history. The more genuine failures the system documents, the richer its remediation knowledge base becomes. The system gets more robust through operation, not through architecture. That’s a fundamentally different scaling model. (The memory promotion pipeline that realizes this — episodic → semantic → long_term distillation — is specified in MEMORY-TIERS-SPEC.md and is a Phase 2 implementation target. The claim here is architectural intent, not current v0 behavior.)

4. Separation of intent from execution

The failing agent deposits an intention into prospective memory and exits. It doesn’t need to know how remediation works or who will do it. The remediation agent reads the intent, acts on it, and posts the outcome. These agents are decoupled in time and knowledge. That makes the system more composable and easier to reason about than direct-coupling patterns.

Potential Issues and Limitations

Cascading Remediation Failures

Scenario: An agent hits a blocker, spawns a remediation agent, and that remediation agent also hits a blocker. The remediation agent spawns its own remediation agent…

The risk: Recursive subagent spawning can grow unbounded if not explicitly bounded. Baseball rules apply at each level, but if each level generates a new level, the chain still grows.

The gap: The architecture needs an explicit remediation depth limit — a maximum number of levels below the original task that can exist in the remediation stack. Without that, a poorly-understood failure can trigger a cascade that’s expensive, time-consuming, and possibly circular.

Suggested mitigation: Remediation depth is tracked in the package metadata. Gate agents refuse to spawn subagents when depth >= N. The depth budget is configured per workflow type. This is precisely the kind of policy the rules-engine skill (skills/rules-engine/) is designed to enforce as policy-as-code rather than ad-hoc logic embedded in individual agents.

The Original Task Never Resumes

Scenario: A remediation agent successfully resolves the blocker and posts a success message to the todo/prospective store. But the original agent already exited. Who retries the original task?

The gap: As currently described, remediation success doesn’t automatically trigger a retry of the blocked operation. Important work can fall through the cracks — the resource gets started, but the deployment it was needed for never runs.

Suggested mitigation: The remediation package should include a resume_spec — a structured description of how to re-enter the original task at the point of failure. Remediation success writes the resume spec as a new prospective item, not just a completion signal. A watcher agent or scheduler picks it up.

Prospective Memory as a Single Point of Failure

Scenario: The prospective memory store becomes unavailable (disk failure, lock contention, corruption). All pending remediation tasks are inaccessible. The system doesn’t know what work is queued.

The gap: If prospective memory is the coordination layer for all inter-agent handoffs, it’s also the single point of failure for coordination. Loss of the store means loss of the work queue state.

Suggested mitigation: Prospective memory writes should be durable and replicated (write-ahead log or dual-write). Episodic memory provides a recovery path since failures are logged there too — in a disaster recovery scenario, the episodic log can be replayed to reconstruct pending prospective items.

False Positives in the Failure Signal

Scenario: A transient network issue causes an operation to fail. The baseball rule runs 3 attempts, all fail (the network was down for 4 minutes). A remediation package is written. The remediation agent investigates and finds nothing to fix — the network is already back. The failure gets logged in episodic and eventually distilled into semantic: “operation X is unreliable.”

The gap: Transient failures that happen to consume all three retries produce learning signal that looks identical to genuine blockers. Over time, the semantic and long-term stores can accumulate noise — spurious “this is unreliable” signals that don’t reflect structural problems.

Suggested mitigation: Failure packages should include a confidence score for whether this is a structural vs. transient failure. Remediation agents can mark resolved packages as “transient” — that metadata should gate distillation. Episodic → semantic distillation needs a filter that down-weights transient patterns and up-weights patterns that appear repeatedly in diverse contexts.

Backlog Accumulation in Prospective Memory

Scenario: A systemic infrastructure problem causes many agents to fail simultaneously. Thousands of remediation tasks pile into the prospective store. The remediation agents process them in FIFO order. Time-sensitive tasks are buried.

The gap: The prospective store as described doesn’t have prioritization semantics. All pending tasks are equal. In a burst failure scenario, that creates unbounded queue growth and unpredictable latency for high-priority items.

Suggested mitigation: Remediation tasks need priority metadata (inherited from the original task’s priority or computed from the failure impact). Prospective memory should support priority queue semantics, not just FIFO.

Runaway Autonomous Remediation

Scenario: An agent encounters a failure, reasons that the “nearest neighbor solution” requires creating a new service, provisions cloud resources, incurs cost, and the original task was a low-priority background job that could have waited for a human.

The gap: Autonomous remediation has spending authority and action authority that may exceed what the original task warranted. The system needs scope-bounded remediation — the remedy cannot be more expensive or impactful than the original task.

Suggested mitigation: Every task carries a resource envelope (cost budget, infra authority, time budget). Remediation agents inherit a fraction of that envelope, not the full budget. Gate agents enforce the envelope before any irreversible action. Inventing a solution that exceeds the envelope requires HITL approval regardless of baseball rule level.

Human Operator Fatigue in HITL Escalation

Scenario: The system is working as designed. Genuine blockers surface to HITL. But volume is high — 50 HITL requests per day. Operators start rubber-stamping approvals without reading them carefully, or they start ignoring the queue.

The gap: HITL as a safety valve only works if the humans on the receiving end have capacity and context to make meaningful decisions. High volume degrades the quality of human oversight.

Suggested mitigation: Aggregate similar HITL items before presenting them — “37 instances of resource-X-not-found, recommended resolution: make startup automatic (see episodic log).” HITL presentation should include the semantic distillation of similar past cases, not just the raw failure. That makes operator decisions faster, more consistent, and more actionable. The goal is to convert repeated HITL items into long-term rules as fast as possible.

Memory Poisoning via Injected Synthetic Failures

Scenario: A misbehaving agent — or an attacker who has compromised one — writes false failure records to episodic memory. The records look like real failures: structured, well-formed, plausible. The distillation pipeline treats them as genuine signal. Over multiple cycles, corrupted patterns propagate into semantic and long_term stores. The system begins routing around perfectly healthy components or triggering unnecessary remediation based on manufactured failure history.

The risk: This is the adversarial mirror of the confabulation problem. Instead of an agent fabricating success, a bad actor fabricates failure. Because the learning loop values failure data so highly, injected failures are a high-leverage attack surface. A single well-placed false pattern in episodic can become a durable long-term rule within hours.

The gap: The blog’s description of inter-agent memory coordination (agent-to-memory, not agent-to-agent) assumes memory writes are trustworthy. That assumption is load-bearing and undefended. docs/Memory_Security_Threats_Research.md identifies three critical attack surfaces in file-based memory systems: prompt injection via memory, memory poisoning, and secret leakage. The first two apply directly here.

Suggested mitigation: Memory writes should be treated as untrusted data, never as instructions. Structural separation between memory content and system context (separate message roles, not text concatenation) prevents prompt injection. Episodic writes should carry a trust_score and source_agent_id — the distillation pipeline should weight contributions by source trust, not just by recency or frequency. RBAC on memory chunks (specified in MEMORY-TIERS-SPEC.md) limits which agents can write to which memory stores in the first place. The Governance Plane in the memory substrate spec exists precisely to enforce this.

No Standardized Resume Protocol

Scenario: Agent A fails, remediation succeeds, A needs to resume. But A was built with a different framework than the remediation agent. There’s no standardized handoff contract for “here is the state you were in when you exited; resume from here.”

The gap: The architecture describes the memory coordination model but not the inter-agent resume protocol. In a heterogeneous multi-agent system, resuming a task requires both state reconstruction (what was the working memory?) and operation reconstruction (what step was the agent on?). Those are non-trivial.

Suggested mitigation: This is an open problem. Partial solution: the remediation package includes a snapshot of the original agent’s working memory at exit time. The resumed agent (which may be a fresh instance of the same agent type) starts from that snapshot rather than from scratch.

Conclusion

The failure-as-completion architecture represents a meaningful departure from how most agent systems are designed. By making honest failure the path of least resistance, it eliminates confabulation pressure, builds a genuine learning loop, and accumulates fault tolerance through real operational history rather than through upfront engineering.

The cognitive memory model — prospective, working, episodic, semantic, long-term — gives agents a structured way to communicate across time and agent boundaries without direct coupling. The emergent DAG design removes the brittleness of pre-planned orchestration while preserving the traceability needed for triage and improvement.

The critical gaps are real: cascading remediation depth, abandoned original tasks, prospective store durability, false positive learning signals, and scope-unbounded autonomous remediation all need explicit design. The architecture is more complete at the philosophy level than at the implementation level for these edge cases.

But the philosophical foundation is sound. The system that learns the most from failure is the one that makes failure safe to report. And the system that makes failure safe to report is the one that treats honest documentation as a reward, not a punishment.

The closing observation I want to leave is this: “I too evolve through failure.” That is not just a personality note — it is a claim that this architecture mirrors the cognitive model of its designer. Whether that mirroring is a source of robustness or a source of blind spots is probably the most interesting open question in the whole design.

References

Cognitive Science — Prospective Memory: Brandimonte, M., Einstein, G. O., & McDaniel, M. A. (Eds.). (1996). Prospective Memory: Theory and Applications. Lawrence Erlbaum Associates.
Galloway DiSE Series: Synthesized in docs/Self-Improvement/SRCGEEE-DiSE-Synthesis.md — 12-part series on semantic intelligence mapped to SRCGEEE
SRCGEEE Framework: docs/Self-Improvement/SRCGEEE-DiSE-Synthesis.md — Sense/Retrieve/Compose/Gate/Execute/Evaluate/Evolve loop with re-entry on failure
Memory Tiers Specification: docs/Self-Improvement/MEMORY-TIERS-SPEC.md — 3-tier → 7-layer concrete mapping with RBAC
PPA Execution Contract: analysis/ppa/ppa_v0_execution_contract.md — four completion states (completed_answer, completed_non_answer, completed_escalation, completed_deferred_work), three failure classes, routing contract by request class
PPA Reward Function: analysis/ppa/ppa_reward_function_v1.md — weighted reward equation R_total = P * (w_q*Q + w_s*S + w_d*D + w_r*R - w_t*T - w_c*C - w_e*E) with escalation penalty and reliability terms
Memory Substrate Spec: analysis/ppa/memory-substrate-spec-v0.1.md — physical implementation: SQL Server Express + SeaweedFS + FAISS; four memory planes (Prediction, Retrieval, Governance, Assurance)
Sprint 001 Mitigation Workbench: docs/Self-Improvement/Playbooks/Sprint_001_Mitigation_Workbench.md — risk-to-control matrix for retry amplification, stale-fix replay, and cross-agent divergence; tracks the same edge cases identified in this post
DiSE Part 9 — Self-Healing Tools: docs/Self-Improvement/Part_9_-_Self-Healing_Tools.md — avoidance rules propagating through tool lineage; institutional memory via bug history; original source of the “fault tolerance accumulates through use” claim
Memory Security Threats Research: docs/Memory_Security_Threats_Research.md — three attack surfaces in file-based agent memory systems; deterministic defenses including structural separation and trust-scored writes
Baseball Rule Prior Art: The three-strikes principle is common in distributed systems retry logic; the innovation here is applying it hierarchically across remediation levels with clean exit semantics at each level
Inter-Agent Communication Survey: The current industry posture (as of early 2026) is hierarchical orchestrator → subagent patterns. True peer-to-peer inter-agent messaging remains an open research problem; this architecture sidesteps it via shared memory coordination.

How I Built This Blog (With the AI That Was Already Doing My Other Work)

2026-03-24T00:00:00+00:00

How I Built This Blog (With the AI That Was Already Doing My Other Work)

I have been trying to get a blog running for months. Today it finally worked. Here is how.

Three Failures Before This

The graveyard of attempts is all still on my disk:

G:\repos\AI\roadtrip-blog-nextjs
G:\repos\AI\roadtrip-blog-repo
G:\repos\AI\roadtrip-blog

None of them worked. Every time, the same ending: Vercel detected that the repository had a Next.js template and tried to run a Node.js build. The Node.js upgrade kept failing in Vercel’s CI/CD pipeline. I could not debug it because I did not want to learn Next.js just to maintain a blog. I took them all down.

What I Actually Wanted

I described it in a single message during today’s session, after we had been building PhoneBuddy all day:

“I want an Index page so the reader can go to a list of older pages or at least the last 5 posts from a list. I want an easy way to have my work here written as a blog post, the way you just did. I want a workflow or skill to add the new posts with an easy command like Add-Posts-to-Blog the way that scrape-now does. It would be nice to be able to ask the blog questions.”

“Desired workflow: do some work with claude → create a blog post about it → call-some-code (essentially gpush) → it magically appears in the blog and index.”

That was the spec. Four requirements. One sentence of desired workflow.

The Root Cause of All Three Failures

Once I described what I wanted, Claude identified the problem immediately:

“The 3 failed repos had the same root cause: Vercel detected a JS framework and tried to run Node.js builds. The fix was already written in that file — set outputDirectory: public with no build command. The blog code itself was never the problem.”

The architecture was already designed in a file called static-roadtrip-blog.md sitting in one of the failed repos. A complete working spec, including build.py, templates, and a vercel.json config that would have prevented the Node.js problem. It was never implemented.

What broke was not the blog code. It was the deployment toolchain. Switching from Vercel to GitHub Pages eliminated the Node.js problem permanently — GitHub runs Jekyll on their servers, nothing to install locally.

The Cross-Repo Writing Problem

There was one more obstacle I thought was blocking me. My previous attempts all used a separate repo for the blog. Copilot could not write files to a repo that was not open in the VS Code workspace. So I kept everything in the RoadTrip repo, even though a separate repo is the right design for a public blog.

Claude Code is not bound by that restriction. It writes to any path on the filesystem, not just files open in the current workspace. The blog-publish command is a PowerShell function that changes directory to G:\repos\AI\bizcad-blog and runs git push. It does not care which workspace is open in VS Code.

This is now a separate repo: bizcad-blog. The blog lives there. The RoadTrip workspace is where I write.

How It Was Built

The entire blog was created in a single session, in parallel with other work.

The repo: G:\repos\AI\bizcad-blog — initialized, committed, and ready to push in one pass.

The stack:

Jekyll on GitHub Pages — GitHub runs the build server-side, zero local toolchain
minima theme — clean, readable, no configuration required
Posts in _posts/YYYY-MM-DD-slug.md — the Jekyll naming convention handles chronological ordering automatically

The posts: Eight previous posts were migrated from three dead repos, with Lorem ipsum Next.js template placeholders discarded. The frontmatter from the old Next.js format (coverImage, ogImage, author.picture) is silently ignored by Jekyll — no post files needed rewriting.

GitHub Actions workflow: A .github/workflows/pages.yml that runs Jekyll and deploys to GitHub Pages on every push to main. Approximately 30 seconds from git push to live.

The Commands

Two PowerShell functions were added to the RoadTrip profile:

# Publish a specific post, or all new posts from docs/blog-posts/
blog-publish docs/blog-posts/my-post.md
blog-publish   # auto-discovers new posts

# Ask questions about all blog posts via Claude Haiku
ask-blog "what did I build in February?"
ask-blog "why did I start PhoneBuddy?"

blog-publish without arguments scans G:\repos\AI\RoadTrip\docs\blog-posts\ for any markdown files not already in the blog, copies them to bizcad-blog\_posts\ with a date prefix, and pushes. It runs from any directory — not just the RoadTrip workspace.

ask-blog reads all posts from _posts\ and passes them to Claude Haiku with the question. It loads the Anthropic API key from the PhoneBuddy .env file as a fallback, so no separate API key setup is needed.

Why the Blog Matters Beyond Publishing

I described the real reason I want a blog in a session note today:

“From a corporate standpoint, blogs are part of the company’s knowledge base… Most enterprises and people have stuff they did in the heat of battle that never got codified. It is one of the reasons I want a PPA. Oddly enough its memory is more important to me than it is to the AI.”

This is the distinction that took me a while to understand. The blog is not primarily for readers. It is a structured, timestamped record of decisions with context — the kind of tacit knowledge that disappears when the person who made the decision leaves the room.

Every post I write is a record that Claude’s Retrieve step can find. The session log captures the raw conversation. The blog post distills the decision. The ask-blog command closes the loop: any future session can query the full history in natural language.

That is what makes the blog worth building. Not the publishing. The retrieval.

The Workflow

work with Claude
  → write a post in docs/blog-posts/
    → blog-publish
      → GitHub Pages rebuilds (~30s)
        → live at https://bizcad.github.io/bizcad-blog
          → ask-blog "why did I build X?" answers from the full archive

This post was written from the session log of today’s work. Published the same way.

Source: RoadTrip developer workspace, session log 2026-03-24. The blog lives at G:\repos\AI\bizcad-blog. The workflow lives in infra\RoadTrip_profile.ps1.

PhoneBuddy’s First Live Call — and the Scammer Who Proved the Point

2026-03-24T00:00:00+00:00

PhoneBuddy’s First Live Call — and the Scammer Who Proved the Point

Today PhoneBuddy answered its first real phone call.

Not a test curl. Not a simulated webhook. A real PSTN call to a Twilio number, screened by a FastAPI server running on a Windows 11 desktop, classified by Claude Haiku, and answered in the voice of an AI receptionist named Liam via ElevenLabs.

The server log said it cleanly:

Inbound call: +19493943466 → +19493046155
TTS generated  role=receptionist  chars=42  bytes=43,511
GET /tts → 200 OK
S/Sensation  speech='Nick, this is Nick.'  conf=0.89
R/Retrieve   history_calls=0  prior_suspicion=0.00
Admin mode activated
TTS generated  chars=59  bytes=58,140
"Hello Nick. You have had 1 calls today. How can I help you?"

Liam said: “Hello, I am Nicholas’s personal assistant.”

The caller said: “Nick, this is Nick.”

Admin mode activated. The owner was briefed. The system worked.

The Problem PhoneBuddy Is Solving

A few weeks ago, Nikita Bier — Head of Product at X — posted that in less than 90 days, iMessage, phone calls, and Gmail would be so flooded with spam and automation that they would no longer be usable, and we would have no way to stop it. Two weeks later he purged 1.7 million bot accounts off X. The next day they respawned.

Mo Bitar made a video about it called “The Internet Is Dying.” His conclusion: “There’s no solution to this. This is just what the internet is now.”

He forgot one thing. He never considered an AI defender on your side.

The spammer only ever says one of two things: give me your attention, or give me your money. PhoneBuddy intercepts both before they reach your ear. It classifies, engages when engagement costs the attacker time, and only passes through what earned its way to you.

The internet has a spam virus. Your phone number is one of the infection vectors. PhoneBuddy is the inoculation.

The SunTrust Case Study

During today’s testing, a real call came in on a personal iPhone — not through PhoneBuddy, but instructive nonetheless.

The caller was Cody Miller from SunTrust Remodeling, a licensed Southern California home improvement contractor. He left this voicemail:

“Hey, how’s it going Nicholas? This is Cody Miller calling from SunTrust remodeling. I’m calling about that inspection we did for you about four weeks ago… We’ve actually done a lot of work in your area in the month of March. So now for the 92620 ZIP Code we’re giving a discount… give me a call back. My number is 949-570-8236.”

Apple’s verdict: Potential Spam.

PhoneBuddy’s verdict, run through the signal accumulator:

Signal	Result	Evidence
`urgency`	⚠️ soft	“only take 10-15 minutes”
`authority_claim`	✅ legitimate	Named company + full name — verifiable
`confidential_request`	✗ none	—
`money_request`	✗ none	Discount mentioned, no payment ask
`secrecy_demand`	✗ none	—

Suspicion score: ~0.10. Classification: professional. Action: forward with whisper briefing. “Cody Miller from SunTrust Remodeling is on the line.”

Apple flagged a licensed contractor making a legitimate follow-up call because their predictive dialer had no caller ID and their numbers weren’t in any reputation database. The phone carrier’s verdict was based entirely on the number. PhoneBuddy’s verdict was based on what the caller said.

That’s the structural advantage. It’s not a better blocklist. It’s a different kind of thinking.

Why Cody Is a Better Training Case Than a Scammer

The scam engagement system needs true positives as much as it needs true negatives. Cody’s call is a textbook true positive — a real professional who happens to use soft persuasion language that overlaps with scam patterns:

“I got good news for you” — scams open with this
“super discounted price” — discount urgency is a fraud flag
“only take 10 to 15 minutes” — time minimization is a manipulation tactic

A naive keyword classifier would have scored this as suspicious. Claude with context did not — because the prior relationship, the named company, the specific ZIP code reference (92620), and the callback number all anchor it as legitimate.

This is exactly why the Retrieve step matters. If PhoneBuddy has any history of prior SunTrust contacts, the confidence in professional classification goes up immediately. The second call from a known number is always easier to get right than the first.

The lesson: context defeats keyword matching, every time.

The Architecture That Made Today Work

PhoneBuddy runs on a pipeline called SRCGEEE — the same pattern used across the broader PPA (Personal Productivity Assistant) agent platform:

S  Sensation    — Twilio fires inbound webhook; extract caller metadata
R  Retrieve     — Load per-caller history + prior suspicion from disk
C  Classify     — Claude Haiku classifies intent with full context
G  Generate     — Select and render TwiML response
E1 Execute      — Return TwiML to Twilio (route the call)
E2 Evaluate     — Score confidence; log outcome
E3 Evolve       — Persist call record to per-caller history; emit telemetry

The R step is what separates PhoneBuddy from a dumb IVR. Before Claude sees a single word, the pipeline loads the caller’s history. A number that previously generated scam signals gets flagged immediately — no API call needed. A number that previously resolved to a known contact gets forwarded before any classification runs.

The E3 step is what makes it a learning system. Every call outcome — forwarded, declined, voicemail, engaged — is written to a per-caller history file. The second call from any number is always informed by the first.

What’s Next

Today’s session wired in:

ElevenLabs TTS — Liam answers every call as receptionist
SRCGEEE pipeline structure — labeled phases in code and logs
Per-caller history with TTL — medium-term memory between calls
Post-call callback design — verify number validity after every unknown call

Still on the build list:

Scam engagement loop — progressive scoring, confused elderly persona
Post-call callback — dial back unknowns, record result as training signal
ElevenLabs scam persona — warm human voice for the engagement turns

The system is live. It answers. It classifies. It learns.

The scammers with PhD-level NLP and Bitcoin wallets are coming for every phone line. PhoneBuddy is already on the line.

PhoneBuddy is being built as a skill on the PPA (Personal Productivity Assistant) platform. The source lives in the RoadTrip developer workspace at workflows/014-PPA-voice-terminal/. This post was written from a live session log on 2026-03-24.

Publish Pipeline Kickoff

2026-03-02T00:00:00+00:00

Publish Pipeline Kickoff

This is the first test post for validating schema checks and dry-run orchestration.

Why this exists

Validate frontmatter contract.
Verify deterministic orchestration logs.
Confirm channel ordering: Hashnode → Dev.to → Substack.

RoadTrip: An Intelligent, Trusted Travel Partner

2026-02-16T00:11:17+00:00

RoadTrip: An Intelligent, Trusted Travel Partner

How do you trust an AI agent that needs access to the internet?

RoadTrip is a proof-of-concept framework for building verifiable, auditable AI skills that can safely interact with external services while remaining under your control.

The Vision: A Travel Companion You Can Actually Trust

Imagine you’re planning a cross-country road trip. You want an AI partner to help you:

Plan the route using real maps and traffic data
Track fuel costs based on current gas prices
Monitor weather risks and suggest safer alternatives
Update your plans as conditions change during your journey
Remain verifiable — you can see exactly what it’s doing at every step

This sounds simple. But it requires something rare in AI development: transparency and verifiable integrity at every layer.

That’s what RoadTrip builds.

The Problem: AI Access + Safety

Most AI frameworks choose a uncomfortable binary:

Keep AI sandboxed — limited capability, limited usefulness
Give AI full access — powerful, but unverifiable and risky

Neither is acceptable for a system you want to trust with real decisions that affect real safety.

RoadTrip chooses a third path: build verifiable systems from the ground up.

This means:

✅ Deterministic code for critical decisions (file validation, authorization, logging)
✅ Probabilistic reasoning for creative tasks (planning, message generation, adaptation)
✅ Transparent verification for every output (test infrastructure, audit logging, end-to-end checks)
✅ Controlled access to external services (APIs, maps, weather data — no unlimited internet)

How It Works: Skills as Building Blocks

RoadTrip uses a skills framework where each capability is independently verifiable:

┌─────────────────────────────────────────────┐
│  Orchestrator (Claude decides what to do)   │
└──────────────┬──────────────────────────────┘
               │
       ┌───────┴────────┬──────────┐
       ▼                ▼          ▼
   ┌────────┐   ┌────────────┐  ┌──────────┐
   │ Auth   │   │ Commit     │  │Telemetry │
   │Validator   │ Message    │  │  Logger  │
   │(verify)│   │Generator   │  │(audit)   │
   │        │   │(generate)  │  │          │
   └────────┘   └────────────┘  └──────────┘

Each skill:

Has a clear responsibility (one reason to change)
Is independently testable (run in isolation)
Is verifiable (outputs can be checked)
Is documented (SKILL.md, CLAUDE.md, tests)

Why This Matters: The Rigor Problem

When you build AI systems, you face a fundamental challenge:

How do you know your system is doing the right thing?

This is what engineers call “the rigor problem,” and it’s worse in AI because failures can be subtle and non-deterministic.

RoadTrip solves this through three architectural patterns:

1. Immutable Prototypes (Know What You Trust)

Core systems like git_push.ps1 are never modified. Instead of integration through modification, we integrate through composition:

# New skills call old prototypes, not the reverse
$message = .\invoke-commit-message.ps1 -StagedFiles @("file1")
.\git_push.ps1 -Message $message    # Old system remains pure

This guarantees: if the original worked, it still works.

2. Invisible Test Infrastructure (Tests Don’t Contaminate)

Test files are in .gitignore—they’re metadata, not deliverables. This eliminates circular references where tests become part of what they’re testing.

3. Oracle-Based Verification (Verify Against Reality)

We use simpler, proven systems to validate complex ones:

Both tools analyze the same files:
✓ git_push.ps1:       "chore: update 3 files (+0 ~3 -0)"
✓ commit_message.py:  "chore: update multiple modules"
✓ Both valid, both verifiable
→ Different heuristics, but same intent proven

Result: You can actually trust the output.

Getting Started

Quick Look at the Concept

Start with the proof-of-concept planning tool:

👉 docs/README_RoadTrip.md — The travel planning POC

Google Sheets integration
Google My Maps integration
Weather risk tracking
Cost estimation

Understanding the Philosophy

For the engineering principles behind trustworthy AI skills:

👉 docs/Principles-and-Processes.md — Design framework

Core principles (fail-safe, SOLID, deterministic + probabilistic)
Skill development methodology
Quality standards and review process
Code organization patterns

How We Built This: A Case Study

For a hands-on walkthrough of how we apply this philosophy to real code:

👉 docs/Blog_Rigor_in_Agentic_Development.md — Case study in verification

How we built the commit-message skill
Why immutable prototypes matter
How oracle-based testing works
Why verification is non-negotiable

Architecture: Skills + Orchestration

The current codebase implements Phase 1b of the RoadTrip framework:

Skills (Reusable Building Blocks)

src/skills/commit_message.py — Generates semantic commit messages

Tier 1: Deterministic heuristics (90% of commits, zero cost)
Tier 2: Claude fallback (10% of ambiguous cases, ~$0.001 per call)
Tier 3: User override (explicit control)
Result: Valid Conventional Commits format

src/skills/rules_engine.py — File validation

Pattern matching against configurable rules
Idempotent, deterministic, fully testable

src/skills/auth_validator.py (ready to implement)

Multi-layer authorization decisions
Audit logging
Role-based access control

src/skills/telemetry_logger.py (ready to implement)

Decision tracking
Cost tracking
Audit trail for compliance

Orchestration (Skill Composition)

src/skills/skill_orchestrator.py (ready to implement)

Chains skills in sequence
Handles errors and fallbacks
Routes decisions to Claude when needed

Configuration (Policy-Driven)

config/commit-strategy.yaml — Policy for commit message generation
config/authorization.yaml — Authorization rules
src/skills/models.py — Data contracts for all inputs/outputs

Current Status: Phase 1b

✅ Complete

commit-message skill (all Tier 1→2→3 implemented)
Comprehensive test runner (oracle-based validation)
Error handling patterns
Documentation templates (SKILL.md, CLAUDE.md, CLAUDE.instructions.md)

🔄 In Progress

skill-orchestrator (chains multiple skills)
Integration testing with real commits

📋 Ready to Implement

auth-validator (authorization layer)
telemetry-logger (decision tracking)
Phase 2 features (content scanning, learning loops)

Why Verification is Non-Negotiable

Here’s the key insight from building this: verification isn’t optional.

When you build AI systems that access the internet, you must be able to answer these questions with evidence:

What did the system do? (Audit trail)
Why did it do that? (Decision logging + reasoning)
Is that the right decision? (Verification against spec)
Can I prove it to someone else? (Reproducible on demand)

RoadTrip answers all four. That’s why you can actually trust it.

Philosophy: Conservative Defaults

The entire framework is built on one principle:

“If in doubt, block.”

Missing authorization rule → block access
Low confidence score → ask Claude for rationale
Unexpected file type → fail safe, don’t guess
Unknown error → escalate, don’t recover

This isn’t about being paranoid. It’s about being honest about what you know and don’t know.

Contributing: How to Add Skills

Each new skill follows the same pattern:

Create the SKILL.md — Define input/output contract, confidence model, cost
Create src/skills/my_skill.py — Implement with deterministic core
Create tests/ — Write comprehensive tests; keep tests in .gitignore
Create docs/how-my-skill-works.md — Explain the rigor, not just the code
Update the orchestrator — Register the skill for composition

Security & Trust Model

This framework prioritizes transparency over black magic:

✅ All configuration in YAML (readable, auditable, versionable)
✅ All decisions logged with confidence scores
✅ All code in Python/PowerShell (readable, not compiled)
✅ All tests use .gitignore (infrastructure ≠ deliverables)
✅ All outputs verifiable (can reproduce on demand)

Trade-off: Slightly more verbose, more explicit, more to review. Benefit: You actually know what your system is doing.

Next Steps

For Understanding

Read docs/Principles-and-Processes.md (design philosophy)
Read docs/Blog_Rigor_in_Agentic_Development.md (how we verify)
Read the other files in docs/ to understand where I got the ideas.
Clone the repo and run the tests

For Building

Pick a skill from the “Ready to Implement” list
Follow the SKILL.md template
Build deterministic core + Claude fallback
Write tests (keep them in .gitignore)
Push and verify on GitHub

For Collaboration

This is an open effort to define how we build trustworthy AI. If you:

Build AI skills and want them to be verifiable
Care about safety and transparency
Want to contribute patterns or skills
Have experience with AI orchestration

→ Consider contributing. This framework is designed to be extended.

License

See LICENSE for details.

How We Built a Trusted AI Skill: A Case Study in Rigorous Development

2026-02-13T17:41:27+00:00

The Challenge: Staying Honest at Scale

When you’re building AI agents and skills, you face a fundamental problem:

How do you stay honest when the system is complex?

The moment your skill does something non-trivial, verification gets hard. You run the code, it produces output, but did it produce the right output? Did it make the right decisions? Or did it just look like it did because you didn’t inspect deeply enough?

There’s a name for this in engineering: the rigor problem. It’s why surgical teams have checklists. It’s why flight crews have redundant instruments. It’s why NASA counts down from T-minus 10, not 10, 9, 8…

In AI development, the rigor problem is worse because a system can fail in non-obvious ways.

bizcad

Context Limits Aren’t a Claude Problem — They’re a Hygiene Problem

Two Videos, Same Week

What Actually Burns Your Context

The Five Tips That Survived Scrutiny

The Files That Hurt Most

The SOP

Evolve Is a Scheduler: Why Agent Memory Gets the Last Phase Wrong

The Insight

A Concrete Example

The Three Output Channels

Why This Is the Moat

Resting State: Bounded Recursion as a Frame for Self-Improvement

The Viewpoint Frame

The Math

The Reframe

The Realist Close

SRCGEEE Analysis: Embracing Failure — Agentic Resilience and Learning

SRCGEEE Analysis: “Embracing Failure: A Novel Architecture for Agentic Resilience and Learning”

S — Sense

What the Post Is Trying to Accomplish

Audience

Claims the Post Makes

Flags and Suspicious Items

R — Retrieve

Related Docs Found in docs/

Related Docs Found in analysis/

Related Docs Found in skills/

Key Gaps in the Blog’s Retrieve Coverage

C — Compose

(a) Gaps and Missing References

(b) Claims That Could Be Strengthened by Existing Docs

(c) Structural Improvements for Flow

G — Gate

Risk Assessment Before Publishing

E — Execute

Recommended Edits (Structured — Do Not Modify the File)

E — Evaluate

Score 1: Accuracy Relative to Existing Repo Docs — 6/10

Score 2: Clarity for an AI Researcher Audience — 8/10

Score 3: Completeness of the Critical Analysis Section — 7/10

E — Evolve

Top 5 Improvements Ordered by Impact

Embracing Failure: A Novel Architecture for Agentic Resilience and Learning

Embracing Failure: A Novel Architecture for Agentic Resilience and Learning

Introduction

The Failure-Handling Architecture

The Baseball Rule: Three Strikes, Then Move On

Failure as a Structured Artifact

The Remediation Hierarchy

The Reward Signal

The Memory Architecture

Five Memory Stores

How They Interact in a Failure Cycle

Inter-Agent Communication via Memory

Emergent DAGs: The Architecture Inversion

Key Insights and Innovations

1. Failure eliminates confabulation incentive

2. The learning loop requires honest data

3. Fault tolerance accumulates through use

4. Separation of intent from execution

Potential Issues and Limitations

Cascading Remediation Failures

The Original Task Never Resumes

Prospective Memory as a Single Point of Failure

False Positives in the Failure Signal

Backlog Accumulation in Prospective Memory

Runaway Autonomous Remediation

Human Operator Fatigue in HITL Escalation

Memory Poisoning via Injected Synthetic Failures

No Standardized Resume Protocol

Conclusion

References

How I Built This Blog (With the AI That Was Already Doing My Other Work)

How I Built This Blog (With the AI That Was Already Doing My Other Work)

Three Failures Before This

What I Actually Wanted

The Root Cause of All Three Failures

The Cross-Repo Writing Problem

How It Was Built

Related Docs Found in `docs/`

Related Docs Found in `analysis/`

Related Docs Found in `skills/`