Status: Design sketch
Calciforge is itself a safety tool for AI agents, so the project has to be honest about how agent-written code fails. The lesson from recent Calciforge bugs, and from broader Rust-with-agents reports such as Cheng Huang’s write-up and the Hacker News discussion, is not “generate more code.” It is: keep contracts explicit, make feedback loops mechanical, and treat every generated test as suspicious until it proves which promise it protects.
This page is maintainer-facing. It records how Calciforge agents and humans should shape future work.
Before changing a boundary, write the contract in plain language:
This is especially important for model/provider adapters, security-proxy rewrites, channel routing, secret handling, installer paths, and doctor checks. Those are not “just implementation” areas. They are the castle doors.
Contracts do not need a formal language before they help. A short table in a test, ADR, or roadmap note is enough if it names the failure mode clearly.
A single user story is the right default unit for agent implementation. A story can still cross several files, but it should have one visible outcome:
stream=true gets a valid response through the
configured provider.”doctor catches an ACP binary path that would fail at runtime.”!secret list with its
destination policy.”When a branch starts solving three stories, split it. Calciforge already has enough moving parts; one PR should not become a second moving castle.
Rust helps because the compiler, formatter, linter, and tests can push back on agent mistakes. Use that. Every meaningful PR should name the narrowest checks that exercise the contract it changes.
For Calciforge, the usual progression is:
Generated code is allowed. Unreviewed generated behavior is not.
HN commenters pushed on the right weak spot: a large test count does not prove much if nobody can say what those tests protect. Calciforge should judge tests by contract value:
The failure discovery action plan calls these aggression tests when they deliberately search for likely future failures. That should become normal for security, gateway, channel, installer, and agent-adapter boundaries.
One HN theme was that agents can make code grow faster than it becomes understandable. Calciforge should resist that. A useful abstraction should make the product contract easier to see:
If a change adds another “almost the same” path, treat that as architecture debt even when tests pass.
The article’s performance loop is worth copying: instrument, run, analyze, change one thing, and measure again. For Calciforge this applies to:
Do not merge speculative latency fixes that lack a baseline and a post-change measurement. We have enough guesses; keep the ones with numbers.
Use this checklist for branches substantially authored by an AI coding agent:
Near-term automation should make the good path easier:
scripts/check-scenarios.py to flag new adapter/provider/channel
files that lack a high-risk scenario or boundary registry entry.scripts/check-agent-pr-discipline.py if PR drift
continues: changed boundary files should either link a scenario, add a test,
or mark the exception explicitly.doctor --live checks for first-class agents so the same path users test
manually is exercised by automation.Avoid these failure modes:
The bar is simple: every agent-aided change should leave the contract clearer than it found it. If the change only leaves more code, Calcifer is allowed to complain.