AI-Assisted Engineering Discipline

Status: Design sketch

Calciforge is itself a safety tool for AI agents, so the project has to be honest about how agent-written code fails. The lesson from recent Calciforge bugs, and from broader Rust-with-agents reports such as Cheng Huang’s write-up and the Hacker News discussion, is not “generate more code.” It is: keep contracts explicit, make feedback loops mechanical, and treat every generated test as suspicious until it proves which promise it protects.

This page is maintainer-facing. It records how Calciforge agents and humans should shape future work.

Useful Lessons to Import

Contracts before code

Before changing a boundary, write the contract in plain language:

This is especially important for model/provider adapters, security-proxy rewrites, channel routing, secret handling, installer paths, and doctor checks. Those are not “just implementation” areas. They are the castle doors.

Contracts do not need a formal language before they help. A short table in a test, ADR, or roadmap note is enough if it names the failure mode clearly.

One story per branch

A single user story is the right default unit for agent implementation. A story can still cross several files, but it should have one visible outcome:

When a branch starts solving three stories, split it. Calciforge already has enough moving parts; one PR should not become a second moving castle.

Feedback loops must be executable

Rust helps because the compiler, formatter, linter, and tests can push back on agent mistakes. Use that. Every meaningful PR should name the narrowest checks that exercise the contract it changes.

For Calciforge, the usual progression is:

  1. reproduce the failure or write the contract test,
  2. make the smallest behavioral change,
  3. run focused tests for the touched boundary,
  4. run the relevant docs/ratchet/doctor checks,
  5. commit,
  6. perform adversarial review on the diff.

Generated code is allowed. Unreviewed generated behavior is not.

Test quality over test count

HN commenters pushed on the right weak spot: a large test count does not prove much if nobody can say what those tests protect. Calciforge should judge tests by contract value:

The failure discovery action plan calls these aggression tests when they deliberately search for likely future failures. That should become normal for security, gateway, channel, installer, and agent-adapter boundaries.

Human-readable abstractions matter

One HN theme was that agents can make code grow faster than it becomes understandable. Calciforge should resist that. A useful abstraction should make the product contract easier to see:

If a change adds another “almost the same” path, treat that as architecture debt even when tests pass.

Measure before tuning

The article’s performance loop is worth copying: instrument, run, analyze, change one thing, and measure again. For Calciforge this applies to:

Do not merge speculative latency fixes that lack a baseline and a post-change measurement. We have enough guesses; keep the ones with numbers.

Required PR Checklist for Agent-Authored Changes

Use this checklist for branches substantially authored by an AI coding agent:

Where This Should Become Automation

Near-term automation should make the good path easier:

Anti-Patterns

Avoid these failure modes:

The bar is simple: every agent-aided change should leave the contract clearer than it found it. If the change only leaves more code, Calcifer is allowed to complain.