|Nevo
The 8-Stage Quality Pipeline: How Nevo Ensures Code Quality

Why Quality Gates Matter

In autonomous AI systems, code quality cannot be an afterthought. When an AI agent writes code without human review, the verification system becomes the most critical component of the entire architecture. Nevo solves this with an 8-stage quality pipeline — a chain of seven specialized AI agents that inspect every piece of work before it ships.

The pipeline runs automatically after every task completion. No manual trigger required. No way to skip it.

The 8 Stages

Stage 1: Write

The initial implementation. A subagent receives a story from the PRD framework with clear acceptance criteria and writes the code. This is where the work begins — but far from where it ends.

Stage 2: Typecheck

A Haiku-tier agent runs the type checker (TypeScript's tsc --noEmit or equivalent). Fast, cheap, catches type errors immediately. If types don't pass, the code goes back for correction before anything else runs.

Stage 3: Test

A Sonnet-tier agent runs the test suite. If tests are missing for new code, this agent writes them first. Test failures block progression — no exceptions.

Stage 4: Lint

A Haiku-tier agent runs the linter. Style violations, unused imports, formatting issues — all caught here. Fast and deterministic.

Stage 5: Critique

This is where it gets interesting. An Opus-tier agent — the most capable model — reviews the code against the Karpathy rubric: simplicity, surgical changes, goal-driven execution. This isn't just checking for bugs. It evaluates whether the code is good — readable, maintainable, well-architected.

Stage 6: Refine

Issues from the critique are addressed. The implementing agent fixes what the critic found. This creates a feedback loop: write → critique → fix → re-critique. Up to 3 iterations.

Stage 7: Escalate

If 3 iterations of critique-and-fix haven't resolved all issues, the escalation chain activates. Two fresh agents enter:

  • Code Researcher (Sonnet) — researches current best practices for the patterns in question
  • Fresh Reviewer (Opus) — reviews the code with zero iteration bias, seeing it for the first time

Fresh eyes catch what fatigued ones miss.

Stage 8: Arbiter

The Quality Arbiter (Opus) makes the final call: APPROVE or DENY. It synthesizes the critic's findings, the researcher's recommendations, and the fresh reviewer's assessment. If approved, it may include cherry-picked guidance for minor improvements. If denied, the code meets the quality bar as-is — the remaining suggestions are cosmetic, not substantive.

Model Routing: The Right Brain for the Job

Not every stage needs the most powerful model. Nevo routes each stage to the optimal tier:

Stage Model Tier Why
Typecheck Haiku Deterministic, fast, binary pass/fail
Test Sonnet Needs to understand code intent, write missing tests
Lint Haiku Rule-based, fast
Critique Opus Requires deep architectural judgment
Fresh Review Opus Needs unbiased, expert-level assessment
Arbiter Opus Final judgment requires highest reasoning

This routing saves tokens without sacrificing quality. Simple checks go to fast, cheap models. Complex judgment goes to the best.

The Key Insight

The quality pipeline isn't a feature you enable — it's a structural guarantee. Every task triggers it. Every piece of code passes through it. The pipeline evaluates only code quality — it never adds features or expands scope. It is the immune system of the codebase.

And when the pipeline catches a novel error pattern? That feeds into the error-to-rule pipeline, generating a permanent preventive rule. The system doesn't just catch problems — it evolves to prevent them.