Using an Agent‑as‑a‑Judge to Fix AI‑Written Unit Tests

Max Roche
&
June 18, 2026
3
min. read
Table of Contents
TABLA DE CONTENIDOS
ÍNDICE DE CONTEÚDO

The Problem

Does Claude constantly write bad unit tests for you?

In our repo, Claude kept producing tests with the same recurring issues:

  • Using Task.sleep instead of XCTestExpectation, leading to flaky tests
  • Writing far too many tests (often the same test expressed 10 different ways)
  • Producing tautological tests that always pass but don't actually test anything
  • Ignoring good dependency‑injection patterns

Despite trying all kinds of prompts, I couldn't get a Claude agent to reliably write solid unit tests. Reviewing its output started to feel very familiar — and very repetitive.

Every review cycle looked something like this:

  • "Don't use Task.sleep — always prefer XCTestExpectation."
  • "You don't need this many tests."
  • "This test isn't actually asserting anything meaningful."

Claude and I would do this dance for two or three iterations before the tests finally came out looking nice and clean 🧼

The Aha Moment

Then it hit me: I could automate myself out of this loop.

What I really needed wasn't a better prompt — it was a second agent.

I created a unit‑test‑reviewer agent whose only job is to review generated tests and look for:

  • Flaky async patterns
  • Redundant or duplicate tests
  • Tautological assertions
  • Poor dependency‑injection practices

This agent doesn't rewrite tests. It simply reviews them and returns a PASS or FAIL, along with concrete reasons for any failure.

Wiring It Together with a Skill

Once I had the two agents, I just needed to connect them. I did this using a skill called create-unit-tests.

Here's how the flow works:

  1. The unit‑test‑writing agent generates tests for the current changes
  2. The unit‑test‑reviewing agent reviews those tests
  3. If the review returns PASS → we're done ✅
  4. If the review returns FAIL → the failure reasons are fed back into step 1

This loop continues until the reviewing agent returns a PASS.

Crucially, the feedback is always specific and consistent — the same things I used to comment on manually, every single time.

The Result

Hooking up two agents with a skill like this has been a huge win for the quality and consistency of the unit tests we write. It also saves me a lot of time as I no longer need to review the "first pass" of unit tests.

Next up: applying this paradigm to other parts of our workflow where human review patterns are predictable and repeatable.

Share this article
Comparte este artículo
Compartilhe este artigo

Find & Meet Yours

Get 0 feet away from the queer world around you.
Thank you! Your phone number has been received!
Oops! Something went wrong while submitting the form.
We’ll text you a link to download the app for free.
Table of Contents
TABLA DE CONTENIDOS
ÍNDICE DE CONTEÚDO
Share this article
Comparte este artículo
Compartilhe este artigo
“A great way to meet up and make new friends.”
- Google Play Store review
Thank you! Your phone number has been received!
Oops! Something went wrong while submitting the form.
We’ll text you a link to download the app for free.
“A great way to meet up and make new friends.”
- Google Play Store review
Discover, navigate, and get zero feet away from the queer world around you.
Descubre, navega y acércate al mundo queer que te rodea.
Descubra, navegue e fique a zero metros de distância do mundo queer à sua volta.
Already have an account? Login
¿Ya tienes una cuenta? Inicia sesión
Já tem uma conta? Faça login

Browse bigger, chat faster.

Find friends, dates, hookups, and more

Featured articles

Artículos destacados

Artigos em Destaque

Related articles

Artículos relacionados

Artigos Relacionados

No items found.

Find & Meet Yours

Encuentra y conoce a los tuyos

Encontre o Seu Match Perfeito

4.6 · 259.4k Raiting
4.6 · 259.4k valoraciones
4.6 · 259.4k mil avaliações