Agentic Coding: Testing is Dead?

Part of the AI Governance Series

Jul 07, 2025

Agentic Coding: The practice of leveraging AI agents to generate, modify, and maintain code has reached a stage where it is becoming an increasingly significant part of the world’s production code.

This fundamental change raises critical questions that the software industry must address: How much of this AI-generated code is being tested? What role does testing play when machines are writing our software? And perhaps most provocatively, is traditional testing becoming obsolete?

The Testing Imperative: Why We Test in the First Place

Before we can assess the future of testing, we must understand its foundational purposes. Testing serves several critical functions:

Core Assertions:

Expected Output Validation: Ensuring the code produces correct results from known inputs
Exception Handling: Verifying appropriate error responses to unexpected inputs
Error Processing: Confirming graceful handling of processing failures

For me, I combined all testing into a single requirement. Behavioural Mapping. Perhaps most importantly, testing serves as a behavioural specification of our systems. This is why I’m such a big supporter of Behaviour-Driven Development (BDD), as BDD tests become the living documentation of what the system does for given known scenarios. More on this later.

The Challenge of Agentic Coding Platforms

The proliferation of code and SaaS generation platforms presents new challenges that were previously unconsidered. These platforms have fundamentally altered the relationship between developers and the systems they create.

The Delegation Paradox

Due to their ability to create working products via prompts, the behaviour of entire systems is often delegated to AI agents. Our prompts no longer consider the intricate mechanics that go into making functional SaaS applications. Navigation patterns, authentication flows, data presentation layers, and user interface components. All of these critical decisions are deferred to the AI ecosystem to decide and provide what it deduces is the best solution for the given prompt.

This delegation creates a peculiar dynamic. As we become less involved in these foundational decisions, both the requestor and the AI ecosystem focus almost exclusively on the expected behaviour, i.e. the specific functionality we prompted for. When we assess the output, we test that it "works," and more often than not, the core flow does work on the first attempt.

The Success Trap

The apparent success of these AI-generated solutions creates a reinforcing cycle. The more these platforms produce expected and working functions, UI and UX flows, the more we rely on them to produce even more complex systems. We become comfortable with the black box nature of the generation process because the visible outcomes meet our immediate needs.

But this success masks a deeper question: Is all of this code truly bug-free?

Bug-Free Code?

The ratio of bugs to lines of code varies widely, but a common industry average suggests 15-50 bugs per 1000 lines of code. I would argue that AI-generated code is likely to have a comparable bug level to that of a typical human-only coded repository. This is because the current AI ecosystem is based solely on predictive analysis. That is, they can only create patterns that look like patterns they have seen before. They are not capable of producing true variance, and few have an understanding of the underlying constructs they are returning.
The risk here is that a flawed pattern learned during training can manifest in countless production systems, creating widespread vulnerabilities that are difficult to detect and patch.

This would mean that while AI ecosystems can produce syntactically correct code that passes basic checks, they can introduce subtle logical errors, edge-case failures, and architectural inconsistencies that only emerge under specific conditions, or worse, in production scenarios.

Rethinking Testing Paradigms

Agentic Coding should bring a critique of our testing pillars. I am a little disappointed that the proponents of these frameworks and their implementers are not emphasising the risks that the AI ecosystem poses. So here is my take.

Is TDD Dead in the Age of Agentic Coding?

Test-Driven Development faces unique challenges when AI is writing the code:

The Specification Problem: TDD relies on developers writing failing tests first, then implementing code to make them pass. However, if an AI agent generates both the tests and the implementation, are we truly following TDD principles?
The Feedback Loop: The red-green-refactor cycle assumes human insight drives the process. When machines handle the refactoring, the learning and design benefits of TDD may be lost.
The Intent Gap: TDD's value lies partly in forcing developers to think through requirements before coding. If AI agents & LLMs generate code without this deliberative process, we may miss the conceptual benefits of TDD.

BDD: Where Does It Stand?

Behaviour-Driven Development may actually become more crucial in agentic coding environments:

Living Specifications: BDD scenarios (written in the Gherkin syntax) serve as clear, human-readable specifications that AI agents can also use to generate and validate code.
Stakeholder Communication: As code generation becomes more automated, the need for business behaviour being communicated to the stakeholders becomes even more critical. I do not know a better vehicle to facilitate this communication than BDD.
Acceptance Criteria: BDD's bake in acceptance criteria. That is, it focuses on providing all the conditions for validating AI-generated code against business requirements.

DDD: Who Defines the Domain Now?

Domain-Driven Design faces perhaps the most fundamental challenge:

The Domain Expert Dilemma: If AI ecosystems are making implementation decisions, who ensures the domain model remains accurate and meaningful?
Bounded Context Erosion: AI ecosystems may blur the boundaries between different domains, creating coupling that domain experts never intended or were unaware of.
Ubiquitous Language: How do we maintain a shared understanding between domain experts and AI ecosystems that, without doubt, will interpret concepts differently?

So clearly, I believe that testing isn't dead in the age of agentic coding. However, it MUST evolve. The fundamental need to validate that our systems work correctly, handle errors gracefully, and meet business requirements remains unchanged.

What's changing is how we approach these validation challenges. I believe that the future of testing in agentic environments will likely involve:

Higher-level behavioural validation rather than low-level unit testing
AI-augmented test generation that can explore edge cases more comprehensively than humans
Continuous validation of AI model behaviour and code generation patterns
Enhanced focus on specification quality as the primary control mechanism for AI coding ecosystems.

What are your thoughts on the future of testing in the Agentic coding ecosystem? How is your organisation adapting its testing practices?

AKILU - The Thoughts & Views of an Executive Father

Discussion about this post