There’s a debate happening right now about whether AI makes engineering better or worse. I think it’s the wrong debate. The better question is: what was your engineering like before AI showed up?
Because AI doesn’t have opinions about quality. It doesn’t care whether your team writes clear specs, runs rigorous reviews, or actually validates what it ships. It just moves faster in whichever direction you were already pointed. If you were building carefully, it helps you build carefully, faster. If you were cutting corners, it helps you cut corners at scale.
AI is an amplifier. And most engineering leaders haven’t stopped to ask what exactly they’re amplifying.
The Amdahl’s Law Problem
Amdahl’s Law is usually invoked in conversations about parallelism: if only part of a task can be sped up, the overall speedup is limited by the part that can’t. It applies just as cleanly here, and I haven’t seen enough people make this argument explicitly.
Good engineering requires that you actually understand and validate what the AI produces. Bad engineering skips that step. When you shorten only the steps that don’t require deep validation, you give a proportionally bigger speedup to the engineers who weren’t validating anyway.
The gap between good and bad engineering just got wider.
This reframes the whole conversation. The question isn’t whether AI produces good or bad code. It’s whether the humans using it are doing the hard work of understanding what came out. That work, reading carefully, reasoning about behavior, catching drift before it compounds into something unmanageable, isn’t something AI speeds up at all. It requires the same concentration it always did. Probably more, because there’s simply more code to read.
The bottleneck moved. Most teams haven’t noticed yet.
The Incentive Problem Is Worse Than the Tool Problem
I’ve watched something uncomfortable happen in teams over the past year: engineers who were doing genuinely good work before AI have since opted to ship three times as much mediocre work, rather than ten percent more excellent work.
I don’t blame them. That’s a rational response to incentives.
If your delivery culture rewards velocity over correctness, if your code review is a rubber stamp, if nobody tracks alignment between what was specified and what was actually built, then of course engineers will use AI to go faster rather than go deeper. You built a system that rewards throughput. They’re delivering throughput. The tool is just making the mismatch more visible.
I’ve seen this in review meetings. PRs are bigger, they arrive faster, and the diff makes less intuitive sense because nobody fully authored it. The reviewer is now reconstructing intent from generated output, which is a harder cognitive task than reviewing code a human wrote with full knowledge of the surrounding system. You’ve shifted the cognitive load downstream without changing downstream capacity.
That is a leadership problem. Not an AI problem.
Spec Drift
There’s a phenomenon I’ve started calling verification debt, related to but distinct from the technical debt most of us are already tracking. Technical debt is visible, at least in principle: you can point at the legacy module, the missing abstraction, the thing you’re going to fix next quarter. Verification debt is more insidious. It lives in the gap between what you think the system does and what it actually does.
The mechanism is straightforward. When code can be produced faster than the thinking that should precede it, you get more code and proportionally less specification. The shape of the system changes before anyone has updated the mental model of the system. Tests written against an earlier version of intended behavior now verify the wrong things, or nothing meaningful at all. Spec, tests, and implementation quietly drift apart.
AI is exceptional at creating that gap without obvious warning signs, because the code it produces often looks correct, compiles cleanly, and passes existing tests. The problem is invisible until it isn’t.
The DORA research has consistently shown that the highest-performing engineering teams combine speed with stability, and that stability is primarily a function of process discipline: clear change management, reliable testing, and fast recovery mechanisms. None of that comes from the tool. All of it comes from the conditions leaders create. AI changes the speed at which code enters the system. It doesn’t change what happens when that code is misaligned.
The solution isn’t to use AI less. It’s to treat specification and alignment as first-class engineering work, which most teams still don’t.
The Four Questions You Should Be Asking Your Teams
I want to be concrete, because “strengthen your engineering process” is the kind of advice that sounds good and changes nothing.
When I look at how AI is actually changing the work, four questions separate teams that are getting stronger from those quietly accumulating debt:
- Who owns the specification? Not the AI. Not the product manager via a ticket. Somebody on the engineering side has to be accountable for a written, current description of what the system is supposed to do. If that document doesn’t exist, or nobody reads it, or it hasn’t been touched in months, you are flying blind. AI makes that worse by an order of magnitude.
- What does your code review actually verify? If the answer is “roughly whether it does what the PR description says,” that’s not enough anymore. Reviewers need to be asking: does this change preserve the alignment of the system? Does the behavior described in the spec still match the behavior in the tests? Does the test still mean something? Good code review has always required this. AI makes skipping it far more costly.
- Is validation a gate or a formality? Teams treating a green CI pipeline as a proxy for correctness have been getting away with it for years. With AI-generated code, it becomes actively dangerous, because the AI will write tests that pass without validating much. A passing test suite is not evidence of a correct system. It’s evidence of a test suite that passes. Those are different things. Charity Majors has written about this more sharply than I can: testing strategy has to be intentional or it’s just confidence theater.
- Are your engineers slowing down anywhere? This is the most important question and the hardest to answer honestly. AI lets people move faster across the board. The engineers using it well are choosing to spend some of that reclaimed time going deeper: on design, on review, on genuine understanding of what they’re shipping. The engineers who aren’t are spending all of it on velocity. You can usually tell which is which by looking at their PRs over the last three months. The question is whether you’re looking.
The Leader’s Job Didn’t Get Simpler
There’s a fantasy version of AI in software engineering where the machines write the code and humans do the “interesting parts.” I find this fantasy genuinely appealing. I also think it fundamentally misunderstands what the interesting parts are.
The interesting parts of software engineering aren’t “above” the code. They’re tangled up with it. The judgment about whether a behavior is correct requires understanding the system. The decision about which tradeoff to make requires understanding both the tradeoff and its downstream consequences. Those things require someone who has built an accurate mental model of what the system is actually doing. AI doesn’t build that model. It generates statistically plausible code based on what similar systems have looked like before.
Your engineers are still the ones who have to build that model. Your job is to create the conditions in which they actually do.
That means resisting the pressure to measure productivity by output volume alone. (I know that’s hard when your own leadership is looking at deployment frequency and feature counts. I’m not pretending otherwise.) It means maintaining review processes that have real teeth, even when it slows things down. It means insisting on specs that are current and owned. It means asking, regularly, whether your team understands the systems they’re shipping, or whether they’re shepherding code they couldn’t fully explain under pressure.
Will Larson’s framing on sustainable pace applies here too: the teams that compound over time are the ones who build durable understanding, not just durable output. AI can help you build output faster. It cannot substitute for the understanding.
I don’t think those leadership responsibilities changed because of AI. I think AI made the consequences of ignoring them arrive faster.
A Note on the Historical Cycle Argument
The “we’ve seen this before” argument is popular right now, and it’s not wrong. Every decade has had its version of “expertise is no longer necessary”: fourth-generation languages, CASE tools, visual programming environments, offshore arbitrage. And every time, the complexity was still there waiting, deferred and disguised but not dissolved.
That pattern will hold this time too. The interesting question isn’t whether complexity will reassert itself. It will. The interesting question is: when it does, will your team have the skills and the discipline to handle it? Or will you have spent two years in a mode where nobody had to think that hard, and you’re now facing a hairball that nobody really understands?
That’s not a question for your engineers. That’s a question for you.
What I’d Actually Change
If I were designing this from scratch today, three things would be non-negotiable.
- First: every meaningful service or domain gets a living spec document, written in plain language, that engineers are expected to read before touching the code and update before merging changes. Not a wiki page nobody has opened in a year. A document with a named owner, reviewed on a cadence. The act of writing down what a system should do is not bureaucracy. It’s the forcing function that makes verification possible.
- Second: code review checklists get updated to explicitly ask about behavioral alignment. Not just “does this work” but “does this still match what we said it should do.” That question forces someone to look at the spec. The spec forces someone to update it if the behavior changed intentionally. That loop, closed deliberately, is what keeps spec drift from compounding.
- Third: I’d track verification debt the same way I’d track technical debt, named, visible, in the backlog. When tests are superficial, when specs are stale, when nobody can clearly articulate what a system is supposed to do, that gets surfaced explicitly. Not because I’m idealistic about backlogs, but because naming something changes how seriously a team takes it. Invisible debt grows. Named debt gets paid down.
None of this is complicated. None of it is new. It’s the work good engineering has always required. The difference is that without it, AI will surface the gap much, much faster.
Amplifiers are neutral. What you point them at is not.