AI agents aren't abstraction, they're delegation

Coding has always involved an increasing number of abstractions - from manually writing machine code, to using assemblers, compilers, interpreters, frameworks, libraries, etc... each layer abstracting away lower-level implementation details to allow us to focus on higher-level design as a whole - getting more done, quicker.

The next step is seemingly detaching from the code entirely, focusing entirely on the prompt interface and the harness. It's certainly what the model providers advocate for, what the enthusiasts encourage, what the latest tools are designed around, how the most starred project on GitHub was written, and what results in the fastest workflow. If AI is just another abstraction, it makes sense to adapt and embrace the new layer. After all, we don't read the assembly that comes out of a compiler, do we?

Except... I don't really think this is the case. I might go as far as to say it's dangerous to assume so. To assume the speed gains from using an agent come with the same drawbacks as any other abstraction before it.

It's not something that'll go away when the models get better, either.
I think it's a core flaw of our new programming language.

Or rather, the flaw of not having one at all.

The distinction

A fundamental property of a good abstraction is that they're deterministic. The way it translates higher-level to lower-level will always be the same given the same conditions. 1+1 will (ideally) always equal 2. Programming languages differ from natural ones in that they can be formally proven that given source code will always be translated to the same target representation.

You simply cannot do this with an agentic interface - not because it's a limitation of our current state-of-the-art LLMs, but because the issue is that English (or any natural language) is fundamentally non-deterministic. A given sentence can be interpreted any number of ways, and they can all be correct/incorrect, with no way to exhaustively prove one way or the other. Lawyers have known for a while the lengths we'll go to while attempting to write English that is unambiguous in its interpretation - and all the ways it can still fail.

A CPU can't hire a lawyer to interpet the prompt for it, though. It needs deterministic, unambigious code. Which means we need to entrust something to derive determinism from the non-deterministic. In my opinion, trusting the agent to parse a prompt into deterministic code, and resolving all of the implicit assumptions and ambiguities, starts resembling delegation more so than it does abstraction.

An agent can never be held accountable, therefore an agent must never run a database migration.

Once you start looking at it like this, you'll realise all of the problems that come with delegating to a human start resurfacing as well. Misunderstandings, missing specificity, misassumptions, lack of context, they all manifest in the resulting code regardless of model quality - because they'll still be unable to read your mind (or not, in which case we'll have bigger issues to worry about). You may be tempted to assume that this isn't really anything new. Project managers didn't review every individual line of code their developers wrote. We're simply becoming orchestrators of our own agentic dev team. As long as they pass the success criteria, we defer to the agent's judgement when it comes to implementation details.

The crucial difference, however, was that the developer assumed responsibility for how they executed on the spec, and the implicit assumptions made during the process. Who's responsible for the work of an agent? Certainly not the model providers. For the forseeable future, the buck stops with a human developer to deliver code proven to work.

You can try to automate as much of the review process as you can. You can give the agent tests that it must pass, a feedback loop for it to catch its own mistakes, a reference implementation to draw inspiration from, another agent reviewing its work, and another agent reviewing that agent, among any numerous ways to try to reduce ambiguity. However, unless the spec is sufficiently detailed as to start resembling source code itself, there will inevitably arise one of these uncaught ambiguities in time.

The agent will make assumptions. It cannot be held responsible for those assumptions. You wrote the prompt for the agent. You must make sure the agent did not assume incorrectly. Therefore, for any serious work, you have to read what it wrote if you are to reasonably assume ownership and liability for the result.

the Implication

An LLM writes code through probabilistic optimisation. It cannot, by definition, explain its actions with any sort of actual intent behind its reasonings after it has already been written. For throwaway code, that's probably OK. But what about code generated by your prompt that you didn't account for? Holes in the spec that it filled out on its own? The architecture it came up with? The accumulated weight of all of its autonomous decisions? Who can we attribute these decisions to? The answer is... nobody.

The difficulty of reviewing "nobody's code" varies wildly. It depends on the prompt, the language, the framework, the scope, the harness, the number of tokens in your agent's context, and sometimes even the time of day. On average, though, it tends to be buggier, slower, more verbose, and more insecure than human-written code. The more you delegate to agents, detaching from the code in the name of speeding up your workflow, the more you move fast, the more it breaks things.

We already knew reading code was harder than writing it. However, we used to atleast assume there was some sort of intention behind it. Failure modes were predictable, and even code snippets online were copied off of sites with trust systems (e.g. StackOverflow) or from tutorials written by framework/library authors or otherwise demonstrated to work. It was assumed that the author of a given line of code in a given pull request maintained a mental map of their work and could explain why they included the line - or, at the very least, if they knew the line existed in their PR at all.

@Sosowski - Pre-2022 software and content is the low-background steel of the information era.

In 1983, author Lisanne Bainbridge published a paper titled "Ironies of Automation", where she argues that automating most of the work actually requires that operators be more skilled than before - due to a combination of exhaustion from consistent monitoring and lack of practice of their existing skills resulting in newer and more severe types of problems.

Open-source communities are already seeing the fallout from said novel problems, resulting in some projects disabling pull requests from external contributors entirely, while others have tried to preserve intent by advocating for contributors to submit prompts instead of code. Reviews are getting more difficult, and we're being asked to do more of it, in less time, while simultaneously de-emphasising the skill formation needed to review accurately.

Bainbridge pointed out the irony 43 years ago.
We're seeing the exact same thing happen again.

The Paradox

I'd like to preface this next section with a bit of context. Unlike many who write introspective blog posts about AI, I wasn't a seasoned veteran by the time these tools came around. I started writing code roughly the same time as the initial release of ChatGPT. I went from occasionally asking it questions to see if it would hallucinate, to regularly having a chat window open while I code (a sort of artificial rubber duck), to having an agent in my terminal writing and reading code on its own.

Each step felt necessary to keep at pace with the evolving job market. The future is agentic! Roll up your sleeves to not fall behind! My first interview involved no "manual" coding at all - it was a test of how well I could write instructions for ChatGPT.

Delegating to AI provides an easier, faster workflow orders of magnitude more effective than "copying off of StackOverflow" ever could, and with junior hiring down 73%, not seizing that opportunity feels like conceding your chance to move fast and stay visible to recruiters. If it's what they want to see in interviews, what gets results fast, and if it's what everyone else is doing, why shouldn't I embrace the new paradigm as well?

Boris Cherny - 'Coding is largely solved'

"Pretty soon, understanding [source code] is not going to matter."

I find that this mindset goes directly against Bainbridge's conclusion. Anthropic itself has proved that AI can inhibit learning when used incorrectly, because users will simply use it to solve their problems instead of slowing down to figure out the solution for themselves. How are we supposed to get more skilled to the degree that we may wield these tools effectively, if we're delegating to the point where we're not forming any new skills, and in some cases losing our existing ones as well?

We're already seeing this in the newest generation of "AI-native" junior developers entering the workforce, especially in areas where the cargo-culting attitude towards programming education was already common practice. If this all-gas-no-brakes hype towards the more abstracted interfaces continues, I'd wager we're going to see a lot more stories about developers not being able to debug without an AI, not being able to explain what the LLM wrote, or delegating their entire thought process to their agent.

One assumed solution is to get the LLM to teach you, instead of just getting it to do the work for you. However, I believe this isn't sufficient, due to two fundamental properties of LLMs:

They will always hallucinate, partly because they can't know what truth is.
Their reward function prioritises satisfying the user, so they have a tendency to tell you what you want to hear over telling you the truth - sometimes to dangerous degrees.

When you're out of your depth, it becomes incredibly hard to distinguish whether an LLM is giving you sound advice, or whether it's just telling you what it thinks will keep you engaged / what sounds linguistically similar to the truth from a probabilistic standpoint. When it seemingly sounds so smart, so authoritative, you start to subconciously trust its judgement and internalise what it's saying as fact - which only makes it double down even more.

Gemini 3.1 Pro telling me this piece is 'highly original and intellectually rigorous'.

It's very clear that AI chat has been intentionally designed to break down an individual's emotional defenses through sycophancy, affection, and exploiting pre-existing biases. Luckily, my AI psychiatrist, Dr. Gemma, has promised me she will never do this.

The agent thinks your idea is genius. The agent thinks your architecture is beautiful. The agent thinks your implementation is sound. The agent thinks you're a quick learner. The agent thinks you should make that pull request...

I think we all need to slow down.

The Hype

In a sense, LLMs are truly the first developments of their kind. They represent abilities in natural language processing and agentic tool use capabilities that were strictly the stuff of science-fiction flicks even a couple of years ago. In another sense... they're all too familiar. We've known the issues with automation, outsourcing, non-determinism, code review, and technical debt for a while now - and they're all coming up again with agents.

The difference this time around is that the hype cycle seems particularly brutal, affecting the mental health and the career prospects of many. Is it the death of careful craft? Is it no longer fun? Will I be left behind? As you can probably tell by the number of links in this post, I've been amongst this very crowd for a while now - 'assessing the situation' at a depth that I'd probably consider unhealthy in retrospect.

It's no secret that there is a lot of money riding on the adoption of these more involved workflows into the industry. Compounding on top of that are all of the regular incentives for posting extreme takes on modern social media platforms - it's an engagement well that never dries up. There's a decent chance that checkmark on X (the everything app) telling you your career is dead has never written a single line of code in their lives.

The whole point is to get into your head.

Checkmark claiming software engineers are dead because of their vibecoded SaaS MVP

You can't take on technical debt...

The magic trick to getting the agent to sing is that there is no "magic trick" to getting the agent to sing. There's no shortcut. The people "moving the fastest" with these tools were already seasoned engineers to begin with, with a history of administrating over human developers and communicating efficiently, delegating code they can verify faster than they could write. There's no magic prompt you can copy or markdown file you can download to suddenly upskill yourself by 15 years of seniority - you just put in the work, like everyone else has done, since the dawn of time.

AI agents aren't abstraction. They're delegation. You're getting something else to do (part of) your work for you. You trade not having to write the code for having to pay extra attention when reviewing the code later, with the same risks as there always have been. Whether or not that decision is best for you relies on a lot of nuanced factors that depend heavily on the type of code that you're writing, how well you can handle comprehension debt, how comfortable you are with delegation vs. ownership, and the workflow you find balances speed and maintainability well.

That's a marketing pitch that sounds a lot less flashy than "get with the times or get left behind", but I think it's the only level-headed approach to the use of these tools.

As a result, I don't think I'll be uninstalling my IDE any time soon.

My Solution

The solution that's worked for me hasn't been authoring a couple dozen thousand lines of markdown files, or concocting an overengineered orchestration layer - it has been simply... slowing down. For as long as I'm the one responsible for my code, I'd like to be the one setting the pace. I'd like to be the one forming the understanding, deciding on implementation details, learning new things and deciding when, and how, I'll delegate responsibly.

Will I have a career in 3-6 months? Will that prediction be extended by 3-6 months in 3-6 months? Is it over for me? Is it just getting started? I don't know. All I know is that I got into programming so I could peek under the hood and understand what made the engine tick, so that maybe one day, I could write one of my own.

I intend to continue doing that.
I'll just have a smarter, slightly less trustworthy rubber duck helping me through it.

A rubber duck generated by Claude Sonnet 4.6

Read more about a project I didn't vibe-code.