Coding has always involved an increasing number of abstractions - from manually writing machine code, to using assemblers, compilers, interpreters, frameworks, libraries, etc... each layer abstracting away lower-level implementation details to allow us to focus on higher-level design as a whole - getting more done, quicker.
It makes sense, then, to assume AI agents are just another abstraction to master. We plan out our architectures in plain English, letting the abstraction handle the lower-level syntactic implementations. A compiler for natural language.
Except... I don't really think this is the case. I might go as far as to say it's dangerous to assume so. To assume the speed gains from using an agent come with the same drawbacks as any other abstraction before it. It's not something that'll go away when the models get better, either.
I think it's a fundamental flaw of our new programming language. Or rather, the flaw of not having one at all.
I'd like to preface this with a bit of context. Unlike many who write introspective blog posts about AI, I wasn't a seasoned veteran by the time these tools came around. I started writing code roughly the same time as the initial release of ChatGPT. I went from occasionally asking it questions to see if it would hallucinate, to regularly having a chat window open while I code (a sort of artificial rubber duck), to having an agent in a pane in my terminal and IDE writing and reading code on its own.
Each step felt necessary to keep at pace with the evolving job market. The future is agentic! Roll up your sleeves to not fall behind! My first job interview involved no "manual" coding at all - it was a test of how well I could write instructions for ChatGPT.
The next step is seemingly detaching from the code entirely, focusing entirely on the prompt interface and the harness. It's certainly what the model providers advocate for, what the enthusiasts encourage, what the latest tools are designed around, how the most starred project on GitHub was written, and what results in the fastest workflow. In a way, it makes sense. If AI is just another abstraction, it makes sense to adapt and embrace the new layer. After all, we don't read the assembly that comes out of a compiler, do we?
I don't think it's that simple.
The determinism dispute
The ideal source code.
A fundamental property of a good abstraction is that they're deterministic. The way it translates higher-level to lower-level will always be the same given the same conditions. 1+1 will always equal 2 (... maybe not in older versions of Python). Programming languages differ from natural ones in that they can be formally proven that given source code will always be translated to the same target representation.
You simply cannot do this with an agentic interface. Not because it's a limitation of our current state-of-the-art LLMs, but because the issue is that English (or any natural language) is fundamentally non-deterministic. A given sentence can be interpreted any number of ways, and they can all be correct/incorrect, with no way to exhaustively prove one way or the other. Lawyers have known for a while the lengths we'll go to while attempting to write English that is unambiguous in its interpretation - and all the ways it can still fail.
A CPU can't hire a lawyer to interpet the prompt for it, though. It needs deterministic, unambigious code. Which means we need to entrust something to derive determinism from the non-deterministic. In my opinion, trusting the agent to parse a prompt into deterministic code, and resolving all of the implicit assumptions and ambiguities, starts resembling delegation more so than it does abstraction.
the delegation distinction
How I'll know when AGI has been achieved.
Once you start looking at it like this, you'll realise all of the problems that come with delegating to a human start resurfacing as well. Misunderstandings, missing specificity, misassumptions, lack of context, they all manifest in the resulting code regardless of model quality - because they'll still be unable to read your mind (or not, in which case we'll have bigger issues to worry about). Maybe you weren't even thinking of certain problems to begin with, they just came up during implementation. You discover friction as you codify your intent in your text editor - possibly even things that force you to rethink core assumptions, learning a thing or two on the way. The agent is now forced to make those decisions for you.
Traditional well-designed abstractions don't work this way. The decisions they make are small, contained and deterministic. The intention of the program is still yours to implement, using the simpler higher-level primitives instead of complex lower-level ones. The number of implicit decisions a given line of Python code makes is relatively small - but a single word in a prompt can completely change what the resulting code, and as a result the final program, look like.
Someone's going to need to give the final approval on the decisions the agent made as a result of this process. For as long as the model providers do not assume legal responsibility for any damages caused by agents, the buck stops with a human developer to deliver code proven to work.
the developer's dilemma
Can I sue Claude? Asking for a friend.
If you zoom out, you'll realise that this isn't really anything new. Project managers don't review every individual line of code their developers write, since it isn't really necessary - as long as they pass the success criteria, we defer to the developer's judgement when it comes to implementation details. The crucial difference, however, was that the developer assumes responsibility for how they executed on the spec, and the implicit assumptions made during the process.
You can try to automate as much of the review process as you can. You can give the agent success criteria, tests that it must pass, a feedback loop for it to catch its own mistakes, a reference implementation to draw inspiration from, another agent reviewing its work, and another agent reviewing that agent, among any numerous ways to try to reduce ambiguity. However, unless the spec is sufficiently detailed as to start resembling source code itself, there will inevitably arise one of these uncaught ambiguities in time.
The agent will make assumptions. It cannot be held responsible for those assumptions. You wrote the prompt for the agent. You must make sure the agent did not assume incorrectly. Therefore, you have to read what it wrote if you are to reasonably assume ownership and liability for the result.
the peculiar phenomenon
I appreciate the honesty.
This is where we need to consider the core architecture behind agentic tooling - LLMs.
Traditionally, when reading code written in the before times, it was a given that a human had written said code - and usually with intention behind it. Failure modes were predictable, and even code snippets online were copied off of sites with trust systems (e.g. StackOverflow) or from tutorials written by framework/library authors or otherwise demonstrated to work. It was assumed that the author of a given line of code could explain why they included the line, - or, at least, if they knew the line existed in the codebase at all.
An LLM does not work this way. It writes code with no intention - only through probabilistic optimisation. It cannot, by definition, explain its actions with any sort of intent behind its reasonings after it has already been written. For throwaway code, that's probably OK. But what about code generated by your prompt that you didn't account for? The architecture it derived on its own? The accumulated weight of all of its autonomous decisions? Who can we attribute these decisions to? The answer is... nobody.
The difficulty of reviewing "nobody's code" varies wildly. It depends on the prompt, the language, the framework, the scope, the harness, the number of tokens in your agent's context, and sometimes even the time of day. Such is the nature of these non-deterministic wonders.
I would wager that, generally, it's more difficult. AI code is - on average - buggier, slower, more verbose, and more insecure than human-written code. The more you delegate to agents, detaching from the code in the name of speeding up your workflow, the more you concede the mental map of the architecture of your project to... probabilistic chance. It gets exponentially harder to review as the codebase grows, partly because of fatigue, but also because of cognitive debt.
Even with improved models reducing (but not entirely eliminating) error rates, the speed at which these tools can generate code lead to fatigue on the part of the reviewer. It gets even worse when you're reviewing code generated by others - since you might end up putting more effort into the review than the author did for the prompt itself.
The cognitive crisis
Such fuzzy connections between intent and code have already lead to widespread issues in the open-source community, including some projects disabling pull requests from external contributors entirely, while others have tried to preserve intent by advocating for contributors to submit prompts instead of code. Some of this is a result of agents allowing for spam to be generated faster by those with misaligned intentions (or no intention at all), but I can't help but wonder - how much comes from people submitting code they couldn't have written themselves in the first place?
As a programmer who's still learning, this hits particularly close to home. Anthropic itself has proved that AI can inhibit learning when used incorrectly, and this is something I've experienced first-hand when getting too carried away with agents. AI provides an easy way out orders of magnitude more effective than "copying off of StackOverflow" ever could, and with junior hiring down 73%, not pressing that button feels like conceding your chance to move fast and stay visible to recruiters.
It gets even murkier due to another emergent property of LLMs - sycophancy. When out of your depth, it becomes incredibly hard to distinguish whether an LLM is giving you sound advice, or whether it's just telling you what you want to hear. When it seemingly sounds so smart, so authoritative, you start to subconciously trust its judgement - which only makes it double down even more.
The agent thinks your idea is genius. The agent thinks your architecture is beautiful. The agent thinks your implementation is sound. The agent thinks you should make that pull request...
I think we need to slow down.
A Cautious Correction
In a sense, LLMs are truly the first developments of their kind. They represent abilities in natural language processing and agentic tool use capabilities that were strictly the stuff of science-fiction flicks even a couple of years ago. In another sense... they're all too familiar. We've known the issues with outsourcing, non-determinism, code review, and technical debt when it comes to delegating to humans - and they're all coming up again with agents.
Over the past couple of months, I've been standing at ground-zero of an explosion of discourse surrounding the future of my career, seemingly before it has even began. The hype cycle might say otherwise, but the solution that's worked for me isn't some hyper-harness, gigabyte of markdown files, or overengineered orchestration layer. It has been simply... slowing down.
For as long as I'm the one responsible for my code, I'd like to be the one setting the pace. I'd like to be the one forming the understanding, deciding on implementation details, learning new things and deciding when, and how, I'll delegate responsibly. I'll be the one having all the fun.
Will I have a career in 3-6 months? Will that prediction be extended by 3-6 months in 3-6 months? Is it over for me? Is it just getting started? I don't know. All I know is the thing that got me into programming was so that I could peek under the hood and understand what made the engine tick, so that maybe one day, I could write one of my own.
I intend to continue doing that.
I'll just have a smarter, slightly more untrustworthy rubber duck helping me through it.
A rubber duck generated by Claude Sonnet 4.6
Read more about a project I didn't vibe-code.