Every modern (and not so modern) software development method hinge on one thing: requirements are not known and even if known they'll change over time. From this you get the goal of "good" code which is "easy to change code".
Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
All the hype is on how fast it is to produce code. But the actual bottleneck has always been the cost of specifying intent clearly enough that the result is changeable, testable, and correct AND that you build something that brings value.
> Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
That bar is unreasonably high.
Right now, if I ask a senior engineer to change a feature in a mature codebase, I only have perhaps 70% certainty they won't break other features. Tests help, but only so far.
This bar only seems high because the bar in most companies is already unreasonably low. We had decades of research into functional programming, formal methods and specification languages. However, code monkey culture was cheaper and much more readily available. Enterprise software development has always been a race to the bottom, and the excitement for "vibe coding" is just the latest manifestation of its careless, thoughtless approach to programming.
> functional programming, formal methods and specification languages
Haha. Tell me you've never done professional software development without, etc. None of those things are solutions to the problem, which is: does the code do the business value it's supposed to?
There are limits how badly can such senior screw up, or more likely forget some corner case situation. And he/she is on top of their own code and whole codebase and getting better each time, changing only whats needed, reverting unnecessary changes, seeing bigger picture. That's (also) seniority.
Llm brings an illusion of that, a statistical model that may or may not hit what you need. Repeat the question twice and senior will be better at task the second time. LLM will produce simply different output, maybe.
Do you feel like you have a full control over whats happening here? Business has absolutely insatiable lust for control, and IT systems are an area of each business that C-suite always feel they have least control of.
Reproducibility and general trust is not something marginal but core of good deliveries. Just read this thread - llms have 0 of that.
But if push come to shove any other engineer can come in and debug your senior engineer code. That's why we insist on people creating easy to change code.
With auto generated code which almost no one will check or debug by hand, you want at least compiler level exactitude. Then changing "the code" is as easy as asking your code generator for new things. If people have to debug its output, then it does not help in making maintainable software unless it also generates "good" code.
This is the brake on “AI will replace all developers”.
Coding is a correctness-discovery-process. For a real product you need to build to know the right thing. As the product matures those constraints increase in granularity to tighter bits of code (security, performance, etc)
You can have AI write 100% of the code but more mature products might be caring about more and more specific low level requirements.
The time you can let an agent swarm just go are cases very well specificed by years of work (like the Anthropic C compiler)
I am constantly getting LLMs to change features and fix bugs. The key is to micromanage the LLM and its context, and read the changes. It's slower that vibe coding but faster than coding by hand, and it results in working, maintainable software.
The comments explain the nuance there pretty well:
> This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.
> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.
Giving people a tool, that have no experience with it, and expecting them to be productive feels... odd?
That's a good point. Myself is the easiest person to fool.
I knocked together a quick analysis of my commit graphs going back several years, if you're interested: https://mccormick.cx/gh/
My average leading up to 2023 was around 2k commits per year. 2023 I started using ChatGPT and I hit my highest commits so far that year at 2,600. 2024 I moved to a different country, which broke my productivity. I started using aider at the end of 2024 and in 2025 I again hit my highest commits ever at 2,900. This year is looking pretty solid.
From this it looks to me like I'm at least 1.4x more productive than before.
As a freelancer I have to track issues closed and hours pretty closely so I can give estimates and updates to clients. My baseline was always "two issues closed per working day". These are issues I create myself (full stack, self-managed freelancer) so the average granularity has stayed roughly constant.
This morning I closed 8 issues on a client project. I estimate I am averaging around 4 issues per working day these days. I know this because I have to actually close the issues each day. So on that metric my productivity has roughly doubled.
I believe those studies for sure. I think there is nuance to using these tools well, and I think a lot of people are going backwards and introducing more bugs than progress through vibe coding. I do not think I have gone backwards, and the metrics I have available seem to agree with that assessment.
Love your approach and that you actually have "before vs. after" numbers to back it up!
I personally also use AI in a similar way, strongly guiding it instead of vibe-coding. It reduces frustration because it surely "types" faster and better than me, including figuring out some syntax nuances.
But often I jump in and do some parts by myself. Either "starting" something (creating a directory, file, method etc.) to let the LLM fill in the "boring" parts, or "finishing" something by me filling in the "important" parts (like business logic etc.).
I think it's way easier to retain authorship and codebase understanding this way, and it's more fun as well (for me).
But in the industry right now there is a heavy push for "vibe coding".
I'd add in "code is easier to write than it is to read" - hence abstraction layers designed to present us with higher level code, hiding the complex implementations.
But LLMs are both really good at writing code _and_ reading code. However, they're not great at knowing when to stop - either finishing early and leaving stuff broken, over-engineering and adding in stuff that's not needed or deciding it's too hard and just removing stuff it deems unimportant.
I've found a TDD approach (with not just unit tests but high-level end-to-end behaviour-driven tests) works really well with them. I give them a high-level feature specification (remember Gherkin specifications?) and tell it to make that pass (with unit tests for any intermediate code it writes), make sure it hasn't broken anything (by running the other high-level tests) then, finally, refactor. I've also just started telling it to generate screenshots for each step in the feature, so I can quickly evaluate the UI flow (inspired by Simon Willison's Rodney tool).
Now I don't actually need to care if the code is easy to read or easy to change - because the LLM handles the details. I just need to make sure that when it says "I have implemented Feature X" that the steps it has written for that feature actually do what is expected and the UI fits the user's needs.
> Do current LLM based agents generate code which is easy to change?
Yes, if that's your goal and you take steps to achieve that goal while working with agents.
That means figuring out how to prompt them, providing them good examples (they'll work better in a codebase which is already designed to afford future changes since they imitate existing patterns) and keeping an eye on what they're doing so you can tell them "rewrite that like X" when they produce something bad.
> Once you can ask your agent to change a feature and be 100% sure they won't break other features
We won't be able to be sure of 100% with LLMs but maybe proper engineering around evals get us to an acceptable level of quality based on the blast radius/safety profile.
I'd also argue that we should be pushing towards tracer bullets as a development concept and less so prototypes that are nice but meant to be thrown away and people might not do that.
The clean room auto porting, after a messy exploratory prototyping session would be a nice pattern, nonetheless.
Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.