I recall someone saying stories of LLMs doing something useful to "I have a Canadian girlfriend" stories. Not trying to discredit or be a pessimist, can anyone elaborate how exactly they use these agents while working in interdependent projects in multi-team settings in e.g. regulated industries?
They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.
I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.
LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.
Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.
> However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.
This argument gets repeated frequently, but to me it seems to be missing final, actionable conclusion.
If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.
Assuming we are not expecting people to operate with implicit delegation of responsibility to the LLM (something that is ultimately not possible anyway - taking blame is a privilege human will keep for a foreseeable future), I guess the argument in the form as above collapses to "it's easier to learn new things now"?
But this does not eliminate (or reduce) a need for specialization of knowledge on the employee side, and there is only so much you can specialize in.
The bottleneck maybe shifted right somewhat (from time/effort of the learning stage to the cognition and the memory limits of an individual), but the output on the other side of the funnel (of learn->understand->operate->take-responsibility-for) didn't necessary widen that much, one could argue.
> If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.
This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing. LLMs-for-Code are another tool under the same category.
>They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.
Wasn't this a problem before AI? If I took a book or online tutorial and followed it, could I be sure it was teaching me the right thing? I would need to make sure I understood it, that it made sense, that it worked when I changed things around, and would need to combine multiple sources. That still needs to be done. You can ask the model, and you'll have the judge the answer, same as if you asked another human. You have to make sure you are in a realm where you are learning, but aren't so far out that you can easily be misled. You do need to test out explanations and seek multiple sources, of which AI is only one.
An AI can hallucinate and just make things up, but the chance it different sessions with different AIs lead to the same hallucinations that consistently build upon each other is unlikely enough to not be worth worrying about.
If you don’t know k8s, or any tech really, you can RTFM, you can generate or apply some premade manifests, you can feed the errors into the LLM and ask about it, you can google the error message, you can do a lot of things. Often times, in the “real world” of software engineering, you learn by having zero idea of how to do something to start with and gradually come up with ideas from screwing around with a particular tool or prototyping a solution and seeing how well it works.
I agree that some of the above basically amounts to: it’s easier to learn new things. Which itself might sound ho-hum, but it really is a fundamental responsibility of software engineers to learn new things, understand new and complex problems, and learn how to do it correctly and repeatable. LLMs unquestionably help with this, even with their tendency to hallucinate: usually proof by contradiction (or the failure of an over-confident chaos machine) is even better than just having a thing that spits out perfect solutions without needing the operator to understand it.
However, I will say that there is a very large gulf between learning how to reason about complex systems or code and learning how to use the entropy machine to produce nominally acceptable work. Pure reliance and delegation of responsibility to the AI will torpedo a lot of projects that a good engineer could solve, and no amount of lines of code makes up for a poorly conceived product or a brittle implementation that the LLM later stumbles over. Good engineering principles are more important than ever, and the developer has to force the LLM to conform to those.
There are many things to question about agentic coding: whether it’s truly cost/effort effective, whether it saves time, whether it makes you worse at problem solving by handing you facile half-solutions that wither in the face of the chaos of the real world, etc. But they clearly aren’t a technology which “doesn’t do ANYTHING useful”, as some HN posters claim.
I don’t think the conclusion is right. Your org might still require enough React knowledge to keep you gainfully employed as a pure React dev but if all you did was changing some forms, this is now something pretty much anyone can do. The value of good FE architecture increased if anything since you will be adding code quicker. Making sure the LLM doesn’t stupidly couple stuff together is quite important for long term success
It really depends on whether coding agents is closer to "compiler" or not. Very few amongst us verify assembly code. If the program runs and does the thing, we just assume it did the right thing.
> someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org
Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.
Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code
I follow at least one GitHub repo (a well respected one that's made the HN front page), and where everything is now Claude coded. Things do move fast, but I'm seriously under impressed with the quality. I've raised a few concerns, some were taken in, others seem to have been shut down with an explanation Claude produced that IMO makes no sense, but which is taken at face value.
This matches my personal experience. I was asked to help with a large Swift iOS app without knowing Swift. Had access to a frontier agent. I was able to consistently knock a couple of tickets per week for about a month until the fire was out and the actual team could take over. Code review by the owners means the result isn't terrible, but it's not great either. I leave the experience none the wiser: gained very little knowledge of Swift, iOS development or the project. Management was happy with the productivity boost.
I think it's fleeting and dread a time where most code is produced that way, with the humans accumulating very little institutional knowledge and not knowing enough to properly review things.
I'm just one data point. Me being unimpressed should not be used to judge their entire work. I feel like I have a pretty decent understanding of a few small corners of what they're doing, and find it a bad omen that they've brushed aside some of my concerns. But I'm definitely not knowledgeable enough about the rest of it all.
What concerns me is, generally, if the experts (and I do consider them experts) can use frontier AI to look very productive, but upon close inspection of something you (in this case I) happen to be knowledgeable about, it's not that great (built on shaky foundations), what about all the vibe coded stuff built by non-experts?
The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.
There was much tweaking, testing, refactoring (often manually) before releasing it.
Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.
The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.
I can give more if you like, but hope that is what you are looking for.
I appreciate the effort and that's a nice looking project. That's similar to the gains I've gotten as well with Greenfield projects (I use codex too!). However not as grandiose as these the Canadian girlfriend post category.
I had some .csproj files that only worked with msbuild/vsbuild that I wanted to make compatible with dotnet. Copilot does a pretty good job of updating these and identifying the ones more likely to break (say web projects compared to plain dlls). It isn't a simple fire and forget, but it did make it possible without me needing to do as much research into what was changing.
Is that a net benefit? Without AI, if I really wanted to do that conversion, I would have had to become much more familiar with the inner workings of csproj files. That is a benefit I've lost, but it would've also taken longer to do so, so much time I might not have decided to do the conversion. My job doesn't really have a need for someone that deeply specialized in csproj, and it isn't a particular interest of mine, so letting AI handle it while being able to answer a few questions to sate my curiosity seemed a great compromise.
A second example, it works great as a better option to a rubber duck. I noticed some messy programming where, basically, OOP had been abandoned in favor of one massive class doing far too much work. I needed to break it down, and talking with AI about it helped come up with some design patterns that worked well. AI wasn't good enough to do the refactoring in one go, but it helped talk through the pros and cons of a few design pattern and was able to create test examples so I could get a feel for what it would look like when done. Also, when I finished, I had AI review it and it caught a few typos that weren't compile errors before I even got to the point of testing it.
None of these were things AI could do on their own, and definitely aren't areas I would have just blindly trusted some vibe coded output, but overall it was productivity increase well worth the $20 or so cost.
(Now, one may argue that is the subsidized cost, and the unsubsidized cost would not have been worthwhile. To that, I can only say I'm not versed enough on the costs to be sure, but the argument does seem like a possibility.)
I was at a podiatrist yesterday who explained that what he's trying to do is to "train" an LLM agent on the articles and research papers he's published to create a chatbot that can provide answers to the most common questions more quickly than his reception team can.
He's also using it to speed up writing his reports to send to patients.
Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant. Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.
> Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant.
As a medical imaging tech, I think this is a terrible idea. At least for the test I perform, a lot of redundancy and double-checking is necessary because results can easily be misleading without a diligent tech or critical-thinking on the part of the reading physician. For instance, imaging at slightly the wrong angle can make a normal image look like pathology, or vice versa.
Maybe other tests are simpler than mine, but I doubt it. If you've ever asked an AI a question about your field of expertise and been amazed at the nonsense it spouts, why would you trust it to read your medical tests?
> Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.
Unless they had the exact same schooling as the radiologist, I wouldn't trust the consultant to interpret my test, even if paired with an AI. There's a reason this is a whole specialized field -- because it's not as simple as interpreting an EKG.
I work in insurance - regulated, human capital heavy, etc.
Three examples for you:
- our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans
- our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients.
- our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.
We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document
Thanks for that. It's a really interesting data point. My takeaway, which I've already felt and I feel like anyone dealing with insurance would anyway, is that the industry is wildly outdated. Which I guess offers a lot of low hanging fruit where AI could be useful. Other than the email drafting, it really seems like all of that should have been handled by just normal software decades ago.
A big win for 'normal software' here is to have authentication as a multi-party/agent approval process. Have the client of the insurance company request the automated delivery of certified documents to some other company's email.
> "draft" clearly implies a human will will double-check.
The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?
> the whole point was to free the human from reading all the details and relevant context about the case
That's your assumption.
My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.
The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.
On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.
The real question is how do you enforce that the human is reviewing and double-checking?
When the AI gets "good enough", and the review becomes largely rubber stamping, and 50% is pretty close to that, then you run the risk that a good percentage of the reviews are approved without real checks.
This is why nuclear operators and security scanning operators have regular "awareness checks". Is something like this also being done, and if so what is the failure rate of these checks?
Years ago I worked at an insurance company where the whole job was doing this - essentially reading through long PDFs with mostly unrelated information and extracting 3-4 numbers of interest. It paid terrible and few people who worked there cared about doing a good job. I’m sure mistakes were constantly being made.
I think we are the stage of the "AI Bubble" that is equivalent to saying it is 1997, 18% of U.S. households have internet access. Obviously, the internet is not working out or 90%+ of households would have internet access if it was going to be as big of deal as some claim.
I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.
The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.
It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.
"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".
That sounds a lot like "LLMs are finally powerful enough technology to overcome our paper/PDF-based business". Solving problems that frankly had no business existing in 2020.
Here's some anecdata from the B2B SaaS company I work at
- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work
- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it
- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.
- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production
---
All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.
Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.
If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.
This kind of take I find genuinely baffling. I can't see how anybody working with current frontier models isn't finding them a massive performance boost. No they can't replace a competent developer yet, but they can easily at least double your productivity.
Careful code review and a good pull request flow are important, just as they were before LLMs.
People thought they were doubling their productivity and then real, actual studies showed they were actually slower. These types of claims have to be taken with entire quarries of salt at this point.
No, I wouldn't say it's super complex. I make custom 3D engines. It's just that you and I were probably never in any real competition anyway, because it's not super common to do what I do.
I will add that LLMs are very mediocre, bordering on bad, at any challenging or interesting 3D engine stuff. They're pretty decent at answering questions about surface API stuff (though, inexplicably, they're really shit at OpenGL which is odd because it has way more code out there written in it than any other API) and a bit about the APIs' structure, though.
I really don't know how effective LLMs are at that but also that puts you in an extremely narrow niche of development, so you should keep that in mind when making much more general claims about how useful they are.
My bigger point was that not everyone who is skeptical about supposed productivity gains and their veracity is in competition with you. I think any inference you made beyond that is a mistake on your part.
(I did do web development and distributed systems for quite some time, though, and I suspect while LLMs are probably good at tutorial-level stuff for those areas it falls apart quite fast once you leave the kiddy pool.)
P.S.:
I think it's very ironic that you say that you should be careful to not speak in general terms about things that might depend much more on context, when you clearly somehow were under the belief that all developers must see the same kind of (perceived) productivity gains you have.
You discount the value of being intimately familiar with each line of code, the design decisions and tradeoffs because one wrote the bloody thing.
It is negative value for me to have a mediocre machine do that job for me, that I will still have to maintain, yet I will have learned absolutely nothing from the experience of building it.
This to me seems like saying you can learn nothing from a book unless you yourself have written it. You can read the code the LLM writes the same as you can read the code your colleagues write. Moreover you have to pretty explicitly tell it what to write for it to be very useful. You're still designing what it's doing you just don't have to write every line.
"Reading is the creative center of a writer’s life.” — Stephen King, On Writing
You need to design the code in order to tell the LLM how to write it. The LLM can help with this but generally it's better to have a full plan in place to give it beforehand. I've said it before elsewhere but I think this argument will eventually be similar to the people arguing you don't truly know how to code unless you're using assembly language for everything. I mean sure assembly code is better / more efficient in every way but who has the time to bother in a post-compiler world?
Good point! You should generate a website for them with "why ai is not good" articles. Have it explore all possible angles. Make it detective style story with appealing characters.
I would also take those studies with a grain of salt at this point, or at least taking into consideration that a model from even a few months ago might have significant enough results from the current frontier models.
And in my personal experience it definitely helps in some tasks, and as someone who doesn't actually enjoy the actual coding part that much, it also adds some joy to the job.
Recently I've also been using it to write design docs, which is another aspect of the job that I somewhat dreaded.
I think the bigger part of those studies was actually that they were a clear sign that whatever productivity coefficient people were imagining back then was clearly a figment of their imagination, so it's useful to take that lesson with you forward. If people are saying they're 2 times productive with LLMs, it's still likely the case that a large part of that is hyperbole, whatever model they're working with.
It's the psychology of it that's important, not the tool itself; people are very bad at understanding where they're spending their time and cannot accurately assess the rate at which they work because of it.
I like coming up with the system design and the low level pseudo code, but actually translating it to the specific programming language and remembering the exact syntax or whatnot I find pretty uninspiring.
Same with design docs more or less, translating my thoughts into proper and professional English adds a layer I don't really enjoy (since I'm not exactly great at it), or stuff like formatting, generating a nice looking diagram, etc.
Just today I wrote a pretty decent design doc that took me two hours instead of the usual week+ slog/procrastination, and it was actually fairly enjoyable.
Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.
Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.
Why is that the real metric? If you can turn a 1x dev into a 2x dev that's a huge deal, especially if you can also turn the original 2x dev into a 4x dev.
And far from "churning out code" my work is better with LLMs. Better tested, better documented, and better organized because now I can do refactors that just would have taken too much time before. And more performant too because I can explore more optimization paths than I had time to before.
Refusing to use LLMs now is like refusing to use compilers 20 years ago. It might be justified in some specific cases but it's a bad default stance.
The answer to "Can you perform at the same level as a dev who is considered 2x as productive as you?" is self-explanatory. If your answer is negative, you are not 2x as productive
Seriously, I’m lucky if 10% of what I do in a week is writing code. I’m doubly lucky if, when I do, it doesn’t involve touching awful corporate horse-shit like low-code products that are allergic to LLM aid, plus multiple git repos, plus having knowledge from a bunch of “cloud” dashboard and SaaS product configs. By the time I prompt all that external crap in I could have just written what I wanted to write.