Green trees beside a body of water from above Photo by Maurits Bausenhart on Unsplash

Large lanugauge model transformers (LLMs), especially the ones marketed as Artificial Intelligence, are all the rage but still searching for a niche where the tools actually make sense. It seems like code generation isn’t actually a good fit for them.

In doesn’t equal out

LLMs are non-deterministic - you don’t always get the same output from the same prompt. This isn’t great for creating code, since you may have to keep rolling the dice to get something that works. An area where non-deterministic output makes more sense is for output which is going to be interpreted by someone. For instance, when summarising an article or working out sentiment, it doesn’t matter if you get slightly different answers if the user can understand what is meant. “I’m unhappy with the service you’re providing” could be summarised as “upset” or “unhappy” or “angry” - the word itself isn’t necessarily the most meaningful to a user as long as they understand that the sentiment is negative.

This is one usecase where LLMs are pretty useful in my day job, condensing transcripts into a short summary so call centre agents can get the gist of a customer contact - the summary doesn’t need to be deterministic, since the agents will parse and understand the snippet.

Non-deterministic code generation

The non-deterministic characteristic of LLMs doesn’t seem like a great fit for code generation, and instead feels more like rolling dice to see if you get the outcome you wanted. If you had some code created by an LLM but it wasn’t quite right, re-rolling the dice with a new prompt means you could get a completely unexpected outcome, rather than just an improvement to the section which needed changing. It also means you may end up with multiple contrasting code styles, making code more complicated to understand later on. The promised LLM time-saving benefit doesn’t really pan out if you need to keep re-rolling dice until you get something mostly what you were aiming for.

Costs

Not being able to consistently generate the desired code also means increasing the cost of code creation. Right now, while companies are competing to be the market leader, tokens are cheap enough, but the venture captial (VC) backers will eventually want to make back some of the vast amounts of money being spent. Looking at Uber as an example, once Uber was the dominant choice in the lift-sharing space, their prices increased as VCs stopped subsidising the growth phase of the app, and instead started recouping the outlay. Re-rolling your code generation to iron out bugs once LLM companies enter the levelling-out phase may end up being a very expensive pattern - you’re not saving money if each roll of the dice costs £100.

Evaluating output

LLM code generation is often quoted as saving time (even though this seems to only be a perceived gain - studies show that involving LLMs in coding tasks actually increses the amount of time taken), but the writing code part of software development is not really the part which needs speeding up: the thinking and planning phase before writing code is the difficult and time-consuming part. If you’ve already done all the thinking and planning before coding - ie you’re out of the prototype/experimentation phase - then using an LLM to generate code which also needs to be tweaked for correctness isn’t going to save you much time.

Reviewing the output is also not straightforward - you have to have a lot of knowledge about what you’re trying to do to be able to assess whether the generated code is correct or not. There’s no point in being able to generate a lot of code in a short amount of time if it contains a large number of bugs, or worse, doesn’t really do what you wanted in the first place. Having to do a lot of thinking both before and after generating code feels like outsourcing, and paying, for the easiest part of software development.

When are LLMs a good fit?

So, with these characteristics, what are good usecases for LLMs? One area could be a quick prototype project in a well-trodden path that will be thrown away - working in a known area means that there will be a lot of quality training data and can easily review the output. The throwaway nature of prototypes mean characteristics of LLM code like not always following consistent code styles means any bugs won’t end up causing stability issues in production.

Another area which could be a good fit are when results can be interpreted, as mentioned earlier, when a person is going to evaluate the output. This means the output from LLMs doesn’t need to be exact, which actually fits the characteristics fairly well.

…and the elephant in the room

The issues with code generation aside, the massive issue with LLMs are that they don’t recognise the data they’ve been trained on, even copyrighted work which hasn’t been explicitly approved to use, and that they’re catastrophic environmentally. 2025 was the second-hottest year on record, in part driven by the massive energy demand of LLM companies. While it’s still difficult to put figures on the exact cost of LLMs, the indirect imapact can be measured. For instance, OpenAI announcing they’re planning to reach 30 gigawatts in data centre infrastructure, the equivalent of the entire country of Spain, 21st in the list of electricity consumption by country. This increase in energy demand, mostly powered by existing fossil fuel usage, completely negates any gains from recent increases in electric vehicle (EV) usage.