On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.
Like how all squares are rectangles, but not all rectangles are squares; prompt engineering is context engineering but context engineering also includes other optimisations that are not prompt engineering.
That all said, I don’t disagree with your overall point regarding the state of AI these days. The industry is full of so much smoke and mirrors these days that it’s really hard to separate the actual novel uses of “AI” vs the bullshit.
The other is that they intentionally forced LLMs to do the things we know are bad at (following algorithms, tasks that require more context that available, etc) without allowing them to solve it in a way they're optimized to do (write a code that implements the algorithm).
A cynical read is that the paper is the only AI achievement Apple has managed to do in the past few years.
(There is another: they managed not to lose MLX people to Meta)
One thing I’ve noticed about this AI bubble is just how much people are sharing and comparing notes. So I don’t think the issue is people being too arrogant (or whatever label you’d prefer to use) to agree on a way to use.
From what I’ve seen, the problem is more technical in nature. People have built this insanely advanced thing (LLMs) and now trying to hammer this square peg into a round hole.
The problem is that LLMs are an incredibly big breakthrough, but they’re still incredibly dumb technology in most ways. So 99% of the applications that people use it for are just a layering of hacks.
With an API, there’s generally only one way to call it. With a stick of RAM, there’s generally only one way to use it. But to make RAM and APIs useful, you need to call upon a whole plethora of other technologies too. With LLMs, it’s just hacks on top of hacks. And because it seemingly works, people move on before they question whether this hack will still work in a months time. Or a years time. Or a decade later. Because who cares when the technology would already be old next week anyway.
LLMs are extremely subtle, they are intellectual chameleons, which is enough to break many a person's brain. They respond as one prompts them in a reflection of how they were prompted, which is so subtle it is lost on the majority. The key to them is approaching them as statistical language constructs with mirroring behavior as the mechanism they use to generate their replies.
I am very successful with them, yet my techniques seem to trigger endless debate. I treat LLMs as method actors and they respond in character and with their expected skills and knowledge. Yet when I describe how I do this, I get unwanted emotional debate, as if I'm somehow insulting others through my methods.
It is different. There are usually two main parts to the prompt:
1. The context.
2. The instructions.
The context part has to be optimized to be as small as possible, while still including all the necessary information. It can also be compressed via, e.g., LLMLingua.
On the other hand, the instructions part must be optimized to be as detailed as possible, because otherwise the LLM will fill the gaps with possibly undesirable assumptions.
So "context engineering" refers to engineering the context part of the prompt, while "prompt engineering" could refer to either engineering of the whole prompt, or engineering of the instructions part of the prompt.
What I described not only applies to using AI for coding, but to most of the other use cases as well.
> Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.
There are many ways to use AI for coding. You could use something like Claude Code for more granular updates, or just copy and paste your entire code base into, e.g., Gemini, and have it oneshot a new feature (though I like to prompt it to make a checklist, and generate step by step).
And that is also not only about just typing, that is also about debugging, refactoring, figuring out how a certain thing works, etc. Nowadays I not only barely write any code by hand, but also most of the debugging, and other miscellaneous tasks I offload to LLMs. They are simply much faster and convenient at connecting all the dots, making sure nothing is missed, etc.
If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.
Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.
They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.
For a paper that poor to get released at all was shocking.
Maybe we should look to science and start using the term pseudo-engineering to dismiss the frivolous terms. I don’t really like that though since pseudoscience has an invalidating connotation whereas eg prompt engineering is not a lesser or invalid form of engineering - it’s simply not engineering at all, and no more or less ”valid”. It’s like calling yourself a ”canine engineer” when teaching your dog to do tricks.