Most active commenters

Context Engineering for Agents

(rlancemartin.github.io)

Show context

ares623 ◴[04 Jul 25 05:22 UTC] No.44461351[source]▶

Another article handwaving or underselling the effects of hallucination. I can't help but draw parallels to layer 2 attempts from crypto.

replies(1): >>44462031 #

1. FiniteIntegral ◴[04 Jul 25 07:27 UTC] No.44462031[source]▶

>>44461351 #

Apple released a paper showing the diminishing returns of "deep learning" specifically when it comes to math. For example, it has a hard time solving the Tower of Hanoi problem past 6-7 discs, and that's not even giving it the restriction of optimal solutions. The agents they tested would hallucinate steps and couldn't follow simple instructions.

On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

replies(7): >>44462128 #>>44462410 #>>44462950 #>>44464219 #>>44464240 #>>44464924 #>>44465232 #

2. hnlmorg ◴[04 Jul 25 07:41 UTC] No.44462128[source]▶

>>44462031 (TP) #

Context engineering isn’t a rebranding. It’s a widening of scope.

Like how all squares are rectangles, but not all rectangles are squares; prompt engineering is context engineering but context engineering also includes other optimisations that are not prompt engineering.

That all said, I don’t disagree with your overall point regarding the state of AI these days. The industry is full of so much smoke and mirrors these days that it’s really hard to separate the actual novel uses of “AI” vs the bullshit.

replies(1): >>44463531 #

3. senko ◴[04 Jul 25 08:25 UTC] No.44462410[source]▶

>>44462031 (TP) #

That's one reading of that paper.

The other is that they intentionally forced LLMs to do the things we know are bad at (following algorithms, tasks that require more context that available, etc) without allowing them to solve it in a way they're optimized to do (write a code that implements the algorithm).

A cynical read is that the paper is the only AI achievement Apple has managed to do in the past few years.

(There is another: they managed not to lose MLX people to Meta)

4. OJFord ◴[04 Jul 25 09:43 UTC] No.44462950[source]▶

>>44462031 (TP) #

Let's just call all aspects of LLM usage 'x-engineering' to professionalise it, even while we're barely starting to figure it out.

replies(1): >>44463885 #

5. bsenftner ◴[04 Jul 25 11:24 UTC] No.44463531[source]▶

>>44462128 #

Context engineering is the continual struggle of software engineers to explain themselves, in an industry composed of weak communicators that interrupt to argue before statements are complete, do not listen because they want to speak, and speak over one another. "How to use LLMs" is going to be argued forever simply because those arguing are simultaneously not listening.

replies(1): >>44463815 #

6. hnlmorg ◴[04 Jul 25 12:08 UTC] No.44463815{3}[source]▶

>>44463531 #

I really don’t think that’s a charitable interpretation.

One thing I’ve noticed about this AI bubble is just how much people are sharing and comparing notes. So I don’t think the issue is people being too arrogant (or whatever label you’d prefer to use) to agree on a way to use.

From what I’ve seen, the problem is more technical in nature. People have built this insanely advanced thing (LLMs) and now trying to hammer this square peg into a round hole.

The problem is that LLMs are an incredibly big breakthrough, but they’re still incredibly dumb technology in most ways. So 99% of the applications that people use it for are just a layering of hacks.

With an API, there’s generally only one way to call it. With a stick of RAM, there’s generally only one way to use it. But to make RAM and APIs useful, you need to call upon a whole plethora of other technologies too. With LLMs, it’s just hacks on top of hacks. And because it seemingly works, people move on before they question whether this hack will still work in a months time. Or a years time. Or a decade later. Because who cares when the technology would already be old next week anyway.

replies(1): >>44463968 #

7. antonvs ◴[04 Jul 25 12:18 UTC] No.44463885[source]▶

>>44462950 #

It’s fitting, since the industry is largely driven by hype engineering.

replies(1): >>44465329 #

8. bsenftner ◴[04 Jul 25 12:34 UTC] No.44463968{4}[source]▶

>>44463815 #

It's not a charitable opinion. It is not people being arrogant either. It's the software industry's members were not taught how to effectively communicate, and due to that the attempts by members of the industry to explain create arguments and confusion. We have people making declarations, with very little acknowledgement of prior declarations.

LLMs are extremely subtle, they are intellectual chameleons, which is enough to break many a person's brain. They respond as one prompts them in a reflection of how they were prompted, which is so subtle it is lost on the majority. The key to them is approaching them as statistical language constructs with mirroring behavior as the mechanism they use to generate their replies.

I am very successful with them, yet my techniques seem to trigger endless debate. I treat LLMs as method actors and they respond in character and with their expected skills and knowledge. Yet when I describe how I do this, I get unwanted emotional debate, as if I'm somehow insulting others through my methods.

replies(2): >>44464189 #>>44464306 #

9. swader999 ◴[04 Jul 25 13:06 UTC] No.44464189{5}[source]▶

>>44463968 #

That's interesting and a unique perspective. Like to hear more.

10. koakuma-chan ◴[04 Jul 25 13:10 UTC] No.44464219[source]▶

>>44462031 (TP) #

> On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

It is different. There are usually two main parts to the prompt:

1. The context.

2. The instructions.

The context part has to be optimized to be as small as possible, while still including all the necessary information. It can also be compressed via, e.g., LLMLingua.

On the other hand, the instructions part must be optimized to be as detailed as possible, because otherwise the LLM will fill the gaps with possibly undesirable assumptions.

So "context engineering" refers to engineering the context part of the prompt, while "prompt engineering" could refer to either engineering of the whole prompt, or engineering of the instructions part of the prompt.

replies(1): >>44464288 #

11. sitkack ◴[04 Jul 25 13:13 UTC] No.44464240[source]▶

>>44462031 (TP) #

At this point all of Apple's AI take-down papers have serious flaws. This one has been beaten to death. Finding citations is left to the reader.

12. 0x445442 ◴[04 Jul 25 13:18 UTC] No.44464288[source]▶

>>44464219 #

I'm getting on in years so I'm becoming progressively more ignorant on technical matters. But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode. Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

replies(1): >>44464417 #

13. janto ◴[04 Jul 25 13:19 UTC] No.44464306{5}[source]▶

>>44463968 #

Ouija boards with statistical machinery :)

14. koakuma-chan ◴[04 Jul 25 13:34 UTC] No.44464417{3}[source]▶

>>44464288 #

> But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode.

What I described not only applies to using AI for coding, but to most of the other use cases as well.

> Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

There are many ways to use AI for coding. You could use something like Claude Code for more granular updates, or just copy and paste your entire code base into, e.g., Gemini, and have it oneshot a new feature (though I like to prompt it to make a checklist, and generate step by step).

And that is also not only about just typing, that is also about debugging, refactoring, figuring out how a certain thing works, etc. Nowadays I not only barely write any code by hand, but also most of the debugging, and other miscellaneous tasks I offload to LLMs. They are simply much faster and convenient at connecting all the dots, making sure nothing is missed, etc.

15. vidarh ◴[04 Jul 25 14:34 UTC] No.44464924[source]▶

>>44462031 (TP) #

The paper in question is atrocious.

If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.

Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.

They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.

For a paper that poor to get released at all was shocking.

16. skeeter2020 ◴[04 Jul 25 15:10 UTC] No.44465232[source]▶

>>44462031 (TP) #

We used to call both of these "being good with the Google". Equating it to engineering is both hilarious and insulting.

replies(1): >>44466397 #

17. klabb3 ◴[04 Jul 25 15:23 UTC] No.44465329{3}[source]▶

>>44463885 #

It’s not good for engineering with the dilution of the term. We don’t really have many backup terms to switch to.

Maybe we should look to science and start using the term pseudo-engineering to dismiss the frivolous terms. I don’t really like that though since pseudoscience has an invalidating connotation whereas eg prompt engineering is not a lesser or invalid form of engineering - it’s simply not engineering at all, and no more or less ”valid”. It’s like calling yourself a ”canine engineer” when teaching your dog to do tricks.

18. triyambakam ◴[04 Jul 25 17:33 UTC] No.44466397[source]▶

>>44465232 #

It is a stretch but not semantically wrong. Strictly, engineering is the practical application of science; we could say that the study of the usage of a model is indeed science and so by applying this science it is engineering.

↑