Context Engineering for Agents

(rlancemartin.github.io)

114 points 0x79de | 1 comments | 01 Jul 25 10:56 UTC | HN request time: 0.237s | source

Show context

ares623 ◴[04 Jul 25 05:22 UTC] No.44461351[source]▶

Another article handwaving or underselling the effects of hallucination. I can't help but draw parallels to layer 2 attempts from crypto.

replies(1): >>44462031 #

FiniteIntegral ◴[04 Jul 25 07:27 UTC] No.44462031[source]▶

>>44461351 #

Apple released a paper showing the diminishing returns of "deep learning" specifically when it comes to math. For example, it has a hard time solving the Tower of Hanoi problem past 6-7 discs, and that's not even giving it the restriction of optimal solutions. The agents they tested would hallucinate steps and couldn't follow simple instructions.

On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

replies(7): >>44462128 #>>44462410 #>>44462950 #>>44464219 #>>44464240 #>>44464924 #>>44465232 #

1. vidarh ◴[04 Jul 25 14:34 UTC] No.44464924[source]▶

>>44462031 #

The paper in question is atrocious.

If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.

Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.

They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.

For a paper that poor to get released at all was shocking.

↑