But it still feels more like a small incremental improvement rather than a radical change, and I still feel its limitations constantly.
Like... it gives me the sort of decent but uninspired solution I would expect it to generate without predictably walking me through a bunch of obvious wrong turns as I repeatedly correct it as I would have to have done with earlier models.
And that's certainly not nothing and makes the experience of using it much nicer, but I'm still going to roll my eyes anytime someone suggests that LLMs are the clear path to imminently available AGI.
It was reverse engineering ~550MB of Hermes bytecode from a react native app, with each function split into a separate file for grep-ability and LLM compatibility.
The others would all start off right then quickly default to just greping randomly what they expected it to be, which failed quickly. 2.5 traced the function all the way back to the networking call and provided the expected response payload.
All the others hallucinated the networking response I was trying to figure out. 2.5 Provided it exactly enough for me to intercept the request and using the response it provided to get what I wanted to show up.
Even Sonnet 3.7 was able to do refactoring work on my codebase sonnet 3.6 could not.
Really not seeing the "LLMs not improving" story
awk '/^=> \[Function #/ {
if (out) close(out);
fn = $0; sub(/^.*#/, "", fn); sub(/ .*/, "", fn);
out = "function_" fn ".txt"
}
{ if (out) print > out }' bundle.hasm
Quick example of the output it gave and it's process.I'm wondering how much gemini 2.5 being "amazing" comes from sonnet-3.7 being such a disappointment.