←back to thread

60 points QueensGambit | 2 comments | | HN request time: 0.588s | source
Show context
QueensGambit ◴[] No.45683114[source]
Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

replies(10): >>45686080 #>>45686164 #>>45686265 #>>45686295 #>>45686359 #>>45686379 #>>45686464 #>>45686479 #>>45686558 #>>45686559 #
XenophileJKO ◴[] No.45686464[source]
Point 2 is 1000% not true, the models have both gotten better at the overall act of coding, but have also gotten WAY better at USING tools. This isn't tool orchestration frameworks, this is knowing how and when to use tools effectively and it is largely inside the model. I would also say this is a fundamental model capability.

This improved think->act->sense loop that they now form, exponentially increases the possible utility of the models. We are just starting to see this with gpt-5 and the 4+ series of Claude models.

replies(1): >>45686805 #
emp17344 ◴[] No.45686805[source]
Yes, the models have gotten better at using tools because tech companies have poured an insane amount of money into improving tools and integrating them with LLMs. Is this because the models have actually improved, or because the tools and integration methods have improved? I don’t think anyone actually knows.
replies(2): >>45687139 #>>45687155 #
1. XenophileJKO ◴[] No.45687139[source]
The models have improved. They are using "arbitrary tools" better.
replies(1): >>45687549 #
2. emp17344 ◴[] No.45687549[source]
I don’t know what you mean, because arbitrary tools don’t integrate with LLMs in the first place. Are you referring to MCP?