←back to thread

60 points QueensGambit | 1 comments | | HN request time: 0s | source
Show context
QueensGambit ◴[] No.45683114[source]
Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

replies(10): >>45686080 #>>45686164 #>>45686265 #>>45686295 #>>45686359 #>>45686379 #>>45686464 #>>45686479 #>>45686558 #>>45686559 #
cpa ◴[] No.45686265[source]
I don't think 2 is true: when OpenAI model won a gold medal in the math olympiads, it did so without tools or web search, just pure inference. Such a feat definitely would not have happened with o1.
replies(2): >>45686389 #>>45686475 #
simonw ◴[] No.45686475[source]
Yeah, I confirmed this at the time. Neither OpenAI nor Gemini used tools as part of their IMO gold medal performances.

Here's OpenAI's tweet about this: https://twitter.com/SebastienBubeck/status/19465776504050567...

> Just to spell it out as clearly as possible: a next-word prediction machine (because that's really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.

My notes: https://simonwillison.net/2025/Jul/19/openai-gold-medal-math...

They DID use tools for the International Collegiate Programming Contest (ICPC) programming one though: https://twitter.com/ahelkky/status/1971652614950736194

> For OpenAI, the models had access to a code execution sandbox, so they could compile and test out their solutions. That was it though; no internet access.

replies(1): >>45686838 #
emp17344 ◴[] No.45686838[source]
We still have next to no real information on how the models achieved the gold medal. It’s a little early to be confirming anything, especially when the main source is a Twitter thread initiated by a company known for “exaggerating” the truth.
replies(1): >>45687260 #
simonw ◴[] No.45687260[source]
If you're not going to believe researchers when they tell you how they did something then sure, we don't know how they did it.

Given how much bad press OpenAI got just last week[1] when one one of their execs clumsily (and I would argue misleadingly) described a model achievement and then had to walk it back amid widespread headlines about their dishonesty, those researchers have a VERY strong incentive to tell the truth.

[1] https://techcrunch.com/2025/10/19/openais-embarrassing-math/

replies(1): >>45687774 #
emp17344 ◴[] No.45687774[source]
Any company will apologize when they receive bad press. That’s basic corporate PR, not integrity.
replies(1): >>45687806 #
simonw ◴[] No.45687806{3}[source]
It illustrates that there is a real risk to lying about research results: if you get caught it's embarrassing.

It's also worth taking professional integrity into account. Even if OpenAI's culture didn't value the truth individual researchers still care about being honest.

replies(1): >>45687880 #
emp17344 ◴[] No.45687880{4}[source]
This exact statement could be said about literally any corporation or organization. And yet, corporations still lie and mislead, because deception helps you make money and acquire funding.

In OpenAI’s case, this isn’t exactly the first time they’ve been caught doing something ethically misguided:

https://techcrunch.com/2025/01/19/ai-benchmarking-organizati...

replies(1): >>45688978 #
1. simonw ◴[] No.45688978{5}[source]
That story feels very different to me from straight up lying about whether a mathematical competition result used tools or not.