←back to thread

60 points QueensGambit | 2 comments | | HN request time: 0.001s | source
Show context
QueensGambit ◴[] No.45683114[source]
Hi HN, OP here. I'd appreciate feedback from folks with deep model knowledge on a few technical claims in the essay. I want to make sure I'm getting the fundamentals right.

1. On o1's arithmetic handling: I claim that when o1 multiplies large numbers, it generates Python code rather than calculating internally. I don't have full transparency into o1's internals. Is this accurate?

2. On model stagnation: I argue that fundamental model capabilities (especially code generation) have plateaued, and that tool orchestration is masking this. Do folks with hands-on experience building/evaluating models agree?

3. On alternative architectures: I suggest graph transformers that preserve semantic meaning at the word level as one possible path forward. For those working on novel architectures - what approaches look promising? Are graph-based architectures, sparse attention, or hybrid systems actually being pursued seriously in research labs?

Would love to know your thoughts!

replies(10): >>45686080 #>>45686164 #>>45686265 #>>45686295 #>>45686359 #>>45686379 #>>45686464 #>>45686479 #>>45686558 #>>45686559 #
1. anonymoushn ◴[] No.45686359[source]
I don't really know what you mean by "preserve semantic meaning at the word level." The significant misunderstanding about tokenization present elsewhere in the article is concerning, given that the proposed path forward is to do with replacing tokenization somehow.
replies(1): >>45687203 #
2. remich ◴[] No.45687203[source]
Right, words don't have semantic meaning on their own, that meaning is derived from surrounding context. "Cat" is both an animal and a bash command.