/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
ComputeBench: Instruction-following benchmarks for long, step-by-step arithmetic
(notdian.github.io)
1 points
notdian
| 2 comments |
20 Dec 25 19:52 UTC
|
HN request time: 0.396s
|
source
ID:
GO
1.
notdian
◴[
20 Dec 25 19:52 UTC
]
No.
46338994
[source]
▶
>>46338993 (OP)
#
Vibecoded this after seeing models do amazing things but still drift on simple recursive steps; tracks exact match, answer accuracy, prefix correctness. Feedback welcome.
↑