←back to thread

165 points distalx | 1 comments | | HN request time: 0.43s | source
Show context
ilaksh[dead post] ◴[] No.43948635[source]
[flagged]
1. andy99 ◴[] No.43949010[source]
The failure modes from 2023 are identical to those today. I agree with the now deleted post that there has been essentially no progress. Benchmark scores (if you think they are a relevant proxy for anything) obviously have increased, but (for example) from 50% to 90% (probably less drastically), not the 99% to 99.999% you'd need for real assurance a widely used system won't make mistakes.

Like in 2023, everything is still a demo, there's nothing that could be considered reliable.