←back to thread

421 points sohkamyung | 1 comments | | HN request time: 0.258s | source
Show context
scarmig ◴[] No.45669929[source]
If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #
1. impossiblefork ◴[] No.45675250[source]
Yes, but the problems with processing human writing are huge, so even if this article is bad something like the problem they claim exists is very real. LLMs misunderstanding individual sentences, losing track of who said what etc. happen in best models, including GPT-5 when they're asked to analyze normal human-written discussions like those we have here.

Much of this is probably solvable, but it very much not solved.