←back to thread

423 points sohkamyung | 2 comments | | HN request time: 0s | source
Show context
scarmig ◴[] No.45669929[source]
If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #
afavour ◴[] No.45669943[source]
> or it (shocking) cites Wikipedia instead of the BBC.

No... the problem is that it cites Wikipedia articles that don't exist.

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(6): >>45670006 #>>45670093 #>>45670094 #>>45670184 #>>45670903 #>>45672812 #
shinycode ◴[] No.45670903[source]
I used perplexity for searches and I clicked on all sources that were given. Depending on the model used from 100% to 20% of the urls I tested did not exist. I kept on querying the LLM about it and it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists. Useless.
replies(1): >>45670989 #
smrq ◴[] No.45670989[source]
I share your opinion on the results, but why would you trust the LLM explanation for why it does what it does?
replies(1): >>45671491 #
shinycode ◴[] No.45671491{3}[source]
I don’t trust it at all. I wanted to know if he would be able to explain its own results. Just because it was displaying sources and links made me trust it until I checked and was horrified. I wanted to know if it was old link that broke or changed but no apparently
replies(1): >>45672119 #
1. macintux ◴[] No.45672119{4}[source]
You said:

>...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.

smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.

Your answer:

> ...I wanted to know if it was old link that broke or changed but no apparently

Leads me to believe that you misunderstood smrq's question.

replies(1): >>45674436 #
2. shinycode ◴[] No.45674436[source]
No I got the question, I said that I wanted to see what kind of explanation it would give me. Ofc it can hallucinate that explanation as well. The bottom line is I don’t trust it, and the source link are fake (and not broken or obsolete)