←back to thread

421 points sohkamyung | 8 comments | | HN request time: 0s | source | bottom
1. simonw ◴[] No.45669931[source]
Page 10 onwards of this PDF shows concrete examples of the mistakes: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...

> ChatGPT / CBC / Is Türkiye in the EU?

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(1): >>45670526 #
2. brabel ◴[] No.45670526[source]
It did exist but got removed: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

Quite an omission to not even check for that and it make me think that was done intentionally.

replies(2): >>45670612 #>>45671354 #
3. sharkjacobs ◴[] No.45670612[source]
Removed because it was an AI generated article which cited made up sources.

Hey, that gives me an idea though, subagents which check whether sources cited exist, and create them whole cloth if they don't

replies(2): >>45670734 #>>45670807 #
4. 1899-12-30 ◴[] No.45670734{3}[source]
Or subagents that check each link to see if they verify the actual claims the links are sourced for.
5. jpadkins ◴[] No.45670807{3}[source]
you shouldn't automate what the CIA already does!
6. simonw ◴[] No.45671354[source]
It's probably for the best that chat interfaces avoid making direct HTTP calls to sources at run-time to confirm that they don't 404 - imagine how much extra traffic that could add to an internet ecosystem which is suffering from badly written crawlers already.

(Not to mention plenty of sites have added robots.txt rules deliberately excluding known AI user-agents now.)

replies(1): >>45671648 #
7. magackame ◴[] No.45671648{3}[source]
Wouldn't it be the same amount of requests as a regular person researching something the old way?
replies(1): >>45672729 #
8. simonw ◴[] No.45672729{4}[source]
If you watch the thinking panel in ChatGPT with GPT-5 Thinking it often consults dozens of pages in response to a single prompt.