AI assistants misrepresent news content 45% of the time

1. visarga ◴[22 Oct 25 14:27 UTC] No.45669657[source]▶

I recently tried to get Gemini to collect fresh news and show them to me, and instead of using search it hallucinated everything wholesale, titles, abstracts and links. Not just once, multiple times. I am kind of afraid of using Gemini now for anything related to web search.

Here is a sample:

> [1] Google DeepMind and Harvard researchers propose a new method for testing the ‘theory of mind’ of LLMs - Researchers have introduced a novel framework for evaluating the "theory of mind" capabilities in large language models. Rather than relying on traditional false-belief tasks, this new method assesses an LLM’s ability to infer the mental states of other agents (including other LLMs) within complex social scenarios. It provides a more nuanced benchmark for understanding if these systems are merely mimicking theory of mind through pattern recognition or developing a more robust, generalizable model of other minds. This directly provides material for the construct_metaphysics position by offering a new empirical tool to stress-test the computational foundations of consciousness-related phenomena.

> https://venturebeat.com/ai/google-deepmind-and-harvard-resea...

The link does not work, the title is not found in Google Search either.

replies(8): >>45669725 #>>45670064 #>>45670405 #>>45670834 #>>45671889 #>>45673663 #>>45676497 #>>45678588 #

2. luckydata ◴[22 Oct 25 14:32 UTC] No.45669725[source]▶

>>45669657 (TP) #

Gemini is notoriously bad at tool calling and it's also widely speculated that 3.0 will put an emphasis on fixing that.

3. wat10000 ◴[22 Oct 25 14:54 UTC] No.45670064[source]▶

>>45669657 (TP) #

They can be good for search, but you must click through the provided links and verify that they actually say what it says they do.

replies(2): >>45670239 #>>45670408 #

4. bloppe ◴[22 Oct 25 15:04 UTC] No.45670239[source]▶

>>45670064 #

The problem is that 90% of people will not do that once they've satisfied their confirmation bias. Hard to say if that's going to be better or worse than the current echo chamber effects of the Internet. I'm still holding out for better, but certainly this is shaking that assumption

replies(1): >>45674753 #

5. Yizahi ◴[22 Oct 25 15:13 UTC] No.45670405[source]▶

>>45669657 (TP) #

But LLM can't collect anything. It can generate the most likely characters in a row. What exactly did you expect from it?

replies(2): >>45670781 #>>45671984 #

6. reaperducer ◴[22 Oct 25 15:13 UTC] No.45670408[source]▶

>>45670064 #

They can be good for search, but you must click through the provided links and verify that they actually say what it says they do.

Then they're not very good at search.

It's like saying the proverbial million monkeys at typewriters are good at search because eventually they type something right.

replies(1): >>45671376 #

7. layer8 ◴[22 Oct 25 15:39 UTC] No.45670781[source]▶

>>45670405 #

Current LLM offerings use realtime web search to collect information and answer questions.

8. HWR_14 ◴[22 Oct 25 15:42 UTC] No.45670834[source]▶

>>45669657 (TP) #

Why would you want Gemini to do this instead of just going to a news site (or several news sites) and reading what the headlines they wrote?

replies(2): >>45672488 #>>45674954 #

9. wat10000 ◴[22 Oct 25 16:14 UTC] No.45671376{3}[source]▶

>>45670408 #

Huh? All the classic search engines required you to click through the results and read them. There's nothing wrong with that. What's different is that LLMs will give you a summary that might make you think you can get away with not clicking through anymore. This is a mistake. But that doesn't mean that the search itself is bad. I've had plenty of cases where an LLM gave me incorrect summaries of search results, and plenty of cases where it found stuff I had a hard time finding on my own because it was better at figuring out what to search for.

10. mckngbrd ◴[22 Oct 25 16:51 UTC] No.45671889[source]▶

>>45669657 (TP) #

What version of Gemini were you using? i.e. were you calling it locally via the API or thru their Gemini or AI Studio web apps?

Not every LLM app has access to web / news search capabilities turned on by default. This makes a huge difference in what kind of results you should expect. Of course, the AI should be aware that it doesn't have access to web / news search, and it should tell you as much rather than hallucinating fake links. If access to web search was turned on, and it still didn't properly search the web for you, that's a problem as well.

replies(1): >>45672458 #

11. bongodongobob ◴[22 Oct 25 16:57 UTC] No.45671984[source]▶

>>45670405 #

LLMs have been able to search the web for a couple years now.

12. visarga ◴[22 Oct 25 17:35 UTC] No.45672458[source]▶

>>45671889 #

Gemini 2.5 Pro and it was this month, so probably the latest version.

13. visarga ◴[22 Oct 25 17:36 UTC] No.45672488[source]▶

>>45670834 #

I wanted to use the agentic powers of the model to dig for specific kinds of news, and use iterative search as well. I think when LLMs use tools correctly this kind of search is more powerful than simple web search. It also has better semantic capabilities, so in a way I wanted to make my own LLM powered news feed.

replies(2): >>45673296 #>>45674800 #

14. HWR_14 ◴[22 Oct 25 18:34 UTC] No.45673296{3}[source]▶

>>45672488 #

That's makes sense. Thanks for explaining!

15. burnte ◴[22 Oct 25 19:04 UTC] No.45673663[source]▶

>>45669657 (TP) #

About 75% of the time I look at the Gemini answer, it's wrong. Maybe 80%. Sometimes it's a little wrong, like giving the correct answer for another product/item, or the times that a business is open wrong. There's a local business I took my wife to, Gemini told her it's open monday to friday, but it's open tuesday to saturday, so we showed up on a monday to see them closed. But sometimes it's insanely wrong making up dozens of wrong "facts". My wife started looked more carefully now. My boss will even say "Gemini says X so it's probably Y" these days.

16. hunterpayne ◴[22 Oct 25 20:29 UTC] No.45674753{3}[source]▶

>>45670239 #

So this probably is valid. However, so is Gell-Mann amnesia and both phenomena happen a lot. There are topics where one side is the group of people who have attempted to understand a problem and the other side are people who either do not or won't due to emotions. Acting as if it is all confirmation bias feels good but probably isn't the best way to look at the media.

17. SrslyJosh ◴[22 Oct 25 20:33 UTC] No.45674800{3}[source]▶

>>45672488 #

> I wanted to use the agentic powers of the model

Do you have an in-depth understanding of how those "agentic powers" are implemented? If not, you should probably research it yourself. Understanding what's underneath the buzzwords will save you some disappointment in the future.

replies(1): >>45677345 #

18. ModernMech ◴[22 Oct 25 20:48 UTC] No.45674954[source]▶

>>45670834 #

They're selling it as having this ability, so it really doesn't matter what people want. We should be holding these companies to account for selling software that doesn't live up to what they say it does.

19. thebytefairy ◴[22 Oct 25 23:33 UTC] No.45676497[source]▶

>>45669657 (TP) #

I'm not able to reproduce something like this. What prompt were you using? Asking it for today's top news gets it to use Google search and provide valid links.

20. visarga ◴[23 Oct 25 01:55 UTC] No.45677345{4}[source]▶

>>45674800 #

I think I do, I have been in ML for 12 years and followed transformers since their invention. Also been using LLM daily since they appeared, personally.

21. anigbrowl ◴[23 Oct 25 05:54 UTC] No.45678588[source]▶

>>45669657 (TP) #

This isn't something you can work on your own either, as getting any kind of news feed via API (even for local personal use) is almost prohibitively expensive unless you're willing to scrape.