AI assistants misrepresent news content 45% of the time

1. iainctduncan ◴[22 Oct 25 15:44 UTC] No.45670881[source]▶

I'm curious how many people have actually taken the time to compare AI summaries with sources they summarize. I did for a few and ... it was really bad. In my experience, they don't summarize at all, they do a random condensation.. not the same thing at all. In one instance I looked at the result was a key takeaway being the opposite of what it should have been. I don't trust them at all now.

replies(10): >>45671039 #>>45671541 #>>45671813 #>>45672108 #>>45672572 #>>45672678 #>>45673123 #>>45674739 #>>45674888 #>>45675283 #

2. staindk ◴[22 Oct 25 15:53 UTC] No.45671039[source]▶

>>45670881 (TP) #

Kind of related to this - we meet with Google Meets and have its Gemini Notes feature enabled globally. I realised last week that the summary notes it generates puts such a positive spin on everything that it's pretty useless to refer back to after a somewhat critical/negative meeting. It will solely focus on the positives that were discussed - at least that's what it seems like to me.

3. dcre ◴[22 Oct 25 16:25 UTC] No.45671541[source]▶

>>45670881 (TP) #

In my experience there is a big difference between good models and weak ones. Quick test with this long article I read recently: https://www.lawfaremedia.org/article/anna--lindsey-halligan-...

The command I ran was `curl -s https://r.jina.ai/https://www.lawfaremedia.org/article/anna-... | cb | ai -m gpt-5-mini summarize this article in one paragraph`. r.jina.ai pulls the text as markdown, and cb just wraps in a ``` code fence, and ai is my own LLM CLI https://github.com/david-crespo/llm-cli.

All of them seem pretty good to me, though at 6 cents the regular use of Sonnet for this purpose would be excessive. Note that reasoning was on the default setting in each case. I think that means the gpt-5 mini one did no reasoning but the other two did.

GPT-5 one paragraph: https://gist.github.com/david-crespo/f2df300ca519c336f9e1953...

GPT-5 three paragraphs: https://gist.github.com/david-crespo/d68f1afaeafdb68771f5103...

GPT-5 mini one paragraph: https://gist.github.com/david-crespo/32512515acc4832f47c3a90...

GPT-5 mini three paragraphs: https://gist.github.com/david-crespo/ed68f09cb70821cffccbf6c...

Sonnet 4.5 one paragraph: https://gist.github.com/david-crespo/e565a82d38699a5bdea4411...

Sonnet 4.5 three paragraphs: https://gist.github.com/david-crespo/2207d8efcc97d754b7d9bf4...

4. icelancer ◴[22 Oct 25 16:46 UTC] No.45671813[source]▶

>>45670881 (TP) #

I've found this mostly to be the case when using lightweight open source models or mini models.

Rarely is this an issue with SOTA models like Sonnet-4.5, Opus-4.1, GPT-5-Thinking or better, etc. But that's expensive, so all the companies use cut-rate models or non-existent TTC to save on cost and to go faster.

5. coffeebeqn ◴[22 Oct 25 17:07 UTC] No.45672108[source]▶

>>45670881 (TP) #

I’ve been looking at the Gemini call summaries and they almost always have at least one serious issue. Just yesterday Gemini claimed we had decided on something we had not. That was probably the most important detail and it got it completely backwards. Worse than useless

replies(1): >>45672665 #

6. Scubabear68 ◴[22 Oct 25 17:42 UTC] No.45672572[source]▶

>>45670881 (TP) #

Random condensation is a great way to put it. This is exactly what I see particularly in email and text summaries, they do not capture the gist of the message but instead just pull out random phrases that 99.9% of the time are not the gist at all. I have learned to completely ignore them.

7. roadside_picnic ◴[22 Oct 25 17:48 UTC] No.45672665[source]▶

>>45672108 #

I used to be a bit nervous about Gemini recording every call. Sometimes when there was a major disagreement I would review the summaries to make sure I didn't say anything I shouldn't have only to find an arbitrary, unrelated bullet point attributed to me. I quickly realized there was nothing to worry about.

Similarly I've had PMs blindly copy/paste summaries into larger project notes and ultimately create tickets based on either a misunderstanding from the LLM or a straight-up hallucination. I've repeatedly had conversations where a PM asks "when do you think Xyz will be finished?" only for me to have to ask in response "where and when did we even discuss Xyz? I'm not even sure what Xyz means in this context, so clarification would help." Only to have them just decide to delete the ticket/bullet etc. once they realize they never bothered to sanity check what they were pasting.

8. raffael_de ◴[22 Oct 25 17:49 UTC] No.45672678[source]▶

>>45670881 (TP) #

I'm rarely not at least a little underwhelmed when I source check or read an answer with focus on details. More often than not answers are technically wrong but correct enough to lead me into the right direction.

9. walkabout ◴[22 Oct 25 18:21 UTC] No.45673123[source]▶

>>45670881 (TP) #

They’re basically markov chain text generators with a relevance-tracking-and-correction step. It turns out this is like 100x more useful than the same thing without the correction step, but they don’t really escape what they are “at heart”, if you will.

The ways they fail are often surprising if your baseline is “these are thinking machines”. If your baseline is what I wrote above (say, because you read the “Attention Is All You Need” paper) none of it’s surprising.

replies(1): >>45674782 #

10. pwlm ◴[22 Oct 25 20:28 UTC] No.45674739[source]▶

>>45670881 (TP) #

Sometimes they do random fabrication. I saw one AI cite a paper that didn't exist. Fictitious title, authors, and results.

11. SrslyJosh ◴[22 Oct 25 20:31 UTC] No.45674782[source]▶

>>45673123 #

See also: 3 Blue 1 Brown's fantastic series on deep learning, particularly videos like "Transformers, the tech behind LLMs".

My own mental model (condensed to a single phrase) is that LLMs are extremely convincing (on the surface) autocomplete. So far, this model has not disappointed me.

12. ModernMech ◴[22 Oct 25 20:42 UTC] No.45674888[source]▶

>>45670881 (TP) #

I have just tried doing this. I thought I could take all the release notes for my project over the past year and AI could give a great summary of all the work that had been done, categorize it and organize it. Seems like a good application for AI.

Result was just trash. It would do exactly as you say: condense the information, but there was no semblance of "summary". It would just choose random phrases or keywords from the release notes and string them together, but it had no meaning or clarity, it just seemed garbled.

And it's not for lack of trying; I tried to get a suitable result out of the AI well past the amount of time it would have taken me to summarize it myself.

The more I use these tools the more I feel their best use case is still advanced autocomplete.

13. hamasho ◴[22 Oct 25 21:20 UTC] No.45675283[source]▶

>>45670881 (TP) #

I wonder that's because a lot of news titles are clickbait. If they hallucinate the summary based on what the title may suggest, no wonder they misunderstand half of news articles.

replies(1): >>45675565 #

14. iainctduncan ◴[22 Oct 25 21:45 UTC] No.45675565[source]▶

>>45675283 #

I had the same experience for summaries of private things too. They were just shit!