Obviously, AI isn't an improvement, but people who blindly trust the news have always been credulous rubes. It's just that the alternative is being completely ignorant of the worldviews of everyone around you.
Peer-reviewed science is as close as we can get to good consensus and there's a lot of reasons this doesn't work for reporting.
> 31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions.
> 20% contained major accuracy issues, including hallucinated details and outdated information.
I'm generally against whataboutism, but here I think we absolutely have to compare it to human-written news reports. Famously, Michael Crichton introduced the "Gell-Mann amnesia effect" [0], saying:
> Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.
This has absolutely been my experience. I couldn't find proper figures, but I would put good money on significantly over 45% of articles written in human-written news articles having "at least one significant issue".
https://www.pewresearch.org/journalism/fact-sheet/news-media...
AI summarizes are good for getting a feel of if you want to read an article or not. Even with Kagi News I verify key facts myself.
But, technology also gave us the internet, and social media. Yes, both are used to propagate misinformation, but it also laid bare how bad traditional media was at both a) representing the world competently and b) representing the opinions and views of our neighbors. Manufacturing consent has never been so difficult (or, I suppose, so irrelevant to the actions of the states that claim to represent us).
I think we're on the same side of this, but I just want to say that we can do a lot better. As per studies around the Replication Crisis over the last decade [0], and particularly this 2016 survey conducted by Monya Baker from Nature [1]:
> 1,576 researchers who took a brief online questionnaire on reproducibility found that more than 70% of researchers have tried and failed to reproduce another scientist's experiment results (including 87% of chemists, 77% of biologists, 69% of physicists and engineers, 67% of medical researchers, 64% of earth and environmental scientists, and 62% of all others), and more than half have failed to reproduce their own experiments.
We need to expect better, needing both better incentives and better evaluation, and I think that AI can help with this.
Here is a sample:
> [1] Google DeepMind and Harvard researchers propose a new method for testing the ‘theory of mind’ of LLMs - Researchers have introduced a novel framework for evaluating the "theory of mind" capabilities in large language models. Rather than relying on traditional false-belief tasks, this new method assesses an LLM’s ability to infer the mental states of other agents (including other LLMs) within complex social scenarios. It provides a more nuanced benchmark for understanding if these systems are merely mimicking theory of mind through pattern recognition or developing a more robust, generalizable model of other minds. This directly provides material for the construct_metaphysics position by offering a new empirical tool to stress-test the computational foundations of consciousness-related phenomena.
> https://venturebeat.com/ai/google-deepmind-and-harvard-resea...
The link does not work, the title is not found in Google Search either.
Optimistically that could be extended "twitter-style" by mandatory basic fact checking and reports when they just copy a statement by some politician or misrepresented science stuff (xkcd 1217, X cures cancer), and add the corrections.
But yeah... in my country, with all the 5G-danger craze, we had TV debates with a PhD in telecommunications on one side, and a "building biologist" on the other, so yeah...
However, 79% of Brits trust the BBC as per this chart:
https://legacy.pewresearch.org/wp-content/uploads/sites/2/20...
Regarding scientific reporting, there's as usual a relevant xkcd ("New Study") [0], and in this case even better, there's a fabulous one from PhD Comics ("Science News Cycle") [1].
I've felt it myself. Recently I was looking as some documentation without a clear edit history. I thought about feeding it into an AI and having it generate one for me, but didn't because I didn't have the time. To think, if I had done that, it probably would have generated a perfectly acceptable edit history but one that would have obscured what changes were actually made. I wouldn't just lack knowledge (like I do now) I would have obtained anti knowledge.
If that is the case with a task so simple, why would we rely on these tools for high risk applications like medical diagnosis or analyzing financial data?
Or against people in general.
It's a pet peeve of mine that we get these kinds of articles without a baseline established of how people do on the same measure.
Is misrepresenting news content 45% of the time better or worse than the average person? I don't know.
By extension: Would a person using an AI assistant misrepresent news more or less after having read a summary of the news provided by an AI assistant? I don't know that either.
When they have a "Why this distortion matters" section, those things matter. They've not established if this will make things better or worse.
(the cynic in me want another question answered too: How often does reporters misrepresent the news? Would it be better or worse if AI reviewed the facts and presented them vs. letting reporters do it? again: no idea)
It's also not clear if humans do better when consuming either, and whether the effect of an AI summary, even with substantial issues, is to make the human reading them better or worse informed.
E.g. if it helps a person digest more material by getting more focused reports, it's entirely possible that flawed summaries would still in aggregate lead to a better understanding of a subject.
On its own, this article is just pure sensationalism.
Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.
This article contains significant issues.
> ChatGPT / CBC / Is Türkiye in the EU?
> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.
Why stop at what humans can do? AND to not be fettered by any expectations of accuracy, or even feasibility of retractions.
Truly, efficiency unbound.
No... the problem is that it cites Wikipedia articles that don't exist.
> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.
From first hand experience -> secondary sources -> journalist regurgitation -> editorial changes
This is just another layer. Doesn't make it right, but we could do the same analysis with articles that mainstream news publishes (and it has been done, GroundNews looks to be a productized version of this)
Its very interesting when I see people I know personally, or YouTubers with small audiences get even local news/newspaper coverage. If its something potentially damning, nearly all cases have pieces of misrepresentation that either go unaccounted for, or a revision months later after the reputational damage is done.
Many veterans see the same for war reporting, spins/details omitted or changed. Its just now BBC sees an existential threat with AI doing their job for them. Hopefully in a few years more accurately.
> ChatGPT / Radio-Canada / Is Trump starting a trade war? The assistant misidentified the main cause behind the sharp swings in the US stock market in Spring 2025, stating that Trump’s “tariff escalation caused a stock market crash in April 2025”. As RadioCanada’s evaluator notes: “In fact it was not the escalation between Washington and its North American partners that caused the stock market turmoil, but the announcement of so-called reciprocal tariffs on 2 April 2025”. ----
> Perplexity / LRT / How long has Putin been president? The assistant states that Putin has been president for 25 years. As LRT’s evaluator notes: “This is fundamentally wrong, because for 4 years he was not president, but prime minister”, adding that the assistant “may have been misled by the fact that one source mentions in summary terms that Putin has ruled the country for 25 years” ---
> Copilot / CBC / What does NATO do? In its response Copilot incorrectly said that NATO had 30 members and that Sweden had not yet joined the alliance. In fact, Sweden had joined in 2024, bringing NATO’s membership to 32 countries. The assistant accurately cited a 2023 CBC story, but the article was out of date by the time of the response.
---
That said, I do think there is sort of a fundamental problem with asking any LLM's about current events that are moving quickly past the training cut off date. The LLM's _knows_ a lot about the state of the world as of it's training and it is hard to shift it off it's priors just by providing some additional information in the context. Try asking chatgpt about sports in particular. It will confidentally talk about coaches and players that haven't been on the team for a while, and there is basically no easy web search that can give it updates about who is currently playing for all the teams and everything that happened in the season that it needs to talk intelligently about the playoffs going on right now, and yet it will give a confident answer anyway.
This even more true and with even higher stakes about politics. Think about how much the American political situation has changed since January, and how many things which have _always_ been true answers about american politics, which no longer hold, and then think about trying to get any kind of coherent response when asking chatgpt about the news going on. It gives quite idiotic answers about politics quite frequently now.
Also, is attributing, without any citation, ChatGPT's preference for Wikipedia to a reprisal to an active lawsuit a significant issue? Or do the authors get off scot-free because they caged it in "we don't know, but maybe it's the case"?
You just give up on uneconomical efforts at accuracy and you sell narratives that work for one political party or the other.
It is a model that has been taken up world over. It just works. “The world is too complex to explain, so why bother?”
And what will you or me do about it? Subscribe to the NYT? Most of us would rather spend that money on a GenAI subscription because that is bucketed differently in our heads.
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...
Right. Let's talk about statistics for a bit. Or let's put it differently: they found in their report that 45% of the answers for 30 questions they have "developed" had a significant issue, e.g. inexisting reference
I'll give you 30 questions out of my sleeve where 95% of the answers will not have any significant issue.
https://www.bbc.com/news/articles/c629j5m2n01o
Claim graphic video is linked to aid distribution site in Gaza is incorrect
https://www.bbc.com/news/live/ceqgvwyjjg8t?post=asset%3A35f5...
BBC ‘breached guidelines 1,500 times’ over Israel-Hamas war:
https://www.telegraph.co.uk/news/2024/09/07/bbc-breached-gui...
Never share information about an article you have not read. Likewise, never draw definitive conclusions from an article that is not of interest.
If you do not find a headline interesting, the take away is that you did not find the headline interesting. Nothing more, nothing less. You should read the key insights before dismissing an article entirely.
I can imagine AI summarizes being problematic for a class of people that do not cross check if an article is of value to them.
You can go through most big name media stories and find it ridden with omissions of uncomfortable facts, careful structuring of words to give the illusion of untrue facts being true, and careful curation of what stories are reported.
More than anything, I hope AI topples the garbage bin fire that is modern "journalism". Also, it should be very clear why the media is especially hostile towards AI. It might reveal them as the clowns they are, and kill the social division and controversy that is their lifeblood.
First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.
On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.
You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.
We're in a weird time. It's always been like this, it's just much.. more, now. I'm not sure how we'll adapt.
I have seen a few cases before of "hallucinations" that turned out to be things that did exist, but no longer do.
I don’t have a personal human news summarizer?
The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.
> cynic in me want another question answered too: How often does reporters misrepresent the news?
The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.
I've been thinking about the state of our media, and the crisis of trust in news began long before AI.
We have a huge issue, and the problem is with the producers and the platform.
I'm not talking about professional journalists who make an honest mistake, own up to it with a retraction, and apologize. I’m talking about something far more damaging: the rise of false journalists, who are partisan political activists whose primary goal is to push a deliberately misleading or false narrative.
We often hear the classic remedy for bad speech: more speech, not censorship. The idea is that good arguments will naturally defeat bad ones in the marketplace of ideas.
Here's the trap: these provocateurs create content that is so outrageously or demonstrably false that it generates massive engagement. People are trying to fix their bad speech with more speech. And the algorithm mistakes this chaotic engagement for value.
As a result, the algorithm pushes the train wreck to the forefront. The genuinely good journalists get drowned out. They are ignored by the algorithm because measured, factual reporting simply doesn't generate the same volatile reaction.
The false journalists, meanwhile, see their soaring popularity and assume it's because their "point" is correct and it's those 'evil nazis from the far right who are wrong'. In reality, they're not popular because they're insightful; they're popular because they're a train wreck. We're all rubbernecking at the disaster and the system is rewarding them for crashing the integrity of our information.
Some very recent discussions on HN:
https://github.com/vectara/hallucination-leaderboard
If the figures on this leaderboard are to be trusted, many frontier and near-frontier models are already better than the median white-collar worker in this aspect.
Note: The leaderboard doesn't cover tool calling, to be clear.
Then they're not very good at search.
It's like saying the proverbial million monkeys at typewriters are good at search because eventually they type something right.
Who cares if AI does a good job representing the source, when the source is crap?
Or is it, 55% of the time the accuracy is in line with the baseline news error, since certainly not all news articles are 100% accurate to begin with.
A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.
So the min max and median are at 0.
Quite an omission to not even check for that and it make me think that was done intentionally.
And the worst part about the people unironically thinking they can use it for "research" is, that it essentially supercharges confirmation bias.
The inefficient sidequests you do while researching is generally what actually gives you the ability to really reason about a topic.
If you instead just laser focus on the tidbits you prompted with... Well, your opinion is a lot less grounded.
They don't say what models they were actually using though, so it could be nano models that they asked. They also don't outline the structure of the tests. It seems rigor here was pretty low. Which frankly comes off a bit like...misrepresentation.
Edit: They do some outlining in the appendix of the study. They used GPT-4o, 2.5 flash, default free copilot, and default free perplexity.
So they used light weight and/or old models.
[1]https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...
Hey, that gives me an idea though, subagents which check whether sources cited exist, and create them whole cloth if they don't
I scan the top stories of the day at various news websites. I then go to an LLM (either Gemini or ChatGPT) and ask it to figure out the core issues, the LLM thinks for a while searches a ton of topics and outputs a fantastic analysis of what is happening and what are the base issues. I can follow up and repeat the process.
The analysis is almost entirely fact based and very well reasoned.
It's fantastic and if I was the BBC I would indeed know that the world is changing under their feet and I would strike back in any dishonest way that I could.
Now, who is responsible for poor prompting?
Maybe the LLM models will just tighten up this part of their models and assistants and suddenly it looks solved.
Why else would we be giving high school diplomas to people who can't read at a 5th grade level? Or offshore call center jobs to people who have poor English skills?
That seems to be the real challenge with AI for this use case. It has no real critical thinking skills, so it's not really competent to choose reliable sources. So instead we're lowering the bar to just asking that the sources actually exist. I really hate that. We shouldn't be lowering intellectual standards to meet AI where it's at. These intellectual standards are important and hard-won, and we need to be demanding that AI be the one to rise to meet them.
This is a hit piece by a media brand that's either feeling threatened or is just incompetent. Or both.
Pre prompting to cite sources is obviously a better way of going about things.
I do sales meetings all day every day, and I've tried different AI note takers that send a summary of the meeting afterwards. I skim them when they get dumped into my CRM and they're almost always quite accurate. And I can verify it, because I was in the meeting.
It's just a misuse of the tools to present LLM's summaries to people without a _lot_ of caveats about it's accuracy. I don't think they belong _anywhere_ near a legitimate news source.
My primary point about calling out those mistakes is that those are the kinds of minor mistakes in a summary that I would find quite tolerable and expected in my own use of LLMs, but I know what I am getting into when I use them. Just chucking those LLM generated summaries next to search results is malpractice, though.
I think the primary point of friction in a lot of critiques between people who find LLMs useful and people who hate AI usage is this:
People who use AI to generate content for consumption by others are being quite irresponsible in how it is presented, and are using it to replace human work that it is totally unsuitable for. A news organization that is putting out AI generated articles and summaries should just close up shop. They're producing totally valueless work. If I wanted chatgpt to summarize something, I could ask it myself in 20 seconds.
People who use AI for _themselves_ are more aware of what they are getting into, know the provenance, and aren't presenting it for others as their own work necessarily. This is more valuable economically, because getting someone to summarize something for you as an individual is quite expensive and time consuming, and even if the end results is quite shoddy, it's often better than nothing. This also goes for generating dumb videos on Sora or whatever or AI generated music for yourself to listen to or send to a few friends.
Not to mention, the AI companies have been extremely abusive to the rest of the internet so they are often blocked from accessing various web sites, so it's not like they're going to be able to access legitimate information anyways.
"I contend we are both atheists, I just believe in one fewer god than you do. When you understand why you dismiss all the other possible gods, you will understand why I dismiss yours." - Stephen F Roberts
Neither is my bucket of 30 questions statistcally significant but it goes to say that I can disprove their hypothesis just by giving them my sample.
I think that the report is being disingenious and I don't understand for what reasons. it's funny that they say "misrepresent" when that's exactly what they are doing.
It definitely has a issues in the detail, but if you're only skimming the result for headlines it's perfectly fine. e.g. Pakistan and Afghanistan are shooting at each other. I wouldn't trust it to understand the tribal nuances behind why, but the key fact is there.
[One exception is economic indicators, especially forward looking trends stuff in say logistics. Don't know precisely why but it really can't do it..completely hopeless]
Perhaps the real bias was inside us the whole time.
Disclaimer: Started my career in onine journalism/aggregation. Hada 4 week internship with the dpa online daughter some 16 years ago.
How does that compare to the number for reporters? I feel like half the time I read or hear a report on a subject I know the reporter misrepresented something.
A recent Kurzgesagt goes into the dangers of this, and they found the same thing happening with a concrete example: They were researching a topic, tried using LLMs, found they weren't accurate enough and hallucinated, so they continued doing things the manual way. Then some weeks/months later, they noticed a bunch of YouTube videos that had the very hallucinations they were avoiding, and now their own AI assistants started to use those as sources. Paraphrased/remembered by me, could have some inconsistencies/hallucinations.
(Not to mention plenty of sites have added robots.txt rules deliberately excluding known AI user-agents now.)
Like for a study like this I expect as a bare minimum clearly stated model variants used, R@k recall numbers measuring retrieval and something like BLEU or ROUGE to measure summarization accuracy against some baseline on top of their human evaluation metrics. If this is useless for the field itself, I don't understand how this can be useful for anyone outside the field?
The command I ran was `curl -s https://r.jina.ai/https://www.lawfaremedia.org/article/anna-... | cb | ai -m gpt-5-mini summarize this article in one paragraph`. r.jina.ai pulls the text as markdown, and cb just wraps in a ``` code fence, and ai is my own LLM CLI https://github.com/david-crespo/llm-cli.
All of them seem pretty good to me, though at 6 cents the regular use of Sonnet for this purpose would be excessive. Note that reasoning was on the default setting in each case. I think that means the gpt-5 mini one did no reasoning but the other two did.
GPT-5 one paragraph: https://gist.github.com/david-crespo/f2df300ca519c336f9e1953...
GPT-5 three paragraphs: https://gist.github.com/david-crespo/d68f1afaeafdb68771f5103...
GPT-5 mini one paragraph: https://gist.github.com/david-crespo/32512515acc4832f47c3a90...
GPT-5 mini three paragraphs: https://gist.github.com/david-crespo/ed68f09cb70821cffccbf6c...
Sonnet 4.5 one paragraph: https://gist.github.com/david-crespo/e565a82d38699a5bdea4411...
Sonnet 4.5 three paragraphs: https://gist.github.com/david-crespo/2207d8efcc97d754b7d9bf4...
Not a personal one. You do however have reporters sitting between you and the source material a lot of the time, and sometimes multiple levels of reporters playing games of telephone with the source material.
> The comparison is between a human reading the primary source against the same human reading an LLM hallucination mixed with an LLM referring the primary source.
In modern news reporting, a fairly substantial proportion of what we digest is not primary sources. It's not at all clear whether an LLM summarising primary sources would be better or worse than reading a reporter passing on primary sources. And in fact, in many cases the news is not even secondary sources - e.g. a wire service report on primary sources getting rewritten by a reporter is not uncommon.
> The fact that you mark as cynical a question answered pretty reliably for most countries sort of tanks the point.
It's a cynical point within the context of this article to point out that it is meaningless to report on the accuracy of AI in isolation because it's not clear that human reporting is better for us. I find it kinda funny that you dismiss this here, after having downplayed the games of telephone that news reporting often is earlier in your reply, thereby making it quite clear I am in fact being a lot more cynical than you about it.
It's not bad when they use the Internet at generation time to verify the output.
Yep.
Including, if not especially, the ones actively worked on by the most active contributors.
The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.
I feel like that’s “the majority of people” or at least “a large enough group for it to be a societal problem”.
Rarely is this an issue with SOTA models like Sonnet-4.5, Opus-4.1, GPT-5-Thinking or better, etc. But that's expensive, so all the companies use cut-rate models or non-existent TTC to save on cost and to go faster.
For example, having a single central arbiter of source bias is inescapably the most biased thing you could possibly do. Bias has to be defined within an intellectual paradigm. So you'd have to choose a paradigm to use for that bias evaluation, and de facto declare it to be the one true paradigm for this purpose. But intellectual paradigms are inherently subjective, so doing that is pretty much the most intellectually biased thing you can possibly do.
I would expect this isn't the on-off switch they conceptualized, but I don't know enough about how different LLM providers handle news search and retrieval to say for sure.
Not every LLM app has access to web / news search capabilities turned on by default. This makes a huge difference in what kind of results you should expect. Of course, the AI should be aware that it doesn't have access to web / news search, and it should tell you as much rather than hallucinating fake links. If access to web search was turned on, and it still didn't properly search the web for you, that's a problem as well.
But even if we concede that to be true, it doesn’t change the fact that LLMs are misrepresenting the text they’ve been given half the time. Which means the information is degraded further. Which is worse.
I guess I don’t exactly understand the point you’re trying to make.
One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.
You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."
There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.
None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.
>...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.
smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.
Your answer:
> ...I wanted to know if it was old link that broke or changed but no apparently
Leads me to believe that you misunderstood smrq's question.
If you are a news organization and you want a reliable summary for an article, you should write it! You have writers available and should use them. This isn't a case where "better-than-nothing" applies, because "nothing" isn't your other option.
If you are an individual who wants a quick summary of something, then you don't have readers and writers on call to do that for you, and chatgpt takes a few seconds of your time and pennies to do a mediocre job.
I don't know If i can agree with that. I think we make an error when we aggregate news in the way we do. We claim that "the right wing media" says something when a single outlet associated with the right says a thing, and vice versa. That's not how I enjoy reading the news. I have a couple of newspapers I like reading, and I follow the arguments they make. I don't agree with what they say half the time, but I enjoy their perspective. I get a sense of the "editorial personality" of the paper. When we aggregate the news, we don't get that sense, because there's no editorial. I think that makes the news poorer, and I think it makes people's views of what newspapers can be poorer.
The news shouldn't a stream of happenings. The newspaper is best when it's a coherent day-to-day conversation. Like a pen-pal you don't respond to.
In the eyes of the evangelists, every major model seems to go from "This model is close to flawless at this task, you MUST try this TODAY" to "It's absolutely wild that anyone would ever consider using such a no-good, worthless model for this task" over the course of a year or so. The old model has to be re-framed for the new model to look more impressive.
When GPT-4 was released I was told it was basically a senior-level developer, now it's an obviously worthless model that you'd be a fool to use to write so much as a throwaway script.
=Why?= The PDF is something that can appeal to anyone who is simply striving to have slower, deeper conversations about AI and the news.
=Frustration= No matter where you land on AI, it seems to me most of us are tired of various framings and exaggerations in the news. Not the same ones, because we often disagree! We feel divided.
=The Toolkit= The European Broadcasting Union (EBU) and BBC have laid out their criteria in this report "News Integrity in AI Assistants Toolkit" [1] IMO, it is the hidden gem from the whole article.
- Let me get the obvious flaws out of the way. (1) Yes, it is a PDF. (2) It is nothing like a software toolkit. (3) It uses the word taxonomy, which conjures brittle and arbitrary tree classification systems -- or worse, the unspeakable horror of ontology and the lurking apparently-unkillable hydra that is the Semantic Web.
- But there are advantages too. With a PDF, you can read it without ads or endless scrolling. This PDF is clear. It probably won't get you riled up in a useless way. It might even give you some ideas of what you can do to improve your own news consumption or make better products.
All in all, this is a PDF I would share with almost anyone (who reads English). I like that it is dry, detailed, and, yes a little boring.
[1]: https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-i...
POSIWID
Similarly I've had PMs blindly copy/paste summaries into larger project notes and ultimately create tickets based on either a misunderstanding from the LLM or a straight-up hallucination. I've repeatedly had conversations where a PM asks "when do you think Xyz will be finished?" only for me to have to ask in response "where and when did we even discuss Xyz? I'm not even sure what Xyz means in this context, so clarification would help." Only to have them just decide to delete the ticket/bullet etc. once they realize they never bothered to sanity check what they were pasting.
What I’m saying is that there should be a disclaimer: hey, we’re testing these models for the average person, that have no idea about AI. People who actually know AI would never use them in this way.
A better idea: educate people. Add “Here’s the best way to use them btw…” to the report.
All I’m saying is, it’s a tool, and yes you can use it wrong. That’s not a crazy realization. It applies to every other tool.
We knew that the hallucation rate for gpt 4o was nuts. From the start. We also know that gpt-5 has a much lower hallucination rate. So there are no surprises here, I’m not saying anything groundbreaking, and neither are they.
In cases where a reporter is just summarising e.g. a court case, sure. Stock market news has been automated since the 2000s.
More broadly, AI assistants misrepresenting news content may sometimes direct reference a court case. But they often don't. Even if they only could, that covers a small fraction of the news, much of which the AI will need to rely on reporters detailing the primary sources they're interfacing with.
Reporter error is somewhat orthogonal to AI assistants' accuracy.
They want to believe that statistical distribution of meaningless tokens is real cognition of machines and if not that, works flawlessly for most of the cases and if not flawlessly, is usable enough to be valued at trillions of dollars collectively.
IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.
Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.
It feels accidental, but it's definitely amusing that the models themselves are aping this ethos.
The ways they fail are often surprising if your baseline is “these are thinking machines”. If your baseline is what I wrote above (say, because you read the “Attention Is All You Need” paper) none of it’s surprising.
However, that is currently not reflected in electoral politics or the media. The farthest left are currently the Greens, at best centre-left. On the right and far right there are Tories and Reform.
Agreed, it's generally quite accurate. I find for hectic meetings, it can get some things wrong... But the notes are generally still higher quality than human generated notes.
Is it perfect? No. Is it good enough? IMO absolutely.
Similar to many other things, the key is that you don't just blindly trust it. Have the LLM take notes and summarize, and then _proofread_ them, just as you would if you were writing them yourself...
I've seen a certain sensationalist news source write a story that went like this.
Site A: Bad thing is happening, cite: article Site B
* follow the source *
Site B: Bad thing is happening, cite different article on Site A
* follow the source *
Site A: Bad thing is happening, no citation.
I fear that's the current state of a large news bubble that many people subscribe to. And when these sensationalist stories start circulating there's a natural human tendency to exaggerate.
I don't think AI has any sort of real good defense to this sort of thing. 1 level of citation is already hard enough. Recognizing that it is citing the same source is hard enough.
There was another example from the Kagi news stuff which exemplified this. A whole article written which made 3 citations that were ultimately spawned from the same new briefing published by different outlets.
I've even seen an example of a national political leader who fell for the same sort of sensationalization. One who should have known better. They repeated what was later found to be a lie by a well-known liar but added that "I've seen the photos in a classified debriefing". IDK that it was necessarily even malicious, I think people are just really bad at separating credible from uncredible information and that it ultimately blends together as one thing (certainly doesn't help with ancient politicians).
Probably the most impactful "easy A" class I had in college.
>people are convinced that language models, or specifically chat-based language models, are intelligent... But there isn’t any mechanism inherent in large language models (LLMs) that would seem to enable this...
and says it must be a con but then how come they pass most of the exams designed to test humans better than humans do?
And there are mechanisms like transformers that may do something like human intelligence.
Still not enough as I find the LLM will not summarize all the relevant facts, sometimes leaving out the most salient ones. Maybe you'll get a summary of some facts, maybe the ones you explicitly ask for, but you'll be left wondering if the LLM is leaving out important information.
I have one example that I check periodically just to see if anybody else has noticed. I've been checking it for several years and it's still there; the SDI page claims that Brilliant Pebbles was designed to use "watermelon sized" tungsten projectiles. This is completely made up; whoever wrote it up was probably confusing "rods from god" proposals that commonly use tungsten and synthesizing that confusion with "pebbles". The sentence is cited but the sources don't back it up. It's been up like this for years. This error has been repeated on many websites now, all post-dating the change on wikipedia.
If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.
I was on my highschool's radio station, part of the broadcast media curriculum. It was awesome.
That early experience erased any esteem I had for mass media. (Much as I loved the actual work.)
We got to visit local stations, job shadow, produce content for public access cable, make commercials, etc. Meet and interview adults.
We also talked with former students, who managed to break into the industry.
Since it was a voc tech program, there was no mention of McLuhan, Chomsky, Postman, or any media criticism of any kind.
I learned that stuff much later. Yet somehow I was able to intuit the rotten core of our media hellscape.
My own mental model (condensed to a single phrase) is that LLMs are extremely convincing (on the surface) autocomplete. So far, this model has not disappointed me.
Do you have an in-depth understanding of how those "agentic powers" are implemented? If not, you should probably research it yourself. Understanding what's underneath the buzzwords will save you some disappointment in the future.
Result was just trash. It would do exactly as you say: condense the information, but there was no semblance of "summary". It would just choose random phrases or keywords from the release notes and string them together, but it had no meaning or clarity, it just seemed garbled.
And it's not for lack of trying; I tried to get a suitable result out of the AI well past the amount of time it would have taken me to summarize it myself.
The more I use these tools the more I feel their best use case is still advanced autocomplete.
Much of this is probably solvable, but it very much not solved.
Is this not the editorial board and journalist? I'm not sure what the gripe is here.
It is not at all. Journalists are wrong all the time, but you still treat news like record and not a sample. In fact I'd put money that AI mischaracterizes events at a LOWER rate than AI does: narratives shift over time, and journalists are more likely to succumb to this shift.
Maybe we complained with enough concrete examples of how absolute shit editors and summarizers are now.
Note that people who write academic papers are quite far from the median white-collar worker.
Imagine if this was the ethos regarding open source software projects. Imaging Microsoft saying 20 years ago, "Linux has this and that bug, but you're not allowed to go fix it because that detracts from our criticism of open source." (Actually, I wouldn't be surprised if Microsoft or similar detractors literally said this.)
Of course Wikipedia has wrong information. Most open source software projects, even the best, have buggy, shite code. But these things are better understood not as products, but as processes, and in many (but not all) contexts the product at any point in time has generally proven, in a broad sense, to outperform their cathedral alternatives. But the process breaks down when pervasive cynicism and nihilism reduce the number of well-intentioned people who positively engage and contribute, rather than complain from the sidelines. Then we land right back to square 0. And maybe you're too young to remember what the world was like at square 0, but it sucked in terms of knowledge accessibility, notwithstanding the small number of outstanding resources--but which were often inaccessible because of cost or other barriers.
Straw man. Everyone educated constantly argues over sourcing.
> I'd put money that AI mischaracterizes events at a LOWER rate than AI does
Maybe it does. But an AI sourcing journalists is demonstrably worse. Source: TFA.
> narratives shift over time, and journalists are more likely to succumb to this shift
Lol, we’ve already forgotten about MechaHitler.
At the end of the day, a lot of people consume news to be entertained. They’re better served by AI. The risk is folks of consequence start doing that, at which point I suppose the system self resolves by making them, in the long run, of no consequence compared to those who own and control the AI.
These grifters simply were not attracted to these gigs in these quantities prior to AI, but now the market incentives have changed. Should we "blame" the technology for its abuse? I think AI is incredible, but market endorsement is different from intellectual admiration.
statistical distribution of meaningless tokens As a aside note the biggest argument for the possibility of machine consciousness is the depressing fact that so many humans are uncritical bullshit spreaders themselves.
I urge everyone to read Harry Frankfurt's short essay On Bullshit: https://www2.csudh.edu/ccauthen/576f12/frankfurt__harry_-_on...
> 45% of responses contained at least one meaningful error. Sourcing [...] is 31%, followed by accuracy 20%
And you can see the reason they think this is important on the second page just after the summary.
> More than 1 in 3 (35%) of UK adults instinctively agree the news source should be held responsible for errors in AI-generated news
So of course the BBC cares that Googles summary said that the BBC cites pornhub when talking about domestic abuse (when they didn't), because a large portion of people blame them for the fact that a significant amount of AI generated crap is wrong.