←back to thread

423 points sohkamyung | 4 comments | | HN request time: 0s | source
Show context
empath75 ◴[] No.45670003[source]
I am reading the actual report and some of this seems _quite_ nitpicky:

> ChatGPT / Radio-Canada / Is Trump starting a trade war? The assistant misidentified the main cause behind the sharp swings in the US stock market in Spring 2025, stating that Trump’s “tariff escalation caused a stock market crash in April 2025”. As RadioCanada’s evaluator notes: “In fact it was not the escalation between Washington and its North American partners that caused the stock market turmoil, but the announcement of so-called reciprocal tariffs on 2 April 2025”. ----

> Perplexity / LRT / How long has Putin been president? The assistant states that Putin has been president for 25 years. As LRT’s evaluator notes: “This is fundamentally wrong, because for 4 years he was not president, but prime minister”, adding that the assistant “may have been misled by the fact that one source mentions in summary terms that Putin has ruled the country for 25 years” ---

> Copilot / CBC / What does NATO do? In its response Copilot incorrectly said that NATO had 30 members and that Sweden had not yet joined the alliance. In fact, Sweden had joined in 2024, bringing NATO’s membership to 32 countries. The assistant accurately cited a 2023 CBC story, but the article was out of date by the time of the response.

---

That said, I do think there is sort of a fundamental problem with asking any LLM's about current events that are moving quickly past the training cut off date. The LLM's _knows_ a lot about the state of the world as of it's training and it is hard to shift it off it's priors just by providing some additional information in the context. Try asking chatgpt about sports in particular. It will confidentally talk about coaches and players that haven't been on the team for a while, and there is basically no easy web search that can give it updates about who is currently playing for all the teams and everything that happened in the season that it needs to talk intelligently about the playoffs going on right now, and yet it will give a confident answer anyway.

This even more true and with even higher stakes about politics. Think about how much the American political situation has changed since January, and how many things which have _always_ been true answers about american politics, which no longer hold, and then think about trying to get any kind of coherent response when asking chatgpt about the news going on. It gives quite idiotic answers about politics quite frequently now.

replies(1): >>45670320 #
1. wat10000 ◴[] No.45670320[source]
That may be nitpicky, but I don't think it's too much to ask that a computer system be fully factually accurate when it comes to basic objective numerical facts. This is very much a case of, "if it gets this stuff wrong, what else is it getting wrong?"
replies(1): >>45670894 #
2. empath75 ◴[] No.45670894[source]
It is in fact too much to expect that an LLM get fine details correct because it is by design quite fuzzy and non-deterministic. It's like trying to paint the Mona Lisa with a paint roller.

It's just a misuse of the tools to present LLM's summaries to people without a _lot_ of caveats about it's accuracy. I don't think they belong _anywhere_ near a legitimate news source.

My primary point about calling out those mistakes is that those are the kinds of minor mistakes in a summary that I would find quite tolerable and expected in my own use of LLMs, but I know what I am getting into when I use them. Just chucking those LLM generated summaries next to search results is malpractice, though.

I think the primary point of friction in a lot of critiques between people who find LLMs useful and people who hate AI usage is this:

People who use AI to generate content for consumption by others are being quite irresponsible in how it is presented, and are using it to replace human work that it is totally unsuitable for. A news organization that is putting out AI generated articles and summaries should just close up shop. They're producing totally valueless work. If I wanted chatgpt to summarize something, I could ask it myself in 20 seconds.

People who use AI for _themselves_ are more aware of what they are getting into, know the provenance, and aren't presenting it for others as their own work necessarily. This is more valuable economically, because getting someone to summarize something for you as an individual is quite expensive and time consuming, and even if the end results is quite shoddy, it's often better than nothing. This also goes for generating dumb videos on Sora or whatever or AI generated music for yourself to listen to or send to a few friends.

replies(1): >>45671022 #
3. filoeleven ◴[] No.45671022[source]
What's the actual utility of a warning-stickered-to-death unreliable summary?
replies(1): >>45672145 #
4. empath75 ◴[] No.45672145{3}[source]
Probably not much.

If you are a news organization and you want a reliable summary for an article, you should write it! You have writers available and should use them. This isn't a case where "better-than-nothing" applies, because "nothing" isn't your other option.

If you are an individual who wants a quick summary of something, then you don't have readers and writers on call to do that for you, and chatgpt takes a few seconds of your time and pennies to do a mediocre job.