←back to thread

423 points sohkamyung | 10 comments | | HN request time: 0.44s | source | bottom
1. Narciss ◴[] No.45670278[source]
> All participating organizations then generated responses to each question from each of the four AI assistants. This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini. Free versions were chosen to replicate the default (and likely most common) experience for users. Responses were generated in late May and early June 2025.

First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.

On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.

replies(5): >>45670334 #>>45670358 #>>45670859 #>>45670920 #>>45672440 #
2. Signez ◴[] No.45670334[source]
I think you are missing the point: it's mainly to highlight that the models that most people use, i.e. free versions with default settings, output a large number of factual errors, even when they are asked to base their answer to specific sources of information (as it's explained in their methodology document).
replies(1): >>45672834 #
3. biophysboy ◴[] No.45670358[source]
If they used a paid version, their study would not represent how most people use AI (with the free version)
replies(1): >>45672817 #
4. layer8 ◴[] No.45670859[source]
> to use free models for anything like this is absolutely wild

It would be wild if they’d use anything else, because the free models are what most people use, and the concern is on how AI influences the general population.

5. filoeleven ◴[] No.45670920[source]
> On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

"I contend we are both atheists, I just believe in one fewer god than you do. When you understand why you dismiss all the other possible gods, you will understand why I dismiss yours." - Stephen F Roberts

replies(1): >>45672861 #
6. dns_snek ◴[] No.45672440[source]
Ah, the "you're using the wrong model" fallacy (is there a name for this?)

In the eyes of the evangelists, every major model seems to go from "This model is close to flawless at this task, you MUST try this TODAY" to "It's absolutely wild that anyone would ever consider using such a no-good, worthless model for this task" over the course of a year or so. The old model has to be re-framed for the new model to look more impressive.

When GPT-4 was released I was told it was basically a senior-level developer, now it's an obviously worthless model that you'd be a fool to use to write so much as a throwaway script.

replies(1): >>45672801 #
7. Narciss ◴[] No.45672801[source]
Not an evangelist for AI at all, I just love it as a tool for my creativity, research and coding.

What I’m saying is that there should be a disclaimer: hey, we’re testing these models for the average person, that have no idea about AI. People who actually know AI would never use them in this way.

A better idea: educate people. Add “Here’s the best way to use them btw…” to the report.

All I’m saying is, it’s a tool, and yes you can use it wrong. That’s not a crazy realization. It applies to every other tool.

We knew that the hallucation rate for gpt 4o was nuts. From the start. We also know that gpt-5 has a much lower hallucination rate. So there are no surprises here, I’m not saying anything groundbreaking, and neither are they.

8. Narciss ◴[] No.45672817[source]
But they’re using a free version that’s not even out there anymore. This is my problem - it came out already dated.
9. Narciss ◴[] No.45672834[source]
Is it true of the latest free models? Just saying that the report started already dated.
10. Narciss ◴[] No.45672861[source]
It ain’t a God, it’s a tool.

One knife does not cut potatoes. Doesn’t mean that all knives don’t cut potatoes. Use the right tool for the job.

Though I do love a well placed quote