AI assistants misrepresent news content 45% of the time

In my experience there is a big difference between good models and weak ones. Quick test with this long article I read recently: https://www.lawfaremedia.org/article/anna--lindsey-halligan-...

The command I ran was `curl -s https://r.jina.ai/https://www.lawfaremedia.org/article/anna-... | cb | ai -m gpt-5-mini summarize this article in one paragraph`. r.jina.ai pulls the text as markdown, and cb just wraps in a ``` code fence, and ai is my own LLM CLI https://github.com/david-crespo/llm-cli.

All of them seem pretty good to me, though at 6 cents the regular use of Sonnet for this purpose would be excessive. Note that reasoning was on the default setting in each case. I think that means the gpt-5 mini one did no reasoning but the other two did.

GPT-5 one paragraph: https://gist.github.com/david-crespo/f2df300ca519c336f9e1953...

GPT-5 three paragraphs: https://gist.github.com/david-crespo/d68f1afaeafdb68771f5103...

GPT-5 mini one paragraph: https://gist.github.com/david-crespo/32512515acc4832f47c3a90...

GPT-5 mini three paragraphs: https://gist.github.com/david-crespo/ed68f09cb70821cffccbf6c...

Sonnet 4.5 one paragraph: https://gist.github.com/david-crespo/e565a82d38699a5bdea4411...

Sonnet 4.5 three paragraphs: https://gist.github.com/david-crespo/2207d8efcc97d754b7d9bf4...