←back to thread

440 points pseudolus | 1 comments | | HN request time: 0.232s | source
Show context
Havoc ◴[] No.45063050[source]
Not sure what these guys are studying but can tell you in the real world - essentially zero AI rollout in accounting world for anything serious.

We've got access to some fancy enterprise copilot version, deep research, MS office integration and all that jazz. I use it diligently every day...to make me a summary of today's global news.

When I try to apply it to actual accounting work. It hallucinates left, right & center on stuff that can't be wrong. Millions and millions off. That's how you get the taxman to kick down your door. Even simple "are these two numbers the same" get false positives so often that it's impossible to trust. So now I've got a review tool that I can't trust the output of? It's like a programming language where the equality (==) symbol has a built in 20% random number generator and you're supposed to write mission critical code with it.

replies(14): >>45063417 #>>45063575 #>>45063964 #>>45064042 #>>45064413 #>>45064732 #>>45065017 #>>45065089 #>>45065569 #>>45065576 #>>45068813 #>>45069627 #>>45076092 #>>45093899 #
coffeefirst ◴[] No.45064413[source]
I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

I've tried multiple models over the course of 6 months. Yesterday it told me I made a brilliant observation, but it hasn't managed to successfully pin down a single real anomaly. Once it told me the charges were Starbucks, when I had not been to a Starbucks—it's just that Starbucks is a probable output when analyzing credit card statements.

And I'm only dealing with a list of 40 records that I can check by hand, with zero consequences if I get it wrong beyond my personal budgeting being off by 1%.

I can't imagine trusting any business that leans on this for inappropriate jobs.

replies(1): >>45065833 #
phkahler ◴[] No.45065833[source]
>> I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

This is a perfect example of what people don't understand (or on HN keep forgetting). LLMs do NOT follow instructions, they predict the next word in text and spit it out. The process is somewhat random, and certainly does not include an interpreter (executive function?) to execute instructions - even natural language instructions.

replies(2): >>45065941 #>>45079443 #
1. seanmcdirmid ◴[] No.45079443[source]
There are models that are tuned to follow instructions, and it kind of works. In a non deterministic way, like if you had a very unreliable junior that could kind of follow instructions but didn’t have very great attention yet.