←back to thread

440 points pseudolus | 7 comments | | HN request time: 0.019s | source | bottom
Show context
Havoc ◴[] No.45063050[source]
Not sure what these guys are studying but can tell you in the real world - essentially zero AI rollout in accounting world for anything serious.

We've got access to some fancy enterprise copilot version, deep research, MS office integration and all that jazz. I use it diligently every day...to make me a summary of today's global news.

When I try to apply it to actual accounting work. It hallucinates left, right & center on stuff that can't be wrong. Millions and millions off. That's how you get the taxman to kick down your door. Even simple "are these two numbers the same" get false positives so often that it's impossible to trust. So now I've got a review tool that I can't trust the output of? It's like a programming language where the equality (==) symbol has a built in 20% random number generator and you're supposed to write mission critical code with it.

replies(14): >>45063417 #>>45063575 #>>45063964 #>>45064042 #>>45064413 #>>45064732 #>>45065017 #>>45065089 #>>45065569 #>>45065576 #>>45068813 #>>45069627 #>>45076092 #>>45093899 #
1. coffeefirst ◴[] No.45064413[source]
I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

I've tried multiple models over the course of 6 months. Yesterday it told me I made a brilliant observation, but it hasn't managed to successfully pin down a single real anomaly. Once it told me the charges were Starbucks, when I had not been to a Starbucks—it's just that Starbucks is a probable output when analyzing credit card statements.

And I'm only dealing with a list of 40 records that I can check by hand, with zero consequences if I get it wrong beyond my personal budgeting being off by 1%.

I can't imagine trusting any business that leans on this for inappropriate jobs.

replies(1): >>45065833 #
2. phkahler ◴[] No.45065833[source]
>> I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

This is a perfect example of what people don't understand (or on HN keep forgetting). LLMs do NOT follow instructions, they predict the next word in text and spit it out. The process is somewhat random, and certainly does not include an interpreter (executive function?) to execute instructions - even natural language instructions.

replies(2): >>45065941 #>>45079443 #
3. coffeefirst ◴[] No.45065941[source]
Agreed. I keep trying stuff because I feel like I’m missing whatever magic people are talking about.

So far, I’ve found nothing of value besides natural language search.

replies(1): >>45071094 #
4. balder1991 ◴[] No.45071094{3}[source]
Yeah, if you go to a subreddit like ClaudeAI, you convince yourself there’s something you don’t know because they keep telling people it’s all their prompt faults if the LLM isn’t turning them into billionaires.

But then you read more of the comments and you see it’s really different interpretations from different people. Some “prompt maximalists” believe that perfect prompting is the key to unlocking the model's full potential, and that any failure is a user error. They tend to be the most vocal and create a sense that there's a hidden secret or a "magic formula" you're missing.

replies(1): >>45072088 #
5. Jensson ◴[] No.45072088{4}[source]
Its basically making a stone soup, people wont believe it can be done, but then put a stone in water and boil it, and tell people if you aren't getting a nice soup you aren't doing it right, just put in all these other ingredients that aren't required but really helps and you get this awesome soup!

Then someone say that isn't stone soup, they just did all the work without the stone! But that is just a stone hater, how can you not see this awesome soup made by the stone?

replies(1): >>45074390 #
6. canonistically ◴[] No.45074390{5}[source]
I think it's more like lottery winners giving "buy lottery tickets" as financial advice.

It is clear at this point that any meaning found in LLM outputs is projected there by the user. Some people by virtue of several intertwined factors can get some acceleration out of them, but most can't. It becomes like a football fan convinced that their rituals are essential for the team's victory.

Add the general very low understanding of machine learning (or even basic formal logic) and people go from a realistic emulation of conversations to magical thinking about having a mind in a box.

Either that or I am taking crazy pills. Because sometimes it feels like that.

7. seanmcdirmid ◴[] No.45079443[source]
There are models that are tuned to follow instructions, and it kind of works. In a non deterministic way, like if you had a very unreliable junior that could kind of follow instructions but didn’t have very great attention yet.