←back to thread

337 points throw0101c | 3 comments | | HN request time: 0.001s | source
Show context
oytis ◴[] No.44609364[source]
I just hope when (if) the hype is over, we can repurpose the capacities for something useful (e.g. drug discovery etc.)
replies(16): >>44609452 #>>44609461 #>>44609463 #>>44609471 #>>44609489 #>>44609580 #>>44609632 #>>44609635 #>>44609712 #>>44609785 #>>44609958 #>>44609979 #>>44610227 #>>44610522 #>>44610554 #>>44610755 #
alphazard ◴[] No.44609712[source]
The rest of the world has not caught up to current LLM capabilities. If it all stopped tomorrow and we couldn't build anything more intelligent than what we have now: there would be years of work automating away toil across various industries.
replies(3): >>44609814 #>>44609887 #>>44614001 #
1. sterlind ◴[] No.44609887[source]
my experience using LLM-powered tools (e.g. copilot in agent mode) has been underwhelming. like, shockingly so. like not cd-ing to the wrong dir where a script is located, and getting lost, disregarding my instructions to run ./tests.ps1 and running `dotnet test`, writing syntactically incorrect scripts and failing to correct them, particularly being overwhelmed by verbose logs. sometimes it even fails to understand the semantic meaning of my prompts.

whereas my experience describing my problem and actually asking the AI is much, much smoother.

I'm not convinced the "LLM+scaffolding" paradigm will work all that well. sanity degrades with context length, and even the models with huge context windows don't seem to use it all that effectively. RAG searches often give lackluster results. the models fundamentally seem to do poorly with using commands to accomplish tasks.

I think fundamental model advances are needed to make most things more than superficially automatable: better planning/goal-directed behavior, a more organic connection to RAG context, automatic gym synthesis, and RL-based fine tuning (that holds up to distribution shift.)

I think that will come, but I think if LLMs plateau here they won't have much more impact than Google Search did in the '90s.

replies(2): >>44610156 #>>44612609 #
2. break_the_bank ◴[] No.44610156[source]
I’m curious which was the model you used when you ran into the cd-ing bug?

I’d give building with sonnet 4 a fair shot. It’s really good, not accurate all the time but pretty good.

3. fragmede ◴[] No.44612609[source]
> won't have much more impact than Google Search did in the '90s.

Given that Google IPOd in 99, and is one of the biggest tech companies in the world, I'm not sure what you mean by that.