OpenAI, Google and Anthropic are struggling to build more advanced AI

(www.bloomberg.com)

Show context

LASR ◴[14 Nov 24 19:19 UTC] No.42140045[source]▶

Question for the group here: do we honestly feel like we've exhausted the options for delivering value on top of the current generation of LLMs?

I lead a team exploring cutting edge LLM applications and end-user features. It's my intuition from experience that we have a LONG way to go.

GPT-4o / Claude 3.5 are the go-to models for my team. Every combination of technical investment + LLMs yields a new list of potential applications.

For example, combining a human-moderated knowledge graph with an LLM with RAG allows you to build "expert bots" that understand your business context / your codebase / your specific processes and act almost human-like similar to a coworker in your team.

If you now give it some predictive / simulation capability - eg: simulate the execution of a task or project like creating a github PR code change, and test against an expert bot above for code review, you can have LLMs create reasonable code changes, with automatic review / iteration etc.

Similarly there are many more capabilities that you can ladder on and expose into LLMs to give you increasingly productive outputs from them.

Chasing after model improvements and "GPT-5 will be PHD-level" is moot imo. When did you hire a PHD coworker and they were productive on day-0 ? You need to onboard them with human expertise, and then give them execution space / long-term memories etc to be productive.

Model vendors might struggle to build something more intelligent. But my point is that we already have so much intelligence and we don't know what to do with that. There is a LOT you can do with high-schooler level intelligence at super-human scale.

Take a naive example. 200k context windows are now available. Most people, through ChatGPT, type out maybe 1500 tokens. That's a huge amount of untapped capacity. No human is going to type out 200k of context. Hence why we need RAG, and additional forms of input (eg: simulation outcomes) to fully leverage that.

replies(43): >>42140086 #>>42140126 #>>42140135 #>>42140347 #>>42140349 #>>42140358 #>>42140383 #>>42140604 #>>42140661 #>>42140669 #>>42140679 #>>42140726 #>>42140747 #>>42140790 #>>42140827 #>>42140886 #>>42140907 #>>42140918 #>>42140936 #>>42140970 #>>42141020 #>>42141275 #>>42141399 #>>42141651 #>>42141796 #>>42142581 #>>42142765 #>>42142919 #>>42142944 #>>42143001 #>>42143008 #>>42143033 #>>42143212 #>>42143286 #>>42143483 #>>42143700 #>>42144031 #>>42144404 #>>42144433 #>>42144682 #>>42145093 #>>42145589 #>>42146002 #

crystal_revenge ◴[14 Nov 24 19:27 UTC] No.42140135[source]▶

>>42140045 #

I don't think we've even started to get the most value out of current gen LLMs. For starters very few people are even looking at sampling which is a major part of the model performance.

The theory behind these models so aggressively lags the engineering that I suspect there are many major improvements to be found just by understanding a bit more about what these models are really doing and making re-designs based on that.

I highly encourage anyone seriously interested in LLMs to start spending more time in the open model space where you can really take a look inside and play around with the internals. Even if you don't have the resources for model training, I feel personally understanding sampling and other potential tweaks to the model (lots of neat work on uncertainty estimations, manipulating the initial embedding the prompts are assigned, intelligent backtracking, etc).

And from a practical side I've started to realize that many people have been holding on of building things waiting for "that next big update", but there a so many small, annoying tasks that can be easily automated.

replies(8): >>42140256 #>>42141284 #>>42141433 #>>42141459 #>>42141522 #>>42141760 #>>42142470 #>>42143106 #

dr_dshiv ◴[14 Nov 24 19:37 UTC] No.42140256[source]▶

>>42140135 #

> I've started to realize that many people have been holding on of building things waiting for "that next big update"

I’ve noticed this too — I’ve been calling it intellectual deflation. By analogy, why spend now when it may be cheaper in a month? Why do the work now, when it will be easier in a month?

replies(2): >>42140326 #>>42141311 #

1. vbezhenar ◴[14 Nov 24 19:43 UTC] No.42140326[source]▶

>>42140256 #

Why optimise software today, when tomorrow Intel will release CPU with 2x performance?

replies(4): >>42140532 #>>42140536 #>>42140770 #>>42144934 #

2. sdenton4 ◴[14 Nov 24 20:01 UTC] No.42140532[source]▶

>>42140326 (TP) #

Curiously, Moore's law was predictable enough over decades that you could actually plan for the speed of next year's hardware quite reliably.

For LLMs, we don't even know how to reliably measure performance, much less plan for expected improvements.

replies(1): >>42140676 #

3. throwing_away ◴[14 Nov 24 20:01 UTC] No.42140536[source]▶

>>42140326 (TP) #

Call Nvidia, that sounds like a job for AI.

4. mikeyouse ◴[14 Nov 24 20:13 UTC] No.42140676[source]▶

>>42140532 #

Moores law became less of a prediction and more of a product road map as time went on. It helped coordinate investment and expectations across the entire industry so everyone involved had the same understanding of timelines and benchmarks. I fully believe more investment would’ve ‘bent the curve’ of the trend line but everyone was making money and there wasn’t a clear benefit to pushing the edge further.

replies(1): >>42141026 #

5. ben_w ◴[14 Nov 24 20:22 UTC] No.42140770[source]▶

>>42140326 (TP) #

Back when Intel regularly gave updates with 2x performance increases, people did make decisions based on the performance doubling schedule.

6. epicureanideal ◴[14 Nov 24 20:45 UTC] No.42141026{3}[source]▶

>>42140676 #

Or maybe it pushed everyone to innovate faster than they otherwise would’ve? I’m very interested to hear your reasoning for the other case though, and I am not strongly committed to the opposite view, or either view for that matter.

7. fooker ◴[15 Nov 24 08:29 UTC] No.42144934[source]▶

>>42140326 (TP) #

If Intel could do that, they would be the one with a 3 trillion market cap. Not Nvidia.

↑