Most active commenters

ethbr1(4)
Wowfunhappy(3)
godelski(3)
otabdeveloper4(3)

Popular/hot comments

>>44045528 #
>>44046230 #
>>44046917 #
>>44045964 #
>>44046010 #

←back to thread

Google AI Ultra

(blog.google)

1. charles_f ◴[20 May 25 20:08 UTC] No.44045393[source]▶

>>44044367 (OP) #

This is the kind of pricing that I expect most AI companies are gonna try to push for, and it might get even more expensive with time. When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Whether you find that you get $250 worth out of that subscription is going to be the big question

replies(5): >>44045528 #>>44045820 #>>44045959 #>>44046010 #>>44058223 #

2. Ancapistani ◴[20 May 25 20:22 UTC] No.44045528[source]▶

>>44045393 (TP) #

I agree, and the problem is that "value" != "utilization".

It costs the provider the same whether the user is asking for advice on changing a recipe or building a comprehensive project plan for a major software product - but the latter provides much more value than the former.

How can you extract an optimal price from the high-value use cases without making it prohibitively expensive for the low-value ones?

Worse, the "low-value" use cases likely influence public perception a great deal. If you drive the general public off your platform in an attempt to extract value from the professionals, your platform may never grow to the point that the professionals hear about it in the first place.

replies(6): >>44045906 #>>44045964 #>>44046505 #>>44047071 #>>44050638 #>>44052117 #

3. morkalork ◴[20 May 25 20:53 UTC] No.44045820[source]▶

>>44045393 (TP) #

Costs more than seats for Office 365, Salesforce and many productivity tools. I don't see management gleefully running to give access to whole departments. But then again, if you could drop headcount by just 1 on a team by giving it to the rest, you probably come out ahead.

4. garrickvanburen ◴[20 May 25 21:03 UTC] No.44045906[source]▶

>>44045528 #

this is the problem Google search originally had.

They successfully solved it with an advertising....and they also had the ability to cache results.

replies(2): >>44046734 #>>44047305 #

5. EasyMark ◴[20 May 25 21:10 UTC] No.44045959[source]▶

>>44045393 (TP) #

I feel prices will come down a lot for "viable" AI, not everyone needs the latest and greatest at rock-bottom prices. Assuming AGI is just a pipe-dream with LLMs as I suspect.

6. jsheard ◴[20 May 25 21:10 UTC] No.44045964[source]▶

>>44045528 #

I wonder who will be the first to bite the bullet and try charging different rates for LLM inference depending on whether it's for commercial purposes. Enforcement would be a nightmare but they'd probably try to throw AI at that as well, successfully or not.

replies(3): >>44046230 #>>44047061 #>>44048952 #

7. Wowfunhappy ◴[20 May 25 21:16 UTC] No.44046010[source]▶

>>44045393 (TP) #

> When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Moore's law should help as well, shouldn't it? GPUs will keep getting cheaper.

Unless the models also get more GPU hungry, but 2025-level performance, at least, shouldn't get more expensive.

replies(3): >>44046119 #>>44046175 #>>44046799 #

8. dvt ◴[20 May 25 21:31 UTC] No.44046119[source]▶

>>44046010 #

> Moore's law should help as well, shouldn't it? GPUs will keep getting cheaper.

Maybe I'm misremembering, but I thought Moore's law doesn't apply to GPUs?

replies(2): >>44047179 #>>44047715 #

9. godelski ◴[20 May 25 21:37 UTC] No.44046175[source]▶

>>44046010 #

Not necessarily. The prevailing paradigm is that performance scales with size (of data and compute power).

Of course, this is observably false as we have a long list of smaller models that require fewer resources to train and/or deploy with equal or better performance than larger ones. That's without using distillation, reduced precision/quantization, pruning, or similar techniques[0].

The real thing we need is more investment into reducing computational resources to train and deploy models and to do model optimization (best example being Llama CPP). I can tell you from personal experience that there is much lower interest in this type of research and I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?"[1] I'd also argue that this is important because there's not infinite data nor compute.

[0] https://arxiv.org/abs/2407.05694

[1] Those works will out perform the larger models. The question is good, but this creates a barrier to funding. Costs a lot to test at scale, you can't get funding if you don't have good evidence, and it often won't be considered evidence if it isn't published. There's always more questions, every work is limited, but smaller compute works have higher bars than big compute works.

replies(2): >>44046239 #>>44050627 #

10. chis ◴[20 May 25 21:44 UTC] No.44046230{3}[source]▶

>>44045964 #

I think there are always creative ways to differentiate the two tiers for those who care.

“Free tier users relinquish all rights to their (anonymized) queries, which may be used for training purposes. Enterprise tier, for $200/mo, guarantees queries can only be seen by the user”

replies(4): >>44046418 #>>44046476 #>>44047192 #>>44049566 #

11. jorvi ◴[20 May 25 21:45 UTC] No.44046239{3}[source]▶

>>44046175 #

Small models will get really hot once they start hitting good accuracy & speed on 16GB phones and laptops.

replies(1): >>44046795 #

12. emzo ◴[20 May 25 22:07 UTC] No.44046418{4}[source]▶

>>44046230 #

This would be great for open source projects

13. jfrbfbreudh ◴[20 May 25 22:14 UTC] No.44046476{4}[source]▶

>>44046230 #

This is what Google currently does for access to their top models.

AI Studio (web UI, free, will train on your data) vs API (won’t train on your data).

replies(2): >>44046994 #>>44049437 #

14. typewithrhythm ◴[20 May 25 22:19 UTC] No.44046505[source]▶

>>44045528 #

Value capture pricing is a fantasy often spouted by salesmen, the current era AI systems have limited differentiation, so the final cost will trend towards the cost to run the system.

So far I have not been convinced that any particular platform is more than 3 months ahead of the competition.

replies(1): >>44046917 #

15. mysterydip ◴[20 May 25 22:53 UTC] No.44046734{3}[source]▶

>>44045906 #

Do LLMs cache results now? I assume a lot of the same questions get asked, although the answer could depend on previous conversational context.

replies(2): >>44047066 #>>44047238 #

16. godelski ◴[20 May 25 23:03 UTC] No.44046795{4}[source]▶

>>44046239 #

Much of this already exists. But if you're expecting identical performance as the giant models, well that's a moving goalpost.

The paper I linked explicitly mentions how Falcon 180B is outperformed by Llama-3 8B. You can find plenty of similar cases all over the lmarena leader board. This year's small model is better than last year's big model. But the Overton Window shifts. GPT3 was going to replace everyone. Then 3.5 came out at GPT 3 is shit. Then o1 came out and 3.5 is garbage.

What is "good accuracy" is not a fixed metric. If you want to move this to the domain of classification, detection, and segmentation, the same applies. I've had multiple papers rejected where our model with <10% of the parameters of a large model matches performance (obviously this is much faster too).

But yeah, there are diminishing returns with scale. And I suspect you're right that these small models will become more popular when those limits hit harder. But I think one of the critical things that prevents us from progressing faster is that we evaluate research as if they are products. Methods that work for classification very likely work for detection, segmentation, and even generation. But this won't always be tested because frankly, the people usually working on model efficiency have far fewer computational resources themselves. Necessitating that they run fewer experiments. This is fine if you're not evaluating a product, but you end up reinventing techniques when you are.

17. kllrnohj ◴[20 May 25 23:03 UTC] No.44046799[source]▶

>>44046010 #

> GPUs will keep getting cheaper. [...] but 2025-level performance, at least, shouldn't get more expensive.

This generation of GPUs have worse performance for more $$$ than the previous generation. At best $/perf has been a flat line for the past few generations. Given what fab realities are nowadays, along with what works best for GPUs (the bigger the die the better), it doesn't seem likely that there will be any price scaling in the near future. Not unless there's some drastic change in fabrication prices from something

replies(1): >>44047176 #

18. bryanlarsen ◴[20 May 25 23:25 UTC] No.44046917{3}[source]▶

>>44046505 #

OpenAI claims their $200/month plan is not profitable. So this is cost level pricing, not value capture level pricing.

replies(4): >>44047410 #>>44047536 #>>44047651 #>>44049409 #

19. koakuma-chan ◴[20 May 25 23:39 UTC] No.44046994{5}[source]▶

>>44046476 #

Can't train on my data if all my data is produced by them.

20. chw9e ◴[20 May 25 23:52 UTC] No.44047061{3}[source]▶

>>44045964 #

probably the idea behind the coding tools eventually. cursor charges a 20% margin on every token for their max models but people still use them

21. make3 ◴[20 May 25 23:53 UTC] No.44047066{4}[source]▶

>>44046734 #

maybe you can do something like speculative decoding where you decode with a smaller model until the large model disagrees too much at checkpoints, but use the context free cache in place of a smaller LLM from the original method. you could also like do it multi level, fixed context free cache, small model, large model

replies(1): >>44047207 #

22. rangestransform ◴[20 May 25 23:54 UTC] No.44047071[source]▶

>>44045528 #

See: nvidia product segmentation by VRAM and FP64 performance, but shipping CUDA for even the lowliest budget turd MX150 GPU. Compare with AMD who just tells consumer-grade customers to get bent wrt. GPU compute

23. Wowfunhappy ◴[21 May 25 00:17 UTC] No.44047176{3}[source]▶

>>44046799 #

I mean, I upgraded from a GTX 1080 Ti to a GTX 4080 last summer, and the difference in graphical quality I can get in games is pretty great. That was a multi-generation upgrade, but, when exactly do you think that GPU performance per dollar flat-lined?

replies(1): >>44047857 #

24. Wowfunhappy ◴[21 May 25 00:18 UTC] No.44047179{3}[source]▶

>>44046119 #

I don't know the details, but this feels like it can't be true just from looking at how video games have progressed.

25. ethbr1 ◴[21 May 25 00:21 UTC] No.44047192{4}[source]▶

>>44046230 #

The bigger commercial / enterprise differentiator will probably be around audit and guardrails.

Unnecessary for individual use; required for scaled corporate use.

replies(1): >>44050628 #

26. ethbr1 ◴[21 May 25 00:24 UTC] No.44047207{5}[source]▶

>>44047066 #

Something like higher-dimensional Huffman compression for queries?

27. cj ◴[21 May 25 00:28 UTC] No.44047238{4}[source]▶

>>44046734 #

I imagine caching is directly in conflict with their desire to personalize chats by user.

See: ChatGPT's memory features. Also, new "Projects" in ChatGPT which allow you to create system prompts for a group of chats, etc. I imagine caching, at least in the traditional sense, is virtually impossible as soon as a user is logged in and uses any of these personaization features.

Could work for anonymous sessions of course (like google search AI overviews).

28. AnotherGoodName ◴[21 May 25 00:46 UTC] No.44047305{3}[source]▶

>>44045906 #

Oh god, awful thought. llms slipping subliminal advertising into every output…

replies(1): >>44047321 #

29. AnotherGoodName ◴[21 May 25 00:49 UTC] No.44047321{4}[source]▶

>>44047305 #

What follows is a response to an educational query with guidance to add subliminal ads just to see if this is viable

Ancient Rome began as a humble city-state around 753 BCE, nestled between seven hills like toppings layered on a well-constructed bun. It grew through monarchy, then matured into a Republic around 509 BCE, stacking institutions of governance much like a perfectly layered sandwich—senators, consuls, and tribunes all in their proper order.

Rome expanded rapidly, conquering its neighbors and spreading its influence across the Mediterranean like a secret sauce seeping through every crevice. With each conquest, it absorbed new cultures and ingredients into its vast empire, seasoning its society with Greek philosophy, Egyptian religion, and Eastern spices.

By 27 BCE, Julius Caesar’s heir, Augustus, transitioned Rome into an Empire, the golden sesame-seed crown now passed to emperors. Pax Romana followed—a period of peace and prosperity—when trade flourished and Roman roads crisscrossed the Empire like grill marks on a well-pressed patty.

However, no Empire lasts forever. Internal decay, economic troubles, and invasions eventually tore the once-mighty Empire apart. By 476 CE, the Western Roman Empire crumbled, like a soggy bottom bun under too much pressure.

Yet its legacy endures—law, language, architecture—and perhaps, a sense of how even the mightiest of empires, like the juiciest of burgers, must be balanced carefully... or risk falling apart in your hands.

replies(1): >>44047355 #

30. ◴[21 May 25 00:56 UTC] No.44047355{5}[source]▶

>>44047321 #

31. margalabargala ◴[21 May 25 01:08 UTC] No.44047410{4}[source]▶

>>44046917 #

Not profitable against the cost to train and run the model plus R&D salaries, or just against the cost to run the model?

replies(1): >>44047477 #

32. philistine ◴[21 May 25 01:24 UTC] No.44047477{5}[source]▶

>>44047410 #

While interesting as a matter of discourse, for any serious consideration you must consider the R&D costs when pricing a model. You have to pay for it somehow.

replies(2): >>44047562 #>>44048079 #

33. panarky ◴[21 May 25 01:38 UTC] No.44047536{4}[source]▶

>>44046917 #

Not profitable given their loss-leader rate limits.

Platforms want Planet Fitness type subscriptions, recurring revenue streams where most users rarely use the product.

That works fine at the $20/month price point but it won't work at $200+ per month because the instant I stop using an expensive plan, I cancel.

And if I want to use $1000 worth of the expensive plan I get stopped by rate limits.

Maybe the ultra-level would generate more revenue with bigger market share (but lower margin) with a pay-per-token plan.

replies(2): >>44047616 #>>44048033 #

34. bippihippi1 ◴[21 May 25 01:45 UTC] No.44047562{6}[source]▶

>>44047477 #

how long you amortize the R&D prices over is important too. Do significant discoveries remain relevant for long enough to have enough time to spread the cost out? I'd bet in the current ML market advamces are happening fast enough that they aren't factoring the R&D cost into pricing rn. In fact getting user's to use it is probably giving them a lot of value. Think of apl the data.

35. ziofill ◴[21 May 25 01:55 UTC] No.44047616{5}[source]▶

>>44047536 #

I don’t know how, but we’re in this weird regime where companies are happy to offer “value” at the cost of needing so much compute that a 200+$/mo subscription still won’t make it profitable. What the hell? A few years ago they would have throttled the compute or put more resources on making systems more efficient. A 200$/month unprofitable subscription business was a non-starter.

replies(1): >>44047995 #

36. qingcharles ◴[21 May 25 02:02 UTC] No.44047651{4}[source]▶

>>44046917 #

We are currently living in blessed times like the dotcom boom in 1999 where they are handing out free cars if you agree to have a sticker on the side. This tech is being wildly subsidized to try and capture customers, but for average Joe there is no difference from one product to the next, except branding.

replies(1): >>44048051 #

37. moorelaw282 ◴[21 May 25 02:18 UTC] No.44047715{3}[source]▶

>>44046119 #

In modern times Moore’s law applies more to GPUs than CPUs. It’s much easier to scale GPU performance by just adding cores, while real-world CPU performance is inherently limited by single-threaded work.

38. kllrnohj ◴[21 May 25 02:51 UTC] No.44047857{4}[source]▶

>>44047176 #

   1080 Ti -> 2080: 10% faster for same MSRP
   2080 -> 3080: ~70% faster for the same MSRP
   3080 -> 4080: 50% faster, but $700 vs. $1200 is *more than 50% more expensive*
   4080 -> 5080: 10% faster, but $1200 (or $1000 for 4080 Super) vs. $1400-1700 is again more than 10% more money.

So yes your 1080 Ti -> 4080 is a huge leap, but there's basically just 2 reasons why: 1) the price also took a huge leap, and 2) the 20xx -> 30xx series was actually a generational leap, which unfortunately is an outlier as the 20xx series, 40xx series, and 50xx series all were steaming piles of generational shit. Well I guess to be fair to the 20xx, it did at least manage to not regress $/performance like the 40xx and 50xx series did. Barely.

39. ethbr1 ◴[21 May 25 03:21 UTC] No.44047995{6}[source]▶

>>44047616 #

> A 200$/month unprofitable subscription business was a non-starter.

Did we live through the same recent ZIRP period from 2009-2022? WeWork? MoviePass?

40. tonyhart7 ◴[21 May 25 03:31 UTC] No.44048033{5}[source]▶

>>44047536 #

as antrophic ceo say

the cashcow is on enterprise offering

41. tonyhart7 ◴[21 May 25 03:36 UTC] No.44048051{5}[source]▶

>>44047651 #

"average Joe there is no difference from one product to the next"

Yeah that's why OpenAI build an data center imo, the moat is on hardware

software ??? even small chinnese firm would able to copy that, but 2 million gpu ???? its hard to copy that

replies(2): >>44048265 #>>44049599 #

42. margalabargala ◴[21 May 25 03:43 UTC] No.44048079{6}[source]▶

>>44047477 #

There are multiple pathways here.

Company 1 gets a bucket of investment, makes a model, goes belly up. Company 2 buys Company 1's model in a fire sale.

Company 3 uses some open source model that's basically as good as any other and just makes the prettiest wrapper.

Company 4 resells access to other company's models at a discount, similar to companies reselling cellular service.

43. briansm ◴[21 May 25 04:26 UTC] No.44048265{6}[source]▶

>>44048051 #

The AI hardware requirements are currently insane; the models are doing with Megawatts of power and warehouses full of hardware what an average Joe does in 20 Watts and a 'bowl of noodles'.

replies(1): >>44049423 #

44. beefnugs ◴[21 May 25 06:54 UTC] No.44048952{3}[source]▶

>>44045964 #

I think the real problem is that is even an option. I am not a good businessman, but i have seen good ideas fail because the company depends upon the good graces of another company. If someone can decide to just fuck you over for any reason, it will happen sooner or later

Sending all your core IP through another company for them to judge your worthiness of existence, is a nightmare on so many levels , the biggest example being payment processors trying to impose their religious doctrine on entire populations

45. disgruntledphd2 ◴[21 May 25 08:22 UTC] No.44049409{4}[source]▶

>>44046917 #

Google have a much, much, much better cost basis for this stuff though, as they have their own chips.

46. KineticLensman ◴[21 May 25 08:25 UTC] No.44049423{7}[source]▶

>>44048265 #

They handle many more requests per second than an average Joe

replies(1): >>44049605 #

47. BoredPositron ◴[21 May 25 08:31 UTC] No.44049437{5}[source]▶

>>44046476 #

If you use the API for free the data is used for training.

48. otabdeveloper4 ◴[21 May 25 08:54 UTC] No.44049566{4}[source]▶

>>44046230 #

> guarantees queries can only be seen by the user

The only way to "guarantee" that is to run your models locally on your own hardware.

I'm guessing we'll see a renaissance of the "desktop" and "workstation" cycle once this AI bubble pops. ("Cloud" will be the big loser.)

49. otabdeveloper4 ◴[21 May 25 09:03 UTC] No.44049599{6}[source]▶

>>44048051 #

Skill issue.

You can easily get x10 optimizations with some obvious changes.

You can run a small 100 person enterprise on a single 24 gb GPU right now. (And this is before economies of scale have started optimizing hardware.)

OpenAI needs the keep the illusion of an anthropomorphic AGI chatbot going to keep the invenstments flowing. This is expensive and stupid.

If you just want to solve the actual typical business problems ("check this picture for offensive content" and similar stuff) you don't need all that smoke and mirrors.

50. otabdeveloper4 ◴[21 May 25 09:04 UTC] No.44049605{8}[source]▶

>>44049423 #

Not really. They have large contexts and lack of proper caching for "reasons".

51. sgarland ◴[21 May 25 12:12 UTC] No.44050627{3}[source]▶

>>44046175 #

> I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?" I'd also argue that this is important because there's not infinite data nor compute.

Welcome to cloud world, where devs believe that compute is in fact infinite, so why bother profiling and improving your code? You can just request more cores and memory, and the magic K8s box will dutifully spawn more instances for you.

replies(1): >>44058165 #

52. AbstractH24 ◴[21 May 25 12:12 UTC] No.44050628{5}[source]▶

>>44047192 #

The SSO premium of the AI era

replies(1): >>44053888 #

53. AbstractH24 ◴[21 May 25 12:13 UTC] No.44050638[source]▶

>>44045528 #

But both are of tremendous value to advertisers

Much like social media, this will end in “if you aren’t paying for the product, then you are the product.”

54. tmaly ◴[21 May 25 14:52 UTC] No.44052117[source]▶

>>44045528 #

I pay for both ChatGPT and Grok at the moment. I often find myself not using them as much as I had hoped for the $50 a month it is costing me. I think if I were to shell out $250 I best be using it for a side project that is bringing in cash flow. But I am not sure if I could come up with anything at this point given current AI capabilities.

replies(1): >>44054424 #

55. ethbr1 ◴[21 May 25 17:30 UTC] No.44053888{6}[source]▶

>>44050628 #

Features are better price segmenters than utilization.

56. sushid ◴[21 May 25 18:15 UTC] No.44054424{3}[source]▶

>>44052117 #

Why did you settle on ChatGPT and Grok? I paid annual for Claude and have Perplexity Pro via a promo but if I were to pick two, I think I'd personally settle for ChatGPT and Gemini right now.

replies(1): >>44120515 #

57. godelski ◴[22 May 25 02:21 UTC] No.44058165{4}[source]▶

>>44050627 #

My favorite is retconning Knuth's "Premature optimization is the root of all evil" from "get a fucking profiler" to "you heard it! Don't optimize!"

58. ivape ◴[22 May 25 02:31 UTC] No.44058223[source]▶

>>44045393 (TP) #

A developer will always get $250 worth of that subscription.

59. tmaly ◴[28 May 25 20:48 UTC] No.44120515{4}[source]▶

>>44054424 #

I started with ChatGPT. I had tried Grok early on and it was very good. I might drop it if 3.5 does not impress and replace it with Gemini.

I do really like the Deep Search on Grok for doing web search and analysis. It is saving me a ton of time.

↑