Most active commenters

Wowfunhappy(3)
godelski(3)

Google AI Ultra

(blog.google)

Show context

charles_f ◴[20 May 25 20:08 UTC] No.44045393[source]▶

This is the kind of pricing that I expect most AI companies are gonna try to push for, and it might get even more expensive with time. When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Whether you find that you get $250 worth out of that subscription is going to be the big question

replies(5): >>44045528 #>>44045820 #>>44045959 #>>44046010 #>>44058223 #

1. Wowfunhappy ◴[20 May 25 21:16 UTC] No.44046010[source]▶

>>44045393 #

> When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Moore's law should help as well, shouldn't it? GPUs will keep getting cheaper.

Unless the models also get more GPU hungry, but 2025-level performance, at least, shouldn't get more expensive.

replies(3): >>44046119 #>>44046175 #>>44046799 #

2. dvt ◴[20 May 25 21:31 UTC] No.44046119[source]▶

>>44046010 (TP) #

> Moore's law should help as well, shouldn't it? GPUs will keep getting cheaper.

Maybe I'm misremembering, but I thought Moore's law doesn't apply to GPUs?

replies(2): >>44047179 #>>44047715 #

3. godelski ◴[20 May 25 21:37 UTC] No.44046175[source]▶

>>44046010 (TP) #

Not necessarily. The prevailing paradigm is that performance scales with size (of data and compute power).

Of course, this is observably false as we have a long list of smaller models that require fewer resources to train and/or deploy with equal or better performance than larger ones. That's without using distillation, reduced precision/quantization, pruning, or similar techniques[0].

The real thing we need is more investment into reducing computational resources to train and deploy models and to do model optimization (best example being Llama CPP). I can tell you from personal experience that there is much lower interest in this type of research and I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?"[1] I'd also argue that this is important because there's not infinite data nor compute.

[0] https://arxiv.org/abs/2407.05694

[1] Those works will out perform the larger models. The question is good, but this creates a barrier to funding. Costs a lot to test at scale, you can't get funding if you don't have good evidence, and it often won't be considered evidence if it isn't published. There's always more questions, every work is limited, but smaller compute works have higher bars than big compute works.

replies(2): >>44046239 #>>44050627 #

4. jorvi ◴[20 May 25 21:45 UTC] No.44046239[source]▶

>>44046175 #

Small models will get really hot once they start hitting good accuracy & speed on 16GB phones and laptops.

replies(1): >>44046795 #

5. godelski ◴[20 May 25 23:03 UTC] No.44046795{3}[source]▶

>>44046239 #

Much of this already exists. But if you're expecting identical performance as the giant models, well that's a moving goalpost.

The paper I linked explicitly mentions how Falcon 180B is outperformed by Llama-3 8B. You can find plenty of similar cases all over the lmarena leader board. This year's small model is better than last year's big model. But the Overton Window shifts. GPT3 was going to replace everyone. Then 3.5 came out at GPT 3 is shit. Then o1 came out and 3.5 is garbage.

What is "good accuracy" is not a fixed metric. If you want to move this to the domain of classification, detection, and segmentation, the same applies. I've had multiple papers rejected where our model with <10% of the parameters of a large model matches performance (obviously this is much faster too).

But yeah, there are diminishing returns with scale. And I suspect you're right that these small models will become more popular when those limits hit harder. But I think one of the critical things that prevents us from progressing faster is that we evaluate research as if they are products. Methods that work for classification very likely work for detection, segmentation, and even generation. But this won't always be tested because frankly, the people usually working on model efficiency have far fewer computational resources themselves. Necessitating that they run fewer experiments. This is fine if you're not evaluating a product, but you end up reinventing techniques when you are.

6. kllrnohj ◴[20 May 25 23:03 UTC] No.44046799[source]▶

>>44046010 (TP) #

> GPUs will keep getting cheaper. [...] but 2025-level performance, at least, shouldn't get more expensive.

This generation of GPUs have worse performance for more $$$ than the previous generation. At best $/perf has been a flat line for the past few generations. Given what fab realities are nowadays, along with what works best for GPUs (the bigger the die the better), it doesn't seem likely that there will be any price scaling in the near future. Not unless there's some drastic change in fabrication prices from something

replies(1): >>44047176 #

7. Wowfunhappy ◴[21 May 25 00:17 UTC] No.44047176[source]▶

>>44046799 #

I mean, I upgraded from a GTX 1080 Ti to a GTX 4080 last summer, and the difference in graphical quality I can get in games is pretty great. That was a multi-generation upgrade, but, when exactly do you think that GPU performance per dollar flat-lined?

replies(1): >>44047857 #

8. Wowfunhappy ◴[21 May 25 00:18 UTC] No.44047179[source]▶

>>44046119 #

I don't know the details, but this feels like it can't be true just from looking at how video games have progressed.

9. moorelaw282 ◴[21 May 25 02:18 UTC] No.44047715[source]▶

>>44046119 #

In modern times Moore’s law applies more to GPUs than CPUs. It’s much easier to scale GPU performance by just adding cores, while real-world CPU performance is inherently limited by single-threaded work.

10. kllrnohj ◴[21 May 25 02:51 UTC] No.44047857{3}[source]▶

>>44047176 #

   1080 Ti -> 2080: 10% faster for same MSRP
   2080 -> 3080: ~70% faster for the same MSRP
   3080 -> 4080: 50% faster, but $700 vs. $1200 is *more than 50% more expensive*
   4080 -> 5080: 10% faster, but $1200 (or $1000 for 4080 Super) vs. $1400-1700 is again more than 10% more money.

So yes your 1080 Ti -> 4080 is a huge leap, but there's basically just 2 reasons why: 1) the price also took a huge leap, and 2) the 20xx -> 30xx series was actually a generational leap, which unfortunately is an outlier as the 20xx series, 40xx series, and 50xx series all were steaming piles of generational shit. Well I guess to be fair to the 20xx, it did at least manage to not regress $/performance like the 40xx and 50xx series did. Barely.

11. sgarland ◴[21 May 25 12:12 UTC] No.44050627[source]▶

>>44046175 #

> I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?" I'd also argue that this is important because there's not infinite data nor compute.

Welcome to cloud world, where devs believe that compute is in fact infinite, so why bother profiling and improving your code? You can just request more cores and memory, and the magic K8s box will dutifully spawn more instances for you.

replies(1): >>44058165 #

12. godelski ◴[22 May 25 02:21 UTC] No.44058165{3}[source]▶

>>44050627 #

My favorite is retconning Knuth's "Premature optimization is the root of all evil" from "get a fucking profiler" to "you heard it! Don't optimize!"

↑