Google AI Ultra

(blog.google)

320 points mfiguiere | 1 comments | 20 May 25 18:20 UTC | HN request time: 0s | source

Show context

charles_f ◴[20 May 25 20:08 UTC] No.44045393[source]▶

This is the kind of pricing that I expect most AI companies are gonna try to push for, and it might get even more expensive with time. When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Whether you find that you get $250 worth out of that subscription is going to be the big question

replies(5): >>44045528 #>>44045820 #>>44045959 #>>44046010 #>>44058223 #

Wowfunhappy ◴[20 May 25 21:16 UTC] No.44046010[source]▶

>>44045393 #

> When you see the delta between what's currently being burnt by OpenAI and what they bring home, the sweet point is going to be hard to find.

Moore's law should help as well, shouldn't it? GPUs will keep getting cheaper.

Unless the models also get more GPU hungry, but 2025-level performance, at least, shouldn't get more expensive.

replies(3): >>44046119 #>>44046175 #>>44046799 #

godelski ◴[20 May 25 21:37 UTC] No.44046175[source]▶

>>44046010 #

Not necessarily. The prevailing paradigm is that performance scales with size (of data and compute power).

Of course, this is observably false as we have a long list of smaller models that require fewer resources to train and/or deploy with equal or better performance than larger ones. That's without using distillation, reduced precision/quantization, pruning, or similar techniques[0].

The real thing we need is more investment into reducing computational resources to train and deploy models and to do model optimization (best example being Llama CPP). I can tell you from personal experience that there is much lower interest in this type of research and I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?"[1] I'd also argue that this is important because there's not infinite data nor compute.

[0] https://arxiv.org/abs/2407.05694

[1] Those works will out perform the larger models. The question is good, but this creates a barrier to funding. Costs a lot to test at scale, you can't get funding if you don't have good evidence, and it often won't be considered evidence if it isn't published. There's always more questions, every work is limited, but smaller compute works have higher bars than big compute works.

replies(2): >>44046239 #>>44050627 #

sgarland ◴[21 May 25 12:12 UTC] No.44050627[source]▶

>>44046175 #

> I've seen plenty of works rejected because "why train a small model when you can just tune a large one?" or "does this scale?" I'd also argue that this is important because there's not infinite data nor compute.

Welcome to cloud world, where devs believe that compute is in fact infinite, so why bother profiling and improving your code? You can just request more cores and memory, and the magic K8s box will dutifully spawn more instances for you.

replies(1): >>44058165 #

1. godelski ◴[22 May 25 02:21 UTC] No.44058165[source]▶

>>44050627 #

My favorite is retconning Knuth's "Premature optimization is the root of all evil" from "get a fucking profiler" to "you heard it! Don't optimize!"

↑