←back to thread

Google is winning on every AI front

(www.thealgorithmicbridge.com)
993 points vinhnx | 3 comments | | HN request time: 0.694s | source
Show context
thunderbird120 ◴[] No.43661807[source]
This article doesn't mention TPUs anywhere. I don't think it's obvious for people outside of google's ecosystem just how extraordinarily good the JAX + TPU ecosystem is. Google several structural advantages over other major players, but the largest one is that they roll their own compute solution which is actually very mature and competitive. TPUs are extremely good at both training and inference[1] especially at scale. Google's ability to tailor their mature hardware to exactly what they need gives them a massive leg up on competition. AI companies fundamentally have to answer the question "what can you do that no one else can?". Google's hardware advantage provides an actual answer to that question which can't be erased the next time someone drops a new model onto huggingface.

[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...

replies(12): >>43661870 #>>43661974 #>>43663154 #>>43663455 #>>43663647 #>>43663720 #>>43663956 #>>43664320 #>>43664354 #>>43672472 #>>43673285 #>>43674134 #
mike_hearn ◴[] No.43663720[source]
TPUs aren't necessarily a pro. They go back 15 years and don't seem to have yielded any kind of durable advantage. Developing them is expensive but their architecture was often over-fit to yesterday's algorithms which is why they've been through so many redesigns. Their competitors have routinely moved much faster using CUDA.

Once the space settles down, the balance might tip towards specialized accelerators but NVIDIA has plenty of room to make specialized silicon and cut prices too. Google has still to prove that the TPU investment is worth it.

replies(4): >>43663930 #>>43664015 #>>43666501 #>>43668095 #
alienthrowaway ◴[] No.43666501[source]
> Developing them is expensive

So are the electric and cooling costs at Google's scale. Improving perf-per-watt efficiency can pay for itself. The fact that they keep iterating on it suggests it's not a negative-return exercise.

replies(1): >>43666633 #
1. mike_hearn ◴[] No.43666633[source]
TPUs probably can pay for themselves, especially given NVIDIA's huge margins. But it's not a given that it's so just because they fund it. When I worked there Google routinely funded all kinds of things without even the foggiest idea of whether it was profitable or not. There was just a really strong philosophical commitment to doing everything in house no matter what.
replies(1): >>43670801 #
2. marsten ◴[] No.43670801[source]
> When I worked there Google routinely funded all kinds of things without even the foggiest idea of whether it was profitable or not.

You're talking about small-money bets. The technical infrastructure group at Google makes a lot of them, to explore options or hedge risks, but they only scale the things that make financial sense. They aren't dumb people after all.

The TPU was a small-money bet for quite a few years until this latest AI boom.

replies(1): >>43671908 #
3. mike_hearn ◴[] No.43671908[source]
Maybe it's changed. I'm going back a long way but part of my view on this was shaped by an internal white paper written by an engineer who analyzed the cost of building a Gmail clone using commodity tech vs Google's in house approach, this was maybe circa 2010. He didn't even look at people costs, just hardware, and the commodity tech stack smoked Gmail's on cost without much difference in features (this was focused on storage and serving, not spam filtering where there was no comparably good commodity solution).

The cost delta was massive and really quite astounding to see spelled out because it was hardly talked about internally even after the paper was written. And if you took into account the very high comp Google engineers got, even back then when it was lower than today, the delta became comic. If Gmail had been a normal business it'd have been outcompeted on price and gone broke instantly, the cost disadvantage was so huge.

The people who built Gmail were far from dumb but they just weren't being measured on cost efficiency at all. The same issues could be seen at all levels of the Google stack at that time. For instance, one reason for Gmail's cost problem was that the underlying shared storage systems like replicated BigTables were very expensive compared to more ordinary SANs. And Google's insistence on being able to take clusters offline at will with very little notice required a higher replication factor than a normal company would have used. There were certainly benefits in terms of rapid iteration on advanced datacenter tech, but did every product really need such advanced datacenters to begin with? Probably not. The products I worked on didn't seem to.

Occasionally we'd get a reality check when acquiring companies and discovering they ran competitive products on what was for Google an unimaginably thrifty budget.

So Google was certainly willing to scale things up that only made financial sense if you were in an environment totally unconstrained by normal budgets. Perhaps the hardware divisions operate differently, but it was true of the software side at least.