←back to thread

Google is winning on every AI front

(www.thealgorithmicbridge.com)
993 points vinhnx | 1 comments | | HN request time: 0.205s | source
Show context
thunderbird120 ◴[] No.43661807[source]
This article doesn't mention TPUs anywhere. I don't think it's obvious for people outside of google's ecosystem just how extraordinarily good the JAX + TPU ecosystem is. Google several structural advantages over other major players, but the largest one is that they roll their own compute solution which is actually very mature and competitive. TPUs are extremely good at both training and inference[1] especially at scale. Google's ability to tailor their mature hardware to exactly what they need gives them a massive leg up on competition. AI companies fundamentally have to answer the question "what can you do that no one else can?". Google's hardware advantage provides an actual answer to that question which can't be erased the next time someone drops a new model onto huggingface.

[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...

replies(12): >>43661870 #>>43661974 #>>43663154 #>>43663455 #>>43663647 #>>43663720 #>>43663956 #>>43664320 #>>43664354 #>>43672472 #>>43673285 #>>43674134 #
jxjnskkzxxhx ◴[] No.43664320[source]
I've used Jax quite a bit and it's so much better than tf/pytorch.

Now for the life of me, I still haven't been able to understan what a TPU is. Is it Google's marketing term for a GPU? Or is it something different entirely?

replies(3): >>43664408 #>>43666281 #>>43668478 #
1. mota7 ◴[] No.43668478[source]
There's basically a difference in philosophy. GPU chips have a bunch of cores, each of which is semi-capable, whereas TPU chips have (effectively) one enormous core.

So GPUs have ~120 small systolic arrays, one per SM (aka, a tensorcore), plus passable off-chip bandwidth (aka 16 lines of PCI).

Where has TPUs have one honking big systolic array, plus large amounts of off-chip bandwidth.

This roughly translates to GPUs being better if you're doing a bunch of different small-ish things in parallel, but TPUs are better if you're doing lots of large matrix multiplies.