Most active commenters

jxjnskkzxxhx(3)

Popular/hot comments

>>43665051 #

←back to thread

Google is winning on every AI front

(www.thealgorithmicbridge.com)

Show context

thunderbird120 ◴[12 Apr 25 06:13 UTC] No.43661807[source]▶

>>43661235 (OP) #

This article doesn't mention TPUs anywhere. I don't think it's obvious for people outside of google's ecosystem just how extraordinarily good the JAX + TPU ecosystem is. Google several structural advantages over other major players, but the largest one is that they roll their own compute solution which is actually very mature and competitive. TPUs are extremely good at both training and inference[1] especially at scale. Google's ability to tailor their mature hardware to exactly what they need gives them a massive leg up on competition. AI companies fundamentally have to answer the question "what can you do that no one else can?". Google's hardware advantage provides an actual answer to that question which can't be erased the next time someone drops a new model onto huggingface.

[1]https://blog.google/products/google-cloud/ironwood-tpu-age-o...

replies(12): >>43661870 #>>43661974 #>>43663154 #>>43663455 #>>43663647 #>>43663720 #>>43663956 #>>43664320 #>>43664354 #>>43672472 #>>43673285 #>>43674134 #

1. jxjnskkzxxhx ◴[12 Apr 25 13:46 UTC] No.43664320[source]▶

>>43661807 #

I've used Jax quite a bit and it's so much better than tf/pytorch.

Now for the life of me, I still haven't been able to understan what a TPU is. Is it Google's marketing term for a GPU? Or is it something different entirely?

replies(3): >>43664408 #>>43666281 #>>43668478 #

2. JLO64 ◴[12 Apr 25 13:56 UTC] No.43664408[source]▶

>>43664320 (TP) #

TPUs (short for Tensor Processing Units) are Google’s custom AI accelerator hardware which are completely separate from GPUs. I remember that introduced them in 2015ish but I imagine that they’re really starting to pay off with Gemini.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

replies(1): >>43665051 #

3. jxjnskkzxxhx ◴[12 Apr 25 15:09 UTC] No.43665051[source]▶

>>43664408 #

Believe it or not, I'm also familiar with Wikipedia. It reads that they're optimized for low precisio high thruput. To me this sounds like a GPU with a specific optimization.

replies(5): >>43665307 #>>43665332 #>>43666084 #>>43667601 #>>43670252 #

4. flebron ◴[12 Apr 25 15:39 UTC] No.43665307{3}[source]▶

>>43665051 #

Perhaps this chapter can help? https://jax-ml.github.io/scaling-book/tpus/

It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).

replies(1): >>43671862 #

5. ◴[12 Apr 25 15:41 UTC] No.43665332{3}[source]▶

>>43665051 #

6. crazygringo ◴[12 Apr 25 16:57 UTC] No.43666084{3}[source]▶

>>43665051 #

I mean yes. But GPU's also have a specific optimization, for graphics. This is a different optimization.

7. 317070 ◴[12 Apr 25 17:18 UTC] No.43666281[source]▶

>>43664320 (TP) #

Way back when, most of a GPU was for graphics. Google decided to design a completely new chip, which focused on the operations for neural networks (mainly vectorized matmul). This is the TPU.

It's not a GPU, as there is no graphics hardware there anymore. Just memory and very efficient cores, capable of doing massively parallel matmuls on the memory. The instruction set is tiny, basically only capable of doing transformer operations fast.

Today, I'm not sure how much graphics an A100 GPU still can do. But I guess the answer is "too much"?

replies(1): >>43667669 #

8. kgwgk ◴[12 Apr 25 20:15 UTC] No.43667601{3}[source]▶

>>43665051 #

Did you also read just after that "without hardware for rasterisation/texture mapping"? Does that sound like a _G_PU?

9. kcb ◴[12 Apr 25 20:25 UTC] No.43667669[source]▶

>>43666281 #

Less and less with each generation. The A100 has 160 ROPS, a 5090 has 176, the H100 and GB100 have just 24.

10. mota7 ◴[12 Apr 25 22:39 UTC] No.43668478[source]▶

>>43664320 (TP) #

There's basically a difference in philosophy. GPU chips have a bunch of cores, each of which is semi-capable, whereas TPU chips have (effectively) one enormous core.

So GPUs have ~120 small systolic arrays, one per SM (aka, a tensorcore), plus passable off-chip bandwidth (aka 16 lines of PCI).

Where has TPUs have one honking big systolic array, plus large amounts of off-chip bandwidth.

This roughly translates to GPUs being better if you're doing a bunch of different small-ish things in parallel, but TPUs are better if you're doing lots of large matrix multiplies.

11. jibal ◴[13 Apr 25 05:13 UTC] No.43670252{3}[source]▶

>>43665051 #

You asked a question, people tried to help, and you lashed out at them in a way that makes you look quite bad.

12. jxjnskkzxxhx ◴[13 Apr 25 11:01 UTC] No.43671862{4}[source]▶

>>43665307 #

Thank you for the answer! You see, up until now I had never appreciated that a GPU does more than matmuls... And that first reference, what a find :-)

Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.

↑