←back to thread

507 points martinald | 2 comments | | HN request time: 0.001s | source
Show context
ekelsen ◴[] No.45052850[source]
The math on the input tokens is definitely wrong. It claims each instance (8 GPUs) can handle 1.44 million tokens/sec of input. Let's check that out.

1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000 GPUs

And that's assuming a more likely 1 byte per parameter.

So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...

replies(5): >>45052936 #>>45052942 #>>45052964 #>>45053047 #>>45053166 #
1. endtime ◴[] No.45052964[source]
> 37e9 bytes/token

This doesn't quite sound right...isn't a token just a few characters?

replies(1): >>45053116 #
2. ◴[] No.45053116[source]