←back to thread

28 points addaon | 1 comments | | HN request time: 0s | source
Show context
tuananh ◴[] No.42190811[source]
it's 16TB of DDR5 btw
replies(1): >>42190905 #
metadat ◴[] No.42190905[source]
Yes, 128x128.

Good for a database, maybe.

What else?

replies(6): >>42191283 #>>42191285 #>>42191559 #>>42191737 #>>42191960 #>>42192376 #
HeatrayEnjoyer ◴[] No.42191960[source]
A half dozen GPT-4 instances
replies(1): >>42195659 #
1. metadat ◴[] No.42195659[source]
LLM inference processors (GPUs) don't use DDR, it uses special, costly stacked HBM ram soldered to the board.

I tested out running Llama on a 512GB machine, it's rather slow and inefficient. Maybe 1-token/sec.