H100's can be $2 and hour, so $192 an hour for the full cluster. They report 22k tokens per second, so ~ 80 million an hour, thats $16 an hour at $0.2 per million. Maybe a bit more for input tokens, but it seems a long way off.

replies(1): >>45066003 #

5. randomjoe2 ◴[29 Aug 25 15:39 UTC] No.45065518[source]▶

>>45065147 #

Local doesn't refer to "on metal" anymore to many people

replies(3): >>45065653 #>>45065663 #>>45067202 #

6. DSingularity ◴[29 Aug 25 15:42 UTC] No.45065549[source]▶

>>45065147 #

I guess local for him is independent/private.

7. monsieurbanana ◴[29 Aug 25 15:50 UTC] No.45065653{3}[source]▶

>>45065518 #

I missed that train

replies(1): >>45065843 #

8. mwcz ◴[29 Aug 25 15:51 UTC] No.45065663{3}[source]▶

>>45065518 #

"On metal" is muddied too. I've heard people refer to web apps running in an OCI container as being "bare metal" deployment, as opposed to AWS or whatever hosting platform.

That's silly, but the idea that "local" is not the opposite of remote is even sillier.

replies(2): >>45065742 #>>45065883 #

9. ffsm8 ◴[29 Aug 25 15:56 UTC] No.45065742{4}[source]▶

>>45065663 #

You can run an OCI container on bare metal though. It doesn't stop being run on bare metal just because you're running in kernel namespaces, aka docker container

Lots of people were advocating for running their k8s on bare metal servers to maximize the performance of their containers

Now wherever that's applied to your conversation... I've no clue, too little context ( ｡ ŏ ﹏ ŏ )

replies(1): >>45066076 #

10. vFunct ◴[29 Aug 25 16:03 UTC] No.45065843{4}[source]▶

>>45065653 #

My basement server really confused by all this...

replies(1): >>45069674 #

11. dtech ◴[29 Aug 25 16:06 UTC] No.45065883{4}[source]▶

>>45065663 #

If you do bare metal as not being under a VM it fits. OCI on linux is cgroup so that counts as not a VM I'd say. Or at least it's a layer closer to the metal than a typical VM running OCI images.

I a Java app running on Linux bare metal?

12. zipy124 ◴[29 Aug 25 16:15 UTC] No.45066003[source]▶

>>45065503 #

I think you mis-read. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour.

13. okasaki ◴[29 Aug 25 16:20 UTC] No.45066076{5}[source]▶

>>45065742 #

In my opinion, if you're running k8s on bare metal, that's "k8s on bare metal" but still "<your app> on kubernetes", not "<your app> on bare metal".

replies(1): >>45067194 #

14. ffsm8 ◴[29 Aug 25 17:44 UTC] No.45067194{6}[source]▶

>>45066076 #

Sorry, but then your opinion is just plain wrong

Bare metal in the context of running software is a technical term with a clear meaning that hasn't become contested like "AI" or "Crypto" - and that meaning is that the software is running directly on the hardware.

As k8s isn't virtualization, processes spawned by its orchestrator are still running on bare metal. It's the whole reason why containers are more efficient compared to virtual machines

replies(3): >>45067264 #>>45067809 #>>45083341 #

15. bee_rider ◴[29 Aug 25 17:45 UTC] No.45067202{3}[source]▶

>>45065518 #

Local doesn’t need to be “on metal,” but I’m still confused as to what they are saying. Are they running some local cloud system?

16. bee_rider ◴[29 Aug 25 17:50 UTC] No.45067264{7}[source]▶

>>45067194 #

Bare metal as in, no operating system? Does Linux really get in the way of these LLM inference engines?

replies(1): >>45067336 #

17. ffsm8 ◴[29 Aug 25 17:56 UTC] No.45067336{8}[source]▶

>>45067264 #

No, as I said in my previous comment: bare metal as in not a virtual machine

https://en.m.wikipedia.org/wiki/Bare-metal_server

replies(1): >>45068930 #

18. mystifyingpoi ◴[29 Aug 25 18:37 UTC] No.45067809{7}[source]▶

>>45067194 #

I think both of you are correct.

Of course, a process running inside Kubernetes Pod, on a baremetal node will show up in `top` if I run it on the node directly. In such terms, it is running directly on hardware.

But when I deploy this Pod, I'm not interacting with the OS in any way. I'm interacting with Kubernetes apiserver, telling it what to run, not really caring about the operating system underneath. In such terms, the application is running "in k8s".

19. pessimizer ◴[29 Aug 25 20:20 UTC] No.45068930{9}[source]▶

>>45067336 #

Note that this is a term whose meaning has been expanded to refer to non-VPS servers very recently. Bare-metal has traditionally meant "without an operating system." It did not mean "a server that is an actual server," because that was the default.

It also does not always "clearly" have this new meaning. Somebody who is used to running programs directly (with no intermediate OS) on hardware might not understand what you're saying, or might ask you to clarify, and you probably shouldn't feel put upon by a totally understandable misinterpretation.

edit: Especially when you keep repeating "directly on hardware" when you mean "not on a VM." VMs also run on hardware. You're saying that you're only running on one OS instead an OS in your OS.

20. mwcz ◴[31 Aug 25 14:20 UTC] No.45083341{7}[source]▶

>>45067194 #

This discussion made me realize that I have a head canon definition of "bare metal" that applies more to the programming environment than the deployment environment. It would exclude any runtime translation to the native instruction set, such as a VM, bytecode VM, language interpreter, etc. Basically identical in meaning to "static compilation", so I'll update my brain to the conventional meaning.

↑