Nvidia's Project Digits is a 'personal AI supercomputer'

1. blackoil ◴[07 Jan 25 08:44 UTC] No.42620569[source]▶

>>42619139 (OP) #

Is there any effort in local cloud computing? I can't justify $3000 for a fun device. But if all devices (6 phone, 2 iPads, a desktop and 2 laptops) in my home can leverage that for fast LLM, gaming, and photo/video editing, now it makes so much more sense.

replies(5): >>42620586 #>>42620810 #>>42620877 #>>42621768 #>>42621901 #

2. KeplerBoy ◴[07 Jan 25 08:47 UTC] No.42620586[source]▶

>>42620569 (TP) #

You can just setup your local openAI like API endpoints for LLMs. Most devices and apps won't be able to use them, because consumers don't run self-hosted apps, but for a simple chatGPT style app this is totally viable. Today.

3. papichulo2023 ◴[07 Jan 25 09:27 UTC] No.42620810[source]▶

>>42620569 (TP) #

Most tools expose openai-like apis that you can easily integrate with.

4. TiredOfLife ◴[07 Jan 25 09:40 UTC] No.42620877[source]▶

>>42620569 (TP) #

That is literally how it was announced as. AI cloud in a box. That can also be used as Linux desktop.

5. reissbaker ◴[07 Jan 25 12:40 UTC] No.42621768[source]▶

>>42620569 (TP) #

Open WebUI, SillyTavern, and other frontends can access any OpenAI-compatible server, and on Nvidia cards you have a wealth of options that will run one of those servers for you: llama.cpp (or the Ollama wrapper), of course, but also the faster vLLM and SGLang inference engines. Buy one of these, slap SGLang or vLLM on it, and point your devices at your machine's local IP address.

I'm mildly skeptical about performance here: they aren't saying what the memory bandwidth is, and that'll have a major impact on tokens-per-second. If it's anywhere close to the 4090, or even the M2 Ultra, 128GB of Nvidia is a steal at $3k. Getting that amount of VRAM on anything non-Apple used to be tens of thousands of dollars.

(They're also mentioning running the large models at Q4, which will definitely hurt the model's intelligence vs FP8 or BF16. But most people running models on Macs runs them at Q4, so I guess it's a valid comparison. You can at least run a 70B at FP8 on one of these even with fairly large context size, which I think will be the sweet spot.)

6. aa-jv ◴[07 Jan 25 13:02 UTC] No.42621901[source]▶

>>42620569 (TP) #

For companies interested in integrating machine learning into their products, both soft and hard - essentially training models specific to a particular use-case - this could be quite a useful tool to add to the kit.