Most active commenters

    ←back to thread

    Devstral

    (mistral.ai)
    701 points mfiguiere | 12 comments | | HN request time: 1.217s | source | bottom
    1. christophilus ◴[] No.44058247[source]
    What hardware are y'all using when you run these things locally? I was thinking of pre ordering the Framework desktop[0] for this purpose, but I wouldn't mind having a decent laptop that could run it (ideally Linux).

    [0] https://frame.work/desktop

    replies(4): >>44058269 #>>44058281 #>>44058363 #>>44058499 #
    2. snitty ◴[] No.44058269[source]
    I think your options are generally:

    0) A desktop PC with one or more graphics cards, or 1) A Mac with Apple Silicon

    3. tripplyons ◴[] No.44058281[source]
    All Hands AI has instructions for running Devstral locally on a MacBook using LMStudio: https://docs.all-hands.dev/modules/usage/llms/local-llms#ser...

    The same page also gives instructions for running the model through VLLM on a GPU, but it doesn't seem like it supports quantization, so it may require multiple GPUs since the instructions say "with at least 2 GPUs".

    4. klooney ◴[] No.44058363[source]
    AMD is going to be off the beaten path, you're likely to have more success/less boring plumbing trouble with nVidia.
    replies(1): >>44058385 #
    5. lolinder ◴[] No.44058385[source]
    Does Nvidia have integrated memory options that allow you to get up to 64GB+ of VRAM without stringing together a bunch of 4090s?

    For local LLMs Apple Silicon has really shown the value of shared memory, even if that comes at the cost of raw GPU power. Even if it's half the speed of an array of GPUs, being able to load the mid-sized models at all is a huge plus.

    replies(2): >>44058797 #>>44058947 #
    6. zackify ◴[] No.44058499[source]
    M4 max 128gb ram.

    LM studio MLX with full 128k context.

    It works well but has a long 1 minute initial prompt processing time.

    I wouldn’t buy a laptop for this, I would wait for the new AMD 32gb gpu coming out.

    If you want a laptop I even consider my m4 max too slow to use more than just here or there.

    It melts if you run this and battery goes down asap. Have to use it docked for full speed really

    replies(3): >>44058814 #>>44062894 #>>44112621 #
    7. kookamamie ◴[] No.44058797{3}[source]
    Not quite, but I do have an Ada 6000, which has 48GB.
    8. pram ◴[] No.44058814[source]
    Yep I have an M4 Max Studio with 128GB of RAM, even the Q8 GGUF fits in memory with 131k context. Memory pressure at 45% lol
    9. karolist ◴[] No.44058947{3}[source]
    RTX Pro 6000 Blackwell has 96GB VRAM.
    replies(1): >>44061498 #
    10. lolinder ◴[] No.44061498{4}[source]
    It also costs 4x the entire Framework Desktop for just the card. If you're doing something professional that's probably worth it, but it's not a clear winner in the enthusiast space.
    11. discordance ◴[] No.44062894[source]
    How many tokens per second are you both getting?
    12. bicepjai ◴[] No.44112621[source]
    Do you also have tokens per second metric ?