Most active commenters

    ←back to thread

    899 points georgehill | 12 comments | | HN request time: 0.736s | source | bottom
    Show context
    samwillis ◴[] No.36216196[source]
    ggml and llama.cpp are such a good platform for local LLMs, having some financial backing to support development is brilliant. We should be concentrating as much as possible to do local inference (and training) based on privet data.

    I want a local ChatGPT fine tuned on my personal data running on my own device, not in the cloud. Ideally open source too, llama.cpp is looking like the best bet to achieve that!

    replies(6): >>36216377 #>>36216465 #>>36216508 #>>36217604 #>>36217847 #>>36221973 #
    1. brucethemoose2 ◴[] No.36216377[source]
    If MeZO gets implemented, we are basically there: https://github.com/princeton-nlp/MeZO
    replies(1): >>36216988 #
    2. moffkalast ◴[] No.36216988[source]
    Basically there, with what kind of VRAM and processing requirements? I doubt anyone running on a CPU can fine tune in a time frame that doesn't give them an obsolete model when they're done.
    replies(1): >>36217136 #
    3. nl ◴[] No.36217136[source]
    According to the paper it fine tunes at the speed of inference (!!)

    This would make fine tuning a qantized 13B model achievable in ~0.3 seconds per training example on a CPU.

    replies(6): >>36217261 #>>36217324 #>>36217354 #>>36217827 #>>36218026 #>>36218841 #
    4. moffkalast ◴[] No.36217261{3}[source]
    Wow if that's true then it's genuinely a complete gamechanger for LLMs as a whole. You probably mean more like 0.3s per token, not per example, but that's still more like 1 or two minutes per training case, not like a day for 4 cases like it is now.
    5. valval ◴[] No.36217324{3}[source]
    I think more importantly, what would the fine tuning routine look like? It's a non-trivial task to dump all of your personal data into any LLM architecture.
    6. f_devd ◴[] No.36217354{3}[source]
    MeZO assumes a smooth parameter space, so you probably won't be able to do it with INT4/8 quantization, probably needs fp8 or smoother.
    7. isoprophlex ◴[] No.36217827{3}[source]
    If you go through the drudgery of integrating with all the existing channels (mail, Teams, discord, slack, traditional social media, texts, ...), such rapid finetuning speeds could enable an always up to date personality construct, modeled on you.

    Which is my personal holy grail towards making myself unnecessary; it'd be amazing to be doing some light gardening while the bot handles my coworkers ;)

    replies(2): >>36217987 #>>36221420 #
    8. ◴[] No.36217987{4}[source]
    9. gliptic ◴[] No.36218026{3}[source]
    I cannot find any such numbers in the paper. What the paper says is that MeZO converges much slower than SGD, and each step needs two forward passes.

    "As a limitation, MeZO takes many steps in order to achieve strong performance."

    10. sp332 ◴[] No.36218841{3}[source]
    It's the same memory footprint as inference. It's not that fast, and the paper mentions some optimizations that could still be done.
    replies(1): >>36220688 #
    11. nl ◴[] No.36220688{4}[source]
    Yes you are right.

    I completely misread that!

    12. vgb2k18 ◴[] No.36221420{4}[source]
    > while the bot handles my coworkers

    Or it handles their bots ;)