BitNet b1.58 2B4T Technical Report

1. balazstorok ◴[17 Apr 25 09:35 UTC] No.43714642[source]▶

Does someone have a good understanding how 2B models can be useful in production? What tasks are you using them for? I wonder what tasks you can fine-tune them on to produce 95-99% results (if anything).

replies(7): >>43714663 #>>43714744 #>>43714864 #>>43714922 #>>43714969 #>>43715153 #>>43715192 #

2. throwaway314155 ◴[17 Apr 25 09:39 UTC] No.43714663[source]▶

>>43714642 (TP) #

Summarization on mobile/embedded might be a good usecase?

replies(1): >>43716601 #

3. Lapel2742 ◴[17 Apr 25 09:51 UTC] No.43714744[source]▶

>>43714642 (TP) #

I'm just playing / experimenting around with local LLM's. Just to see what I can do with them. One thing that comes to mind is gaming: E.g. text/dialog generation in procedural worlds / adventures.

4. logicchains ◴[17 Apr 25 10:11 UTC] No.43714864[source]▶

>>43714642 (TP) #

2B models by themselves aren't so useful, but it's very interesting as a proof of concept, because the same technique used to train a 200B model could produce one that's much more efficient (cheaper and more environmentally friendly) than existing 200B models, especially with specialised hardware support.

5. nialse ◴[17 Apr 25 10:22 UTC] No.43714922[source]▶

>>43714642 (TP) #

The use case for small models include sentiment and intent analysis, spam and abuse detection, and classifications of various sorts. Generally LLM are thought of as chat models but the output need not be a conversation per se.

replies(1): >>43715445 #

6. future10se ◴[17 Apr 25 10:31 UTC] No.43714969[source]▶

>>43714642 (TP) #

The on-device models used for Apple Intelligence (writing tools, notification and email/message summaries, etc.) are around ~3B parameters.

I mean, they could be better (to put it nicely), but there is a legitimate use-case for them and I'd love to see more work in this space.

https://machinelearning.apple.com/research/introducing-apple...

https://arxiv.org/abs/2407.21075

7. snovv_crash ◴[17 Apr 25 11:06 UTC] No.43715153[source]▶

>>43714642 (TP) #

Anything you'd normally train a smaller custom model for, but with an LLM you can use a prompt instead of training.

8. meltyness ◴[17 Apr 25 11:14 UTC] No.43715192[source]▶

>>43714642 (TP) #

I'm more interested in how users are taking 95-99% to 99.99% for generation-assisted tasks. I haven't seen a review or study of techniques, even though on the ground it's pretty trivial to think of some candidates.

replies(1): >>43716572 #

9. mhitza ◴[17 Apr 25 11:52 UTC] No.43715445[source]▶

>>43714922 #

My impression was that text embeddings are better suited for classification. Of course the big caveat is that the embeddings must have "internalized" the semantic concept you're trying to map.

From some article I have in my draft, experimenting with open source text embeddings:

    ./match venture capital
    purchase           0.74005488647684
    sale               0.80926752301733
    place              0.81188663814236
    positive sentiment 0.90793311875207
    negative sentiment 0.91083707598925
    time               0.9108697315425
 
    ./store sillicon valley
    ./match venture capital
    sillicon valley    0.7245139487301
    purchase           0.74005488647684
    sale               0.80926752301733
    place              0.81188663814236
    positive sentiment 0.90793311875207
    negative sentiment 0.91083707598925
    time               0.9108697315425

Of course you need to figure out what these black boxes understand. For example for sentiment analysis, instead of having it match against "positive" "negative" you would have the matching terms be "kawai" and "student debt". Depending how the text embedding internalized negatives and positives based on their training data.

10. oezi ◴[17 Apr 25 13:32 UTC] No.43716572[source]▶

>>43715192 #

Three strategies seem to be:

- Use LLM to evaluate result and retry if it doesn't match.

- let users trigger a retry

- let users edit

11. ◴[17 Apr 25 13:35 UTC] No.43716601[source]▶

>>43714663 #