Could anyone break down the steps further?
I mean, they could be better (to put it nicely), but there is a legitimate use-case for them and I'd love to see more work in this space.
https://machinelearning.apple.com/research/introducing-apple...
From some article I have in my draft, experimenting with open source text embeddings:
./match venture capital
purchase 0.74005488647684
sale 0.80926752301733
place 0.81188663814236
positive sentiment 0.90793311875207
negative sentiment 0.91083707598925
time 0.9108697315425
./store sillicon valley
./match venture capital
sillicon valley 0.7245139487301
purchase 0.74005488647684
sale 0.80926752301733
place 0.81188663814236
positive sentiment 0.90793311875207
negative sentiment 0.91083707598925
time 0.9108697315425
Of course you need to figure out what these black boxes understand. For example for sentiment analysis, instead of having it match against "positive" "negative" you would have the matching terms be "kawai" and "student debt". Depending how the text embedding internalized negatives and positives based on their training data.Custom silicon would solve that, but nobody wants to build custom silicon for a data format that will go out of fashion before the production run is done.
I know it's not chatGPT4 but I've tried other very small models that run on CPU only and had better results
Maybe you can you share some comparative examples?
here's the same prompt given to smollm2:135m
The quality of the second results are not fantastic. The data isn't public, and it repeats itself mentioning income a few times. I don't think I would use either of these models for accurate data but I was surprised at the truncated results from bitnet
Smollm2:360M returned better quality results, no repetition, but it did suggest things which didn't fit the brief exactly (public data given location only)
Edit:
I tried the same query on the live demo site and got much better results. Maybe something went wrong on my end?
>Marine Le Pen, a prominent figure in France, won the 2017 presidential election despite not championing neoliberalism. Several factors contributed to her success: (…)
What data did they train their model on?
By using 4 ternary weights per 8 bits, the model is not quite as space-efficient as it could be in terms of information density. (4*1.58)/8 = 0.79 vs (5*1.58)/8 = 0.988 There is currently no hardware acceleration for doing operations on 5 trits packed into 8 bits, so the weights have to be packed and unpacked in software. Packing 5 weights into 8 bits requires slower, more complex packing/unpacking algorithms.
Context: https://web.archive.org/web/20030830105202/http://www.catb.o...