←back to thread

203 points amazonhut | 1 comments | | HN request time: 0.236s | source
Show context
ndai ◴[] No.45248065[source]
I’m curious where you got your training data? I will look myself, but saw this and thought I’d ask. I have a CPU-first, no-backprop architecture that works very well on classification datasets. It can do single‑example incremental updates which might be useful for continuous learning. I made a toy demo to train on tiny.txt and it can predict next characters, but I’ve never tried to make an LLM before. I think my architecture might work well as an on-device assistant or for on-premises needs, but I want to work with it more before I embarrass myself. Any open-source LLM training datasets you would recommend?
replies(2): >>45248077 #>>45248081 #
1. kachapopopow ◴[] No.45248077[source]
huggingface has plenty of openai and antrophic user to assistant chains, beware there are dragons (hallucinations), but good enough for instruction training. I actually recommend distilling kimi k2 instead for instruction following capabilities.