(kyutai.org)

425 points karimf | 1 comments | 21 Oct 25 12:55 UTC | HN request time: 0s | source

Show context

orena ◴[22 Oct 25 03:06 UTC] No.45664511[source]▶

>>45655161 (OP) #

How many epochs did you train with ? 100k hours is not a lot for an LLM, Feels like bitter lesson

replies(1): >>45665933 #

vvolhejn ◴[22 Oct 25 07:39 UTC] No.45665933[source]▶

>>45664511 #

I train for 1M steps (batch size 64, block size 2048), which is enough for the model to more-or-less converge.

It's also a tiny model for LLM standards, with 150M parameters. The goal wasn't really to reach state of the art but to show how the performance of a single language model architecture can be vastly different when you just change the tokenizer.

replies(1): >>45666063 #

1. singularfutur ◴[22 Oct 25 07:58 UTC] No.45666063{3}[source]▶

>>45665933 #

To get around state of the art, how many parameters would be needed with your approach?

↑

Neural audio codecs: how to get audio into LLMs