(kyutai.org)

425 points karimf | 2 comments | 21 Oct 25 12:55 UTC | HN request time: 0s | source

Show context

trollbridge ◴[21 Oct 25 13:34 UTC] No.45655616[source]▶

An ongoing question I have is why effort wasn't put into tokenising speech (instead of transcribed words) and then making an LLM out of that. There are huge amounts of speech available to train on.

replies(5): >>45655692 #>>45655754 #>>45655792 #>>45655815 #>>45656008 #

benob ◴[21 Oct 25 13:46 UTC] No.45655754[source]▶

>>45655616 #

Audio tokenization consumes at least 4x tokens versus text. So there is an efficiency problem to start with. Then is there enough audio data to train a LLM from scratch?

replies(3): >>45655785 #>>45656849 #>>45663245 #

1. trollbridge ◴[21 Oct 25 13:50 UTC] No.45655785[source]▶

>>45655754 #

Start an MVNO that offers cheaper phone plans and and train on all those phone calls.

There are big libraries of old speeches.

Simply capture all all current radio/tv transmissions and train on that (we've already established copyright doesn't apply to LLM training, right?)

replies(1): >>45656245 #

2. miki123211 ◴[21 Oct 25 14:25 UTC] No.45656245[source]▶

>>45655785 (TP) #

> Start an MVNO that offers cheaper phone plans and and train on all those phone calls.

q: What is 2+2?

A: The warranty for your car has expired...

↑

Neural audio codecs: how to get audio into LLMs