Show HN: Dia, an open-weights TTS model for generating realistic dialogue

(github.com)

652 points toebee | 1 comments | 21 Apr 25 17:07 UTC | HN request time: 0.281s | source

Show context

toebee ◴[21 Apr 25 17:07 UTC] No.43754125[source]▶

Hey HN! We’re Toby and Jay, creators of Dia. Dia is 1.6B parameter open-weights model that generates dialogue directly from a transcript.

Unlike TTS models that generate each speaker turn and stitch them together, Dia generates the entire conversation in a single pass. This makes it faster, more natural, and easier to use for dialogue generation.

It also supports audio prompts — you can condition the output on a specific voice/emotion and it will continue in that style.

Demo page comparing it to ElevenLabs and Sesame-1B https://yummy-fir-7a4.notion.site/dia

We started this project after falling in love with NotebookLM’s podcast feature. But over time, the voices and content started to feel repetitive. We tried to replicate the podcast-feel with APIs but it did not sound like human conversations.

So we decided to train a model ourselves. We had no prior experience with speech models and had to learn everything from scratch — from large-scale training, to audio tokenization. It took us a bit over 3 months.

Our work is heavily inspired by SoundStorm and Parakeet. We plan to release a lightweight technical report to share what we learned and accelerate research.

We’d love to hear what you think! We are a tiny team, so open source contributions are extra-welcomed. Please feel free to check out the code, and share any thoughts or suggestions with us.

replies(11): >>43754718 #>>43754758 #>>43755567 #>>43756264 #>>43756302 #>>43757244 #>>43757317 #>>43757653 #>>43758343 #>>43758672 #>>43768981 #

dangoodmanUT ◴[21 Apr 25 23:38 UTC] No.43757653[source]▶

>>43754125 #

I know it’s taboo to ask, but I must: where’s the dataset from? Very eager to play around with audio models myself, but I find existing datasets limiting

replies(2): >>43758141 #>>43765198 #

zelphirkalt ◴[22 Apr 25 01:01 UTC] No.43758141[source]▶

>>43757653 #

Why would that be a taboo question to ask? It should be the question we always ask, when presented with a model and in some cases we should probably reject the model, based on that information.

replies(1): >>43758318 #

dangoodmanUT ◴[22 Apr 25 01:35 UTC] No.43758318[source]▶

>>43758141 #

Because generally the person asking this question is trying to cancel the model maker

replies(3): >>43758511 #>>43760208 #>>43773535 #

1. fennecfoxy ◴[23 Apr 25 15:52 UTC] No.43773535[source]▶

>>43758318 #

Well presumably since they're individuals and not a business the consequences are much less severe legally - but public opinion still won't be great, but since when was it ever, for any new thing?

If I cut up a song or TV show & put it on Youtube (and screech about fair use/parody law) then that's fine, but people will balk at something like this.

AI is here, people.

↑