Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model

(play.ht)

258 points amrrs | 2 comments | 14 Oct 24 19:16 UTC | HN request time: 0.561s | source

Show context

phkahler ◴[14 Oct 24 20:10 UTC] No.41841397[source]▶

Sounds quite good, but this prompt is NOT what I'd expect an automated system to feed into it:

“I’ve successfully processed your order and I’d like to confirm your product ID. It is A as in Alpha, 1, 2, 3, B as in Bravo, 5, 6, 7, Z as in Zulu, 8, 9, 0, X as in X-ray.“

Phone numbers and others were read nicely, but apparently a string of alphanumerics for an order number aren't handled well yet.

replies(3): >>41841433 #>>41841899 #>>41842302 #

1. diggan ◴[14 Oct 24 21:38 UTC] No.41842302[source]▶

>>41841397 #

> Phone numbers and others were read nicely

The phone numbers were not naturally read at all. A human would have read a grouping of 123-456-789 like "123", "456", "789", but instead the model generated something like "123", "45", "6789". Listen to the RVSP example again and you'll know what I mean. The pacing is generally off for normal text too, but extra noticeable for the numbers.

My hunch would be that it's because of tokenization, but I wouldn't be able to say that's the issue for sure. Sounds like it though :)

replies(1): >>41851695 #

2. bryananderson ◴[15 Oct 24 18:39 UTC] No.41851695[source]▶

>>41842302 (TP) #

In this case it’s not tokenization. I wrote the text preprocessing code that deals with spacing these numbers. This is good feedback. It’s optimized for US-style 10-digit phone numbers, and it should be more flexible than that. For example, if I was reading a US phone number such as (123) 456-7890 over the phone and wanted to make sure it was heard correctly, I’d say “123”, “456”, “78”, “90”. But a 9-digit phone number should be spaced as you said.

↑