←back to thread

678 points georgemandis | 1 comments | | HN request time: 0.307s | source
1. b0a04gl ◴[] No.44378320[source]
it's still decoding every frame and matching phonemes either way, but speeding it up reduces how many seconds they bill you for. so you may hack their billing logic more than the model itself.

also means the longer you talk, the more you pay even if the actual info density is the same. so if your voice has longer pauses or you speak slow, you maybe subsidizing inefficiency.

makes me think maybe the next big compression is in delivery cadence. just auto-optimize voice tone and pacing before sending it to LLM. feed it synthetic fast speech with no emotion, just high density words. you lose human warmth but gain 40% cost savings