←back to thread

425 points karimf | 1 comments | | HN request time: 0s | source
1. liqilin1567 ◴[] No.45666380[source]
Out of curiosity, would it be possible to attach pitch, emotion, tone info as text-based metadata to each word during ASR, so that the asr output retains these metadata?