This is great, thank you for sharing. I work on these APIs at OpenAI, it's a surprise to me that it still works reasonably well at 2/3x speed, but on the other hand for phone channels we get 8khz audio that is upsampled to 24khz for the model and it still works well. Note there's probably a measurable decrease in transcription accuracy that worsens as you deviate from 1x speed. Also we really need to support bigger/longer file uploads :)
replies(2):