OpenAI charges by the minute, so speed up your audio

(george.mand.is)

692 points georgemandis | 1 comments | 25 Jun 25 13:17 UTC | HN request time: 0.325s | source

Show context

w-m ◴[25 Jun 25 15:21 UTC] No.44378345[source]▶

With transcribing a talk by Andrej, you already picked the most challenging case possible, speed-wise. His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

In the idea of making more of an OpenAI minute, don't send it any silence.

E.g.

    ffmpeg -i video-audio.m4a \
      -af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
                         stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
                         apad=pad_dur=0.02" \
      -c:a aac -b:a 128k output_minpause.m4a -y

will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.

replies(12): >>44378492 #>>44378769 #>>44378939 #>>44378971 #>>44380884 #>>44380906 #>>44381352 #>>44382788 #>>44382864 #>>44384720 #>>44388923 #>>44388970 #

georgemandis ◴[25 Jun 25 15:33 UTC] No.44378492[source]▶

>>44378345 #

Oooh fun! I had a feeling there was more ffmpeg wizardry I could be leaning into here. I'll have to try this later—thanks for the idea!

replies(1): >>44378587 #

w-m ◴[25 Jun 25 15:43 UTC] No.44378587[source]▶

>>44378492 #

In the meantime I realized that the apad part is nonsensical - it pads the end of the stream, not at each silence-removed cut. I wanted to get angry at o3 for proposing this, but then I had a look at the silenceremove= documentation myself: https://ffmpeg.org/ffmpeg-filters.html#silenceremove

Good god. You couldn't make that any more convoluted and hard-to-grasp if you wanted to. You gotta love ffmpeg!

I now think this might be a good solution:

    ffmpeg -i video-audio.m4a \
           -af "silenceremove=start_periods=1:stop_periods=-1:stop_duration=0.15:stop_threshold=-40dB:detection=rms" \
           -c:a aac -b:a 128k output.m4a -y

replies(3): >>44379478 #>>44380996 #>>44388001 #

1. dylan604 ◴[26 Jun 25 14:48 UTC] No.44388001[source]▶

>>44378587 #

if you did it in 2 passes, you could find the cut points using silence detect, use a bunch of -ss/-t/-i based on those segments, apad each segment with a -filter_complex chain the ends in concating. it would be a wonderfully gnarly command for very little benefit. but it could be done

↑