OpenAI charges by the minute, so speed up your audio

(george.mand.is)

692 points georgemandis | 1 comments | 25 Jun 25 13:17 UTC | HN request time: 0.38s | source

Show context

w-m ◴[25 Jun 25 15:21 UTC] No.44378345[source]▶

With transcribing a talk by Andrej, you already picked the most challenging case possible, speed-wise. His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

In the idea of making more of an OpenAI minute, don't send it any silence.

E.g.

    ffmpeg -i video-audio.m4a \
      -af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
                         stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
                         apad=pad_dur=0.02" \
      -c:a aac -b:a 128k output_minpause.m4a -y

will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.

replies(12): >>44378492 #>>44378769 #>>44378939 #>>44378971 #>>44380884 #>>44380906 #>>44381352 #>>44382788 #>>44382864 #>>44384720 #>>44388923 #>>44388970 #

jwrallie ◴[26 Jun 25 06:30 UTC] No.44384720[source]▶

>>44378345 #

From my own experience with whisper.cpp, normalizing the audio and removing silence not only shortens the process time significantly, but also increases a lot the quality of the transcription, as silence can mean hallucinations. You can do that graphically with Audacity too, if you do not want to deal with the command line. You also do not need any special hardware to run whisper.cpp, with the small model literally any computer should be able to do it if you can wait a bit (less than the audio length).

One half interesting / half depressing observation I made is that at my workplace any meeting recording I tried to transcribe in this way had its length reduced to almost 2/3 when cutting off the silence. Makes you think about the efficiency (or lack of it) of holding long(ish) meetings.

replies(3): >>44384975 #>>44385016 #>>44388493 #

dogprez ◴[26 Jun 25 15:43 UTC] No.44388493[source]▶

>>44384720 #

Others pointed out the value of silence, but I just wanted to say it saddens me when humanity is misclassified as inefficiency. The other day Sam Altman made a jest about how much energy is wasted by people saying "thanks" to chatgpt. The corollary is how much human energy is wasted on humans saying thanks to each other. When making a judgement about inefficiency one is making a judgement on what is valuable, a very biased judgement that isn't necessarily aligned with what makes us thrive. =) (<-- a wasteful smiley)

replies(3): >>44389094 #>>44389488 #>>44390288 #

1. Philip-J-Fry ◴[26 Jun 25 17:35 UTC] No.44389488[source]▶

>>44388493 #

Well, humans saying thanks to eachother isn't wasted energy. It has a real affect on our relationships.

People say thank you to AI because they are portrayed as human-like chat bots, but in reality it has almost no effect on their effectiveness to respond to our queries.

Saying thank you to ChatGPT is no less wasteful than saying thank you to Windows for opening the calculator.

I don't think anyone is trying to draw any parallels between that inefficiency and real humans saying thank you?

↑