OpenAI charges by the minute, so speed up your audio

(george.mand.is)

Show context

w-m ◴[25 Jun 25 15:21 UTC] No.44378345[source]▶

With transcribing a talk by Andrej, you already picked the most challenging case possible, speed-wise. His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

In the idea of making more of an OpenAI minute, don't send it any silence.

E.g.

    ffmpeg -i video-audio.m4a \
      -af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
                         stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
                         apad=pad_dur=0.02" \
      -c:a aac -b:a 128k output_minpause.m4a -y

will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.

replies(12): >>44378492 #>>44378769 #>>44378939 #>>44378971 #>>44380884 #>>44380906 #>>44381352 #>>44382788 #>>44382864 #>>44384720 #>>44388923 #>>44388970 #

1. jwrallie ◴[26 Jun 25 06:30 UTC] No.44384720[source]▶

>>44378345 #

From my own experience with whisper.cpp, normalizing the audio and removing silence not only shortens the process time significantly, but also increases a lot the quality of the transcription, as silence can mean hallucinations. You can do that graphically with Audacity too, if you do not want to deal with the command line. You also do not need any special hardware to run whisper.cpp, with the small model literally any computer should be able to do it if you can wait a bit (less than the audio length).

One half interesting / half depressing observation I made is that at my workplace any meeting recording I tried to transcribe in this way had its length reduced to almost 2/3 when cutting off the silence. Makes you think about the efficiency (or lack of it) of holding long(ish) meetings.

replies(3): >>44384975 #>>44385016 #>>44388493 #

2. d1sxeyes ◴[26 Jun 25 07:12 UTC] No.44384975[source]▶

>>44384720 (TP) #

1/3 of the meeting is silence? That’s a good thing. It’s allowing people time to think over what they’re hearing, there are pauses to allow people to contribute or participate. What do you think a better percentage of silent time would be?

replies(1): >>44386518 #

3. sudhirj ◴[26 Jun 25 07:21 UTC] No.44385016[source]▶

>>44384720 (TP) #

If a human meeting had lot of silence (assuming it's between words and not before / after), I would consider it a very efficient meeting where there was just enough information exchanged with adequate absorption, processing and response time.

4. jwrallie ◴[26 Jun 25 11:54 UTC] No.44386518[source]▶

>>44384975 #

Good point, somehow if I think of a 30 minutes meeting, 10 minutes of silence sounds great, but seeing a 1 hour block disappear from a 3 hour recording makes me want to use that “free” hour to do something else.

Well, I don’t think silence is not the real problem with a 3 hour meeting!

replies(1): >>44386995 #

5. literalAardvark ◴[26 Jun 25 12:58 UTC] No.44386995{3}[source]▶

>>44386518 #

If people could speak continuously for an entire meeting then that meeting would be better off as an email. Meetings are for bouncing half formed ideas around and coagulating that into something greater.

There MUST be time to think

6. dogprez ◴[26 Jun 25 15:43 UTC] No.44388493[source]▶

>>44384720 (TP) #

Others pointed out the value of silence, but I just wanted to say it saddens me when humanity is misclassified as inefficiency. The other day Sam Altman made a jest about how much energy is wasted by people saying "thanks" to chatgpt. The corollary is how much human energy is wasted on humans saying thanks to each other. When making a judgement about inefficiency one is making a judgement on what is valuable, a very biased judgement that isn't necessarily aligned with what makes us thrive. =) (<-- a wasteful smiley)

replies(2): >>44389094 #>>44389488 #

7. kristianbrigman ◴[26 Jun 25 16:49 UTC] No.44389094[source]▶

>>44388493 #

I’ll remember that you told me thanks. Will chatgpt? (Honestly curious… it’s possible)

replies(2): >>44389146 #>>44389542 #

8. Salgat ◴[26 Jun 25 16:56 UTC] No.44389146{3}[source]▶

>>44389094 #

I say thanks for my own well-being too.

9. Philip-J-Fry ◴[26 Jun 25 17:35 UTC] No.44389488[source]▶

>>44388493 #

Well, humans saying thanks to eachother isn't wasted energy. It has a real affect on our relationships.

People say thank you to AI because they are portrayed as human-like chat bots, but in reality it has almost no effect on their effectiveness to respond to our queries.

Saying thank you to ChatGPT is no less wasteful than saying thank you to Windows for opening the calculator.

I don't think anyone is trying to draw any parallels between that inefficiency and real humans saying thank you?

10. rz2k ◴[26 Jun 25 17:40 UTC] No.44389542{3}[source]▶

>>44389094 #

I get the impression that it sets a tone that encourages creative, more open ended responses.

I think this is the reverse of confrontation with the LLM. Typically if you get a really dumb response, it is better to hang up the conversation and completely start over than it is to tell the LLM why it is wrong. Once you start arguing, they start getting stupider and respond with even faultier logic as they try to appease you.

I suppose it makes sense if the training involves alternate models of discourse resembling two educated people in a forum with shared intellectual curiosity and a common goal, or two people having a ridiculous internet argument.

↑