←back to thread

685 points georgemandis | 4 comments | | HN request time: 0.428s | source
Show context
w-m ◴[] No.44378345[source]
With transcribing a talk by Andrej, you already picked the most challenging case possible, speed-wise. His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

In the idea of making more of an OpenAI minute, don't send it any silence.

E.g.

    ffmpeg -i video-audio.m4a \
      -af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
                         stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
                         apad=pad_dur=0.02" \
      -c:a aac -b:a 128k output_minpause.m4a -y
will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.
replies(12): >>44378492 #>>44378769 #>>44378939 #>>44378971 #>>44380884 #>>44380906 #>>44381352 #>>44382788 #>>44382864 #>>44384720 #>>44388923 #>>44388970 #
behnamoh ◴[] No.44378939[source]
> His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

I wonder if there's a way to automatically detect how "fast" a person talks in an audio file. I know it's subjective and different people talk at different paces in an audio, but it'd be cool to kinda know when OP's trick fails (they mention x4 ruined the output; maybe for karpathy that would happen at x2).

replies(7): >>44379087 #>>44379461 #>>44379539 #>>44380162 #>>44380831 #>>44383231 #>>44387266 #
varispeed ◴[] No.44379461[source]
It's a shame platforms don't generally support speeds greater than 2x. One of my "superpowers" or a curse is that I cannot stand normal speaking pace. When I watch lectures, I always go for maximum speed and that still is too slow for me. I wish platforms have included 4x but done properly (with minimal artefacts).
replies(10): >>44379513 #>>44379536 #>>44379612 #>>44379810 #>>44379982 #>>44380594 #>>44380830 #>>44381970 #>>44384356 #>>44387197 #
1. mrmuagi ◴[] No.44379536[source]
All audiobooks are like this for me. I tried it for lectures but if I'm taking handwritten notes, I can't keep up my writing.

I wonder if there is negative side effects of this though, do you notice when interacting with people who speak slower require a greater deal of patience?

replies(3): >>44379957 #>>44380513 #>>44383539 #
2. colechristensen ◴[] No.44379957[source]
No but a little. I struggle with people who repeat every point of what they're saying to you several times or when you say "you told me exactly this the last time we spoke" they cannot be stopped from retelling the whole thing verbatim. Usually in those situations though there's some potential cognitive issues so you can only be understanding.
3. hamburglar ◴[] No.44380513[source]
I once attended a live talk by Leslie Lamport and as he talked, I had the overwhelming feeling that something was wrong, and was thinking “did he have a stroke or something?” but then I realized I had just always watched his lectures online and had become accustomed to listening to him at 2x.
4. userbinator ◴[] No.44383539[source]
I wonder if there is negative side effects of this though, do you notice when interacting with people who speak slower require a greater deal of patience?

You are basically training your brain to work faster, and I suspect that causes some changes in the structure of your memory; if someone speaks too slowly, I'll be more likely to forget what they said earlier, compared to if they quickly gave me the entire sentence.