←back to thread

669 points georgemandis | 1 comments | | HN request time: 0.209s | source
Show context
karpathy ◴[] No.44379755[source]
Omg long post. TLDR from an LLM for anyone interested

Speed your audio up 2–3× with ffmpeg before sending it to OpenAI’s gpt-4o-transcribe: the shorter file uses fewer input-tokens, cuts costs by roughly a third, and processes faster with little quality loss (4× is too fast). A sample yt-dlp → ffmpeg → curl script shows the workflow.

;)

replies(3): >>44379800 #>>44379806 #>>44386599 #
georgemandis ◴[] No.44379806[source]
Hahaha. Okay, okay... I will watch it now ;)

(Thanks for your good sense of humor)

replies(1): >>44379953 #
karpathy ◴[] No.44379953[source]
I like that your post deliberately gets to the point first and then (optionally) expands later, I think it's a good and generally underutilized format. I often advise people to structure their emails in the same way, e.g. first just cutting to the chase with the specific ask, then giving more context optionally below.

It's not my intention to bloat information or delivery but I also don't super know how to follow this format especially in this kind of talk. Because it's not so much about relaying specific information (like your final script here), but more as a collection of prompts back to the audience as things to think about.

My companion tweet to this video on X had a brief TLDR/Summary included where I tried, but I didn't super think it was very reflective of the talk, it was more about topics covered.

Anyway, I am overall a big fan of doing more compute at the "creation time" to compress other people's time during "consumption time" and I think it's the respectful and kind thing to do.

replies(2): >>44380436 #>>44382911 #
1. georgemandis ◴[] No.44380436[source]
I watched your talk. There are so many more interesting ideas in there that resonated with me that the summary (unsurprisingly) skipped over. I'm glad I watched it!

LLMs as the operating system, the way you interface with vibe-coding (smaller chunks) and the idea that maybe we haven't found the "GUI for AI" yet are all things I've pondered and discussed with people. You articulated them well.

I think some formats, like a talk, don't lend themselves easily to meaningful summaries. It's about giving the audience things to think about, to your point. It's the sum of storytelling that's more than the whole and why we still do it.

My post is, at the end of the day, really more about a neat trick to optimize transcriptions. This particular video might be a great example of why you may not always want to do that :)

Anyway, thanks for the time and thanks for the talk!