(george.mand.is)

666 points georgemandis | 1 comments | 25 Jun 25 13:17 UTC | HN request time: 0.246s | source

Show context

mt_ ◴[25 Jun 25 23:05 UTC] No.44382623[source]▶

You can just dump the youtube link video in Google AI studio and ask it to transcribe the video with speaker labels and even ask it it to add useful visual clues, because the model is multimodal for video too.

replies(1): >>44383325 #

MaxDPS ◴[26 Jun 25 01:09 UTC] No.44383325[source]▶

>>44382623 #

Can I ask what you mean by “useful visual clues”?

replies(1): >>44384694 #

1. mt_ ◴[26 Jun 25 06:25 UTC] No.44384694[source]▶

>>44383325 #

What is the speaker showcasing in its slides, what is it's body language and so on.

↑

OpenAI charges by the minute, so speed up your audio