←back to thread

456 points ph4evers | 1 comments | | HN request time: 0.198s | source

I've been working on a little side project that combines Duolingo-like listening comprehension exercises with real content .

Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.

Would love your thoughts!

1. tom1337 ◴[] No.43544565[source]
I wonder if this could be used as something like early recaptcha. Have a machine do transcriptions and for the parts where it's not entirely sure just let users play the game and then accept what most users chose as the correct solution. Later on train your automatic transcriber on this.