←back to thread

456 points ph4evers | 2 comments | | HN request time: 0.423s | source

I've been working on a little side project that combines Duolingo-like listening comprehension exercises with real content .

Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.

Would love your thoughts!

Show context
mcjiggerlog ◴[] No.43544059[source]
Really cool idea! I tried a few Spanish ones (I speak Spanish) and unfortunately it was marking things as incorrectly wrong on 2/5 videos I did!
replies(1): >>43544314 #
ph4evers ◴[] No.43544314[source]
That's a bit unfortunate, sorry about that!

I only checked English, French, Dutch and German and assumed that Spanish would be OK. Was this for drag & drop. And do you maybe have the video? Maybe I need to tune the quality threshold specifically for Spanish videos.

replies(1): >>43544370 #
1. mcjiggerlog ◴[] No.43544370[source]
I actually did the same video on desktop and the same answers worked fine! Screenshots of it failing in an android webview, but passing on desktop firefox: https://imgur.com/a/vALlFdH.
replies(1): >>43544497 #
2. ph4evers ◴[] No.43544497[source]
Oh wow, I think this is a cross platform bug where I dumbly assumed that strings were equal without normalizing it. I'll fix it! Thanks!