←back to thread

458 points ph4evers | 2 comments | | HN request time: 0.459s | source

I've been working on a little side project that combines Duolingo-like listening comprehension exercises with real content .

Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.

Would love your thoughts!

1. gwd ◴[] No.43544718[source]
One more thing, just in general: Some people are complaining that some languages work better than others. This seems to be a common issue now with the availability of AI (both voice recognition and LLMs): there's a temptation to expand into as many languages as possible, simply because you can.

My advice would be to have languages default to an "alpha" state, and only progress them to "beta" and "1.0" state when they reach certain milestones, as defined by community feedback.

replies(1): >>43545105 #
2. ph4evers ◴[] No.43545105[source]
Agreed. That's why the exercises are only there for a couple of selected languages. But even there it can be tricky. The model is less confident in Dutch than in English, so I have to experiment a bit with what is best for having a variety of content and quality.