←back to thread

678 points georgemandis | 2 comments | | HN request time: 0.473s | source
Show context
55555 ◴[] No.44379409[source]
This seems like a good place for me to complain about the fact that the automatically generated subtitle files Youtube creates are horribly malformed. Every sentence is repeated twice. In many subtitle files, the subtitle timestamp ranges overlap one another while also repeating every sentence twice in two different ranges. It's absolutely bizarre and has been like this for years or possibly forever. Here's an example - I apologize that it's not in English. I don't know if this issue affects English. https://pastebin.com/raw/LTBps80F
replies(1): >>44383686 #
1. xenator ◴[] No.44383686[source]
Seems like Thai. Thai translation and recognition is like 10 years ago comparing to other languages I'm dealing with in my everyday life. Good news tho is the same level was for Russian years ago, and now it is near perfect.
replies(1): >>44384176 #
2. 55555 ◴[] No.44384176[source]
Well the weird thing is honestly their speech to text recognizes 97% of words correctly. The subtitle content is pretty perfect. It’s just the formatting that’s awful.