←back to thread

176 points Bogdanp | 2 comments | | HN request time: 0.397s | source
1. internet_points ◴[] No.45653750[source]
What tools/data do you use for pos-tagging? I'm guessing it has to be fast, to run without a google data center :)
replies(1): >>45653951 #
2. marginalia_nu ◴[] No.45653951[source]
I'm using RDRPosTagger[1], though I've optimized the code a bit so that it's not just algorithmically efficient, but to use the language in a way that is fast. It isn't perfect, but it's good enough to be useful.

Language detection and sentence splitting are the other two slow bits of processing.

[1] https://github.com/datquocnguyen/RDRPOSTagger