(github.com)

279 points matthewolfe | 1 comments | 30 Jun 25 12:33 UTC | HN request time: 0.256s | source

TokenDagger is a drop-in replacement for OpenAI’s Tiktoken (the tokenizer behind Llama 3, Mistral, GPT-3.*, etc.). It’s written in C++ 17 with thin Python bindings, keeps the exact same BPE vocab/special-token rules, and focuses on raw speed.

I’m teaching myself LLM internals by re-implementing the stack from first principles. Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.

Benchmarking code is included. Notable results show: - 4x faster code sample tokenization on a single thread. - 2-3x higher throughput when tested on a 1GB natural language text file.

Show context

npalli ◴[30 Jun 25 13:13 UTC] No.44422888[source]▶

>>44422480 (OP) #

Kudos, I think (in the short term at least) there is a large amount of perf. optimization to be found by coding parts of the whole AI/ML infrastructure in C++ like this one, not as a rewrite (god no!) but drop in and fix key bottlenecks. Anytime I see someone (seems Chinese engineers are good at this) put something out in C++, good chance some solid engineering tradeoffs have been made and dramatic improvement will be seen.

replies(4): >>44424382 #>>44424572 #>>44424990 #>>44427963 #

matthewolfe ◴[30 Jun 25 15:16 UTC] No.44424382[source]▶

>>44422888 #

Agreed. A former mentor of mine told me a nice way of viewing software development:

1. Make it work. 2. Make it fast. 3. Make it pretty.

Transformers & LLMs have been developed to a point where they work quite well. I feel as though we're at a stage where most substantial progress is being made on the performance side.

replies(3): >>44424439 #>>44425934 #>>44426074 #

diggan ◴[30 Jun 25 15:22 UTC] No.44424439[source]▶

>>44424382 #

Heh, seems people I've been learning from been biased away from beauty, as I know that as "Make It Work, Make It Right, Make It Fast".

replies(5): >>44424671 #>>44425051 #>>44425719 #>>44428459 #>>44428747 #

abybaddi009 ◴[30 Jun 25 15:44 UTC] No.44424671[source]▶

>>44424439 #

What's the difference between make it work and make it right? Aren't they the same thing?

replies(4): >>44424691 #>>44424757 #>>44424773 #>>44424931 #

1. robertfw ◴[30 Jun 25 15:53 UTC] No.44424757[source]▶

>>44424671 #

Making it work can be a hacky, tech debt laden implementation. Making it right involves refactoring/rewriting with an eye towards maintainability, testability, etc etc

↑

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken