←back to thread

279 points matthewolfe | 1 comments | | HN request time: 0.207s | source

TokenDagger is a drop-in replacement for OpenAI’s Tiktoken (the tokenizer behind Llama 3, Mistral, GPT-3.*, etc.). It’s written in C++ 17 with thin Python bindings, keeps the exact same BPE vocab/special-token rules, and focuses on raw speed.

I’m teaching myself LLM internals by re-implementing the stack from first principles. Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.

Benchmarking code is included. Notable results show: - 4x faster code sample tokenization on a single thread. - 2-3x higher throughput when tested on a 1GB natural language text file.

Show context
npalli ◴[] No.44422888[source]
Kudos, I think (in the short term at least) there is a large amount of perf. optimization to be found by coding parts of the whole AI/ML infrastructure in C++ like this one, not as a rewrite (god no!) but drop in and fix key bottlenecks. Anytime I see someone (seems Chinese engineers are good at this) put something out in C++, good chance some solid engineering tradeoffs have been made and dramatic improvement will be seen.
replies(4): >>44424382 #>>44424572 #>>44424990 #>>44427963 #
matthewolfe ◴[] No.44424382[source]
Agreed. A former mentor of mine told me a nice way of viewing software development:

1. Make it work. 2. Make it fast. 3. Make it pretty.

Transformers & LLMs have been developed to a point where they work quite well. I feel as though we're at a stage where most substantial progress is being made on the performance side.

replies(3): >>44424439 #>>44425934 #>>44426074 #
diggan ◴[] No.44424439[source]
Heh, seems people I've been learning from been biased away from beauty, as I know that as "Make It Work, Make It Right, Make It Fast".
replies(5): >>44424671 #>>44425051 #>>44425719 #>>44428459 #>>44428747 #
abybaddi009 ◴[] No.44424671[source]
What's the difference between make it work and make it right? Aren't they the same thing?
replies(4): >>44424691 #>>44424757 #>>44424773 #>>44424931 #
stavros ◴[] No.44424691[source]
Yeah, if it's not right, it doesn't work.
replies(3): >>44424765 #>>44424906 #>>44425794 #
gabrielhidasy ◴[] No.44425794[source]
Depends on your definition of "right" and "work". It could be a big ball of mud that always returns exactly the required response (so it 'works'), but be hellish hard change and very picky about dependencies and environment (so it's not 'right').
replies(1): >>44425846 #
1. stavros ◴[] No.44425846[source]
Nope, it's right, but it's not pretty.