Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken

1. npalli ◴[30 Jun 25 13:13 UTC] No.44422888[source]▶

Kudos, I think (in the short term at least) there is a large amount of perf. optimization to be found by coding parts of the whole AI/ML infrastructure in C++ like this one, not as a rewrite (god no!) but drop in and fix key bottlenecks. Anytime I see someone (seems Chinese engineers are good at this) put something out in C++, good chance some solid engineering tradeoffs have been made and dramatic improvement will be seen.

replies(4): >>44424382 #>>44424572 #>>44424990 #>>44427963 #

2. matthewolfe ◴[30 Jun 25 15:16 UTC] No.44424382[source]▶

>>44422888 (TP) #

Agreed. A former mentor of mine told me a nice way of viewing software development:

1. Make it work. 2. Make it fast. 3. Make it pretty.

Transformers & LLMs have been developed to a point where they work quite well. I feel as though we're at a stage where most substantial progress is being made on the performance side.

replies(3): >>44424439 #>>44425934 #>>44426074 #

3. diggan ◴[30 Jun 25 15:22 UTC] No.44424439[source]▶

>>44424382 #

Heh, seems people I've been learning from been biased away from beauty, as I know that as "Make It Work, Make It Right, Make It Fast".

replies(5): >>44424671 #>>44425051 #>>44425719 #>>44428459 #>>44428747 #

4. saretup ◴[30 Jun 25 15:35 UTC] No.44424572[source]▶

>>44422888 (TP) #

And while we’re at it, let’s move away from Python altogether. In the long run it doesn’t make sense just because it’s the language ML engineers are familiar with.

replies(3): >>44424608 #>>44425185 #>>44425265 #

5. tbalsam ◴[30 Jun 25 15:38 UTC] No.44424608[source]▶

>>44424572 #

No! This is not good.

Iteration speed trumps all in research, most of what Python does is launch GPU operations, if you're having slowdowns from Pythonland then you're doing something terribly wrong.

Python is an excellent (and yes, fast!) language for orchestrating and calling ML stuff. If C++ code is needed, call it as a module.

6. abybaddi009 ◴[30 Jun 25 15:44 UTC] No.44424671{3}[source]▶

>>44424439 #

What's the difference between make it work and make it right? Aren't they the same thing?

replies(4): >>44424691 #>>44424757 #>>44424773 #>>44424931 #

7. stavros ◴[30 Jun 25 15:47 UTC] No.44424691{4}[source]▶

>>44424671 #

Yeah, if it's not right, it doesn't work.

replies(3): >>44424765 #>>44424906 #>>44425794 #

8. robertfw ◴[30 Jun 25 15:53 UTC] No.44424757{4}[source]▶

>>44424671 #

Making it work can be a hacky, tech debt laden implementation. Making it right involves refactoring/rewriting with an eye towards maintainability, testability, etc etc

9. darknoon ◴[30 Jun 25 15:54 UTC] No.44424765{5}[source]▶

>>44424691 #

In ML, often it does work to a degree even if it's not 100% correct. So getting it working at all is all about hacking b/c most ideas are bad and don't work. Then you'll find wins by incrementally correcting issues with the math / data / floating point precision / etc.

10. ◴[30 Jun 25 15:55 UTC] No.44424773{4}[source]▶

>>44424671 #

11. DSingularity ◴[30 Jun 25 16:06 UTC] No.44424906{5}[source]▶

>>44424691 #

Not true. Things can work with hacks. Your standards might consider it unacceptable to have hacks. So you can have a “make it right” stage.

12. gopalv ◴[30 Jun 25 16:08 UTC] No.44424931{4}[source]▶

>>44424671 #

> make it work and make it right?

My mentor used say it is the difference between a screw and glue.

You can glue some things together and prove that it works, but eventually you learn that anytime you had to break something to fix it, you should've used a screw.

It is trade off in coupling - the glue binds tightly over the entire surface but a screw concentrates the loads, so needs maintenance to stay tight.

You only really know which is "right" it if you test it to destruction.

All of that advice is probably sounding date now, even in material science the glue might be winning (see the Tesla bumper or Lotus Elise bonding videos - every screw is extra grams).

13. ipsum2 ◴[30 Jun 25 16:15 UTC] No.44424990[source]▶

>>44422888 (TP) #

Sort of. The key bottlenecks are not in tokenization, but running the actual CUDA kernels. Python actually has very little overhead. (See VLLM, which is primarily in Python). So when people (like deepseek) 'rewrite in C++', they're usually just rewriting CUDA kernels to be more efficient.

14. kevindamm ◴[30 Jun 25 16:20 UTC] No.44425051{3}[source]▶

>>44424439 #

I've usually heard/said it as

  1. Make it
  2. Make it work
  3. Make it work better

(different circumstances have different nuances about what "better" means, it isn't always performance optimization; some do substitute "faster" for "better" here, but I think it loses generality then).

replies(1): >>44430314 #

15. bigyabai ◴[30 Jun 25 16:33 UTC] No.44425185[source]▶

>>44424572 #

It makes plenty of sense. Python handles strings well, has a great package ecosystem, and is easy to write/learn for non-programmers. It can be easily embedded into a notebook (which is huge for academics) and is technically a "write once run anywhere" platform in theory. It's great.

If you think Python is a bad language for AI integrations, try writing one in a compiled language.

replies(1): >>44429751 #

16. janalsncm ◴[30 Jun 25 16:40 UTC] No.44425265[source]▶

>>44424572 #

Most of that is already happening under the hood. A lot of performance-sensitive code is already written in C or cython. For example numpy, scikit learn, pandas. Lots of torch code is either C or CUDA.

ML researchers aren’t using python because they are dumb. They use it because what takes 8 lines in Java can be done with 2 or 3 (including import json) in python for example.

17. gabrielhidasy ◴[30 Jun 25 17:17 UTC] No.44425719{3}[source]▶

>>44424439 #

I always heard the "Make it Right" as "Make it Beautiful", where Right and Beautiful would mean "non-hacky, easily maintainable, easily extendable, well tested, and well documented"

18. gabrielhidasy ◴[30 Jun 25 17:24 UTC] No.44425794{5}[source]▶

>>44424691 #

Depends on your definition of "right" and "work". It could be a big ball of mud that always returns exactly the required response (so it 'works'), but be hellish hard change and very picky about dependencies and environment (so it's not 'right').

replies(1): >>44425846 #

19. stavros ◴[30 Jun 25 17:29 UTC] No.44425846{6}[source]▶

>>44425794 #

Nope, it's right, but it's not pretty.

20. binarymax ◴[30 Jun 25 17:38 UTC] No.44425934[source]▶

>>44424382 #

The Huggingface transformers lib is currently undergoing a refactor to get rid of cruft and make it more extensible, hopefully with some perf gains.

21. jotux ◴[30 Jun 25 17:51 UTC] No.44426074[source]▶

>>44424382 #

A similar concept dates back to 30BC: https://en.wikipedia.org/wiki/De_architectura

Firmitas, utilitas, venustas - Strong, useful, and beautiful.

22. notatallshaw ◴[30 Jun 25 21:15 UTC] No.44427963[source]▶

>>44422888 (TP) #

It looks like TikToken is written in Rust (https://github.com/openai/tiktoken/tree/main/src), are the gains here actually from porting to C++?

replies(1): >>44430216 #

23. matthewolfe ◴[30 Jun 25 22:11 UTC] No.44428459{3}[source]▶

>>44424439 #

Fair chance I'm remembering it wrong :D

24. mindcrime ◴[30 Jun 25 22:48 UTC] No.44428747{3}[source]▶

>>44424439 #

I've always heard it (and said it) as:

  1. Make it work
  2. Make it correct
  3. Make it fast

25. mdaniel ◴[01 Jul 25 01:37 UTC] No.44429751{3}[source]▶

>>44425185 #

> has a great package ecosystem

So great there are 8 of them. 800% better than all the rest!

> If you think Python is a bad language for AI integrations, try writing one in a compiled language.

I'll take this challenge, all day, every day, so long as I and the hypothetical 'move fast and break things' have equal "must run in prod" and "must be understandable by some other human" qualifiers

What type is `array`? Don't worry your pretty head about it, feed it whatever type you want and let Sentry's TypeError sort it out <https://github.com/openai/whisper/blob/v20250625/whisper/aud...> Oh, sorry, and you wanted to know what `pad_or_trim` returns? Well that's just, like, your opinion man

replies(1): >>44429894 #

26. bigyabai ◴[01 Jul 25 02:02 UTC] No.44429894{4}[source]▶

>>44429751 #

Tracks with me, I don't like using Python for real programming. Try explaining any of your "Python sucks" catechisms to a second-year statistics student though. If you'd rather teach them C++, be my guest. If you want to make them indebted to proprietary infra like Mojo or CUDA, knock yourself out.

I'm still teaching them Python.

27. fhub ◴[01 Jul 25 03:11 UTC] No.44430216[source]▶

>>44427963 #

From the post

Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.

28. acosmism ◴[01 Jul 25 03:34 UTC] No.44430314{4}[source]▶

>>44425051 #

i like this version best