←back to thread

268 points Areibman | 4 comments | | HN request time: 0s | source

Hey HN! Tokencost is a utility library for estimating LLM costs. There are hundreds of different models now, and they all have their own pricing schemes. It’s difficult to keep up with the pricing changes, and it’s even more difficult to estimate how much your prompts and completions will cost until you see the bill.

Tokencost works by counting the number of tokens in prompt and completion messages and multiplying that number by the corresponding model cost. Under the hood, it’s really just a simple cost dictionary and some utility functions for getting the prices right. It also accounts for different tokenizers and float precision errors.

Surprisingly, most model providers don't actually report how much you spend until your bills arrive. We built Tokencost internally at AgentOps to help users track agent spend, and we decided to open source it to help developers avoid nasty bills.

Show context
refulgentis[dead post] ◴[] No.40714113[source]
[flagged]
1. J_Shelby_J ◴[] No.40718153[source]
I’m not sure if the python tiktoken library has the cl200k tokenizer for gpt-4o, but I would imagine it does. So this library does support gpt-4o at least.
replies(1): >>40718266 #
2. refulgentis ◴[] No.40718266[source]
Yes it does, and no it doesn't.

It is exactly as bad of a situation as I laid out.

It is a tiktoken wrapper that only does CL100K, doesn't bother with anything beyond that, even the message frame tokens, and claims to calculate cost for 400 LLMs.

replies(1): >>40719389 #
3. J_Shelby_J ◴[] No.40719389[source]
> tiktoken.encoding_for_model(model)

Calling this where model == 'gpt-4o' will encode with CL200k no?

But yes, I do agree with you. I had time implementing non-tiktoken tokenizers for my project. I ended up manually adding tokenizer.json files into my repo.[1] The other options is downloading from HF, but the official repos where the model's tokenizer.json lives require agreeing to their terms to access. So it requires an HF key, and agreeing to the terms. So not a good experience for a consumer of the package.

> Message frame tokens?

Do you mean the chat template tokens? Oh, that's another good point. Yeah, it counts OpenAI prompt tokens, but you're right it doesn't count chat template tokens. So that's another source of inaccuracy. I solved this by implementing a Jinja templating engine to create the full prompt. [2] Granted, both llama.cpp and mistral-rs do this on the backend, so it's purely for counting tokens. I guess it would make sense to add a function to convert tokens to Dollars.

[1] https://github.com/ShelbyJenkins/llm_utils/tree/main/src/mod... [2] https://github.com/ShelbyJenkins/llm_utils/blob/main/src/pro...

replies(1): >>40723841 #
4. refulgentis ◴[] No.40723841{3}[source]
>> tiktoken.encoding_for_model(model) > Calling this where model == 'gpt-4o' will encode with CL200k no?

No, it will never use O200K, I don't know how to word where its located without sounding aggro, apologies: read below, i.e. the rest of the method.

They copied demo code for Tiktoken with an allowlist without gpt-4o in it, because the demo code is from before 4o.

The demo code has an allowlist, that does string matching, and if its not one of 5 models, none of which are gpt-4o, it says "eh, if it starts gpt-4, just use gpt-4-0613, and make a recursive call"

You can't really blame them, because all they did was copy demo code from OpenAI from before gpt-4o, but I hope you get a giggle out of the extreme clown car this situation is. It's a really bad paper-thin out-of-date tiktoken wrapper that can only do c100k and claims support for 400 LLMs.

Really bonkers.

I know you gotta read the whole method to get it, but, people really shouldn't have just been like "my word! its mean to say they don't get it!" -- it's horrible.

https://github.com/AgentOps-AI/tokencost/blob/e1d52dbaa3ada2...