I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.
I'd like to see a math/logic bench appear for tokenization schemes that captures this. BPB/perplexity is fine, but its not everything.
To the extent we've already found that to be the case, it's perhaps the weirdest part of this whole "paradigm shift."
Then a system samples from that distribution, typically with randomness, and there are some optimizations in running them that introduce randomness, but it's important to understand that the models themselves are not random.
It's best to assume that the relationship between input and output of an LLM is not deterministic, similar to something like using a Google search API.
but from time to time, doing this does require doing arithmetic correctly (to correctly add two exponents or whatever). so it would be nice to be able to trust that.
i imagine there are other uses for basic arithmetic too, QA applications over data that quotes statistics and such.
It sounds weird, but try writing your problem in LaTeX - I don’t know why, I’ve found a couple models to be incredibly capable at solving mathematical problems if you write them in LaTeX.