I am currently exploring methods to detect and categorize undocumented special tokens (or: "delimiters") used by LLMs.
As part of this effort, I have developed a preliminary prototype (named deLLMiter) that I am refining.
This method is founded upon the hypothesis that there exists a correlation between first-order expressions and more complex, hierarchical forms of expression (https://glthr.com/llm-delimiters-and-higher-order-expression...).
With this project, my goal is to improve the security of LLM models.