I'm effectively a complete layman in this (although I do see some parallels to physical positional encoders, which is interesting) so at first read this entire thing went WAAAAY over my head. At first glance it seemed to be way overcomplicated just to encode position, so I figured I was missing something. ChatGPT was super helpful in explaining spiking neural networks to me so I just spent 20 minutes asking ChatGPT to explain this to me and I feel like I actually learned something.
Then at the end I asked ChatGPT how this all relates to how it operates and it was interesting to see things like:
>Tokens as Subword Units: I use a tokenization method called Byte Pair Encoding (BPE), which breaks text into subword units.
I don't know if it's accurate or not, but it's wild seeing it talk about how it works.
replies(2):