The Theoretical Limitations of Embedding-Based Retrieval

In the theoretical section, they extrapolate assuming a polynomial from 40 to thousands of dimensions. Why do they trust a polynomial fit to extrapolate two orders of magnitude? Why do we even think it's polynomial instead of exponential in the first place? Most things like this increase exponentially with dimension.

In fact, I think we can do it in d=2k dimensions, if we're willing to have arbitrarily precise query vectors.

Embed our points as (sin(theta), cos(theta), sin(2 x theta), cos(2 x theta)..., sin(k x theta), cos(k x theta)), with theta uniformly spaced around the circle, and we should be able to select any k of them.

Using a few more dimensions we can then ease the precision requirements on the query.