"hat" gives a range of poses
The backend is written in Swift, and is hosted on a single Mac Mini. It performs nearest neighbors on the GPU over ~3M product images.
No vector DB, just pure matrix multiplications. Since we aren't just doing approximate nearest neighbors but rather sorting all results by distance, it's possible to show different "variety" levels by changing the stride over the sorted search results.
Nearest neighbors are computed in a latent vector space. The model which produces the vectors is also something I trained in pure Swift.
The underlying data is about 2TB scraped from https://www.shopltk.com/.
All the code is at https://github.com/unixpickle/LTKlassifier
"hat" gives a range of poses
I think using a better model to produce feature vectors could achieve this, or perhaps even finetuning the feature model to match human preferences.