Would you mind sharing how you trained the model to produce the vectors? Are you using a vision transformer under the hood with contrastive training against price, product category, etc.?
EDIT: I see that the training script is included in the repo and you are using a CNN. Inspiring work!