←back to thread

379 points Sirupsen | 3 comments | | HN request time: 0.212s | source
1. softwaredoug ◴[] No.40920438[source]
Having worked with Simon he knows his sh*t. We talked a lot about what the ideal search stack would look when we worked together at Shopify on search (him more infra, me more ML+relevance). I discussed how I just want a thing in the cloud to provide my retrieval arms, let me express ranking in a fluent "py-data" first way, and get out of my way

My ideal is that turbopuffer ultimately is like a Polars dataframe where all my ranking is expressed in my search API. I could just lazily express some lexical or embedding similarity, boost with various attributes like, maybe by recency, popularity, etc to get a first pass (again all just with dataframe math). Then compute features for a reranking model I run on my side - dataframe math - and it "just works" - runs all this as some kind of query execution DAG - and stays out of my way.

replies(2): >>40922580 #>>40934181 #
2. bkitano19 ◴[] No.40922580[source]
+1, had the fortune to work with him at a previous startup and meetup in person. Our convo very much broadened my perspective on engineering as a career and a craft, always excited to see what he's working on. Good luck Simon!
3. snthpy ◴[] No.40934181[source]
Could you give an example of what you mean by _fluent "py-data" first way_ ?

You mean like a fluent API like `data.transform().filter()...` , that sort of thing?