Yeah, we found even the delay of non-local LLMs to be prohibitive. We started using claude for "smartest" recs and profile generation from preferences and it was so slow, on the order of a minute or so for a first visit and still 20-30s on repeat visits after storing a "profile" (essentially your notion of memoized heuristics) in local storage to come back to.
We ended up finding that a middle ground between that and ~simonw's no-AI but fast, was to use flash for fast semantic understanding of preferences and recs, but degraded quality compared with a friontier model.
> And forward-thinking sites might try to make that process easier, with special APIs/docs/recipe-interchanges for all users' agents to share their progress on popular needs.
HN is that! our exploration was made 1000% easier because they have an API which is "good enough" for most information.