Caching is an abstraction, not an optimization

1. necovek ◴[03 Jul 25 19:22 UTC] No.44458350[source]▶

On top of the other things mentioned (caching always introduces complexity with lifetime tracking, and thus can't make things simple), the article's got it the wrong way around.

When code has abstract interfaces for data access, introducing caching can be simpler (but not simple) by localizing it in the abstraction implementation which has or doesn't have caching.

But it is not an abstraction (you can perfectly well do caching without any abstractions, and it's frequently done exactly that way).

replies(1): >>44458432 #

2. movpasd ◴[03 Jul 25 19:33 UTC] No.44458432[source]▶

>>44458350 (TP) #

I think you and the article are referring to abstractions over different concerns.

The concern you're talking about is about the actual access to the data. My understanding of the article is that it's about how caching algorithms can abstract the concern of minimising retrieval cost.

So in some ways you're coming at it from opposite directions. You're talking about a prior of "disk by default" and saying that a good abstraction lets you insert cache layers above that, whereas for the author the base case is "manually managing the layers of storage".

replies(2): >>44458485 #>>44458887 #

3. foldU ◴[03 Jul 25 19:40 UTC] No.44458485[source]▶

>>44458432 #

This is correct, I appreciate you for putting it so coherently :). I think I didn’t make it clear enough in the piece that I’m coming from a stance of fast access being table stakes, and the question being about how that’s accomplished.

replies(1): >>44458995 #

4. necovek ◴[03 Jul 25 20:28 UTC] No.44458887[source]▶

>>44458432 #

The language used is seriously confusing here.

Algorithms can't really abstract anything since they are, well, just algorithms (formal descriptions of how a computation should be done).

Looking at the author's examples again, I think most everybody would say that caching is used in both:

  if data_id in fast_storage:
      return fast_storage.get(data_id)
  else:
      data = slow_storage.get(data_id)
      fast_storage.set(data_id, data)
      return data

and

  # Uses fast storage or slow storage just like above, but behind the get() method.
  return storage.get(data_id)

The first one does not make an abstraction on storage, the second one does, but they are both "caching" data internally.

While there are generic implementations of caching algorithms and we can consider those abstractions, "caching" is a wider term than those implementations, and is specifically not an abstraction (the fact that there is a caching implementation that abstracts something does not make all caching an abstraction).

Edit: Let me also point out that "abstract the concern of minimising retrieval cost" is not caching — I can say that eg. a simple interface with method FastGet(id) does the former, and it needs not use any caching if the underlying structure is fast enough and eg. directly in memory.

5. necovek ◴[03 Jul 25 20:41 UTC] No.44458995{3}[source]▶

>>44458485 #

"Caching" is an idea of storing a result of an expensive computation in storage that is faster to get from than doing the original computation (in very generic computer terms, computation can be simply fetching from the network or slower local storage).

What you describe as "caching algorithms" are not really caching algorithms, but cached object lifetime management algorithms (LRU, LFU...).

"Abstraction" is a higher level, simplified view of a set of concepts, yet caching is a single concept. See eg. https://en.wikipedia.org/wiki/Abstraction_(computer_science)

It sounds like you are both trying to redefine what "caching" means (tying it to implementations of particular algorithms), but also what "abstraction" means.

We should be very deliberate with the language we use, and our main goal should be to make it simpler to understand, not harder — I believe you are doing the latter here.