Caching is an abstraction, not an optimization

(buttondown.com)

134 points samuel246 | 3 comments | 01 Jul 25 10:42 UTC | HN request time: 0.001s | source

Show context

ckdot2 ◴[03 Jul 25 19:03 UTC] No.44458190[source]▶

"I think now caching is probably best understood as a tool for making software simpler" - that's cute. Caching might be beneficial for many cases, but if it doesn't do one thing then this is simplifying software. There's that famous quote "There are only two hard things in Computer Science: cache invalidation and naming things.", and, sure, it's a bit ironical, but there's some truth in there.

replies(11): >>44458265 #>>44458365 #>>44458502 #>>44459091 #>>44459123 #>>44459372 #>>44459490 #>>44459654 #>>44459905 #>>44460039 #>>44460321 #

EGreg ◴[03 Jul 25 19:41 UTC] No.44458502[source]▶

>>44458190 #

I never understood about cache invalidation or naming things

Both are not that difficult, honestly.

Aren’t there a lot harder things out there

replies(8): >>44458592 #>>44458650 #>>44458692 #>>44458868 #>>44458913 #>>44459031 #>>44459481 #>>44459828 #

ninalanyon ◴[03 Jul 25 23:00 UTC] No.44459828[source]▶

>>44458502 #

Really? Have you tried building any substantial program that makes use of caching and succeeded in invalidating the cache both correctly and efficiently? It's not all about simple things like disk access, caching is also useful in software that models complex hardware where properties depend on multitudes of interconnected calculated values that are time consuming to calculate and where you cannot predict which ones the client will ask for next.

replies(2): >>44460212 #>>44461595 #

1. logical42 ◴[04 Jul 25 06:10 UTC] No.44461595[source]▶

>>44459828 #

I agree with you, but you are omitting a key aspect of the problem which makes this hard. If you have a single server serving from a local sqlite database, caching and invalidating the cache is trivially easy.

It becomes way more difficult when you have N servers all of which could potentially serve the same data. Local caches could then, yes easily become stale, and even if you have a propagation mechanism, you couldn't guarantee against network failures or latency issues.

But suppose you have an HA redis instance that you store your cached data in. Even with a write-through cache, you basically need to implement a 2-phase commit to ensure consistency.

replies(1): >>44462375 #

2. EGreg ◴[04 Jul 25 08:19 UTC] No.44462375[source]▶

>>44461595 (TP) #

See the thread below. Caches only become stale if you refuse to implement any sort of push architecture. And even then, frequent batch-polling with etags will quickly resolve it. (eg right after you used N entries, poll to see which of the N have changed).

Bloom filters, Prolly trees etc can speed the batch polling up. But push architecture obviates the need for it prevents caches from being stale, it’s called eventual consistency.

(Well, of course unless there is a network partition and you can’t communicate with the machine holding the actual source of truth then yeah, your local cache will be stale. And to catch up on missed updates you will need to request all updates since a certain marker you saved. If the upstream had a ton of updates since your marker and can’t send the entire history since your marker then you might have to purge your cache for that specific table / collection, yeah.).

replies(1): >>44476174 #

3. logical42 ◴[05 Jul 25 22:44 UTC] No.44476174[source]▶

>>44462375 #

I would love it if I could over-withdraw money from my bank account by having someone on the other side of the country take out money from my account at an ATM at the exact time as me, due to the fact that the bank's storage system was eventually consistent, but that's not the case.

↑