Most active commenters

EGreg(10)
TeMPOraL(5)
(5)
logical42(5)
necovek(4)

Popular/hot comments

>>44458502 #
>>44459372 #
>>44458365 #
>>44459123 #
>>44460039 #

←back to thread

Caching is an abstraction, not an optimization

(buttondown.com)

1. ckdot2 ◴[03 Jul 25 19:03 UTC] No.44458190[source]▶

>>44432506 (OP) #

"I think now caching is probably best understood as a tool for making software simpler" - that's cute. Caching might be beneficial for many cases, but if it doesn't do one thing then this is simplifying software. There's that famous quote "There are only two hard things in Computer Science: cache invalidation and naming things.", and, sure, it's a bit ironical, but there's some truth in there.

replies(11): >>44458265 #>>44458365 #>>44458502 #>>44459091 #>>44459123 #>>44459372 #>>44459490 #>>44459654 #>>44459905 #>>44460039 #>>44460321 #

2. bell-cot ◴[03 Jul 25 19:10 UTC] No.44458265[source]▶

>>44458190 (TP) #

(You forgot off-by-1 errors.)

All software has to name things, and count. Caching (including invalidation) is best understood as a liability. If you can foist it off on your CPU and OS and DB, good for you. Programming whatever you're actually trying to get done is already hard enough.

replies(2): >>44459048 #>>44464137 #

3. whateveracct ◴[03 Jul 25 19:24 UTC] No.44458365[source]▶

>>44458190 (TP) #

caching often does simplify software though when done well

and - as the OP suggests - it works best when the cache is a well-defined abstraction with properties and rules about how it works

just because "caching" is mentioned in a meme doesn't mean it can't be true that it can simplify software

replies(7): >>44458520 #>>44458529 #>>44458670 #>>44458723 #>>44458778 #>>44458835 #>>44460773 #

4. EGreg ◴[03 Jul 25 19:41 UTC] No.44458502[source]▶

>>44458190 (TP) #

I never understood about cache invalidation or naming things

Both are not that difficult, honestly.

Aren’t there a lot harder things out there

replies(8): >>44458592 #>>44458650 #>>44458692 #>>44458868 #>>44458913 #>>44459031 #>>44459481 #>>44459828 #

5. meesles ◴[03 Jul 25 19:42 UTC] No.44458520[source]▶

>>44458365 #

> caching often does simplify software though when done well

I have to push back here, I think this is objectively untrue. By definition a system or piece of code on where you add a condition where something else happens (cache) that behaves differently than the uncached path increases complexity.

I'm not saying it's wrong to cache things or that they aren't useful, but I think they absolutely are an abstraction and an optimization at the cost of complexity. Good code bases hide complexity from the devs all the time, so it's not a question of whether you can code it away, but rather how difficult is it to troubleshoot the internals of the system.

6. PaulHoule ◴[03 Jul 25 19:43 UTC] No.44458529[source]▶

>>44458365 #

Trying some other way to explicitly manage multiple storage tiers could get pretty complicated.

7. gryfft ◴[03 Jul 25 19:57 UTC] No.44458650[source]▶

>>44458502 #

It's a little bit tongue in cheek; no one is seriously suggesting it's harder than P=NP or the problem of consciousness. But there's something a bit "death and taxes" to the inevitability that any large enough project is going to have some corner cases involving these old chestnuts.

Heck you can probably prove that any system for naming things is either inconsistent or incomplete.

replies(1): >>44459457 #

8. jameshart ◴[03 Jul 25 20:00 UTC] No.44458670[source]▶

>>44458365 #

If you hide caching away as an implementation detail behind an abstraction, it comes back and bites you as a leaky abstraction later.

Look at how CPU cache line behaviors radically change the performance of superficially similar algorithms.

Look at how query performance for a database server drops off a cliff the moment the working cache no longer fits in memory.

Hiding complexity can be a simplification, until you exceed the bounds of the simplification and the complexity you hid demands your attention anyway.

replies(1): >>44459308 #

9. Valodim ◴[03 Jul 25 20:03 UTC] No.44458692[source]▶

>>44458502 #

In my experience, the larger the software you write, the truer these become. At some point all obvious names will have collisions, and getting caching right is crucial to do but difficult to achieve because it transcends the entire stack.

You could group these two things into "getting the data model right" as the single hard thing, perhaps that rings more true to you :)

10. aswanson ◴[03 Jul 25 20:07 UTC] No.44458723[source]▶

>>44458365 #

I guess simplification needs to include "at what level" as a qualifier.

11. ckdot2 ◴[03 Jul 25 20:14 UTC] No.44458778[source]▶

>>44458365 #

That abstraction is another layer though. And additional layers are additional complexity. So, if you add another layer, the software is less simple than before. You might need to have caching in your software. I don't doubt that. But there's simply no way it makes the software more simple except if you assume some unfortunate starting point where you could get rid of any high-complex performance optimizations in your existing code by replacing them with a more simple cache solution. But then the statement should be "refactoring makes your code simpler".

replies(1): >>44459415 #

12. moritzwarhier ◴[03 Jul 25 20:20 UTC] No.44458835[source]▶

>>44458365 #

Getting cache keys or caching events wrong is easy and a nightmare.

But getting them right can easily cross the boundary of purely optimizing performance towards simplifying public API of something. I think this is true.

I'd imagine an involved example where semantics and caching really start to offer a trade-off.

Imagine that somehow querying the actual meteorological data is quite expensive, and consider this badly written pseudocode (equals sign denoting default parameters):

- measureCurrentTemparature()

- retrieveAccurateTemperatureForNanoSecond(momentInTime)

-> cached abstractions which would access cached data:

- getTempearature(moment = now(), tolerance = 1min)

- getCurrentTemperature(tolerance = MIN_TOLERANCE)

I know, reality is much more complicated, and using time (seeing it as quasi-continuous) as a caching parameter is already stretching it so far.

Just a stupid example that came to my mind.

I've bitten myself in the ass with caching rasterized reprentations of images more than once, where the input were SVG images or limited formats that convert to SVG.

13. quuxplusone ◴[03 Jul 25 20:24 UTC] No.44458868[source]▶

>>44458502 #

For "only two hard problems," read "two candidates for among the hardest problems (but we feel strongly that these are indeed good candidates)," or something along those lines, more or less.

It's also possible that these used to be the only two hard problems at the time the aphorism was first recorded, but the underlying state of the world has changed since then and the aphorism, as recorded, is no longer current.

14. IshKebab ◴[03 Jul 25 20:31 UTC] No.44458913[source]▶

>>44458502 #

Cache invalidation isn't hard in theory. It's just one of those things that is very easy to get subtly wrong and difficult to test.

Think about all those times your program isn't building and `make clean` fixes it.

replies(1): >>44459653 #

15. gpderetta ◴[03 Jul 25 20:45 UTC] No.44459031[source]▶

>>44458502 #

Namings things is of course a bit tongue in cheek. But cache invalidation is hard. For example, allegedly MESI is one of the hardest things to validate in processor design.

16. yxhuvud ◴[03 Jul 25 20:48 UTC] No.44459048[source]▶

>>44458265 #

Off by 1-errors is not part of the original quote, but is just a later addon to make it funny.

They also tend not to be very hard.

replies(2): >>44459283 #>>44459892 #

17. bloppe ◴[03 Jul 25 20:53 UTC] No.44459091[source]▶

>>44458190 (TP) #

"Two programs could have similar behaviour but structured very differently, the difference being that one utilizes caching as an abstraction and one explicitly has the concept of different tiers of storage."

The author is comparing "off-the-shelf" caching with custom caching. They're coming from the assumption that you must be caching somehow and arguing that the word "caching" should be understood to mean only particular approaches to the general idea of caching. And obviously the whole point of the general idea is to optimize things.

It's a rhetorical mess

18. heikkilevanto ◴[03 Jul 25 20:58 UTC] No.44459123[source]▶

>>44458190 (TP) #

Caching is simple, yes. The hard part is in the last word, invalidation. Even that is manageable for a single process. But as soon as you have multiple (threads / processes / nodes / data centers) updating the data, it does get quite complex, pretty fast.

Likewise, naming things is simple as long as you alone, or a in a small team. But as soon as there are multiple organizations with all their own traditions, it gets tricky. Just witness the eternal flame wars about camelCase, PascalCase, snake_case, kebab-case, and UPPER_CASE. It is almost as hopeless culture clash as Emacs vs Vi vs PowerPoint...

(I leave the off-by-one errors as an exercise for the reader)

replies(4): >>44459345 #>>44460065 #>>44461431 #>>44467003 #

19. TeMPOraL ◴[03 Jul 25 21:18 UTC] No.44459283{3}[source]▶

>>44459048 #

Except when they're part of some base assumptions in the domain or dozen of layers of abstractions below you. They are hard to prevent from happening.

20. atq2119 ◴[03 Jul 25 21:21 UTC] No.44459308{3}[source]▶

>>44458670 #

CPUs are still a great example for how caching simplifies things.

There's a long history in computer architecture of cores and accelerators that don't have a cache but instead rely on explicitly programmed local scratchpads. They are universally more difficult to program than general purpose CPUs because of that.

replies(1): >>44459559 #

21. TeMPOraL ◴[03 Jul 25 21:27 UTC] No.44459345[source]▶

>>44459123 #

I'd say this is not the "naming things" that's hard. Beyond picking a common identifier format in the team, there are at least two dimensions that are much harder:

- The language dimension - choice of words, that are good enough for the purpose, and not confusing. For example, "Manager" is as ambiguous as it gets, it can mean many thing, except we've been using it long enough that there's a more specific shape of meaning[0] for that word in code/program architecture contexts - so you still would use it instead of, say "Coordinator", which would raise all kinds of questions that "Manager" no longer does.

- The epistemological dimension - whether the word you chose correctly names the concept you meant, and whether the concept you meant is actually the right one to describe the thing you're trying to describe. Ultimately, this is the hard thing at the root of philosophy. In practice, it manifests like e.g. choice between digging into some obscure branches of mathematics to correctly name the thing "endofunctor" or something, or calling it "Square" and saying "fuck it, we'll clarify the exceptions in the comments".

[0] - I mean "more specific" in the sense it's distinct from the other meanings and somewhat narrow - but still it's fuzzy as heck and you can't describe it fully in words; it's basically tacit knowledge.

replies(1): >>44460060 #

22. Traubenfuchs ◴[03 Jul 25 21:31 UTC] No.44459372[source]▶

>>44458190 (TP) #

I never understood this meme.

We use caching a lot, anything that gets cached can only be written by one service each. The writing services emit cache invalidation messages via SNS that cache users must listen to via SQS, to clear/update their cache.

Alternatively we cache stuff with just a TTL, when immediate cache invalidation is not important.

Where‘s the struggle?

replies(8): >>44459400 #>>44459529 #>>44459632 #>>44459774 #>>44461198 #>>44463192 #>>44464161 #>>44465957 #

23. porridgeraisin ◴[03 Jul 25 21:36 UTC] No.44459400[source]▶

>>44459372 #

Here's one: everybody invalidating and refreshing their cache at the same time can cause a thundering herd problem.

24. whateveracct ◴[03 Jul 25 21:40 UTC] No.44459415{3}[source]▶

>>44458778 #

additional layers (or software in general) are not inherently additional complexity

replies(1): >>44459477 #

25. TeMPOraL ◴[03 Jul 25 21:49 UTC] No.44459457{3}[source]▶

>>44458650 #

> no one is seriously suggesting it's harder than P=NP or the problem of consciousness.

Well, I for one feel that "naming things" ultimately boils down to the latter, which may or may not be harder than the former.

26. TeMPOraL ◴[03 Jul 25 21:53 UTC] No.44459477{4}[source]▶

>>44459415 #

In some sense they are, since establishing an abstraction is strictly additive. Abstractions help manage complexity.

27. TOGoS ◴[03 Jul 25 21:53 UTC] No.44459481[source]▶

>>44458502 #

There is a secret technique, called content-addressing[1], which elegantly solves both of them at once.

A lot of people haven't caught on, and try to cache things using ambiguous names, hence the struggle to invalidate their caches when the meaning changes.

[1] This can be applied even if you don't know the content yet; you just have to unambiguously name the inputs to the function that produces it. You might not know what all the inputs are, and then you have to start adding stuff like "unknown-unknown-2025-07-03T16", but it'll still basically work.

28. ◴[03 Jul 25 21:55 UTC] No.44459490[source]▶

>>44458190 (TP) #

29. hmottestad ◴[03 Jul 25 22:01 UTC] No.44459529[source]▶

>>44459372 #

Does SQS guarantee delivery to all clients? If it does then that’s doing a lot of heavy lifting for you.

If it doesn’t guarantee delivery, then I believe you will at some point have a client that reads a cached value thinking it’s still valid because the invalidation message got lost in the network.

replies(1): >>44459789 #

30. hmottestad ◴[03 Jul 25 22:06 UTC] No.44459559{4}[source]▶

>>44459308 #

I’m sure the CPU designers would love it if they didn’t need several different layers of cache. Or no cache at all. Imagine if memory IOPS were as fast as L1 cache, no need for all that dedicated SRAM on the chip or worry about side channel attacks.

replies(1): >>44466730 #

31. williamdclt ◴[03 Jul 25 22:21 UTC] No.44459632[source]▶

>>44459372 #

You don’t support read-your-own-write and your cache data might be stale for arbitrarily long. These relaxed consistency constraints make caching a lot easier. If that’s acceptable to your use cases then you’re in a great place! If not… well, at scale you often need to find a way for it to be acceptable anyway

32. throwaway150 ◴[03 Jul 25 22:24 UTC] No.44459653{3}[source]▶

>>44458913 #

> Think about all those times your program isn't building and `make clean` fixes it.

I don't think that's a good example. I've worked with more than 20 different build tools by now, and I cannot recall a single instance where the problem actually came down to cache invalidation. Every time I dug into it, the real cause was something else: a mistake in the build script, an incorrectly declared dependency, or something similar.

So when you say "think about all those times", not a single such time comes to my mind!

replies(2): >>44460506 #>>44463400 #

33. hatthew ◴[03 Jul 25 22:25 UTC] No.44459654[source]▶

>>44458190 (TP) #

If you have a system with "slow storage", caching is a way to optimize that to "storage that is sometimes fast".

If you have a system with "slow storage" and "fast storage", caching is a way to abstract that away to just "storage".

The author is arguing that the latter is the default way we should think about the concept of caching, which is a valid opinion to have.

34. pton_xd ◴[03 Jul 25 22:49 UTC] No.44459774[source]▶

>>44459372 #

> Where‘s the struggle?

If there are no real consequences for reading stale data, and your writes are infrequent enough, then indeed you're lucky and have a relatively simple problem.

35. maccard ◴[03 Jul 25 22:54 UTC] No.44459789{3}[source]▶

>>44459529 #

Eventually. The problem is that eventually delivering that message will result in clients assuming that it will always be the same, when it’s not.

36. ninalanyon ◴[03 Jul 25 23:00 UTC] No.44459828[source]▶

>>44458502 #

Really? Have you tried building any substantial program that makes use of caching and succeeded in invalidating the cache both correctly and efficiently? It's not all about simple things like disk access, caching is also useful in software that models complex hardware where properties depend on multitudes of interconnected calculated values that are time consuming to calculate and where you cannot predict which ones the client will ask for next.

replies(2): >>44460212 #>>44461595 #

37. tombert ◴[03 Jul 25 23:15 UTC] No.44459892{3}[source]▶

>>44459048 #

They're not hard but I will say that when I was writing an app that was using both JavaScript and Julia, I kept getting off-by-one errors because Julia starts at 1 instead of 0.

Really the only time in my entire professional career that off-by-one errors have actually given me headaches.

replies(1): >>44464019 #

38. ◴[03 Jul 25 23:17 UTC] No.44459905[source]▶

>>44458190 (TP) #

39. AdieuToLogic ◴[03 Jul 25 23:43 UTC] No.44460039[source]▶

>>44458190 (TP) #

> There's that famous quote "There are only two hard things in Computer Science: cache invalidation and naming things.", and, sure, it's a bit ironical, but there's some truth in there.

The joke form of this quote goes along the lines of:

  There are only two hard things in Computer Science: cache 
  invalidation, naming things, and off-by-one errors.

:-D

replies(3): >>44461109 #>>44462642 #>>44464042 #

40. Xss3 ◴[03 Jul 25 23:47 UTC] No.44460060{3}[source]▶

>>44459345 #

I try to name things descriptively in simple terms and often end up with NamesAboutThisLong, once they get too long i know the thing is doing too much and some refactoring is needed for readability.

I also avoid letting the reader make assumptions. HasPlayerJumpedRecently is bad. What does recently mean? HasPlayerJumpedInLastTenMs is better, even if it's a bit long...Which highlights that it should probably be refactored into a more flexible value; MsSincePlayerLastJumped.

If you arent assuming a time var wth Ms is milliseconds you aren't doing games dev so that one slides with me.

replies(2): >>44460618 #>>44461814 #

41. gblargg ◴[03 Jul 25 23:48 UTC] No.44460065[source]▶

>>44459123 #

I figured the naming issue is deciding how much context. A name might begin inside an organization but need to endure a wider area. If you make all names so long and context-free that they can work in any context, they become unwieldy. Also it can be hard to realize some of the implicit context and what needs to be differentiated with the name. Where server used to suffice, now you need server-a and server-b.

42. EGreg ◴[04 Jul 25 00:23 UTC] No.44460212{3}[source]▶

>>44459828 #

Yes, I have. I've actually built a general-purpose cache system. Several, in fact:

Here's one for PHP: https://github.com/Qbix/Platform/blob/main/platform/classes/...

And here is for the Web: https://github.com/Qbix/Platform/blob/dc95bd85fa386e45546b0b...

I also discuss caching in the context of HTTP: https://community.qbix.com/t/caching-and-sessions/192

  WRITING ABOUT CACHING AND INVALIDATION

You should especially start here: https://community.qbix.com/t/qbix-websites-loading-quickly/2...

And then you can read about incremental updates: https://community.qbix.com/t/streams-plugin-messages-subscri...

replies(1): >>44460904 #

43. ◴[04 Jul 25 00:53 UTC] No.44460321[source]▶

>>44458190 (TP) #

44. degamad ◴[04 Jul 25 01:50 UTC] No.44460506{4}[source]▶

>>44459653 #

I think the idea is that if make clean allows the build to complete correctly, then the underlying issue is that the "mistake in the build script, incorrectly declared dependency, or similar" was causing the cached build results to not be invalidated when they should have.

45. dmkolobov ◴[04 Jul 25 02:20 UTC] No.44460618{4}[source]▶

>>44460060 #

Wow cool, you just summed up something I’ve found myself doing subconsciously in the past few years. Thanks!

I use to be quite fond of short identifiers, especially ones the make the signs “line up”… until I worked with code long enough that I forgot what I did and had to read it again.

46. fastball ◴[04 Jul 25 02:56 UTC] No.44460773[source]▶

>>44458365 #

Caching is a performance improvement. There is no software that requires caching, therefore it is always something being added on top of the business logic that is fundamentally required. As such, a cache is increasing complexity by nature of its existence.

The only scenario where it would simplify software is if a bunch of complex (non-cache) things are being done to improve perf, and a cache would be the simpler solution. But in that case the simplifying step is not adding a cache, it is removing complex things that aren't actually required. After that you add a cache to improve performance (which increases complexity but is worth it for this imagined use-case). But maybe you remove the complex perf shenanigans, and realize that perf is still "good enough" even without a cache, keeping your software even simpler.

47. necovek ◴[04 Jul 25 03:20 UTC] No.44460904{4}[source]▶

>>44460212 #

Your implementations are not cache systems: they are key-value store abstractions which can be used for caching. There is a simpler one in most languages (IIRC, named "hashes" in PHP).

Caching becomes hard when such a stores are used in distributed or concurrent contexts.

An example: imagine Qcache being used when fetching data from an SQL database. And data in the database changes with a direct SQL query from another process.

How will your key-value store know that the value in it is stale and needs refreshing?

replies(1): >>44460947 #

48. EGreg ◴[04 Jul 25 03:33 UTC] No.44460947{5}[source]▶

>>44460904 #

Yes, they are cache systems. And as you say, they can be used for caching.

Caching systems know when something needs updating several ways. One is pubsub: When the information is loaded from the cache, you want to set up a realtime websocket or webhook for example, to reflect any changes since the cached info was shown. If you can manage to have a server running, you can simply receive new data (deltas) and update your cached data — in that case you can even have it ready and never stale by the time it is shown.

If you can’t run a server and don’t want to reveive pushes then another approach simply involves storing the latest ordinal or state for EACH cached item locally (e-tags does this) and then before rendering (or right after you do) bulk-sending the tags and seeing what has been updated, then pulling just that. The downside is that you may have a flash of old content if you show it optimistically.

If you combine the two methods, you can easily get an up-to-date syndication of a remote mutable data store.

My whole framework is built around such abstractions, to take care of them for people.

replies(2): >>44461090 #>>44461131 #

49. necovek ◴[04 Jul 25 04:16 UTC] No.44461090{6}[source]▶

>>44460947 #

We can agree to disagree on this being a "caching system" and it "knowing when something needs updating" (from the API, I am guessing it's a "set" method).

Phrases like "simply" and "never stale" are doing a lot of heavy lifting. Yet you haven't answered a very simple question I posted above: how would Q_Cache structure handle a direct SQL write query from another process on the database it is being used as a cache for?

SQL databases don't run websockets to external servers, nor do they fire webhooks. If you are running a server which you are hitting with every update instead of doing direct SQL on the DB, there's always a small amount of time where a cache user can see stale data before this server manages to trigger the update of the cache.

replies(2): >>44461123 #>>44462335 #

50. dcminter ◴[04 Jul 25 04:23 UTC] No.44461109[source]▶

>>44460039 #

I rather like the snark of:

there's two hard problems in computer science: we only have one joke and it's not funny.

Apparently⁰ by Philip Scott Bowden¹

⁰ https://martinfowler.com/bliki/TwoHardThings.html

¹ https://x.com/pbowden/status/468855097879830528

replies(1): >>44467814 #

51. EGreg ◴[04 Jul 25 04:27 UTC] No.44461123{7}[source]▶

>>44461090 #

Simple. Use the principles I said above, for distributed and asynchronous systems.

The SQL server should have a way to do pubsub. It can then notify your PHP webhook via HTTP, or run a script on the command line, when a row changes. It should have an interface to subscribe to these changes.

If your SQL data store lacks the ability to notify others that a row changed, then there’s your main problem. Add that functionality. Make a TRIGGER in SQL for instance and use an extension.

If MySQL really lacks this ability then just put node.js middleware in front of it that will do this for you. And make sure that all mysql transactions from all clients go through that middleware. Now your combined MySQL+Node server is adequate to help clients avoid stale data.

As I already said, if you for some reason refuse to handle push updates, then you have to store the latest row state (etags) in your PHP cache (apcu). And then when you access a bunch of cached items on a request, at the end collect all their ids and etags and simply bulk query node.js for any cached items that changed. You can use bloom filters or merkle trees or prolly trees to optimize the query.

Joel Gustafson had a great article on this and now uses it in gossiplog: https://joelgustafson.com/posts/2023-05-04/merklizing-the-ke...

replies(2): >>44461603 #>>44461641 #

52. ◴[04 Jul 25 04:29 UTC] No.44461131{6}[source]▶

>>44460947 #

53. motorest ◴[04 Jul 25 04:47 UTC] No.44461198[source]▶

>>44459372 #

> I never understood this meme.

If you don't understand how and why and when eventual consistency is a problem, you will never understand why cache invalidation is hard.

By the sound of your example, you only handle scenarios where naive approaches to cache invalidation serve your needs, and you don't even have to deal with problems caused by spikes to origin servers. That's perfectly fine.

Others do. They understand the meme. You can too if you invest a fee minutes reading up on the topic.

54. yashasolutions ◴[04 Jul 25 05:36 UTC] No.44461431[source]▶

>>44459123 #

Don't bring a PowerPoint to a Vi/Emacs fight...

55. logical42 ◴[04 Jul 25 06:10 UTC] No.44461595{3}[source]▶

>>44459828 #

I agree with you, but you are omitting a key aspect of the problem which makes this hard. If you have a single server serving from a local sqlite database, caching and invalidating the cache is trivially easy.

It becomes way more difficult when you have N servers all of which could potentially serve the same data. Local caches could then, yes easily become stale, and even if you have a propagation mechanism, you couldn't guarantee against network failures or latency issues.

But suppose you have an HA redis instance that you store your cached data in. Even with a write-through cache, you basically need to implement a 2-phase commit to ensure consistency.

replies(1): >>44462375 #

56. logical42 ◴[04 Jul 25 06:12 UTC] No.44461603{8}[source]▶

>>44461123 #

Adding change data capture to a database isn't exactly trivial.

replies(1): >>44462249 #

57. tsimionescu ◴[04 Jul 25 06:19 UTC] No.44461641{8}[source]▶

>>44461123 #

You're just pushing much of the complexity to that notification layer.

What if an SQL update changes a million rows, and you had cached the sum of those million rows? Should it send a million notifications? Should it re-run the sum for you? What if it's a complex operation and it takes a while to re-compute, and another update arrives before the re-compute finishes?

And of course, you will always have some partitions, so you will occasionally need to re-query the data anyway and re-establish any kind of pubsub system. And the complexity in efficiently reconciling two views of a large amount of data is a major issue, that you are just brushing off as "you can do various things to maybe optimize the query".

replies(1): >>44462223 #

58. TeMPOraL ◴[04 Jul 25 06:56 UTC] No.44461814{4}[source]▶

>>44460060 #

That's a great example. Personally, even beyond gamedev, units are an exception for me - they need to be somewhere, whether in type or in identifier name (or ideally both), or else bad things happen.

But that's what I meant by the epistemological aspect - what is "recently"? Still, "LastTenMs" is arguably even worse - why ten? Did you really mean "since last frame"? Then say "SinceLastFrame". Did you mean "since 2x update time delta"? Then maybe make a PlayerJumpCooldown constant, but then maybe it's not about cooldowns, so... here's when I'd probably say screw this, let's just track MsSincePlayerLastJumped and add a HasPlayerJumpedRecently() -> bool as a helper, with a fat interface comment explaining what "recently" means - and go back to talk with the game designers to give a more specific name for the "recently" thing here.

Point being, this is deceptively hard - doubly so because going all perfectionist about it is also bad.

replies(1): >>44462462 #

59. EGreg ◴[04 Jul 25 07:54 UTC] No.44462223{9}[source]▶

>>44461641 #

Yes, if you really want to show something as ambitious as a sum of a million rows, then you should absolutely move to an event-based, push-based architecture. Because once you’ve calculated the sum, the least work you would need to do is to periodically look at batches of UPDATES to rows, and update your sum based on that (rather than, say, running SUM again on the whole entire table). You can either pull a digest periodically, or — you can handle incremental updates by listening for pushes.

All the difficulties you cite, including the “optimizations” I mentioned, stem from not using push, and insisting on periodically polling something.

If you step back, your whole problem is that you are using a system that only has pull and not push architecture. You’re trying to put a bandaid on that core fact. You chose a relational database system that you refuse to build extensions for (as opposed to pretty much any other system) in order to insist “nyeh nyeh you havent solved cache invalidation”.

Caching is just one part of sync. The part that stores a non-authoritative, slightly out of date copy locally. And sync, or replication, is just one part of an eventually consistent system. I mean, even MySQL supports replication, so just hook your client up to that protocol (encrypted in transit) and boom, now you can update your cache.

Here’s the thing. If you use any network protocol that is open, eg HTTP, then yea it’s solved. Because you can have webhooks and just spin up a server to handle them, and update your cache. A row is removed? Decrement your sum. A row is added? Increment it.

You are just insisting that “no, our system has to be clunky and we refuse to implement webhooks / websockets / sockets / push / mysql replication client / updates of any kind, now solve it for me to be as good as a system which has push capabilities.”

60. EGreg ◴[04 Jul 25 07:59 UTC] No.44462249{9}[source]▶

>>44461603 #

Well then maybe consider making a sql replication client that would ingest the changes (like most databases, mysql writes to a straming append-only log, before it is compacted). Just parse the log and act on them.

Not trivial, really? Here, enjoy: https://github.com/krowinski/php-mysql-replication

replies(2): >>44476165 #>>44476178 #

61. EGreg ◴[04 Jul 25 08:13 UTC] No.44462335{7}[source]▶

>>44461090 #

I will answer your question directly, after having explained the core issue.

Mysql and Postgresql does in fact have push. It is called replication. The protocol is documented. So, much like a webhook with HTTP protocol, you can ingest changes on a table with a php long-running client and then simply update your apcu cache entries if they are still being stored, and matching that row. And all your PHP processes (managed by php-fpm or FrankenPHP or whatever) will benefit from that atomic apcu update.

There is nothing that says only another mysql server can be a mysql replication client.

Boom, solved. Even in your case. Note that the SQL database did in fact expect you to open a socket, and then it writes its log to that socket. That’s all the streaming events you need.

But now you’re gonna say “but but no you cant have a long running PHP client, we’re talking about a shared hosting environment where the host disables your ability to run a long PHP service that opens a socket”. Well, look. Most of the difficultu is that you’re intentionally trying to use the most clunky architectures just to “prove” that “cache invalidation is really hard.”

Shall we continue with Naming next? :)

replies(2): >>44463708 #>>44466341 #

62. EGreg ◴[04 Jul 25 08:19 UTC] No.44462375{4}[source]▶

>>44461595 #

See the thread below. Caches only become stale if you refuse to implement any sort of push architecture. And even then, frequent batch-polling with etags will quickly resolve it. (eg right after you used N entries, poll to see which of the N have changed).

Bloom filters, Prolly trees etc can speed the batch polling up. But push architecture obviates the need for it prevents caches from being stale, it’s called eventual consistency.

(Well, of course unless there is a network partition and you can’t communicate with the machine holding the actual source of truth then yeah, your local cache will be stale. And to catch up on missed updates you will need to request all updates since a certain marker you saved. If the upstream had a ton of updates since your marker and can’t send the entire history since your marker then you might have to purge your cache for that specific table / collection, yeah.).

replies(1): >>44476174 #

63. Xss3 ◴[04 Jul 25 08:35 UTC] No.44462462{5}[source]▶

>>44461814 #

I admit my example is slightly contrived and LastTen still raises questions re frames and time handling, especially as its so small, oops! Lets pretend i said LastHundred and its using a framerate independent method.

If i were to helperise it with a bool I'd use the intent for the bool, CanPlayerJumpAgain or something (again...contrived example).

64. AndrewOMartin ◴[04 Jul 25 09:01 UTC] No.44462642[source]▶

>>44460039 #

Which leads to

> I don't see what's so hard about DNS, it's just cache invalidation and naming things.

replies(2): >>44462857 #>>44462981 #

65. manyaoman ◴[04 Jul 25 09:30 UTC] No.44462857{3}[source]▶

>>44462642 #

... and avoiding off-by-one errors.

66. dcminter ◴[04 Jul 25 09:48 UTC] No.44462981{3}[source]▶

>>44462642 #

Oh that's gooood. Got a cite or is it yours?

67. graealex ◴[04 Jul 25 10:25 UTC] No.44463192[source]▶

>>44459372 #

That's because relying on a TTL simplifies the concept of caching, and makes invalidation trivial, and also inflexible.

It's used in DNS, which already was an example here. There is no way to be sure clients see an updated value before end of TTL. As a result, you have to use very conservative TTLs. It's very inefficient.

replies(1): >>44467115 #

68. IshKebab ◴[04 Jul 25 10:59 UTC] No.44463400{4}[source]▶

>>44459653 #

> an incorrectly declared dependency

That literally IS cache invalidation. Incremental build systems are caching previous artefacts. They're supposed to invalidate the artefact if a dependency changes. But you forgot to declare the dependency... so it doesn't get invalidated when it should.

69. ◴[04 Jul 25 11:50 UTC] No.44463708{8}[source]▶

>>44462335 #

70. bobthepanda ◴[04 Jul 25 12:43 UTC] No.44464019{4}[source]▶

>>44459892 #

I think that is from a time when popular programming languages varied in their behavior and also when people were writing for loops with incrementation all the time.

A lot of languages have just settled on zero indexing, and many now have some variation of for/each or for/of that would eliminate a lot of potential ways to encounter this error.

replies(1): >>44464952 #

71. SAI_Peregrinus ◴[04 Jul 25 12:47 UTC] No.44464042[source]▶

>>44460039 #

My favorite variation only really works in text:

There are three hard problems in Computer Science:

1) Cache invalidation

2) Naming th3) Concurings

rency

4) Off-by-one errors

72. Cthulhu_ ◴[04 Jul 25 13:00 UTC] No.44464137[source]▶

>>44458265 #

If you omit the off-by-1 error from the two hard things joke, you're still off by 1, right? Kind of?

73. Cthulhu_ ◴[04 Jul 25 13:02 UTC] No.44464161[source]▶

>>44459372 #

> Where‘s the struggle?

> anything that gets cached can only be written by one service each

How do you guarantee it's only written by one service each? Sounds like locking across network boundaries, which is not easy.

> The writing services emit cache invalidation messages via SNS that cache users must listen to via SQS

SNS and SQS are both nontrivial services (at least you don't have to build / maintain them I suppose) that require training to use effectively and avoid any possible footguns

I think you're underestimating the complexity in your own solution, and you're probably lucky that some of the harder problems have already been solved for you.

74. tombert ◴[04 Jul 25 14:36 UTC] No.44464952{5}[source]▶

>>44464019 #

Yeah, I mostly will do map/reduce/filter when possible, and obviously in those cases indexes don’t matter. It could start at index 12345 for all I care with that stuff.

Occasionally, though, I need to use the same index across multiple items, there’s not a trivial means in which to zip, and at that point I have to use an old school for loop. That’s when the 1-index vs 0-index bites me.

75. tengbretson ◴[04 Jul 25 16:40 UTC] No.44465957[source]▶

>>44459372 #

I've never really understood it either. In my experience, in order for a cache to be a possible solution to a given problem at all, you must either:

1. Be content with/resilient to the possibility of stale data.

2. Gatekeep all reads and writes (for some subset of the key space) through a single thread.

That's basically it.

76. necovek ◴[04 Jul 25 17:25 UTC] No.44466341{8}[source]▶

>>44462335 #

I wasn't going to say any of that ;) Yes, Postgres does have WAL log you can subscribe to, but you are creating a replicated slave really. There are many nuances to doing that right and you are downplaying how hard that is (it's not about API, it's about ACID).

More importantly, you are pushing all the complex logic to this imaginary DB replication client: your "caching system" is still nothing more than a KV store.

replies(1): >>44467481 #

77. atq2119 ◴[04 Jul 25 18:16 UTC] No.44466730{5}[source]▶

>>44459559 #

Sure, but we were talking about the perspective of software developers. The hardware designers take on complexity so that the software developer's work can be simpler.

78. Pet_Ant ◴[04 Jul 25 18:58 UTC] No.44467003[source]▶

>>44459123 #

Even caching is not simple as the resources get consumed and you need an eviction policy. Like in a Maven cache, keep no more than one old version around of library to allow for upgrade windows.

79. ahoka ◴[04 Jul 25 19:15 UTC] No.44467115{3}[source]▶

>>44463192 #

You can’t be sure even after the TTL to be fair.

80. EGreg ◴[04 Jul 25 20:07 UTC] No.44467481{9}[source]▶

>>44466341 #

Well what are the “real” caching systems, if my system can be used for caching but isnt “real”? This feels like a no true scotsman argument.

replies(1): >>44467718 #

81. necovek ◴[04 Jul 25 20:41 UTC] No.44467718{10}[source]▶

>>44467481 #

Caching system would be what you described as a response to my example case: your key-value store combined together with the entire mechanism to watch for DB writes (eg with triggers or the replication log) and tie them back to cached values which need to have their lifetime managed.

82. aorth ◴[04 Jul 25 20:54 UTC] No.44467814{3}[source]▶

>>44461109 #

Just remembered another one: there are 10 types of people in the world: those who understand binary and those who don't. :)

83. logical42 ◴[05 Jul 25 22:41 UTC] No.44476165{10}[source]▶

>>44462249 #

That means they did implement change data capture, and only exposed it for a very specific use case.

84. logical42 ◴[05 Jul 25 22:44 UTC] No.44476174{5}[source]▶

>>44462375 #

I would love it if I could over-withdraw money from my bank account by having someone on the other side of the country take out money from my account at an ATM at the exact time as me, due to the fact that the bank's storage system was eventually consistent, but that's not the case.

85. logical42 ◴[05 Jul 25 22:44 UTC] No.44476178{10}[source]▶

>>44462249 #

You didn't implement change data capture. You're using theirs.

replies(1): >>44477231 #

86. EGreg ◴[06 Jul 25 02:11 UTC] No.44477231{11}[source]▶

>>44476178 #

I’m also not making my own programming language, I’m using PHP. And I’m using MySQL. What is your point?

Are you saying that “invalidating caches is hard… if you insist on using only systems with no way to push updates, and oh if they have a way to push updates then you’re using someone’s library to consume updates so it doesn’t count?”

↑