Depends on what angle you are interested in. If you are interested in continual learning for something like mitigating model drift such that a model can stay up-to-date where the goal is attain speed ups during training see these works:
Compared to other methods for continual learning on ImageNet-1K, SIESTA requires 7x-60x less compute than other methods and achieves the same performance as a model trained in an offline/batch manner. It also works for arbitrary distributions rather than a lot of continual learning methods that only work for specific distributions (and hence don't really match any real-world use case):
https://yousuf907.github.io/siestasite/
In this one we focused on mitigating the drop in performance when a system encounters a new distribution. This resulted in a 16x speed up or so: https://yousuf907.github.io/sgmsite/
In this one, we show how the strategy for creating multi-modal LLMs like LLaVA is identical to a two-task continual learning system and we note that many LLMs once they become multi-modal forget a large amount of the capabilities of the original LLM. We demonstrate that continual learning methods can mitigate that drop in accuracy enabling the multi-modal task to be learned while not impairing uni-modal performance: https://arxiv.org/abs/2410.19925 [We have a couple approaches that are better now that will be out in the next few months]
It really depends on what you are interested in. For production AI, the real need is computational efficiency and keeping strong models up-to-date. Not many labs besides mine are focusing on that.
Currently, I'm focused on continual learning for creating systems beyond LLMs that incrementally learn meta-cognition and working on continual learning to explain memory consolidation works in mammals and why we have REM phases during sleep, but that's more of a cognitive science contribution so the constraints on the algorithms differ since the goal differs.