SIESTA: Efficient Online Continual Learning with Sleep

Most deep neural networks are trained once and then evaluated. In contrast, continual learning mimics how humans continually learn new knowledge throughout their lifespan. Most continual learning research has focused on mitigating a phenomenon called catastrophic forgetting, in which neural networks forget past information. Despite making remarkable progress toward alleviating catastrophic forgetting, existing algorithms remain compute-intensive and ill-suited for many resource-constrained real-world applications such as edge devices, mobile phones, robots, AR/VR and virtual assistants. For continual learning to make a real-world impact, continual learning systems need to provide computational efficiency and rival traditional offline learning systems retrained from scratch when dataset grows in size.

Towards that goal, we propose a novel online continual learning algorithm named SIESTA (Sleep Integration for Episodic STreAming). SIESTA uses a wake/sleep framework for training, which is well aligned to the needs of on-device learning. The major goal of SIESTA is to advance compute efficient continual learning so that DNNs can be updated efficiently using far less time and energy. The principal innovations of SIESTA are: [1] rapid online updates using a rehearsal-free, backpropagation-free, and data-driven network update rule during its wake phase, and [2] expedited memory consolidation using a compute-restricted rehearsal policy during its sleep phase. SIESTA is far more computationally efficient than existing methods, enabling continual learning on ImageNet-1K in under 2 hours on a single GPU; moreover, in the augmentation-free setting it matches the performance of the offline learner, a milestone critical to driving adoption of continual learning in real-world applications.

SIESTA Achieves "Zero Forgetting" - A Milestone

SIESTA matches the performance of the offline batch learner (upper bound) while outperforming existing state-of-the-art methods e.g., ER, DER, and REMIND by large margins in continual learning on ImageNet-1K dataset. In the augmentation-free setting, Chochran’s Q test reveals that there is no significant difference among SIESTA’s final accuracy for the continual iid and class incremental settings compared to the offline batch learner. Therefore, SIESTA achieves “zero forgetting” by matching the performance of the offline batch learner.

In general, IID (independent and identically distributed) orderings do not cause catastrophic forgetting; and at the other extreme, an ordering sorted by category i.e., class incremental learning (CIL) causes severe catastrophic forgetting in conventional algorithms. When switching from IID to the CIL setting, existing methods e.g., ER and REMIND fail to maintain similar performance and intensify forgetting. In contrast, SIESTA maintains similar performance as offline batch learner and achieve "zero forgetting" in both settings, demonstrating its robustness to data ordering.

SIESTA is Performant on Four Benchmark Datasets

SIESTA outperforms state-of-the-art online continual leanring method, REMIND in class incremental learning on four benchmark datasets. SIESTA learns the large-scale ImageNet-1K dataset (1.2M training samples) 3.4x faster than REMIND. Moreover, SIESTA provides a 4.4x speedup compared to REMIND to learn another large-scale dataset, Places365-Standard (1.8M training samples). Since real-world data distribution is often long-tailed (LT) and imbalanced, we investigate the robustness of SIESTA and REMIND to LT data distribution using Places365-LT dataset. SIESTA outperforms REMIND on Places365-LT, demnostrating greater robustness and applicability.

As size of the dataset grows, the gap in GFLOPS (Giga floating-point operations per second) between SIESTA and existing methods grows significantly. It is evident that SIESTA becomes far more compute-efficient than others in large-scale dataset regime.

Sleep Enhances Learning

We ask the question “What is the impact of sleep on SIESTA’s ability to learn and remember?”. After examining the pre-sleep and post-sleep performance of SIESTA on ImageNet-1K, we see that the performance of SIESTA after sleep is consistently higher than before sleep for all increments. Therefore, sleep greatly benefits online continual learning in DNN.

We study the impact of sleep length by varying the number of updates (m) during each sleep period, where SIESTA slept every 100 classes. We observe that as sleep length increases, SIESTA’s performance also increases; however, as the sleep length increases, SIESTA requires more updates, so we must strike a balance between accuracy and efficiency.

We argue that an ideal continual learner should have the following characteristics:

1. It should be capable of online learning and inference in a compute and memory constrained environment.

2. It should rival (or exceed) an offline learner, regardless of the structure of the training data stream.

3. It should be significantly more computationally efficient than training from scratch.

4. It should make no additional assumptions that constrain the supervised learning task, e.g., using task labels during inference.

Our method, SIESTA, meets all these criteria and thus aligns with real-world applications.

Acknowledgements

This work was supported in part by NSF awards #1909696, #2326491, and #2125362.

News

SIESTA has been accepted at the journal track of Conference on Lifelong Learning Agents (CoLLAs), 2024 🎉

BibTeX

@article{harun2023siesta,
  title     = {{SIESTA}: Efficient Online Continual Learning with Sleep},
  author    = {Md Yousuf Harun and Jhair Gallardo and Tyler L. Hayes and Ronald Kemker and Christopher Kanan},
  journal   = {Transactions on Machine Learning Research},
  issn      = {2835-8856},
  year      = {2023},
  url       = {https://openreview.net/forum?id=MqDVlBWRRV},
  note      = {}
  }

SIESTA: Efficient Online Continual Learning with Sleep

Motivation

SIESTA Outperforms Prior Arts on ImageNet-1K

SIESTA requires 7x-60x fewer updates, 10x less memory, 2x-20x fewer parameters than other methods. SIESTA requires only 1.9 hours to learn full ImageNet-1K whereas other methods require many hours or even days on the same hardware!

How Efficient is SIESTA?

SIESTA Achieves "Zero Forgetting" - A Milestone

SIESTA is Capable of Working with Arbitrary Orderings

SIESTA is Performant on Four Benchmark Datasets

Efficiency in Large-Scale Dataset Regime

Online Updates with Offline Consolidation

How Does SIESTA Work?

Sleep Enhances Learning

Impact of Sleep Length

Criteria for An Efficient Continual Learner

Acknowledgements

News

BibTeX