GRASP: A Rehearsal Policy for Efficient Online Continual Learning

1Rochester Institute of Technology, 2University of Rochester

[* Corresponding author]

Motivation

Continual learning (CL) in deep neural networks (DNNs) involves incrementally accumulating knowledge in a DNN from a growing data stream. A major challenge in CL is that non-stationary data streams cause catastrophic forgetting of previously learned abilities. A popular solution is rehearsal: storing past observations in a buffer and then sampling the buffer to update the DNN. Uniform sampling in a class-balanced manner is highly effective, and better sample selection policies have been elusive. Here, we propose a new sample selection or rehearsal policy called GRASP (GRAdually Select less Prototypical) that selects the most prototypical (easy) samples first and then gradually selects less prototypical (harder) samples. GRASP has little additional compute or memory overhead compared to uniform selection, enabling it to scale to large datasets. Compared to 17 other rehearsal policies, GRASP achieves higher accuracy in CL experiments on ImageNet. Compared to uniform balanced sampling, GRASP achieves the same performance with 40% fewer updates. We also show that GRASP is effective for CL on five text classification datasets. GRASP has potential to supplant expensive periodic retraining and make on-device CL more efficient.

What Is A Rehearsal Policy?

Interpolation end reference image.

A rehearsal policy governs construction of mini-batches for updating a deep neural network (DNN) during rehearsal. Goal: Efficiently train a DNN with fewer network updates to reach maximum performance.

Compute Comparison

GRASP achieves the best accuracy of the uniform balanced policy while requiring 40% fewer gradient descent updates for class incremental learning with SIESTA on ImageNet-1K

Interpolation end reference image.

Time Comparison

GRASP matches the best accuracy of the uniform balanced policy while requiring 36% less training time for class incremental learning with SIESTA on ImageNet-1K

Interpolation end reference image.

How Efficient Is GRASP?

Interpolation end reference image.

Current rehearsal policies are computationally expensive and challenging to scale, whereas GRASP offers significantly greater efficiency and scalability. Unlike GRASP, existing SOTA methods require significantly longer training time than uniform.

GRASP Outperforms Existing SOTA

Interpolation end reference image.

GRASP outperforms 17 other rehearsal policies in class incremental learning on ImageNet-300 using SIESTA with latent rehearsal. Here, we illustrate the qualitative comparison between GRASP and other high-performing policies.

Class Incremental Learning

GRASP outperforms uniform balanced rehearsal in class incremental learning on ImageNet-1K using SIESTA for both veridical and latent rehearsal settings.

Interpolation end reference image.

Continual IID Learning

GRASP outperforms uniform balanced in continual IID (independent and identically distributed) experiments on ImageNet-1K using SIESTA and latent rehearsal.

Interpolation end reference image.

Storage Constraints Analysis

GRASP outperforms other rehearsal policies under varied storage constraints in class incremental learning on ImageNet-300 using SIESTA and latent rehearsal.

Interpolation end reference image.

Compute Constraints Analysis

GRASP surpasses compared methods under varied compute constraints in class incremental learning on ImageNet-300 using SIESTA and latent rehearsal.

Interpolation end reference image.

GRASP Works Well With Various CL Methods

GRASP outperforms uniform balanced rehearsal policy when integrated with various rehearsal-based continual learning methods.

Interpolation end reference image.

GRASP Shows Efficacy For Various Settings

GRASP surpasses uniform balanced policy for both latent and veridical rehearsal with or without buffer constraints when integrated with SIESTA algorithm.

Interpolation end reference image.

How Does GRASP Work?

Interpolation end reference image.

GRASP is based on the hypothesis that choosing only easy or hard samples are both suboptimal and that the DNN would benefit from a curriculum that combines both. GRASP first selects the most prototypical (easy) samples from the rehearsal buffer and then gradually selects harder samples, where easy samples are closest to the class mean and hard samples are farthest. We illustrate how GRASP works compared to uniform random policy. Class mean is denoted by ⭐. Selected samples are indicated by 🔴.

Why Is GRASP More Effective?

Interpolation end reference image.

While learning new classes, the representations of old classes abruptly change and drift over time. This abrupt change in old representations causes catastrophic forgetting of old knowledge and is difficult to correct without longer training. As shown in Figure, existing methods exhibit higher representation drift. These methods mostly prioritize difficult samples, for instance, MIR and ASER select samples with maximum interference. Consequently, the old representations are excessively perturbed especially in the early stage of rehearsal indicated by the sharp rise in MSE (mean squared error). On the contrary, GRASP reduces representation drift by learning from subsets of increasing difficulty levels. We measure the representation drift by computing the MSE between the penultimate embedding vectors across consecutive training iterations.

Acknowledgements

This work was supported in part by NSF awards #1909696, #2326491, and #2125362.

News

GRASP has been accepted at the Conference on Lifelong Learning Agents (CoLLAs), 2024 🎉

BibTeX

@article{harun2023grasp,
  title     = {GRASP: A Rehearsal Policy for Efficient Online Continual Learning},
  author    = {Harun, Md Yousuf and Gallardo, Jhair and Chen, Junyu and Kanan, Christopher},
  journal   = {arXiv preprint arXiv:2308.13646},
  year      = {2023}
  }