ABOUT ME

Hello! I am a 5th year PhD student in the Chester F. Carlson Center for Imaging Science at the Rochester Institute of Technology (RIT) in Rochester, NY. I currently work in the kLab under the supervision of Dr. Christopher Kanan. My current research focuses on deep learning with an emphasis on continual/ lifelong machine learning. My works have been published in NeurIPS, TMLR, CVPRW, and CoLLAs that advance the state-of-the-art in continual deep learning.

I received an MS in Electrical Engineering from University of Hawaii and a BS in Electrical Engineering from Khulna University of Engineering and Technology. During MS, I worked on deep learning applied to medical imaging. My prior works have been published in IEEE NanoMed, ASRM conferences, and Reproductive BioMedicine Journal.

🔮 You can find my CV here.

🔮 Here is my

LATEST NEWS

Sep 2024: Our paper "What Variables Affect Out-Of-Distribution Generalization in Pretrained Models?" got accepted at NeurIPS 2024! (25.8% accept rate) 🔥
Sep 2024: Our paper "Overcoming the Stability Gap in Continual Learning" got accepted at Transactions on Machine Learning Research (TMLR) 2024! 🎉
April 2024: Our paper "GRASP: A Rehearsal Policy for Efficient Online Continual Learning" got accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2024! 🎉
Nov 2023: Our paper "SIESTA: Efficient Online Continual Learning with Sleep" got accepted at Transactions on Machine Learning Research (TMLR) 2023! 🎉
Nov 2023: Won the "Best Student Abstract Award" at IEEE Western New York Image & Signal Processing Workshop 2023. 🎉
Oct 2023: Gave an invited talk on "Towards Efficient Continual Learning in Deep Neural Networks" at RIT Center for Human-aware Artificial Intelligence (CHAI) Seminar Series.
April 2023: Our paper "How Efficient Are Today's Continual Learning Algorithms?" got accepted in the CLVision Workshop at CVPR 2023! 🎉

RESEARCH

Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

Md Yousuf Harun, Jhair Gallardo, Christopher Kanan

Out-of-distribution (OOD) detection and OOD generalization are widely studied in deep learning, yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but hurts generalization, while weaker NC does the opposite. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to these objectives and propose a method to control NC across layers using entropy regularization for OOD generalization and a fixed Simplex ETF projector for OOD detection.

A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Md Yousuf Harun, Christopher Kanan

Continual learning systems must efficiently learn new concepts while preserving prior knowledge. However, randomly initializing classifier weights for new categories causes instability and high initial loss, requiring prolonged training. Inspired by neural collapse, we propose a data-driven weight initialization strategy using a least-square analytical solution, aligning weights with learned features. This reduces loss spikes and accelerates adaptation.

Improving Multimodal Large Language Models Using Continual Learning

Shikhar Srivastava, Md Yousuf Harun, Robik Shrestha, Christopher Kanan

Generative LLMs gain multimodal capabilities by integrating pre-trained vision models, but this often leads to linguistic forgetting. We analyze this issue in LLaVA MLLM through a continual learning lens, evaluating five methods to mitigate forgetting. Our approach reduces linguistic forgetting by up to 15% while preserving multimodal accuracy. We also demonstrate its robustness in sequential vision-language tasks, maintaining linguistic skills while acquiring new multimodal abilities.

NeurIPS 2024: What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Md Yousuf Harun*, Kyungbok Lee*, Jhair Gallardo, Giri Krishnan, Christopher Kanan
[* denotes equal contribution]

Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which suggests deeper DNN layers compress representations and hinder OOD performance. Contrary to earlier work, we find the tunnel effect is not universal. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.

CoLLAs 2024: GRASP: A Rehearsal Policy for Efficient Online Continual Learning

Md Yousuf Harun, Jhair Gallardo, Junyu Chen, Christopher Kanan

In this work, we propose a new sample selection or rehearsal policy called GRASP (GRAdually Select less Prototypical) for efficient continual learning (CL). GRASP is a dynamic rehearsal policy that progressively selects harder samples over time to efficiently update deep neural networks on large-scale data streams in CL settings. GRASP is the first method to outperform uniform balanced sampling in both large-scale vision and NLP datasets. GRASP has potential to supplant expensive periodic retraining and make on-device CL more efficient.

TMLR 2024: Overcoming the Stability Gap in Continual Learning

Md Yousuf Harun, Christopher Kanan

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay. To mitigate model decay, DNNs are retrained from scratch which is computationally expensive. In this work, we study how continual learning could overcome model decay and reduce computational costs. We identify the stability gap as a major obstacle in our setting. We study how to mitigate the stability gap and test a variety of hypotheses. This leads us to discover a method that vastly reduces the stability gap and greatly increases computational efficiency.

TMLR 2023: SIESTA: Efficient Online Continual Learning with Sleep

Md Yousuf Harun*, Jhair Gallardo*, Tyler L. Hayes, Ronald Kemker, Christopher Kanan
[* denotes equal contribution, also presented at the Journal Track of CoLLAs 2024]

For continual learning (CL) to make a real-world impact, CL systems need to provide computational efficiency and rival traditional offline learning systems retrained from scratch. Towards that goal, we propose a novel online CL algorithm named SIESTA. SIESTA uses a wake/sleep framework for training, which is well aligned to the needs of on-device learning. SIESTA is far more computationally efficient than existing methods, enabling CL on ImageNet-1K in under 2 hours; moreover, it achieves "zero forgetting" by matching the performance of the joint model, a milestone critical to driving adoption of CL in real-world applications.

CVPRW 2023: How Efficient Are Today's Continual Learning Algorithms?

Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

Continual learning (CL) has focused on catastrophic forgetting, but a major motivation for CL is efficiently updating deep neural networks (DNNs) with new data, rather than retraining from scratch when dataset grows over time. We study the computational efficiency of existing CL methods which reveals that many are as expensive as training offline models from scratch. This defeats the efficiency aspect of CL.

PUBLICATIONS

Pre-Prints

Peer-Reviewed Papers

Poster

Dissertation