ABOUT ME

Hello! I am a PhD candidate at Rochester Institute of Technology (RIT), where I conduct research under the supervision of Dr. Christopher Kanan. My research focuses on deep learning with an emphasis on continual/ lifelong machine learning. My works have been published in NeurIPS, ICML, TMLR, CVPRW, and CoLLAs that advance the state-of-the-art in deep learning and continual learning. Broadly, my work is driven by the goal of building efficient, reliable AI systems for real-world applicationsโ€”ranging from lightweight, on-device solutions to large-scale foundation models and LLMs.

๐Ÿ”ฎ You can find my CV here.
๐Ÿ”ฎ Here is my

NEWS

Aug 2025: Our ICML paper "Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning" was accepted as an oral presentation at the ICCV 2025 Workshop on Building Foundation Models You Can Trust (T2FM)! ๐Ÿ”ฅ
Jun 2025: Our paper "A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization" has been accepted to CoLLAs 2025! ๐ŸŽ‰
Jun 2025: Our paper "Improving Multimodal Large Language Models Using Continual Learning" has been accepted to CoLLAs 2025! ๐ŸŽ‰
May 2025: Our paper "Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning" was accepted at ICML 2025! ๐Ÿ”ฅ
Apr 2025: Successfully defended my dissertation proposal and advanced to candidacy. ๐Ÿ˜Š
Sep 2024: Our paper "What Variables Affect Out-Of-Distribution Generalization in Pretrained Models?" was accepted at NeurIPS 2024! ๐Ÿ”ฅ

RESEARCH

ICML 2025: Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

Md Yousuf Harun, Jhair Gallardo, Christopher Kanan
[Oral at T2FM Workshop @ ICCV 2025]

Out-of-distribution (OOD) detection and OOD generalization are widely studied in deep learning, yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but hurts generalization, while weaker NC does the opposite. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to these objectives and propose a method to control NC across layers using entropy regularization for OOD generalization and a fixed Simplex ETF projector for OOD detection.

CoLLAs 2025: A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Md Yousuf Harun, Christopher Kanan

Continual learning systems must efficiently learn new concepts while preserving prior knowledge. However, randomly initializing classifier weights for new categories causes instability and high initial loss, requiring prolonged training. Inspired by neural collapse, we propose a data-driven weight initialization strategy using a least-square analytical solution, aligning weights with learned features. This reduces loss spikes and accelerates adaptation.

CoLLAs 2025: Improving Multimodal Large Language Models Using Continual Learning

Shikhar Srivastava, Md Yousuf Harun, Robik Shrestha, Christopher Kanan

Generative LLMs gain multimodal capabilities by integrating pre-trained vision models, but this often leads to linguistic forgetting. We analyze this issue in LLaVA MLLM through a continual learning lens, evaluating five methods to mitigate forgetting. Our approach reduces linguistic forgetting by up to 15% while preserving multimodal accuracy. We also demonstrate its robustness in sequential vision-language tasks, maintaining linguistic skills while acquiring new multimodal abilities.

NeurIPS 2024: What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Md Yousuf Harun*, Kyungbok Lee*, Jhair Gallardo, Giri Krishnan, Christopher Kanan
[* denotes equal contribution]

Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which suggests deeper DNN layers compress representations and hinder OOD performance. Contrary to earlier work, we find the tunnel effect is not universal. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.

CoLLAs 2024: GRASP: A Rehearsal Policy for Efficient Online Continual Learning

Md Yousuf Harun, Jhair Gallardo, Junyu Chen, Christopher Kanan

In this work, we propose a new sample selection or rehearsal policy called GRASP (GRAdually Select less Prototypical) for efficient continual learning (CL). GRASP is a dynamic rehearsal policy that progressively selects harder samples over time to efficiently update deep neural networks on large-scale data streams in CL settings. GRASP is the first method to outperform uniform balanced sampling in both large-scale vision and NLP datasets. GRASP has potential to supplant expensive periodic retraining and make on-device CL more efficient.

TMLR 2024: Overcoming the Stability Gap in Continual Learning

Md Yousuf Harun, Christopher Kanan

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay. To mitigate model decay, DNNs are retrained from scratch which is computationally expensive. In this work, we study how continual learning could overcome model decay and reduce computational costs. We identify the stability gap as a major obstacle in our setting. We study how to mitigate the stability gap and test a variety of hypotheses. This leads us to discover a method that vastly reduces the stability gap and greatly increases computational efficiency.

TMLR 2023: SIESTA: Efficient Online Continual Learning with Sleep

Md Yousuf Harun*, Jhair Gallardo*, Tyler L. Hayes, Ronald Kemker, Christopher Kanan
[* denotes equal contribution, also presented at the Journal Track of CoLLAs 2024]

For continual learning (CL) to make a real-world impact, CL systems need to provide computational efficiency and rival traditional offline learning systems retrained from scratch. Towards that goal, we propose a novel online CL algorithm named SIESTA. SIESTA uses a wake/sleep framework for training, which is well aligned to the needs of on-device learning. SIESTA is far more computationally efficient than existing methods, enabling CL on ImageNet-1K in under 2 hours; moreover, it achieves "zero forgetting" by matching the performance of the joint model, a milestone critical to driving adoption of CL in real-world applications.

CVPRW 2023: How Efficient Are Today's Continual Learning Algorithms?

Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

Continual learning (CL) has focused on catastrophic forgetting, but a major motivation for CL is efficiently updating deep neural networks (DNNs) with new data, rather than retraining from scratch when dataset grows over time. We study the computational efficiency of existing CL methods which reveals that many are as expensive as training offline models from scratch. This defeats the efficiency aspect of CL.

PUBLICATIONS

Peer-Reviewed Papers

Dissertation