Md Yousuf Harun

About Me

Md Yousuf Harun
Hello! I am a PhD candidate at Rochester Institute of Technology (RIT), where I conduct research under the supervision of Dr. Christopher Kanan. My doctoral research focuses on continual/ lifelong machine learning of large-scale vision, language, and multimodal models. My research directly addresses a fundamental challenge in AI: how to create systems that remain adaptable, scalable, and trustworthy in dynamic environments.
My works have been published in top-tier AI venues including NeurIPS, ICML, CVPR-W, TMLR, and CoLLAs that advance the state-of-the-art in deep learning and continual learning. I am passionate about developing efficient, reliable, and generalizable AI systems for real-world applications—ranging from lightweight, on-device solutions to large-scale foundation models including LLMs and VLMs.
I am actively seeking AI/ML Research Scientist or Engineer full-time job opportunities. If you are hiring or aware of relevant openings, I would love to connect—please reach out via email.

News

Mar 2026: New! Our paper "Unlocking ImageNet's Multi-Object Nature: Automated Large-Scale Multilabel Annotation" was accepted to CVPR 2026 Findings!
Aug 2025: Our ICML paper "Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning" was selected for an oral presentation at the ICCV 2025 Workshop on Building Foundation Models You Can Trust (T2FM).
Jun 2025: Our paper "A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization" was accepted to CoLLAs 2025!
Jun 2025: Our paper "Improving Multimodal Large Language Models Using Continual Learning" was accepted to CoLLAs 2025!
May 2025: Our paper "Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning" was accepted at ICML 2025.
Apr 2025: Successfully defended my dissertation proposal and advanced to candidacy.
Sep 2024: Our paper "What Variables Affect Out-of-Distribution Generalization in Pretrained Models?" was accepted at NeurIPS 2024.
Sep 2024: Our paper "Overcoming the Stability Gap in Continual Learning" was accepted to TMLR.
Apr 2024: Our paper "GRASP: A Rehearsal Policy for Efficient Online Continual Learning" was accepted at CoLLAs 2024.
Nov 2023: Our paper "SIESTA: Efficient Online Continual Learning with Sleep" was accepted to TMLR.
Nov 2023: Won the Best Student Abstract Award at IEEE Western New York Image & Signal Processing Workshop (WNYISPW).
Apr 2023: Our paper "How Efficient Are Today's Continual Learning Algorithms?" was accepted at the CLVision Workshop @ CVPR 2023.
May 2021: Joined kLab at RIT.
Aug 2020: Started the Ph.D. program in Imaging Science at RIT.
May 2020: Completed my M.S. in Electrical Engineering at the University of Hawaiʻi at Mānoa.
May 2016: Completed my B.S. in Electrical & Electronic Engineering at KUET, Bangladesh.

Research

CVPR 2026 Findings: Unlocking ImageNet's Multi-Object Nature: Automated Large-Scale Multilabel Annotation

Junyu Chen, Md Yousuf Harun, Christopher Kanan

CoLLAs 2025

The original ImageNet benchmark enforces a single-label assumption, despite many images depicting multiple objects. This leads to label noise and limits the richness of the learning signal. Multi-label annotations more accurately reflect real-world visual scenes, where multiple objects co-occur and contribute to semantic understanding, enabling models to learn richer and more robust representations. While prior efforts (e.g., ReaL, ImageNetv2) have improved the validation set, there has not yet been a scalable, high-quality multi-label annotation for the training set. To this end, we present an automated pipeline to convert the ImageNet training set into a multi-label dataset, without human annotations.

ICML 2025: Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

Oral at ICCV 2025 T2FM Workshop

Md Yousuf Harun, Jhair Gallardo, Christopher Kanan

ICML 2025

Out-of-distribution (OOD) detection and OOD generalization are widely studied in deep learning, yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but hurts generalization, while weaker NC does the opposite. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to these objectives and propose a method to control NC across layers using entropy regularization for OOD generalization and a fixed Simplex ETF projector for OOD detection.

CoLLAs 2025: A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Md Yousuf Harun, Christopher Kanan

CoLLAs 2025

Continual learning systems must efficiently learn new concepts while preserving prior knowledge. However, randomly initializing classifier weights for new categories causes instability and high initial loss, requiring prolonged training. Inspired by neural collapse, we propose a data-driven weight initialization strategy using a least-square analytical solution, aligning weights with learned features. This reduces loss spikes and accelerates adaptation.

CoLLAs 2025: Improving Multimodal Large Language Models Using Continual Learning

Shikhar Srivastava, Md Yousuf Harun, Robik Shrestha, Christopher Kanan

CoLLAs 2025 MLLM

Generative LLMs gain multimodal capabilities by integrating pre-trained vision models, but this often leads to linguistic forgetting. We analyze this issue in LLaVA MLLM through a continual learning lens, evaluating five methods to mitigate forgetting. Our approach reduces linguistic forgetting by up to 15% while preserving multimodal accuracy. We also demonstrate its robustness in sequential vision-language tasks, maintaining linguistic skills while acquiring new multimodal abilities.

NeurIPS 2024: What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Md Yousuf Harun*, Kyungbok Lee*, Jhair Gallardo, Giri Krishnan, Christopher Kanan

* denotes equal contribution.

NeurIPS 2024

Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which suggests deeper DNN layers compress representations and hinder OOD performance. Contrary to earlier work, we find the tunnel effect is not universal. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.

CoLLAs 2024: GRASP: A Rehearsal Policy for Efficient Online Continual Learning

Md Yousuf Harun, Jhair Gallardo, Junyu Chen, Christopher Kanan

CoLLAs 2024

In this work, we propose a new sample selection or rehearsal policy called GRASP (GRAdually Select less Prototypical) for efficient continual learning (CL). GRASP is a dynamic rehearsal policy that progressively selects harder samples over time to efficiently update deep neural networks on large-scale data streams in CL settings. GRASP is the first method to outperform uniform balanced sampling in both large-scale vision and NLP datasets. GRASP has potential to supplant expensive periodic retraining and make on-device CL more efficient.

TMLR 2024: Overcoming the Stability Gap in Continual Learning

Also presented at the Journal Track of CoLLAs 2025

Md Yousuf Harun, Christopher Kanan

TMLR 2024

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay. To mitigate model decay, DNNs are retrained from scratch which is computationally expensive. In this work, we study how continual learning could overcome model decay and reduce computational costs. We identify the stability gap as a major obstacle in our setting. We study how to mitigate the stability gap and test a variety of hypotheses. This leads us to discover a method that vastly reduces the stability gap and greatly increases computational efficiency.

TMLR 2023: SIESTA: Efficient Online Continual Learning with Sleep

Also presented at the Journal Track of CoLLAs 2024

Md Yousuf Harun*, Jhair Gallardo*, Tyler L. Hayes, Ronald Kemker, Christopher Kanan

* denotes equal contribution.

TMLR 2023

For continual learning (CL) to make a real-world impact, CL systems need to provide computational efficiency and rival traditional offline learning systems retrained from scratch. Towards that goal, we propose a novel online CL algorithm named SIESTA. SIESTA uses a wake/sleep framework for training, which is well aligned to the needs of on-device learning. SIESTA is far more computationally efficient than existing methods, enabling CL on ImageNet-1K in under 2 hours; moreover, it achieves "zero forgetting" by matching the performance of the joint model, a milestone critical to driving adoption of CL in real-world applications.

CVPR Workshop 2023: How Efficient Are Today's Continual Learning Algorithms?

Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

CVPR Workshop 2023

Continual learning (CL) has focused on catastrophic forgetting, but a major motivation for CL is efficiently updating deep neural networks (DNNs) with new data, rather than retraining from scratch when dataset grows over time. We study the computational efficiency of existing CL methods which reveals that many are as expensive as training offline models from scratch. This defeats the efficiency aspect of CL.

Publications

Peer-Reviewed Papers

Best Poster Award

  • M.Y. Harun, J. Gallardo, C. Kanan. Prioritized Training on Rehearsal Samples for Efficient Online Continual Learning. In: IEEE Western New York Image & Signal Processing Workshop (WNYISPW), 2023

Links

Curriculum Vitae · Google Scholar · GitHub · LinkedIn