Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

1Rochester Institute of Technology, 2University of Rochester

Published at ICML 2025

Motivation

Out-of-distribution (OOD) detection and OOD generalization are widely studied in Deep Neural Networks (DNNs), yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but degrades generalization, while weaker NC enhances generalization at the cost of detection. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to OOD detection and generalization. We show that entropy regularization mitigates NC to improve generalization, while a fixed Simplex Equiangular Tight Frame (ETF) projector enforces NC for better detection. Based on these insights, we propose a method to control NC at different DNN layers. In experiments, our method excels at both tasks across OOD datasets and DNN architectures.

Core Insight: Neural Collapse Relates to OOD Detection & Generalization

Interpolation end reference image.

In this paper, we show that there is a close inverse relationship between OOD detection and generalization with respect to the degree of representation collapse in DNN layers. This plot illustrates this relationship for VGG17 pre-trained on ImageNet-100 using four OOD datasets, where we measure collapse and OOD performance for various layers. For OOD detection, there is a strong positive Pearson correlation (R = 0.77) with the degree of neural collapse (NC1) in a DNN layer, whereas for OOD generalization, there is a strong negative correlation (R=−0.60). We rigorously examine this inverse relationship and propose a method to control NC at different layers.

Controlling Neural Collapse

Interpolation end reference image.

Mitigating Neural Collapse (NC) in the encoder improves OOD generalization, while promoting NC in the projector enhances OOD detection.

Method Overview

Method overview diagram.
  • A single feature space cannot effectively support both OOD detection and generalization.
  • To address this, we control Neural Collapse (NC) at different layers of the network:
    • Layer for OOD generalization:
      • We introduce entropy regularization to mitigate NC in the encoder, improving feature diversity for generalization.
      • We develop a theoretical framework that explains how entropy regularization mitigates NC. In particular, we show that collapsing implies entropy diverges to negative infinity.
      • We implement entropy regularization using nearest-neighbor-based density estimation.
    • Layer for OOD detection:
      • We use a fixed simplex Equiangular Tight Frame (ETF) projector to induce NC in the final layer, improving feature compactness for detection.
      • The projector is a two-layer MLP configured to satisfy ETF constraints (equinorm and maximum equiangularity) and remains frozen during training.

Neural Collapse Correlates with Entropy

Interpolation end reference image.

The stronger the neural collapse (NC1), the lower the entropy and vice-versa. We analyze different layers of VGG17 networks that are pre-trained on the ImageNet-100 (ID) dataset. R denotes the Pearson correlation coefficient.

Qualitative Results: Encoder Vs. Projector

Interpolation end reference image.

UMAP Visualization of Embedding. The projector embeddings exhibit much greater NC (NC1 = 0.393) than the encoder embeddings (NC1 = 2.175) as indicated by the formation of compact clusters around class means. For clarity, we highlight 10 ImageNet classes by distinct colors. For this, we use ImageNet-100 pre-trained VGG17.

UMAP Visualization of ID & OOD Data

The projector exhibits a greater separation between ID and OOD embeddings than the encoder. For clarity, we show ImageNet-10 as ID data and NINCO-64 as OOD data.

Interpolation end reference image.

Energy Score Distribution of ID & OOD Data

The projector exhibits a greater separation between ID and OOD energy scores than the encoder. For ID and OOD datasets, we show ImageNet-100 and NINCO-64, respectively.

Interpolation end reference image.

Energy Score Distribution - Flowers-102 OOD Dataset

The projector creates a greater separation between ID and OOD data and achieves a lower FPR95 than the encoder.

Interpolation end reference image.

Energy Score Distribution - STL-10 OOD Dataset

The projector better separates ID and OOD data, achieving a lower FPR95 than the encoder.

Interpolation end reference image.

Quantitative Results: Encoder Vs. Projector

  • We train various DNNs on ImageNet-100 dataset (ID) and evaluate OOD detection and generalization using same eight OOD datasets. Reported results are averaged across eight OOD datasets.
  • (a) The projector intensifies NC and becomes a better OOD detector than the encoder.
  • (b) The encoder mitigates NC and becomes a better OOD generalizer than the projector.
  • (c) The projector exhibits lower NC1 values (i.e., stronger NC) than the encoder.

(a) OOD Detection

OOD Detection

(b) OOD Generalization

OOD Generalization

(c) Neural Collapse (NC1)

Combined OOD Performance

Comparison with Baseline

  • We train various DNNs on ImageNet-100 dataset (ID) and evaluate OOD detection and generalization using same eight OOD datasets. Reported results are averaged across eight OOD datasets.
  • Compared to baselines that do not control NC, our method consistently improves both OOD detection and generalization across diverse DNN architectures.

OOD Detection

OOD Detection

OOD Generalization

OOD Generalization

Analyzing Entropy Regularization & L2 Normalization

(a) Entropy regularization reduces neural collapse (indicated by higher NC1 values) in the encoder.

(b) Entropy regularization increases the entropy of encoder embeddings otherwise entropy remains unchanged.

(c) Entropy regularization increases the effective rank of encoder embeddings otherwise effective rank remains as low as the number of classes (i.e., 10 ImageNet classes).

(d) L2 normalization increases neural collapse (indicated by lower NC1 values) in the projector.

Acknowledgments

This work was partly supported by NSF awards #2326491, #2125362, and #2317706.

Related Work

Check out our NeurIPS-2024 paper "What Variables Affect Out-of-Distribution Generalization in Pretrained Models?" , where we present a comprehensive study of OOD generalization through the lens of the Tunnel Effect Hypothesis, which is closely linked to intermediate Neural Collapse.

BibTeX

@article{harun2025controlling,
  title     = {Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning},
  author    = {Harun, Md Yousuf and Gallardo, Jhair and Kanan, Christopher},
  journal   = {International Conference on Machine Learning},
  year      = {2025}
  }