Embeddings from pre-trained deep neural networks (DNNs) are widely used across
computer vision; however, the efficacy of these embeddings when used for down-stream
tasks can vary widely. We seek to understand what variables affect out-of-distribution
(OOD) generalization. We do this through the lens of the tunnel
effect hypothesis, which states that after training an over-parameterized DNN, its
layers form two distinct groups. The first consists of the initial DNN layers that
produce progressively more linearly separable representations, and the second
consists of the deeper layers that compress these representations and hinder OOD
generalization. Earlier work convincingly demonstrated the tunnel effect exists
for DNNs trained on low-resolution images (e.g., CIFAR-10) and suggested that
it was universally applicable. Here, we study the magnitude of the tunnel effect
when the DNN architecture, training dataset, image resolution, augmentations, and
OOD dataset are varied. We show that in some cases the tunnel effect is completely
mitigated, therefore refuting that the hypothesis is universally applicable. Through
extensive experiments with 10,584 trained linear probes, we find that each variable
plays a role, but some have more impact than others. Our results caution against
the practice of extrapolating findings from models trained on toy datasets to be
universally applicable.