The current mainstream view in neuroscience and machine learning is that neural networks compress representations into low-dimensional manifolds [Op de Beeck et al., 2001, Gao and Ganguli, 2015, Gallego et al., 2017, Ansuini et al., 2019, Recanatesi et al., 2019]. A recent study challenges this view, by arguing that neural networks benefit form high-dimensional representations [Elmoznino and Bonner, 2022].
In contrast to these positions, we argue that learning in deep neural networks optimizes signal-to-noise processing. According to this view, neural networks may benefit form feature compression and expansion to (i) increase signal processing and (ii) diminish noise, while (iii) mapping inputs dimensions into outputs categories. We speculate also that nonlinearities (e.g., in activation functions) facilitate this process.
A causal relationship shall exist between the signal-to-noise ratio (SNR) and the behavioral performance of a network (e.g., in terms of classification accuracy) if SNRs are optimized through learning. To test this hypothesis, we introduced a SNR expression derived geometrically. Unlike the SNR expression in [Sorscher et al., 2022], our expression can be applied to neural representations associated with predictions of unseen data. We then computed the SNR to quantify the separability between category-based manifolds through different layers of neural processing, and tested the SNR with and without input noisy fluctuations, as well as with linear and nonlinear transformations (Linear, ReLU and Sigmoid).
Our results show that increasing noise fluctuations increases the dimensionality but diminishes the SNR and the accuracy, proving that higher dimensionality is not necessarily better in all conditions. In addition, we observe a causal correlation between SNR and accuracy in a perturbative analysis that gradually shortens the distance between the centroids of different category-based manifolds, without this analysis affecting the dimensionality of the data.
In addition, larger SNRs were obtained for nonlinear functions (ReLU and Sigmoid vs. Linear), which were correlated with higher accuracy ratios. Moreover, we found that the highest SNRs encountered were obtained for activations distributed within the region of maximum curvature of the sigmoid function, stressing the role of nonlinearities. We are currently exploring to what extent designed nonlinear functions with strong nonlinearities facilitate learning in neural networks with noisy data and robustness against adversarial examples.
Furthermore, we are testing if early-stopping, aimed to avoid overfitting during training, might be enhanced by considering the SNR.
H. Op de Beeck, J. Wagemans, and R. Vogels. Inferotemporal neurons represent low-dimensional configurations of parameterized shapes. Nat Neurosci, 4(12):1244-1252, Dec 2001.
P. Gao and S. Ganguli. On simplicity and complexity in the brave new world of large-scale neuroscience. Curr Opin Neurobiol, 32:148-155, Jun 2015.
J. A. Gallego, M. G. Perich, L. E. Miller, and S. A. Solla. Neural Manifolds for the Control of Movement. Neuron, 94(5):978-984, Jun 2017.
A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan. Intrinsic dimension of data representations in deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
S. Recanatesi, M. Farrell, M. Advani, T.Moore, G. Lajoie, and E. Shea-Brown. Dimensionality compression and expansion in deep neural networks. arXiv preprint arXiv:1906.00443, 2019.
E. Elmoznino and M. F. Bonner. High-performing neural networks models of visual cortex benefit from high latent dimensionality. bioRxiv, pages 2022-07, 2022.
B. Sorscher, S. Ganguli, and H. Sompolinsky. Neural representational geometry underlies few-shot concept learning . Proc Natl Acad Sci U S A, 119(43):e2200800119, Oct 2022.