Deep neural networks with increasing depth consistently outperform their shallower counterparts across a wide range of architectures and datasets. However, a principled theoretical explanation for this phenomenon remains at a superficial level. While explanations such as representational power, input space folding, optimization conditioning, the neural tangent kernel and hierarchical feature learning have been proposed, they all have glaring deficiencies. We develop a new narrative that postulates the following. From the viewpoint of any given layer, the other layers act as a lens that applies importance weights to the input, facilitating the emergence of features without global scope that would otherwise be overlooked. Gradient methods simply update the parameters of a layer to best fit the data that is importance-weighted by this lens made of other layers. Individual neuron parameters serve the dual role of ‘separator for data lensed by other layers’ and ‘component of lens for other layers.’ This enables a positive feedback loop that forms the core mechanism of feature learning under gradient descent in deep networks. We support our narrative with theoretical analysis and empirical results on the recently proposed Deep Linearly Gated Network (DLGN), an architecture combining elements of deep linear and ReLU networks. Our perspective further provides new insights into the effects of architectural modifications such as pruning, skip-connections, and momentum-based optimization.
Prof. Harish is currently an assistant professor in the Data Science and Artificial Intelligence (DSAI) department of IIT Madras. His primary areas of interest include machine learning, statistical learning theory, and optimization. He was previously a research scientist at IBM Research Labs and a postdoctoral researcher at the University of Michigan. He completed his PhD in Computer Science and Automation (CSA) department of the Indian Institute of Science (IISc), Bangalore, under the guidance of Prof. Shivani Agarwal. Earlier, he pursued his Master's degree under the supervision of Prof. Chiranjib Bhattacharyya.