Neural Kernels Without Tangents
Vaishaal Shankar,u00a0Alex Fang,u00a0Wenshuo Guo,u00a0Sara Fridovich-Keil,u00a0Jonathan Ragan-Kelley,u00a0Ludwig Schmidt,u00a0Benjamin Recht
We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating u201ccompositionalu201d kernels from bags of features. We show that these operations correspond to many of the building blocks of u201cneural tangent kernels (NTK)u201d. Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.