Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective

Liangliang Shi,u00a0Gu Zhang,u00a0Haoyu Zhen,u00a0Jintao Fan,u00a0Junchi Yan

Previous research on contrastive learning (CL) has primarily focused on pairwise views to learn representations by attracting positive samples and repelling negative ones. In this work, we aim to understand and generalize CL from a point set matching perspective, instead of the comparison between two points. Specifically, we formulate CL as a form of inverse optimal transport (IOT), which involves a bilevel optimization procedure for learning where the outter minimization aims to learn the representations and the inner is to learn the coupling (i.e. the probability of matching matrix) between the point sets. Specifically, by adjusting the relaxation degree of constraints in the inner minimization, we obtain three contrastive losses and show that the dominant contrastive loss in literature InfoNCE falls into one of these losses. This reveals a new and more general algorithmic framework for CL. Additionally, the soft matching scheme in IOT induces a uniformity penalty to enhance representation learning which is akin to the CLu2019s uniformity. Results on vision benchmarks show the effectiveness of our derived loss family and the new uniformity term.