AdaTune: Adaptive Tensor Program Compilation Made Efficient

Menghao Li, Minjia Zhang, Chi Wang, Mingqin Li

In this paper, we present a new method, called AdaTune, that significantly reduces the optimization time of tensor programs for high-performance deep learning inference. In particular, we propose an adaptive evaluation method that statistically early terminates a costly hardware measurement without losing much accuracy. We further devise a surrogate model with uncertainty quantification that allows the optimization to adapt to hardware and model heterogeneity better. Finally, we introduce a contextual optimizer that provides adaptive control of the exploration and exploitation to improve the transformation space searching effectiveness. We evaluate and compare the levels of optimization obtained by a state-of-the-art DL compiler and AdaTune. The experiment results show that AdaTune obtains up to 115% higher GFLOPS than the baseline under the same optimization time budget. Furthermore, AdaTune provides 1.3--3.9X speedup in optimization time over the state-of-the-art to reach the same optimization quality for a range of models across different hardware architectures.