Sampling from a k-DPP without looking at all items

Daniele Calandriello,Michal Derezinski,Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selectinga small diverse subset out of a large collection of items, with applications insummarization, recommendation, stochastic optimization, experimental design andmore. Given a kernel function and a subset size k, our goal is to sample k outof n items with probability proportional to the determinant of the kernel matrixinduced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms requirean expensive preprocessing step which involves multiple passes over all n items,making it infeasible for large datasets. A nau00efve heuristic addressing this problem isto uniformly subsample a fraction of the data and perform k-DPP sampling only onthose items, however this method offers no guarantee that the produced sample willeven approximately resemble the target distribution over the original dataset. In thispaper, we develop alpha-DPP, an algorithm which adaptively builds a sufficiently large uniformsample of data that is then used to efficiently generate a smaller set of k items,while ensuring that this set is drawn exactly from the target distribution definedon all n items. We show empirically that ouralgorithm produces a k-DPP sample after observing only a small fraction of allelements, leading to several orders of magnitude faster performance compared tothe state-of-the-art. Our implementation of alpha-DPP is provided at https://github.com/guilgautier/DPPy/.