Certified Data Removal from Machine Learning Models
Chuan Guo,u00a0Tom Goldstein,u00a0Awni Hannun,u00a0Laurens Van Der Maaten
Good data stewardship requires removal of data at the request of the datau2019s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to u201cremoveu201d data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.