Self-supervised learning through the eyes of a child
Emin Orhan,Vaibhav Gupta,Brenden M. Lake
Within months of birth, children develop meaningful expectations about theworld around them. How much of this early knowledge can be explained throughgeneric learning mechanisms applied to sensory data, and how much of itrequires more substantive innate inductive biases? Addressing this fundamentalquestion in its full generality is currently infeasible, but we can hope tomake real progress in more narrowly defined domains, such as the development ofhigh-level visual categories, thanks to improvements in data collectingtechnology and recent progress in deep learning. In this paper, our goal isprecisely to achieve such progress by utilizing modern self-supervised deeplearning methods and a recent longitudinal, egocentric video dataset recordedfrom the perspective of three young children (Sullivan et al., 2020). Ourresults demonstrate the emergence of powerful, high-level visualrepresentations from developmentally realistic natural videos using genericself-supervised learning objectives.


