Center for Signal and Information Processing Seminar
Friday, December 3, 2021
3:00pm - 4:00pm
Date: Friday, December 3, 2021
Time: 3 pm
BlueJeans link: https://bluejeans.com/4658604304
Speaker: Qi Lei
Speaker's Title: Associate Research Scholar
Speakers' Affiliation: Princeton University, Department of Electrical and Computer Engineering
Seminar Title: Provable Representation Learning: The Importance of Task Diversity and Pretext Tasks
Abstract of Talk: Modern machine learning models are transforming applications in various domains at the expense of a large amount of hand-labeled data. In contrast, humans and animals first establish their concepts or impressions from data observations. The learned concepts then help them to learn specific tasks with minimal external instructions. Accordingly, we argue that deep representation learning seeks a similar procedure: 1) to learn a data representation that filters out irrelevant information from the data; 2) to transfer the data representation to downstream tasks with few labeled samples and simple models. In this talk, we study two forms of representation learning: supervised pre-training from multiple tasks and self-supervised learning.
Supervised pre-training uses a large labeled source dataset to learn a representation, then trains a simple (linear) classifier on top of the representation. We prove that supervised pre-training can pool the data from all source tasks to learn a good representation that transfers to downstream tasks (possibly with covariate shift) with few labeled examples. We extensively study different settings where the representation reduces the model capacity in various ways. Self-supervised learning creates auxiliary pretext tasks that do not require labeled data to learn representations. These pretext tasks are created solely using input features, such as predicting a missing image patch, recovering the color channels of an image, or predicting missing words. Surprisingly, predicting this known information helps in learning a representation useful for downstream tasks. We prove that under an approximate conditional independence assumption, self-supervised learning provably learns representations that linearly separate downstream targets. For both frameworks, representation learning provably and drastically reduces sample complexity for downstream tasks.
Speaker biosketch: Qi Lei is an associate research scholar at Princeton ECE department. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop sample and computational-efficient machine learning algorithms and to bridge the theoretical and empirical gap in machine learning. Qi has received several awards, including the Outstanding Dissertation Award, National Initiative for Modeling and Simulation Graduate Research Fellowship, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship.
Last revised November 18, 2021