Generative Models as a Data Source for Multiview Representation Learning

Ali Jahanian Xavier Puig Yonglong Tian Phillip Isola
MIT Computer Science and Artificial Intelligence Laboratory

[Paper]      [Code]     

Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple "views" of the same semantic content. We show that for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more "model zoos" proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future.


Conceptual Figures

Different ways of creating views for contrastive learning:

Different ways of learning representations:

Different ways of creating latent transformations combined with pixel transformations:

Reference

@article{jahanian2021generative, title={Generative Models as a Data Source for Multiview Representation Learning}, author={Jahanian, Ali and Puig, Xavier and Tian, Yonglong and Isola, Phillip}, journal={arXiv preprint arXiv:2106.05258}, year={2021} }

Acknowledgements:
Author A.J. thanks Kamal Youcef-Toumi, Boris Katz, and Antonio Torralba for their support. We thank Antonio Torralba and Tongzhou Wang for helpful discussions.

This research was supported in part by IBM through the MIT-IBM Watson AI Lab. The research was also partly sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

MIT Accessibility