Talk on Recent Advances in Vision-Language Pre-training

Event Date/Time

May 3, 2021, 4:00pm

Event Location

Zoom

Dr. Zhe Gan is giving an invited talk on Recent Advances in Vision-Language Pre-training next Monday (5/3) at 4pm PT. Welcome to join the talk: https://ucsc.zoom.us/j/91616813658?.

Abstract: With the advent of models such as OpenAI CLIP and DALL-E, transformer-based vision-language pre-training (VLP) has become an increasingly hot research topic. In this talk, I will first briefly review UNITER, one of the best-performing VLP models, then share some of the most recent works in our team that extend UNITER for better generalization, robustness, and efficiency. In terms of generalization, I will present our NeurIPS 2020 Spotlight paper VILLA that uses adversarial training for performance enhancement on standard vision-language tasks. In terms of robustness, I will present our recent MANGO work that carefully and systematically examines the performance of VLP models on an assemblage of 9 robust VQA benchmarks. In terms of efficiency, I will present our recent work that uses the lottery ticket hypothesis to investigate the parameter redundancy of VLP models. At last, I will also briefly discuss challenges and future directions for vision-language pre-training.

Bio: Dr. Zhe Gan is a Senior Researcher at Microsoft. He received the PhD degree from Duke University in 2018. Before that, he received the Master's and Bachelor's degree from Peking University in 2013 and 2010, respectively. His current research interests include vision-language representation learning, self-supervised pre-training, and adversarial machine learning. He has been regularly serving as an Area Chair for multiple top-tier AI conferences, including NeurIPS, ICLR, ICML, ACL, and AAAI.