Gaurav Mittal

I am a Principal Researcher at Microsoft Cloud+AI where I work on the research and development of Computer Vision and Machine Learning related products.

I am currently pursuing Multimodal research and development as part of the Responsible AI pillar under Azure Cognitive Services, supporting products like Azure AI Content Safety and Azure OpenAI. Previously, I have worked on AutoML as part of the Microsoft Custom Vision Service. I did my Master's in Computer Vision at Robotics Institute, Carnegie Mellon University where I worked with Prof. Kris Kitani on model compression and Hypernetworks. I have also worked at Amazon as a Software Development Engineer. I graduated with a B.Tech. in Computer Science and Engineering from Indian Insitute of Technology Ropar with the President of India Gold Medal. I have won the Microsoft Imagine Cup India and Indian National Academy of Engineering Innovate Student Project Award in 2016 for my project SpotGarbage.

Email / Google Scholar / Twitter / Github / LinkedIn

News

July 2022 : Paper accepted at ECCV 2022 (Oral) (details coming soon).
May 2022 : Outstanding Reviewer for CVPR 2022.
May 2022 : Reviewer for ECCV 2022.
Mar 2022 : GateHUB accepted at CVPR 2022.
Feb 2022 : Patent granted, Patent No.: US 11,238,885 B2.
Dec 2021 : Reviewer for CVPR 2022.

Research

I'm interested in computer vision, multmodal machine learning, automated machine learning (AutoML), semi-/self-supervised learning and meta learning.

	GateHUB: Gated History Unit With Background Suppression for Online Action Detection Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen CVPR, 2022 paper GateHUB introduces a novel gated cross-attention along with future-augmented history and background suppression objective to outperform all existing methods on online action detection task on all public benchmarks.
	Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen ICCV, 2021 (Oral Presentation, 3% acceptance rate) paper First Unsupervised Meta-learning algorithm for Video Few-Shot action recognition. It comprises a novel Action-Appearance Aligned Meta-adaptation (A3M) module that learns to focus on the action-oriented video features in relation to the appearance features via explicit few-shot episodic meta-learning over unsupervised hard-mined episodes.
	MUSE: Feature Self-Distillation with Mutual Information and Self-Information Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen BMVC, 2021 paper A novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). It jointly improve the expressivity of all features extracted from different layers in a CNN using Additive Information and Multiplicative Information.
	BLT: Balancing Long-Tailed Datasets with Adversarially-Perturbed Images Jedrzej Kozerawski, Victor Fragoso, Nikolaos Karianakis, Gaurav Mittal, Matthew Turk, Mei Chen ACCV, 2020 paper / code / video A novel data augmentation technique that uses gradient-ascent to generate extra training samples for tail classes in a long-tail class distribution to improve generalization performance of a image classifier for real-world datasets exhibiting long tail. BLT avoids dedicated generative networks for image generation, thereby significantly reducing training time and compute.
	HyperSTAR: Task-Aware Hyperparameters for Deep Networks Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu (* Equal Contribution) CVPR, 2020 (Oral Presentation, 5.7% acceptance rate) paper / video A task-aware method to warm-start Hyperparameter Optimization (HPO) methods by predicting the performance for a hyperparamter configuration via a learned task (dataset) representation.
	Animating Face using Disentangled Audio Representations Gaurav Mittal, Baoyuan Wang WACV, 2020 paper / video / arXiv To make talking head generation robust to such emotional and noise variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. When conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations.
	Interactive Image Generation Using Scene Graphs Gaurav Mittal, Shubham Agrawal, Anuva Agarwal, Sushant Mehta, Tanya Marwah* (Equal Contribution) ICLR*, 2019 DeepGenStruct Workshop paper / arXiv Proposed a method to generate an image incrementally based on a sequence of scene graphs such that the image content generated in previous steps is preserved and the cumulative image is modified as per the newly provided scene information.
	Attentive Semantic Video Generation Using Captions Tanya Marwah, Gaurav Mittal, Vineeth N Balasubramanian (* Equal Contribution) ICCV, 2017 paper / code / arXiv Proposed a network architecture that learns long-term and short-term context of the video data and uses attention to align the information with accompanying text to perform variable length semantic video generation on unseen caption combinations.
	Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures Gaurav Mittal, Tanya Marwah, Vineeth N Balasubramanian (* Equal Contribution) ACM Multimedia, 2017 (Oral Presentation, 7.5% acceptance rate) paper / arXiv Combines a variational autoencoder (VAE) with recurrent attention mechanism to create a temporally dependent sequence of frames that are gradually formed over time.
	SpotGarbage: Smartphone App to Detect Garbage using Deep Learning Gaurav Mittal, Kaushal B Yagnik, Mohit Garg, Narayanan C Krishnan ACM UbiComp, 2016 video / code / paper/ dataset Designed a fully convolutional network to detect and coarsely segment garbage regions in the image. Built an smartphone app, SpotGarbage, deploying the CNN to make on-the-device detections. Also introduced a new Garbage-In-Images (GINI) dataset.
	Supervised deep segmentation network for brain extraction Apoorva Sikka, Gaurav Mittal, Deepthi R Bathula, Narayanan C Krishnan (Equal Contribution) ICVGIP*, 2016 paper Proposed a novel encode-decoder network for brain extraction from T1-weighted MR images. The model operates on full 3D volumes, simplifying pre- and post-processing operations, to efficiently provide a voxel-wise binary mask delineating the brain region.

Service

Reviewer at ICLR 2022, ECCV 2022, NeurIPS 2022, AAAI 2021
Outstanding Reviewer at ICCV 2021, CVPR 2021, CVPR 2022
Program Committee member of IEEE Workshop on Computer Vision for Microscopy Image Analysis (CVMI) held in conjunction with CVPR 2022, CVPR 2021, CVPR 2020
Program Chair of Workshop on Neural Architecture Search for Computer Vision in the Wild (NASFW) held in conjunction with WACV 2020

Workshop on Neural Architecture Search for Computer Vision in the Wild (NASFW)
WACV 2020

Respectfully copied from Jon Barron's website.