Gaurav Mittal

I am a Researcher at Microsoft Cloud+AI where I work on the research and development of Computer Vision and Machine Learning related products.

I am currently working in AutoML as part of the Microsoft Custom Vision Service. I did my Master's in Computer Vision at Robotics Insitute, Carnegie Mellon University where I worked with Prof. Kris Kitani on model compression and Hypernetworks. I have worked at Amazon as a Software Development Engineer. I graduated with a B.Tech. in Computer Science and Engineering from Indian Insitute of Technology Ropar with the President of India Gold Medal. I have won the Microsoft Imagine Cup India and Indian National Academy of Engineering Innovate Student Project Award in 2016 for my project SpotGarbage.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

I'm interested in computer vision, automated machine learning (AutoML), self-supervised learning and meta learning.

BLT: Balancing Long-Tailed Datasets with Adversarially-Perturbed Images
Jedrzej Kozerawski, Victor Fragoso, Nikolaos Karianakis, Gaurav Mittal, Matthew Turk, Mei Chen
ACCV, 2020
paper / code / video

A novel data augmentation technique that uses gradient-ascent to generate extra training samples for tail classes in a long-tail class distribution to improve generalization performance of a image classifier for real-world datasets exhibiting long tail. BLT avoids dedicated generative networks for image generation, thereby significantly reducing training time and compute.

HyperSTAR: Task-Aware Hyperparameters for Deep Networks
Gaurav Mittal*, Chang Liu*, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu (* Equal Contribution)
CVPR, 2020 (Oral Presentation, 5.7% acceptance rate)
paper / video

A task-aware method to warm-start Hyperparameter Optimization (HPO) methods by predicting the performance for a hyperparamter configuration via a learned task (dataset) representation.

Animating Face using Disentangled Audio Representations
Gaurav Mittal, Baoyuan Wang
WACV, 2020
paper / video / arXiv

To make talking head generation robust to such emotional and noise variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. When conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations.

Interactive Image Generation Using Scene Graphs
Gaurav Mittal*, Shubham Agrawal*, Anuva Agarwal*, Sushant Mehta*, Tanya Marwah* (*Equal Contribution)
ICLR, 2019 DeepGenStruct Workshop
paper / arXiv

Proposed a method to generate an image incrementally based on a sequence of scene graphs such that the image content generated in previous steps is preserved and the cumulative image is modified as per the newly provided scene information.

Attentive Semantic Video Generation Using Captions
Tanya Marwah*, Gaurav Mittal*, Vineeth N Balasubramanian (* Equal Contribution)
ICCV, 2017
paper / code / arXiv

Proposed a network architecture that learns long-term and short-term context of the video data and uses attention to align the information with accompanying text to perform variable length semantic video generation on unseen caption combinations.

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
Gaurav Mittal*, Tanya Marwah*, Vineeth N Balasubramanian (* Equal Contribution)
ACM Multimedia, 2017 (Oral Presentation, 7.5% acceptance rate)
paper / arXiv

Combines a variational autoencoder (VAE) with recurrent attention mechanism to create a temporally dependent sequence of frames that are gradually formed over time.

SpotGarbage: Smartphone App to Detect Garbage using Deep Learning
Gaurav Mittal, Kaushal B Yagnik, Mohit Garg, Narayanan C Krishnan
ACM UbiComp, 2016
video / code / paper/ dataset

Designed a fully convolutional network to detect and coarsely segment garbage regions in the image. Built an smartphone app, SpotGarbage, deploying the CNN to make on-the-device detections. Also introduced a new Garbage-In-Images (GINI) dataset.

Supervised deep segmentation network for brain extraction
Apoorva Sikka*, Gaurav Mittal*, Deepthi R Bathula, Narayanan C Krishnan (*Equal Contribution)
ICVGIP, 2016
paper

Proposed a novel encode-decoder network for brain extraction from T1-weighted MR images. The model operates on full 3D volumes, simplifying pre- and post-processing operations, to efficiently provide a voxel-wise binary mask delineating the brain region.

Service
Workshop on Neural Architecture Search for Computer Vision in the Wild (NASFW)
WACV 2020

Respectfully copied from Jon Barron's website.