Datagen at CVPR

Our team recently returned from CVPR 2022 and it was an incredible experience.  We were blown away by the people, the technology, the papers and even made time to visit New Orleans, eat a few beignets and listen to some jazz. 

The People

It takes a village to run a booth and experience everything that CVPR has to offer at the same time. Our Datagen team included our CTO, algo engineers, product, marketing and our VP R&D, to name a few. We met people from all over the globe – Saudi Arabia, Mexico, South Korea, Japan – and enjoyed spending time with each of them. It was exciting to understand how they see and use synthetic data, what the challenges are and what they need from synthetic data for their everyday work. 

We had over 500 conversations with both academics and engineers working in the industry. We discussed with our visitors where synthetic data is needed; humans in context, face recognition including face liveliness, generative algorithms with adding aspects to faces and verticals like in-cabin automotive, home security, metaverse ar/vr and smart office and fitness. We also spoke about the challenges in academia using synthetic data like the difficulty in rendering at scale, the lack of 3D artists, and how much effort it takes to create the many identities needed for their research. 

The Technology

Synthetic data was definitely trending at CVPR from showing how to use it in the best way to using it for data-centric AI. Everyone was interested and curious about what we do. But there were also many other cool ideas and papers that were presented. NeRF seemed to dominate and is making progress towards closing the gap in a variety of ways. There were over 50 papers on NeRF alone! Some of the most exciting advances are: Less images and faster training, accurate reflections, light control, material control, HDR and large scale (whole-block). See even more on our blog about 6 papers to watch at CVPR 2022 here.

The Research

Our CTO and co-founder, Gil Elbaz presenting at the 7th Workshop on Benchmarking Multi-Target Tracking: How far can synthetic data take us on synthetic data and our recent benchmark on leveraging synthetic data for hands-on-wheel detection and how this can be used to train driver monitoring systems.  Gil demonstrated the use of synthetic photorealistic in-cabin data, created on the Datagen platform, to train a Driving Monitoring System (DMS) that detects whether the driver’s hands are on the wheel. The experiment used synthetic data to train a lightweight neural network to detect when the driver removes their hands from the wheel. He presented achieving similar results to training on real data. This showcases the ability of human-centric synthetic data to generalize well to the real world, and help train algorithms in computer vision settings where data from the target domain is scarce or hard to collect.

Jonathan Laserson, Datagen’s Head of AI Research, presenting at Machine Learning with Synthetic Data on Applying StyleGAN On Top of Synthetically Generated Data. Neural generators like StyleGAN can generate photorealistic images in many domains after learning their distribution “bottom-up” from large image datasets. Even though it’s possible to manipulate the generated images in various ways, controlling the generated content is a hard task, as it requires reverse-engineering the latent space of the StyleGAN.

To bridge this domain gap between the level of diversity and photorealism, Jonathan proposed an initial version of the desired image using the top-down synthetic pipeline, and then inverted this image into the latent space of a StyleGAN trained on real images. He shows that the inversion maintains the same person identity, but adds photorealism and provides access to new modes of diversity. This enables us to generate synthetic, photorealistic image datasets that can be used to train computer vision models, such as face recognition, while retaining full control over the distribution of the data.

The Fun

New Orleans is definitely a fun city! There were lots of fun things to do and see in the city, at the Expo and the conference itself. Here are some of the highlights:

Autonomous Vehicles

Project Aria Glasses from Meta

Synthetic Mice

New Orleans

Karine Regev is Datagen’s VP of Marketing. She has over 17 years of experience in marketing, especially scaling security and AI tech startups. Karine has a track record of growing brand and market share, specializes in driving lead generation and developing a marketing team for B2B growth. She works to bring Datagen’s innovation to the global market and sharing the possibilities of synthetic data.