The Next AI Blog

Stay informed with the latest updates on synthetic data.

Lightbox Image

Task2Sim: Towards Effective Pre-Training and Transfer from Synthetic Data

Review: Neural network (NN) models pretraining on large datasets, like ImageNet, became a standard procedure in computer vision in the last years. Model pre-training is especially effective when only a small amount of data is available for training. In this case, training highly expressive models, for example large-scale neural networks, may lead to overfitting and...
Read More

The Pace of Progress in Academia and Industry

In a recent Unboxing AI podcast episode, Gil Elbaz, Datagen co-founder and CTO, sat down to speak with Lihi Zelnik-Manor, an associate professor in the faculty of electrical engineering at the Technion, and the former general manager of Alibaba DAMO Israel Lab. ​​  Professor Zelnik-Manor holds a PhD and MSC with honors in computer science,...
Read More

Edge Cases in Autonomous Vehicle Production

“Because [the autonomous vehicle] is a product in the hands of customers, you are forced to go through the long tail. You cannot do just 95% and call it a day. The long tail brings all kinds of interesting challenges,” says Andrej Kaparthy, the director of artificial intelligence and Autopilot Vision at Tesla, at the...
Read More

Using Synthetic Images To Uncover Biases

In January 2020, Robert Williams was arrested for shoplifting after examining the security footage. The investigators followed the lead of a facial recognition system that flagged Williams’ photo as a match against the grainy footage (Figure 1). The problem? Williams was nowhere near the crime scene when the incident happened. Figure 1. A photo of the...
Read More

Real and Synthetic Data for Facial Landmark Detection

In part 1 of this series, we discussed  domain gaps and  laid the groundwork to proceed with our experiment. This experiment hypothesizes that “a landmark detection model trained on a combined real and synthetic data set will outperform one trained solely on real data”. To test that, we adopted a truly data-centric approach. Using fixed...
Read More

VOS: Learning What You Don’t Know

Motivation: Deep learning models’ safe deployment in real-life scenarios requires accurate detection of out-of-distribution (OOD) data. Deep neural networks (DNNs) are usually trained under the assumption that training and real world data distributions coincide. Real-world tasks, however, fail to uphold this assumption, leading to erroneous and high-confident predictions for OOD data. Simply put, the absence...
Read More

We Just Raised $50M in Round B!

I’m excited and proud to announce today that Datagen has closed $50M in Series B financing led by our new investor Andy Vitus from Scale Venture Partners, with participation from our existing investors TLV Partners, Viola Ventures and Spider Capital. Additional investors taking part in the round include financial funds Vintage IP, Viola Growth and...
Read More

Q&A with Anthony Goldbloom, Founder and CEO, Kaggle

Gil Elbaz, Datagen’s co-founder and CTO, sat down with Anthony Goldbloom, the co-founder and CEO of Kaggle. Kaggle hosts machine learning competitions, where data scientists download data and upload solutions to difficult problems. Before Kaggle, Anthony worked as an econometrician at the Reserve Bank of Australia, and before that the Australian Treasury. He holds a...
Read More

Privacy Requirements Must Keep Up as Data Builds

In today’s world, upholding the right to privacy is challenging, to say the least. Data protection and privacy legislation exists in 70% of countries around the world. The need for data that feeds the voracious appetite of machine learning algorithms has made that data an indispensable part of doing business in the modern world.  There...
Read More