The Metaverse and AI Edge Cases Will Drive Synthetic Data Boom: Top Predictions for 2022 by Synthetic Data Innovator Datagen
Synthetic data is in for a banner year, as businesses look to leverage AI for a growing number of increasingly-sophisticated applications, including tackling the world’s supply-chain disruptions, reinventing automotive safety, and creating a whole new class of intelligent consumer goods with the metaverse at the fore.
Tel Aviv, Israel — November 30, 2021 — Datagen, the pioneer of domain-specific synthetic data for humans and object perception, today released its new year’s predictions for the fields of Artificial Intelligence, Machine Learning, and Computer Vision. As AI makes its way into ubiquitous adoption by a growing number of industries and applications, the demand for robust training data will expand accordingly. However, with manual data collection already at the limits of its own utility, the race for AI supremacy will only serve to widen the existing gulf between supply and demand. At the same time, companies like Datagen are making it easier and more affordable to generate high-quality synthetic datasets to train computer vision (CV) AI models. The ability to generate tens of thousands of synthetic images — customized to suit the unique parameters of each distinct application — makes synthetic data the obvious solution to the limitations of traditional, manually-collected data.
“We’re approaching a major inflection point for the synthetic data field,” said Ofir Chakon, co-founder and CEO of Datagen. “This year, AI underwent a major paradigm shift, in which traditional, model-centric approaches to AI development were reconsidered in favor of data-centrism, which means data scientists are now placing more significance on the quality of their training data as a determinant of performance, rather than the quality of their model. This shift in the zeitgeist — combined with the ability to rapidly iterate one’s dataset in a targeted, fine-tuned way — will make 2022 the year in which synthetic data becomes the most widely used training and testing solution in AI.”
After a year of building great momentum to power the next big leap in computer vision systems, including key appointments to its executive leadership and advisory board, Datagen’s executive team have predicted the following trends to take center stage in 2022 to help organizations accelerate their AI adoption and to prepare for what comes next:
The Synthetic Data Revolution Will Create a New ‘Synthetic Data Engineer’ Vocation to Become of the Most In-Demand Jobs
In 2022, a new position will surface — the ‘synthetic data engineer’ — data scientists who handle the creation, processing, and analysis of large synthetic datasets in an effort to support the automation of prescriptive decision-making through visuals. This new vocation, a natural evolution of the computer vision engineer, is already emerging in larger companies, where synthetic data teams have sprouted. The synthetic data engineer will become one of the most sought-after professionals in the AI market as more enterprises and startups alike will need the skills to support their simulated data initiatives. Expect to see such job postings soar and more training courses to become available, to fill the 22% rise in computer and information research scientist jobs over the next 10 years (US Bureau of Labor statistics), of which CV (and synthetic data) engineers are a subset. In addition, we will see other data-related professionals reposition themselves as synthetic data engineers to take advantage of expanding opportunities.
Data-Centric AI Development Will Fuel Widespread Adoption of Synthetic Data
After nearly a decade of being dominated by model-centric approaches to development, the field of AI is experiencing a paradigm shift — away from modeling and toward a data-centric approach to AI development. In short, rather than focusing on making incremental improvements to one’s AI algorithm or model, researchers have found that they can optimize AI performance much more effectively by improving the quality of one’s training data. Over the course of 2021, data-centrism has been rapidly gaining acceptance throughout AI’s R&D and enterprise communities. This trend will undoubtedly continue well into 2022, and the increased focus on data quality will act as yet another catalyst for the adoption of synthetic data.
Technology Needed to Make the Metaverse a Reality Will Experience a Major Expansion
Facebook’s recent announcement about its foray into the metaverse is driving the metaverse mania. Recent metaverse developments include Microsoft’s announcement of its own metaverse, plus a key metaverse patent filing from Apple. Meanwhile, another early metaverse entrant, NVIDIA, saw a 12% increase in stock price since the Facebook announcement.
These recent metaverse announcements are merely the opening salvos in what will surely be a heated competition to define the future of human interaction with the environment and how we manage social connections with remote people. In the frenzy to develop the first practical, real-world applications, vendors will need to invest heavily in tools and technologies that can help them get to market first and gain first-mover advantage. These include a variety of hardware, software and data solutions. Look for a bump in these investments over the next 12-18 months.
Edge Cases Will Continue to Boost Industry Demand for Synthetic Data
Edge cases are unlikely or improbable situations that a given AI may still conceivably encounter over the course of its operational lifetime. Although improbable, engineers need to take these edge cases into consideration when developing and training their AI applications — especially when applications carry significant risks, such as autonomous vehicles. However, the very same risks that make edge case training so important in these applications, also make it exceedingly difficult, if not impossible, to gather the data said training requires. Faced with this conundrum, more and more businesses will turn to synthetic data for their training needs. More and more car manufacturers will use synthetic data to train and develop their in-cabin driver monitoring system (DMS). These AI-enabled systems use computer vision to monitor drivers and issue alerts whenever drivers show signs of distraction or fatigue. We will surely see many other carmakers follow suit over the coming years, as new EU regulations mandating DMS technologies go into effect; and American manufacturers inevitably do the same to keep up with competition. This, along with work on driverless technologies, will vastly expand and deepen the industry’s investment in the human-centered synthetic data needed to train those systems.
The Supply Chain Crisis Will Worsen but Digital Twins Will Save the Day
Federal Reserve chair Jerome Powell and other experts predict that the global supply chain crisis will only get worse in 2022 before it gets better. In fact, a recent Wall Street Journal poll of leading economists finds almost half of the respondents cite supply chain bottlenecks as the biggest threat to growth in the next 12 to 18 months. Unpredictable weather patterns and labor shortages will intensify the disruptions caused by the global pandemic. As a result, private businesses and government agencies will turn to solutions that could help alleviate the pressures. One such solution will be digital twins, a machine learning driven simulation of real-world objects to predict disruptions and provide recommendations on how to avoid them. Organizations whose operations are heavily supply chain dependent should consider investing in digital twins technology to stay competitive.
Across all these predictions, the common thread is clear — the world’s good data needs are going up. And manual data collection and annotation won’t be able to satisfy the impending explosion of demand. Synthetic data, on the other hand, offers a fast, customizable, and cost-effective alternative that, in many cases, performs even better than its real-world counterpart. The world’s increasing demand for data also coincides with an increased demand for data professionals, both data scientists and computer vision engineers, which may well prove to be the true bottleneck to impede AI’s rise to universal adoption.
Datagen is powering the AI revolution by providing high-performance, synthetic data, with a focus on data for human-centric computer vision applications. We developed the first self-serve synthetic data platform that generates visual data which is both photorealistic and high-variance. Our platform allows CV Engineers to create high-fidelity synthetic data in a seamless and scalable manner. Fortune 500 companies rely on Datagen to enable their technological innovation in the worlds of AR/ VR/ Metaverse, In-cabin Vehicle Safety, Robotics, IoT Security and more. Founded in 2018, Datagen is led and backed by world renowned AI experts.
Scratch Marketing + Media for Datagen