Introducing Our 50K Synthetic Identities!

Ofir Zuk (Chakon)

5/12/2022

3 Min read

We are pleased to announce that 50K face new identities have been integrated into the Datagen product and are accessible through our SDK and API!

Training and testing with a large number of unique identities can be a key to good results. That’s why we’ve focused on continuously broadening and expanding the diversity and scale of the identities we offer so that you can get the most powerful, effective human-oriented synthetic data.

Want to know more? This blog will explain:

How these identities were generated
The features of the identities
The roadmap for the identities

Our identities are verified to be distinct according to a face recognition model, which checks each one for uniqueness. Our identities in the platform are now defined to include hair, eyes and more descriptive features, in addition to mesh and texture.

Generated Identities

We define an identity as a unique combination of a 3D mesh, a 4k texture, and a normal map, as well as hair, eyebrows, and iris.

Samples of our 50K identities generated by the Datagen platform.

Generate Synthetic Data with Our Free Trial. Start Now!

The generated identities were generated by an algorithm pipeline based on a set of approximately 1K diverse, high-quality 3D models that were retopologized from scans of real people (base identities). All generated identities are distinct from these base identities, eliminating identifiability and privacy concerns.

Each generated identity is assigned an age, ethnicity and gender based on CLIP, a model from OpenAI, followed by post-processing and validation using our internal models.

Identities Evaluation And Features

Our 50k identities were verified to be distinct using the face recognition network Facenet, applied to the identities’ “enrollment images” (frontal facing, neutral expression, eyes fully opened, images and no background, see above). In the Facenet latent space, the average distance between 2 generated identities is approximately the same as the average distance between 2 base identities, which shows that no variance has been lost compared to base identities.

We evaluated both visually and quantitatively, the level of realism in our identities compared to the original base identities, and observed that it was approximately preserved.

Download the images to see the details in high-resolution. Images generated from the Datagen Platform.

Future Identity Roadmap

Since variance is one of the most critical aspects of synthetic data, we’re not stopping at 50K. We will continue to grow the number of identities available for generation, and to improve diversity to deal with edge cases and bias. Each conceivable real-world identity should have a synthetic look-alike in our platform. In addition, we have an active project to further improve the level of realism to reach full photorealistic quality for a zero domain gap future.