FFHQ Dataset: Usage in GAN Research and Alternatives

What is Flickr-Faces-HQ (FFHQ)?

Flickr-Faces-HQ (FFHQ) is an image dataset containing high-quality images of human faces. It is provided by NVIDIA under the Creative Commons BY-NC-SA 4.0 license. It offers 70,000 PNG images at 1024×1024 resolution that display diverse ages, ethnicities, image backgrounds, and accessories like hats and eyeglasses. 

Here is how the images in FFHQ were obtained:

  • All images were crawled from Flickr, inheriting biases from this website. The process collected only images under permissive licenses. 
  • The images were automatically aligned and cropped using the dlib toolkit. 
  • The process applied automatic filters to prune the set. 
  • The process employed Amazon Mechanical Turk to remove paintings, statues, and photos of photos.

FFHQ was initially created as a benchmark for generative adversarial networks (GAN).

Downloading and Installing the FFHQ Dataset

The FFHQ dataset includes JSON metadata, a download script, and documentation. There are two ways to access the dataset:

  • Download it directly from Google Drive
  • Use a download script provided by the authors of the FFHQ paper. The script provides several convenient features such as automatically downloading all images, verifying checksums, retry on error, and using multiple concurrent connections for download.

The download script provides several arguments which can help you customize the download to your needs. Here are the most important arguments:

  • –json—download metadata as JSON file
  • –stats—display dataset statistics
  • –images—download images as PNG with pixel size 1024×1024 pixels (total size 89.1 GB)
  • –thumbs—download images as PNG with pixel size 128×128 (total size 1.95 GB)
  • –wilds—download original in-the-wild images as PNG (total size 955 GB)
  • –tfrecords—download multi-resolution TFRecords (total size 273 GB)
  • –align—recreate images in pixel size 1024×1024 from original in-the-wild images
  • –num_threads—number of concurrent threads for the download
  • –num_attempts—number of times the script should attempt to download each file
  • no-rotation—keep original orientation of images (do not align)
  • –nopadding—do not apply blur-padding around and near image borders
  • –source-dir—local directory with existing FFHQ source data

Usage of FFHQ in Generative Model Research

FFHQ is commonly used to benchmark GAN models. Below are three research efforts that used FFHQ to demonstrate the effectiveness of their generative models.

StyleGAN

The original FFHQ paper was the first use of the FFHQ dataset, to show the effectiveness of a novel GAN framework known as StyleGAN. The authors proposed a new generator architecture inspired by style transfer research. It performs unsupervised separation of high-level attributes of face images (such as hair or freckles), making it possible to generate highly realistic variations on base images. 

Their generator improved on previous generator methods, which operated as black boxes and did not allow fine-tuning of high level features. The StyleGAN generator starts from a learned constant input, and adjusts the style of the image at each convolutional step, directly controlling the strength of different image features, and injecting a controlled amount of “noise” into the network.

The authors created the FFHQ dataset as part of this research and used it to show the effectiveness of their approach. From this point onwards, FFHQ became a popular way to benchmark and compare GAN architectures.

Manifold Matching via Metric Learning (MvM)

Researchers from Microsoft proposed an alternative generative modeling technique, which can be used to create super-resolution versions of existing images or realistic synthetic images. This approach, called MvM, is different from traditional GAN in that it models geometric measures such as centroids and p-diameters, instead of statistical properties like mean and moments. 

MvM has two neural networks: a metric generator network, which learns to define better distribution metrics, and a distribution generator network which learns to produce hard negative samples. Through adversarial training, the distribution generator network learns to generate fake data distributions that are very close to real distributions. A major advantage of MvM is that it is more interpretable than GAN, because it uses a single min-max function to indicate how well the network is trained. This makes it easier for human testers to know when the network has finished training and is ready to evaluate. 

To test the effectiveness of this new framework, researchers trained the StyleGAN2 architecture using MvM on the FFHQ dataset. They generated images of 512×512 pixels, demonstrating that MvM is effective in creating images very similar to the real images in the FFHQ dataset.

Using FFHQ to Compare Different Approaches to Face Aging

In a recent paper by Sharma et al. the researchers addressed the task of generative face aging. This is an image-to-image task that can has many useful applications such as biometric systems, law enforcement, and entertainment. They compared several approaches to face aging:

  • CycleGAN (Cycle-Consistent Adversarial Network)—an architecture with two generators and two discriminators, which converts an image from one domain to another without requiring a dataset of paired images. 
  • AttentionGAN—uses attention masks and content masks applied to the generated output in one domain to create a highly realistic image in another domain. 

The researchers trained both models on the CelebA dataset and the FFHQ dataset, evaluating efficacy of the two models by measuring identity preservation (how likely each model is to retain the same identity compared to the original image), quantitative image quality metrics, and subjective evaluations by human testers. They concluded that CycleGAN has better performance than AttentionGAN.

FFHQ Alternatives

Large-scale CelebFaces Attributes (CelebA) Dataset

CelebFaces Attributes (CelebA) is a large-scale face dataset. It consists of celebrity images and covers large pose variations and background clutter, including:

  • 200,000 images of celebrities in total
  • 10,177 identities 
  • 202,599 face images
  • Five landmark locations 
  • 40 binary attributes annotations for each image 

CelebA can help train and test data sets for various computer vision tasks, including face attribute recognition, landmark localization, face editing and synthesis, and face detection. 

Tufts-Face-Database

Tufts Face Database is a large-scale face dataset. It contains seven image modalities—visible, thermal, near-infrared, computerized sketch, recorded video, 3D images, and LYTRO. The dataset consists of over 10,000 images of diverse backgrounds, including: 

  • 74 females and 38 males 
  • 15 countries 
  • An age range of 4-70 

Tufts Face Database is globally available to researchers to help benchmark facial recognition algorithms for thermal, sketches, NIR, heterogamous face recognition, and 3D face recognition.

Google Facial Expression Comparison Dataset

Google offers a large-scale facial expression dataset to assist researchers working on facial expression analysis. The dataset can help with expression-based image retrieval, emotion classification, expression-based photo album summarization, and expression synthesis.

This dataset consists of face image triplets with human-made annotations. These annotations specify which of two faces in each triplet includes more similarities in facial expression. The dataset includes 500,000 triplets, 156,000 face images, and weighs 200MB.

Labeled Faces in the Wild Home (LFW) Dataset

Labeled Faces in the Wild (LFW) is a public benchmark for face verification or other techniques of face recognition. It is designed to help study the problem of unconstrained face recognition. The database includes over 13,000 images of faces collected across the web and weighs 173MB.