Home » 34 Amazing Image Datasets for Your Next Computer Vision Project

34 Amazing Image Datasets for Your Next Computer Vision Project

What Are Image Datasets?

A dataset is a collection of data curated for a machine learning project. An image dataset includes digital images curated for testing, training, and evaluating the performance of machine learning and artificial intelligence (AI) algorithms, commonly computer vision algorithms.

Image datasets help algorithms learn to identify and recognize information in images and perform related cognitive activities. For example, AI algorithms can be trained to tag photographs, read car plates, and identify tumors in medical images. Computer vision algorithms can also transform images or generate completely new images, with a variety of practical applications.

Below we provide a concise listing of important image datasets, categorized by their usefulness for computer vision tasks such as face detection, image segmentation, and image classification.

Face Datasets

Face datasets provide a diverse set of images containing human faces. A face dataset may include various lighting conditions, poses, emotions, and other factors like the ethnicity, age, and gender of the people in the image.

These datasets are commonly used for facial recognition, a branch of computer vision applicable to many areas, including augmented reality, criminal justice, and personal device security.

CelebFaces Attributes Dataset (CelebA)

Description: Large-scale face attributes dataset with celebrity images, covering pose variations and background clutter.
Publisher and Release Date: ICCV, 2015
# Images: 202,599
# Identities: 10,177
Annotations: 5 face landmark locations, 40 binary attributes such as Bushy Eyebrows, Mustache, Gray Hair, Wearing Necklace.
License: Non-commercial research purposes only
Link: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

VGG Face2

Description: Large dataset with 3.31 million face images, covering many variations of pose, age, illumination, ethnicity, and professions.
Publisher and Release Date: University of Oxford, 2018
# Images: 3,310,000
# Identities: 9,131
Annotations: Human-verified bounding boxes around faces, five face landmarks. Pose (yaw, pitch and roll) and age are estimated by pre-trained classifiers.
License: Non-commercial research purposes only
Link: https://github.com/ox-vgg/vgg_face2

IJB (IARPA Janus Benchmark)

A project including three datasets: IJB-A—face dataset with wide variations in pose, illumination, expression, resolution and occlusion. IJB-B—template-based face dataset with still images and videos. A template consists of still images and video frames of the same individual from different sources. IJB-C—a video-based face dataset which extends the IJB-A/B datasets with more face images, face videos, and non-face images.
Description: A project including three datasets: IJB-A—face dataset with wide variations in pose, illumination, expression, resolution and occlusion. IJB-B—template-based face dataset with still images and videos. A template consists of still images and video frames of the same individual from different sources. IJB-C—a video-based face dataset which extends the IJB-A/B datasets with more face images, face videos, and non-face images.
Publisher and Release Date: NIST, 2015 – 2019
# Images: IJB-A: 5,712 images + 2,085 videos, 500 identities IJB-B: 11,754 images + 7,011 videos, 1845 identities IJB-C: 138,000 face images + 11,000 videos + 10,000 non-face images
Annotations: The three datasets provide different levels of annotation (see the challenge website for details).
License: Creative Commons License with strict limitation on distributing the data
Link: https://www.nist.gov/programs-projects/face-challenges

CASIA-WebFace

Description: Dataset intended for scientific research of unconstrained face recognition and face identification tasks. Contains face images crawled from the Internet.
Publisher and Release Date: Institute of Automation, Chinese Academy of Sciences (CASIA)
# Images: 494,414
# Identities: 10,575
Annotations: Annotated face images
License: Non-commercial research purposes only.
Link: https://paperswithcode.com/dataset/casia-webface#:~:text=The%20CASIA%2DWebFace%20dataset%20is,Mask%20using%20Multi%2Dscale%20GANs

Yale Face Database

Database A—small-scale dataset with grayscale images in GIF format. Covers only 15 people, showing them with different facial expressions or configurations. Database B—large-scale database with a large variety of images of 28 human subjects showing different poses and lighting conditions.
Description: Database A—small-scale dataset with grayscale images in GIF format. Covers only 15 people, showing them with different facial expressions or configurations. Database B—large-scale database with a large variety of images of 28 human subjects showing different poses and lighting conditions.
# Images: Database A: 165 images, 15 identities Database B: 16,128 images, 28 identities
Annotations: Annotations for identity, facial expressions (happy, sleepy, surprised, etc.), features such as glasses/no glasses, and lighting conditions
License: Non-commercial research purposes only.
Link: http://vision.ucsd.edu/content/yale-face-database

UMDFaces

Description: Large face dataset collected and labeled by human annotators and deep networks. Includes both images and still images from video footage.
Publisher and Release Date: IEEE IJCB, 2017
# Images: 22,000 videos + 367,888
# Identities: 8,277 in images + 3,100 in video
Annotations: Human-curated face bounding boxes, estimated pose (roll, pitch)
License: Non-commercial research purposes
Link: http://umdfaces.io/

MS-Celeb

Description: 1M Dataset of the top 100K celebrities in terms of web appearance frequency. For each celebrity, approx. 100 images were retrieved from popular search engines.
Publisher and Release Date: ECCV, 2016
# Images: 10,000,000
# Identities: 100,000
Annotations: Not annotated.
License: Open Data Commons PDD License
Link: https://www.microsoft.com/en-us/research/app/uploads/2016/08/MSCeleb-1M-a.pdf

YoutubeFace

Description: Similar to MS-Celeb-1M, only the face images were obtained from YouTube videos.
Publisher and Release Date: CVPR, 2011
# Images: 3,425
# Identities: 1,595
Annotations: Face bounding boxes and the following descriptors: Local Binary Patterns (LBP), CenterSymmetric LBP (CSLBP) and Four-Patch LBP.
License: MIT License
Link: https://www.cs.tau.ac.il/~wolf/ytfaces/

PaSc

Description: Dataset including both still images and videos of people. Images are balanced with respect to distance to the camera, frontal vs. non-frontal views, and locations. Suitable for comparing still images to still images, videos to videos, and still images to videos.
Publisher and Release Date: IEEE 6th Biometrics Conf, 2013
# Images: 2,802 videos
# Identities: 293
Annotations: Not annotated, because it is intended as a benchmark for video face recognition.
License: Check with publishers
Link: https://www.nist.gov/publications/challenge-face-recognition-digital-point-and-shoot-cameras

iQIYI-VID

Description: The largest video dataset for multi-modal person identification. Video clips are extracted from 400K hours of online videos including movies, TV shows, and news broadcasting, with careful human annotation
Publisher and Release Date: iQIYI, Inc., 2018
# Images: 600,000
# Identities: 5,000
Annotations: Stage 1: Localizing faces and identities using algorithms. Stage 2: Manual face recognition and identification by two different annotators per image, ensuring error rate less than 0.2%
License: Permissive open source license
Link: https://arxiv.org/pdf/1811.07548.pdf

Wider Face

Description: Dataset with rich annotations including occlusions, poses, event categories, and face bounding boxes. Intended to be challenging for face recognition algorithms due to variations in scale, pose and occlusion.
Publisher and Release Date: Chinese University of Hong Kong, 2018
# Images: 32,203
# Identities: 393,703
Annotations: Face bounding boxes, occlusion, pose, and event categories. Dataset also labels faces that are occluded or need to be ignored due to low quality or resolution. Each annotation is labeled by one annotator and cross-checked by two people.
License: Non-commercial research purposes only
Link: http://shuoyang1213.me/WIDERFACE/

Flickr-Faces-HQ Dataset (FFHQ)

Description: High-quality dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN).
Publisher and Release Date: NVIDIA, 2019
# Images: 70,000
Annotations: Images automatically aligned and cropped, non-human figures identified by Amazon Mechanical Turk. Annotations of age, ethnicity, image background, accessories.
License: Creative Commons BY-NC-SA 4.0
Link: https://github.com/NVlabs/ffhq-dataset

Labeled Faces in the Wild (LFW)

Description: A public benchmark for face verification, also known as pair matching. This dataset is now considered outdated because it has been saturated by current face recognition algorithms.
Publisher and Release Date: University of Massachusetts, 2007
# Images: 13,233
# Identities: 5,749
Annotations: Images automatically recognized using Viola-Jones algorithm, and manually annotated with names.
License: Non-commercial research purposes only
Link: http://vis-www.cs.umass.edu/lfw/index.html

Image Segmentation and Object Detection Datasets

Image segmentation involves breaking down an image into several segments and associating each pixel with an object type. There are two main types of segmentation—semantic segmentation, which marks all objects of the same type with one class label, and instance segmentation, which marks similar objects (for example, individual people or cars in an image) with separate labels.

Object detection involves locating instances of objects in videos or images. This technique helps machines learn how humans recognize and locate objects in images. Object detection is often performed on top of image segmentation; after an image is segmented into meaningful objects, object detection algorithms attempt to identify the class of each object.

Image datasets suitable for image segmentation and object detection projects provide images of scenes, broken down into individual objects, with ground truth annotations performed or verified by humans.

LabelMe

Description: Dataset of digital images with annotations, designed for recognition of object classes.
Publisher and Release Date: MIT
# Images: Dynamically Updated
Annotations and Classes: Annotation of multiple objects within an image by specifying polygon bounding box containing the object.
License: Free to use and open to public contribution
Link: http://labelme.csail.mit.edu/Release3.0/

MS COCO

Description: An image dataset with challenging, high-quality images used to train object detection, segmentation, and captioning algorithms
Publisher and Release Date: Microsoft, Facebook, and other organizations, 2015
# Images: 330,000 images, 200,000+ annotated
Annotations and Classes: 80 object categories, 91 “stuff” (materials and objects like sky, street, grass), 5 captions per image.
License: Creative Commons 4.0
Link: https://cocodataset.org/#home

GuessWhat?!

Description: Large-scale dataset generated via 150K human-played games with 800K visual question-answer pairs
Publisher and Release Date: CHISTERA—IGLU, 2017
# Images: 66,000
Annotations and Classes: Object labels determined by iterative questioning.
License: Apache 2.0
Link: https://github.com/GuessWhatGame/guesswhat

CAMO (Camouflaged Object)

Description: Dataset intended for the task of segmenting naturally or artificially camouflaged objects which are difficult to distinguish from their background
Publisher and Release Date: CVIU, 2019
# Images: 1,250 camouflaged images 1,250 non-camouflaged
Annotations and Classes: Object mask for ground truth images.
License: Non-commercial research purposes only
Link: https://sites.google.com/view/ltnghia/research/camo

CAR (Cityscapes Attributes Recognition)

Description: Dataset with visual attributes of objects in city scenes. Intended to support development of self-driving vehicles.
Publisher and Release Date: Daimler AG, Max Planck Institute, TU Darmstadt
# Images: 32,000
Annotations and Classes: Each object in an image has a list of attributes that depend on its category of the object—for example, vehicles have visibility attributes and pedestrians have activity attributes.
License: MIT
Link: https://github.com/kareem-metwaly/CAR-API

SECOND (SEmantic Change detectiON Dataset)

Description: A change detection dataset that shows changes in aerial images including both natural and man-made geographical changes.
Publisher and Release Date: Wuhan University, 2020
# Images: 4,662 pairs of aerial images
Annotations and Classes: Annotations focus on six land-cover classes: non-vegetated ground surface, trees, low vegetation, water, buildings and playgrounds, with 30 change categories.
License: Non-commercial research purposes only
Link: https://captain-whu.github.io/SCD/

Image Classification Datasets

Image classification involves identifying what an image represents. Training an image classification model enables it to recognize diverse classes of images. Image classification datasets provide large sets of images with ground truth labels, providing the structured information needed to train a classification model.

CIFAR-100

Description: Dataset with 100 classes grouped into 20 superclasses, and 600 images per class.
Publisher and Release Date: Institute for Advanced Research, 2009
# Images: 60,000
Annotations and Classes: Each image has a label for its class and superclass.
License: MIT
Link: https://www.cs.toronto.edu/~kriz/cifar.html

ImageNet

Description: Widely used benchmark for image classification with millions of images annotated according to the WorldNet hierarchy.
Publisher and Release Date: IEEE, 2009
# Images: 14,197,122
Annotations and Classes: Image-level annotations provide a binary label for the presence or absence of an object class (e.g. a car). Object-level annotations provide a tight bounding box and class label for a specific object in the image
License: Non-commercial research purposes only
Link: https://image-net.org/index.php

MNIST

Description: Dataset of handwritten digits normalized to fit in a 20×20 pixel box, centered in a 28×28 image.
Publisher and Release Date: NIST
# Images: 60,000
Annotations and Classes: Each image is labeled with the digit shown in it.
License: Non-commercial research purposes only
Link: http://yann.lecun.com/exdb/mnist/

Visual Storytelling

Description: A dataset of unique photos collected into 50,000 stories or albums. It is intended to train algorithms in natural language storytelling—human-like understanding of grounded event structure and subjective expression
Publisher and Release Date: NAACL, 2016
# Images: 81,743 photos in 20,211 sequences
Annotations and Classes: The stories were created via Amazon Mechanical Turk and have corresponding images and text written by human annotators, for example “the boy is playing soccer”, “up the soccer ball goes
License: Non-commercial research purposes only
Link: https://visionandlanguage.net/VIST/dataset.html

Other Notable Image Datasets

Below are several other image datasets that are notable, either for the unique data they provide or for the sheer size of the annotated images they contain.

Open Images Dataset

Description: Dataset of images with labels and bounding boxes.
# Images: 9 million
Link: https://storage.googleapis.com/openimages/web/index.html

SVHN (Street View House Numbers)

Description: Digital classification benchmark with images of printed digits on house number plates
# Images: 600,000
Link: http://ufldl.stanford.edu/housenumbers/

AID (Aerial Image Dataset)

Description: Large-scale aerial dataset using Google Earth imagery.
# Images: 10,000
Link: https://captain-whu.github.io/AID/

IQUAD (Interactive Question Answering Dataset)

Description: Images based on a simulated photo-realistic environment with indoor scenes and interactive objects.
# Images: 75,000 unique scene configuration
Link: https://github.com/danielgordon10/thor-iqa-cvpr-2018

CUHK-QA

Description: Dataset for natural language-based person search.
# Images: 400 images of 360 people
Link: https://github.com/vikshree/QA_PersonSearchLanguageData

FACTIFY

Description: Dataset on multi-modal fact verification, containing images, textual claims, reference textual documents and images.
# Images: 50,000 claims supported by 100,000 images
Link: https://competitions.codalab.org/competitions/35153

FixMyPose

Description: Dataset for automated pose correction. Shows characters performing a variety of movements in interior environments.
# Images: Synthetic images
Link: https://competitions.codalab.org/competitions/35153

IAPR TC-12

Description: Still natural images taken around the world, showing sports, actions, people, animals, cities, and landscapes.
# Images: 20,000
Link: https://www.imageclef.org/photodata

InstaCities1M

Description: Dataset of social media images with associated text. Each image is associated with one of the 10 most populated English speaking cities.
# Images: 1 million images (100K for each city)
Link: https://gombru.github.io/2018/08/01/InstaCities1M/

OSLD (Open Set Logo Detection Dataset)

Description: Dataset of eCommerce product images with associated brand logo images for logo detection tasks.
# Images: 20,000
Link: https://github.com/mubastan/osld

RodoSol-ALPR

Description: Images of vehicles captured during day and night by Brazil pay toll cameras.
# Images: 20,000
Link: https://github.com/raysonlaroca/rodosol-alpr-dataset/