34 Amazing Image Datasets for Your Next Computer Vision Project

What Are Image Datasets?

A dataset is a collection of data curated for a machine learning project. An image dataset includes digital images curated for testing, training, and evaluating the performance of machine learning and artificial intelligence (AI) algorithms, commonly computer vision algorithms. 

Image datasets help algorithms learn to identify and recognize information in images and perform related cognitive activities. For example, AI algorithms can be trained to tag photographs, read car plates, and identify tumors in medical images. Computer vision algorithms can also transform images or generate completely new images, with a variety of practical applications.

Below we provide a concise listing of important image datasets, categorized by their usefulness for computer vision tasks such as face detection, image segmentation, and image classification.

Face Datasets

Face datasets provide a diverse set of images containing human faces. A face dataset may include various lighting conditions, poses, emotions, and other factors like the ethnicity, age, and gender of the people in the image. 

These datasets are commonly used for facial recognition, a branch of computer vision applicable to many areas, including augmented reality, criminal justice, and personal device security.

See Face Datasets table below.

Image Segmentation and Object Detection Datasets

Image segmentation involves breaking down an image into several segments and associating each pixel with an object type. There are two main types of segmentation—semantic segmentation, which marks all objects of the same type with one class label, and instance segmentation, which marks similar objects (for example, individual people or cars in an image) with separate labels.

Object detection involves locating instances of objects in videos or images. This technique helps machines learn how humans recognize and locate objects in images. Object detection is often performed on top of image segmentation; after an image is segmented into meaningful objects, object detection algorithms attempt to identify the class of each object.

Image datasets suitable for image segmentation and object detection projects provide images of scenes, broken down into individual objects, with ground truth annotations performed or verified by humans.

See Image Segmentation and Object Detection Datasets table below.

Image Classification Datasets

Image classification involves identifying what an image represents. Training an image classification model enables it to recognize diverse classes of images. Image classification datasets provide large sets of images with ground truth labels, providing the structured information needed to train a classification model.

See Image Classification Datasets table below.

Other Notable Image Datasets

Below are several other image datasets that are notable, either for the unique data they provide or for the sheer size of the annotated images they contain.

See Notable Image Datasets table below.

 

Face Datasets
Description
Publisher and Release Date
# Images
# Identities
Annotations
License
Link
CelebFaces Attributes Dataset (CelebA)
Large-scale face attributes dataset with celebrity images, covering pose variations and background clutter. 
ICCV, 2015
202,599
10,177
5 face landmark locations, 40 binary attributes such as Bushy Eyebrows, Mustache, Gray Hair, Wearing Necklace.
Non-commercial research purposes only
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
VGG Face2
Large dataset with 3.31 million face images, covering many variations of pose, age, illumination, ethnicity, and professions.
University of Oxford, 2018
3,310,000
9,131
Human-verified bounding boxes around faces, five face landmarks. Pose (yaw, pitch and roll) and age are estimated by pre-trained classifiers. 
Non-commercial research purposes only
https://github.com/ox-vgg/vgg_face2
IJB (IARPA Janus Benchmark) A project including three datasets: IJB-A—face dataset with wide variations in pose, illumination, expression, resolution and occlusion. IJB-B—template-based face dataset with still images and videos. A template consists of still images and video frames of the same individual from different sources. IJB-C—a video-based face dataset which extends the IJB-A/B datasets with more face images, face videos, and non-face images.
A project including three datasets: IJB-A—face dataset with wide variations in pose, illumination, expression, resolution and occlusion. IJB-B—template-based face dataset with still images and videos. A template consists of still images and video frames of the same individual from different sources. IJB-C—a video-based face dataset which extends the IJB-A/B datasets with more face images, face videos, and non-face images.
NIST, 2015-2019
IJB-A: 5,712 images + 2,085 videos, 500 identities IJB-B: 11,754 images + 7,011 videos, 1845 identities IJB-C: 138,000 face images + 11,000 videos + 10,000 non-face images
The three datasets provide different levels of annotation (see the challenge website for details).
Creative Commons License with strict limitation on distributing the data
https://www.nist.gov/programs-projects/face-challenges
CASIA-WebFace
Dataset intended for scientific research of unconstrained face recognition and face identification tasks. Contains face images crawled from the Internet.
Institute of Automation, Chinese Academy of Sciences (CASIA)
494,414
10,575
Annotated face images.
Non-commercial research purposes only.
https://paperswithcode.com/dataset/casia-webface#:~:text=The%20CASIA%2DWebFace%20dataset%20is,Mask%20using%20Multi%2Dscale%20GANs
Yale Face Database Database A—small-scale dataset with grayscale images in GIF format. Covers only 15 people, showing them with different facial expressions or configurations. Database B—large-scale database with a large variety of images of 28 human subjects showing different poses and lighting conditions.
Database A—small-scale dataset with grayscale images in GIF format. Covers only 15 people, showing them with different facial expressions or configurations. Database B—large-scale database with a large variety of images of 28 human subjects showing different poses and lighting conditions.
Database A: 165 images, 15 identities Database B: 16,128 images, 28 identities
Annotations for identity, facial expressions (happy, sleepy, surprised, etc.), features such as glasses/no glasses, and lighting conditions.
Non-commercial research purposes only.
http://vision.ucsd.edu/content/yale-face-database
UMDFaces
Large face dataset collected and labeled by human annotators and deep networks. Includes both images and still images from video footage.
IEEE IJCB, 2017
22,000 videos + 367,888
8,277 in images + 3,100 in video
Human-curated face bounding boxes, estimated pose (roll, pitch)
Non-commercial research purposes
http://umdfaces.io/
MS-Celeb
1M Dataset of the top 100K celebrities in terms of web appearance frequency. For each celebrity, approx. 100 images were retrieved from popular search engines.
ECCV, 2016
10,000,000
100,000
Not annotated.
Open Data Commons PDD License
https://www.microsoft.com/en-us/research/app/uploads/2016/08/MSCeleb-1M-a.pdf
YoutubeFace
Similar to MS-Celeb-1M, only the face images were obtained from YouTube videos.
CVPR, 2011
3,425
1,595
Face bounding boxes and the following descriptors: Local Binary Patterns (LBP), CenterSymmetric LBP (CSLBP) and Four-Patch LBP.
MIT License
https://www.cs.tau.ac.il/~wolf/ytfaces/
PaSc
Dataset including both still images and videos of people. Images are balanced with respect to distance to the camera, frontal vs. non-frontal views, and locations. Suitable for comparing still images to still images, videos to videos, and still images to videos.
IEEE 6th Biometrics Conf, 2013
2,802 videos
293
Not annotated, because it is intended as a benchmark for video face recognition.
Check with publishers
https://www.nist.gov/publications/challenge-face-recognition-digital-point-and-shoot-cameras
iQIYI-VID
The largest video dataset for multi-modal person identification. Video clips are extracted from 400K hours of online videos including movies, TV shows, and news broadcasting, with careful human annotation
iQIYI, Inc., 2018
600,000
5,000
Stage 1: Localizing faces and identities using algorithms. Stage 2: Manual face recognition and identification by two different annotators per image, ensuring error rate less than 0.2%
Permissive open source license
https://arxiv.org/pdf/1811.07548.pdf
Wider Face
Dataset with rich annotations including occlusions, poses, event categories, and face bounding boxes. Intended to be challenging for face recognition algorithms due to variations in scale, pose and occlusion.
Chinese University of Hong Kong, 2018
32,203
393,703
Face bounding boxes, occlusion, pose, and event categories. Dataset also labels faces that are occluded or need to be ignored due to low quality or resolution. Each annotation is labeled by one annotator and cross-checked by two people.
Non-commercial research purposes only
http://shuoyang1213.me/WIDERFACE/
Flickr-Faces-HQ Dataset (FFHQ)
High-quality dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN).
NVIDIA, 2019
70,000
N/A
Images automatically aligned and cropped, non-human figures identified by Amazon Mechanical Turk. Annotations of age, ethnicity, image background, accessories.
Creative Commons BY-NC-SA 4.0
https://github.com/NVlabs/ffhq-dataset
Labeled Faces in the Wild (LFW)
A public benchmark for face verification, also known as pair matching. This dataset is now considered outdated because it has been saturated by current face recognition algorithms.
University of Massachusetts, 2007
13,233
5,749
Images automatically recognized using Viola-Jones algorithm, and manually annotated with names.
Non-commercial research purposes only
http://vis-www.cs.umass.edu/lfw/index.html
Image Segmentation and Object Detection Datasets
Description
Publisher and Release Date
# Images
Annotations and Classes
License
Link
LabelMe
Dataset of digital images with annotations, designed for recognition of object classes.
MIT
Dynamically updated
Annotation of multiple objects within an image by specifying polygon bounding box containing the object.
Free to use and open to public contribution
http://labelme.csail.mit.edu/Release3.0/
MS COCO
An image dataset with challenging, high-quality images used to train object detection, segmentation, and captioning algorithms
Microsoft, Facebook, and other organizations, 2015
330,000 images, 200,000+ annotated
80 object categories, 91 “stuff” (materials and objects like sky, street, grass), 5 captions per image.
Creative Commons 4.0
https://cocodataset.org/#home
GuessWhat?!
Large-scale dataset generated via 150K human-played games with 800K visual question-answer pairs
CHISTERA—IGLU, 2017
66,000
Object labels determined by iterative questioning.
Apache 2.0
https://github.com/GuessWhatGame/guesswhat
CAMO (Camouflaged Object)
Dataset intended for the task of segmenting naturally or artificially camouflaged objects which are difficult to distinguish from their background
CVIU, 2019
1,250 camouflaged images 1,250 non-camouflaged
Object mask for ground truth images.
Non-commercial research purposes only
https://sites.google.com/view/ltnghia/research/camo
CAR (Cityscapes Attributes Recognition)
Dataset with visual attributes of objects in city scenes. Intended to support development of self-driving vehicles.
Daimler AG, Max Planck Institute, TU Darmstadt
32,000
Each object in an image has a list of attributes that depend on its category of the object—for example, vehicles have visibility attributes and pedestrians have activity attributes.
MIT
https://github.com/kareem-metwaly/CAR-API
SECOND (SEmantic Change detectiON Dataset)
A change detection dataset that shows changes in aerial images including both natural and man-made geographical changes.
Wuhan University, 2020
4,662 pairs of aerial images.
Annotations focus on six land-cover classes: non-vegetated ground surface, trees, low vegetation, water, buildings and playgrounds, with 30 change categories.
Non-commercial research purposes only
https://captain-whu.github.io/SCD/
Image Classifications Datasets
Description
Publisher and Release Date
# Images
Annotations and Classes
License
Link
CIFAR-100
Dataset with 100 classes grouped into 20 superclasses, and 600 images per class.
Institute for Advanced Research, 2009
60,000
Each image has a label for its class and superclass.
MIT
https://www.cs.toronto.edu/~kriz/cifar.html
ImageNet
Widely used benchmark for image classification with millions of images annotated according to the WorldNet hierarchy.
IEEE, 2009
14,197,122
Image-level annotations provide a binary label for the presence or absence of an object class (e.g. a car). Object-level annotations provide a tight bounding box and class label for a specific object in the image.
Non-commercial research purposes only
https://image-net.org/index.php
MNIST
Dataset of handwritten digits normalized to fit in a 20x20 pixel box, centered in a 28x28 image.
NIST
60,000
Each image is labeled with the digit shown in it.
Non-commercial research purposes only
http://yann.lecun.com/exdb/mnist/
Visual Storytelling
A dataset of unique photos collected into 50,000 stories or albums. It is intended to train algorithms in natural language storytelling—human-like understanding of grounded event structure and subjective expression
NAACL, 2016
81,743 photos in 20,211 sequences
The stories were created via Amazon Mechanical Turk and have corresponding images and text written by human annotators, for example “the boy is playing soccer”, “up the soccer ball goes”.
Non-commercial research purposes only
https://visionandlanguage.net/VIST/dataset.html
Notable Image Datasets
Description
# Images
Link
Open Images Dataset
Dataset of images with labels and bounding boxes.
9 million
https://storage.googleapis.com/openimages/web/index.html
SVHN (Street View House Numbers)
Digital classification benchmark with images of printed digits on house number plates
600,000
http://ufldl.stanford.edu/housenumbers/
AID (Aerial Image Dataset)
Large-scale aerial dataset using Google Earth imagery.
10,000
https://captain-whu.github.io/AID/
IQUAD (Interactive Question Answering Dataset)
Images based on a simulated photo-realistic environment with indoor scenes and interactive objects.
75,000 unique scene configuration
https://github.com/danielgordon10/thor-iqa-cvpr-2018
CUHK-QA
Dataset for natural language-based person search.
400 images of 360 people
https://github.com/vikshree/QA_PersonSearchLanguageData
FACTIFY
Dataset on multi-modal fact verification, containing images, textual claims, reference textual documents and images.
50,000 claims supported by 100,000 images
https://competitions.codalab.org/competitions/35153
FixMyPose
Dataset for automated pose correction. Shows characters performing a variety of movements in interior environments.
Synthetic images
https://competitions.codalab.org/competitions/35153
IAPR TC-12
Still natural images taken around the world, showing sports, actions, people, animals, cities, and landscapes.
20,000
https://www.imageclef.org/photodata
InstaCities1M
Dataset of social media images with associated text. Each image is associated with one of the 10 most populated English speaking cities.
1 million images (100K for each city)
https://gombru.github.io/2018/08/01/InstaCities1M/
OSLD (Open Set Logo Detection Dataset)
Dataset of eCommerce product images with associated brand logo images for logo detection tasks.
20,000
https://github.com/mubastan/osld
RodoSol-ALPR
Images of vehicles captured during day and night by Brazil pay toll cameras.
20,000
https://github.com/raysonlaroca/rodosol-alpr-dataset/