A Friendly Guide to Public Indoor Environment Datasets

Ofir Zuk (Chakon)

26/11/2020

7 Min read

Understanding images of indoor home environments is a fundamental task for many applications of computer vision. One of the challenges in advancing computer vision is the availability of suitable datasets on which models can be trained. Particularly useful are public indoor datasets. Indoor means interior spaces such as within homes, buildings, offices, and the like. Public datasets are open-source and can be used freely for research purposes.

Generally, training datasets can be split into two broad categories: Manually Captured Datasets and Synthetically Generated Datasets. Manually Captured Datasets are comprised of data collected from the real world and then annotated. Synthetically Generated Datasets are made of data created by algorithms and computer graphics, meant to mimic the real world. Depending on the application or research needs, computer vision researchers might be interested in both. And, especially at the outset of research projects, may spend a lot of time searching for high-quality public indoor datasets that can jumpstart their efforts.

In this post, we provide a quick guide to some of the most popular high-quality, public datasets focused on training computer vision systems for indoor environment understanding. We’ll touch on some of their key characteristics and strengths and weaknesses of each. Over time, Datagen plans to contribute our own publicly available Simulated Datasets to the ecosystem. These are some of the datasets we look to as examples:

In This Article

Manual Datasets

Replica

Affiliation – Facebook, Georgia Tech, Simon Fraser University
Released – June 2019
Description – Dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale.
Scenes – 18
Rooms – 35
Frames – The data is a 3D house simulation. There are no frames per se, rather frames can be generated from the simulation.
Platform – custom-built RGB-D capture rig with an IR projector
Available Labels – RGB, depth, semantic instance, and semantic class segmentation
Bottom Line – The Replica dataset has a very high level of quality but on a small scale. It is free for non-commercial uses such as research and education.

ScanNet

Affiliation – Stanford University, Princeton University, Technical University of Munich
Released – February 2017
Description – ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans
Scenes – 1,513
Rooms – 707
Frames – 2.5 Million
Platform – Structure Sensor + iPad
Scene Design Type
Available Labels – 3D camera poses, surface reconstructions, and instance-level semantic segmentations.
Image Resolution – Depth frames 640 × 480 & color at 1296 × 968 pixels
Bottom Line – The semantic segmentation method used is limited in accuracy. Researchers can use the database freely, only for non-commercial research and educational purposes.

Read our survey on Synthetic Data: The Key to Production AI in 2022

MatterPort 3D

Affiliation – Stanford University, Princeton University, Technical University of Munich
Released – September 2017
Description – Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes.
Scenes – 90
Rooms – 2056
Frames – 194,400
Platform – MatterPort Camera
Available Labels – surface reconstructions, camera poses, and 2D and 3D semantic segmentation.
Image Resolution – 1280×1024
Bottom Line – The quality of the data is relatively high, but there are some issues with the geometry and lighting. Additionally, the segmentation is done similarly to the method used by ScanNet and has the same limits in terms of accuracy and resolution. Matterport can be used freely for both commercial and non-commercial uses.

SceneNN

Affiliation – University of Tokyo Singapore University of Technology and Design Deakin University George Mason University The Hong Kong University of Science and Technology
Released – 2016
Description – An RGB-D scene dataset consisting of more than 100 indoor scenes. Our scenes are captured at various places, e.g., offices, dormitory, classrooms, pantry, etc.,
Scenes – 100
Rooms – 95
Platform – Asus Xtion
Bottom Line – The quality and photorealism of the dataset is very high, but the scale is limited due to the amount of manual effort required. The dataset is free for educational and research purposes.

indoor environment

Synthetic Datasets

Structured3D

Affiliation – ShanghaiTech University
Released – August 2019
Description – Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs created by professional designers with a variety of ground truth 3D structure annotations and generate photo-realistic 2D images.
Scenes – 3500
Rooms – 21,835
Frames – 196,000
Scene Design Type – Profesional
Available Labels – Rich ground truth 3D structure annotations
Image Resolution – 720×1280
Bottom Line – A large and very high-quality dataset. Additionally, this dataset, like the other synthetic datasets has perfect ground truth. The dataset can be used freely only for non-commercial research and educational purposes.

InteriorNet

Affiliation – Imperial College London, Kujiale.
Released – 2018
Description – an end-to-end pipeline to render an RGB-D-inertial benchmark for large scale interior scene understanding and mapping.
Scenes – 10,000
Rooms 1.7 Million
Frames – 5 Million
Scene Design Type – Professional
Available Labels – RGB, depth, semantic instance, and semantic class segmentation.
Image Resolution – 640×480
Bottom Line – Very realistic, has a hard time capturing real spaces with all their imperfections due to use, clutter, and semantic variety. The dataset can be used freely only for non-commercial research and educational purposes.

SceneNet

Affiliation – University of Cambridge
Released – November 2015
Description – The main contribution of this work is to propose a new dataset of annotated 3D scenes that can generate virtually unlimited ground truth training data.
Scenes – 57
Rooms – 1000
Frames – The data is a 3D house simulation. There are no frames per se, rather frames can be generated from the simulation.
Scene Design Type – Random/Manual
Available Labels – Per-pixel semantic labeling
Bottom Line – SceneNet features very detailed segmentation and perfect ground truth. The dataset was released under a creative commons license which is purely for research purposes only.

SceneNet RGB-D

Affiliation – Imperial Collge London
Released – 2016
Description – Expanding upon the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories.
Rooms – 57
Scenes – 16,895
Frames – 5 Million
Scene Design Type – Random
Available Labels – The dataset provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, optical flow, depth estimation, camera pose estimation, and 3D reconstruction.
Image Resolution – 320×240
Bottom Line – An expanded and advanced version of SceneNet. Each layout also has random lighting, camera trajectories, and textures which enables a continuous stream of unseen training examples. Like SceneNet, this dataset was released under a creative commons license which is purely for research purposes only.

SUNCG

Affiliation – Princeton University
Released – November 2016
Description – This dataset is not photorealistic but focuses on occluded surfaces and achieving accurate geometric representation of the objects in the image.
Scenes – 45,622
Rooms – 404,058
Frames – The data is a 3D house simulation. There are no frames per se, rather frames can be generated from the simulation.
Scene Design Type – Manual
Available Labels – The dataset introduces the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously out
puts occupancy and semantic labels for all voxels in the camera view frustum.
Bottom Line – Large scale and great at dealing with occluded surfaces. However, the scenes lack photorealistic appearance and are often semantically overly simplistic. This dataset is free for educational and research purposes.

As you can tell, there are numerous options when it comes to datasets of indoor environments with significant nuances and differences. One thing to note is the trade-off between scale and quality. This is a constant feature to keep top of mind when exploring datasets. Synthetic data has a huge advantage when it comes to scaling, as the manual effort in collecting and annotating is huge and very costly to scale. But, historically, synthetic data has had the challenge of retaining the photorealism of manual datasets and not sacrificing quantity for quality (a problem that Datagen is committed to solving).

Another key differentiator between the sets is types and quality of annotation – certain annotation types are irrelevant for key applications and low quality can significantly impact application performance.

Choosing a dataset to use can be tough and you may spend a lot of time researching your various options. We hope that this resource expedites your search. In future posts, we’ll be providing similar summaries of public datasets for other applications such as hands, faces, and bodies.

Read our survey on Synthetic Data: The Key to Production AI in 2022