Facial Landmarks: Use Cases, Datasets, and a Quick Tutorial

What Are Facial Landmarks?

Facial landmark detection is a computer vision task in which a model needs to predict key points representing regions or landmarks on a human’s face – eyes, nose, lips, and others. Facial landmark detection is a base task which can be used to perform other computer vision tasks, including head pose estimation, identifying gaze direction, detecting facial gestures, and swapping faces.

This is part of our series of articles about face recognition.

Common Use Cases for Facial Landmark Detection

Driver Monitoring

Driver fatigue causes many car accidents, and the emergency stopping capabilities of smart cars are not always effective. Monitoring drivers for signs of fatigue is useful in preventing accidents. For example, a computer vision model could process video feeds from a camera in the vehicle that detects signs of tiredness or distraction in the driver’s face. The model could trigger an alert if the driver doesn’t pay sufficient attention to the road.

Built-in movement tracking systems and wearable ECG tracking devices can perform a similar function, but a computer vision approach is simpler and less intrusive. Neural networks can learn to identify sleepiness in drivers’ faces using facial landmark inputs. 

Alternatively, a MobileNetV2 architecture can detect driver fatigue in video streams without landmark detection, although the training time is longer. CNN-based landmark detection can help simplify dataset labels for training purposes. However, the inference speed of the neural network and the mobile device quality may affect driver monitoring systems. Thus, landmark detection algorithms offer a better solution. 

Animation and Reenactments

Face animation algorithms use facial landmark detection to generate animated characters based on a limited set of images with annotated facial landmarks. These algorithms can produce follow-up frames for film and gaming applications. 

Another application is editing facial movements in dubbed films to match the new audio track. A facial landmark detection algorithm can replace the mouth region with an adapted 3D model. 

For example, the Pix2PixHD neural network is useful for lip-syncing tasks, while the DeFA algorithm can construct an entire 3D face mesh. The Dlib library can be used to synthesize face representations using boundaries and facial landmarks. FReeNet can produce reenactments between different people using a Unified Landmark Converter module during training. The PFLD algorithm can extract facial landmarks from source and target images, and a GAN can use these to generate new images with more facial details. 

Facial Recognition

This use case involves algorithms performing face verification, recognition, and clustering tasks (grouping similar faces). The best algorithms use face preprocessing coupled with face alignment to improve facial recognition. These algorithms often use multi-task cascaded convolutional networks (MTCNN) to detect faces and localize landmarks.

Emotional expressions are detectable through lip, eye, and eyebrow movements. Facial landmark recognition can help identify emotions.

Facial Landmark Keypoint Density

Standard facial datasets provide annotations of 68 x and y coordinates that indicate the most important points on a person’s face. Dlib is a commonly used open source library that can recognize these landmarks in an image.


Source: ResearchGate

However, this system of 68 facial landmarks is limited for many purposes. Newer datasets and algorithms leverage a dense “face mesh” with over 468 3D face landmarks. This approach uses machine learning to infer the 3D facial surface from a single camera input, without a dedicated depth sensor. Google’s MediaPipe solution uses dense facial landmarks.

Source: Google

Facial Landmark Datasets

Facial landmark algorithms require datasets of pre-annotated facial landmarks they can learn from. Here are a few commonly used facial landmark datasets.


300-W is a face data set consisting of 300 indoor images and 300 outdoor wild images. The images cover a variety of identities, facial expressions, lighting conditions, poses, occlusions, and face sizes.

Compared to other datasets, the 300W database has a relatively high proportion of partially occluded images and covers additional facial expressions. Images were annotated with 68 point markers using a semi-automatic method. Images from the database are carefully selected to represent challenging samples of natural face instances under unrestricted conditions.

Get the dataset


300 Videos in the Wild (300-VW) is a dataset for evaluating landmark tracking algorithms. It contains 114 videos containing faces, each approximately 1 minute in length at 25-30 fps. All frames are annotated with the same 68 markers used for the 300W dataset.

Get the dataset


Annotated Facial Landmarks in the Wild (AFLW) is a large collection of annotated facial images sourced from Flickr. The images have a wide variety of pose, facial expressions, ethnicity, age, gender, as well as diverse imaging conditions. A total of approximately 25,000 faces are annotated, with up to 21 landmarks per image.

Get the dataset

Detecting Facial Landmarks with dlib, OpenCV, and Python

This is abbreviated from the full tutorial by Adrian Rosebrock. We’ll show how to use the landmark detector that comes with the dlib library to identify the coordinates of 68 landmarks in human faces. Identifying a dense mesh of facial landmarks, like in the Google Pipelines example above, is beyond the scope of this section.


Step 1: Detect Faces in Your Images

Create a new Python file with the following code:

# import required libraries

from imutils import face_utils

import numpy as np

import argparse

import imutils

import dlib

import cv2

ap = argparse.ArgumentParser()

# –-image is a path to the input image

ap.add_argument(“-p”, “–shape-predictor”, required=True,

help=”path to facial landmark predictor”)

ap.add_argument(“-i”, “–image”, required=True,

help=”path to input image”)

args = vars(ap.parse_args())

# initialize built-in face detector in dlib

detector = dlib.get_frontal_face_detector()

# initialize face landmark predictor

predictor = dlib.shape_predictor(args[“shape_predictor”])

# load input image, resize it, and convert it to grayscale

image = cv2.imread(args[“image”])

image = imutils.resize(image, width=500)

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# detect faces in the grayscale image

rects = detector(gray, 1)

In this code we pre-process the image into a 500×500 pixel grayscale image (downsampled to allow faster processing), and run it through the built-in face detector in dlib to identify the bounding box of the face. 

Step 2: Detect Facial Landmarks

The following code loops over each face detection and performs facial landmark detection:

for (i, rect) in enumerate(rects):

# predict facial landmarks in image and convert to NumPy array

shape = predictor(gray, rect)

shape = face_utils.shape_to_np(shape)

# convert to OpenCV-style bounding box

cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

# show the face number and draw facial landmarks on the image

cv2.putText(image, “Face #{}”.format(i + 1), (x – 10, y – 10),

cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

for (x, y) in shape:

cv2.circle(image, (x, y), 1, (0, 0, 255), -1)

# show the resulting output image

cv2.imshow(“Output”, image)


The dlib face landmark detector returns a shape object containing 68 (x, y)-coordinates of facial landmark regions. We then use the face_utils.shape_to_np utility function to convert the output into a NumPy array, which makes it possible to easily print the discovered face landmarks on the image for visualization.