Next Article in Journal
Robot-Assisted Autism Therapy (RAAT). Criteria and Types of Experiments Using Anthropomorphic and Zoomorphic Robots. Review of the Research
Next Article in Special Issue
Recent Research for Unobtrusive Atrial Fibrillation Detection Methods Based on Cardiac Dynamics Signals: A Survey
Previous Article in Journal
Model of a Device-Level Combined Wireless Network Based on NB-IoT and IEEE 802.15.4 Standards for Low-Power Applications in a Diverse IoT Framework
Previous Article in Special Issue
A Feasibility Study of the Use of Smartwatches in Wearable Fall Detection Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Review of Deep Learning-Based Contactless Heart Rate Measurement Methods

Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson, TX 75080, USA
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(11), 3719; https://doi.org/10.3390/s21113719
Submission received: 29 April 2021 / Revised: 18 May 2021 / Accepted: 24 May 2021 / Published: 27 May 2021
(This article belongs to the Special Issue Wearable and Unobtrusive Technologies for Healthcare Monitoring)

Abstract

:
The interest in contactless or remote heart rate measurement has been steadily growing in healthcare and sports applications. Contactless methods involve the utilization of a video camera and image processing algorithms. Recently, deep learning methods have been used to improve the performance of conventional contactless methods for heart rate measurement. After providing a review of the related literature, a comparison of the deep learning methods whose codes are publicly available is conducted in this paper. The public domain UBFC dataset is used to compare the performance of these deep learning methods for heart rate measurement. The results obtained show that the deep learning method PhysNet generates the best heart rate measurement outcome among these methods, with a mean absolute error value of 2.57 beats per minute and a mean square error value of 7.56 beats per minute.

1. Introduction

Physiological measurements are widely used to determine a person’s health condition [1,2,3,4,5,6]. Photoplethysmography (PPG) is a physiological measurement method that is used to detect volumetric changes in blood in vessels beneath the skin [1]. Medical devices based on PPG have been introduced to measure different physiological measurements including heart rate (HR), respiratory rate, heart rate variability (HRV), oxyhemoglobin saturation, and blood pressure [2,3,4,5,6]. Due to its low cost and non-invasive nature, PPG is utilized in many devices such as finger pulse oximeters, sports bands, and wearable sensors.
PPG-based physiological measurements can be categorized into two types: contact-based and contactless. Several survey articles have appeared in the literature on contact-based PPG methods as well as on contactless PPG methods. Contact-based methods deploy a light source and a photodetector. On the other hand, contactless methods deploy a video camera to measure the PPG signal. The previous survey articles mostly addressed conventional signal processing approaches. The recently developed deep learning-based methods have shown more promising results compared to the conventional methods. The focus of this review paper is thus placed on deep learning-based contactless methods for heart rate measurement.
A common practice in the medical field to measure the heart rate is ECG or electrocardiography [7,8], where voltage changes in the heart electrical activity are detected using electrodes placed on the skin. In general, ECG provides a more reliable heart rate measurement compared to PPG [9,10]. Hence, ECG is often used as the reference for evaluation of PPG methods [7,8,9,10]. Typically, 10 electrodes of the ECG machine are attached to different parts of the body including the wrist and ankle. Different from ECG, PPG-based medical devices possess differing sensor shapes placed on different parts of the body such as rings, earpieces, and bands [7,11,12,13,14,15,16], and they all use a light source and a photodetector to detect the PPG signal with signal processing, see Figure 1. The signal processing is for the purpose of processing the reflected optical signal from the skin [1].
Early research in this field concentrated on obtaining the PPG signal and ways to perform pulse wave analysis [17]. A comparison between ECG and PPG is discussed in [18,19]. There are survey papers covering different PPG applications that involve the use of wearable devices [20,21], atrial fibrillation detection [22], and blood pressure monitoring [23]. Papers have also been published which used deep learning for contact-based PPG, e.g., [24,25,26,27]. The previous survey papers on contact-based PPG methods are listed in Table 1.
Although contact-based PPG methods are non-invasive, they can be restrictive due to the requirement of their contact with the skin. Contact-based methods can be irritating or distracting in some situations, for example, for newborn infants [28,29,30,31]. When a less restrictive approach is desired, contactless PPG methods are considered. The use of contactless PPG methods or remote PPG (rPPG) methods has been growing in recent years [32,33,34,35,36].
Contactless PPG methods usually utilize a video camera to capture images which are then processed by image processing algorithms [32,33,34,35,36]. The physics of rPPG is similar to contact-based PPG. In rPPG methods, the light-emitting diode in contact-based PPG methods is replaced with ambient illuminance, and the photodetector is replaced with a video camera, see Figure 2. The light reaching the camera sensor can be separated into static (DC) and dynamic (AC) components. The DC component corresponds to static elements including tissue, bone, and static blood, while the AC component corresponds to the variations in light absorption due to arterial blood volume changes. Figure 3 provides an illustration of the image processing framework in rPPG methods. The common image processing steps involved in the framework are illustrated in this figure. In the signal extraction part of the framework, a region of interest (ROI), normally on the face, is extracted.
In earlier studies, video images from motionless faces were considered [37,38,39]. Several papers relate to exercising situations [40,41,42,43,44]. ROI detection and ROI tracking constitute two major image processing parts of the framework. The Viola and Jones (VJ) algorithm [45] is often used to detect face areas [46,47,48,49]. As an example of prior work on skin detection, a neural network classifier was used to detect skin-like pixels in [50]. In the signal estimation part, a bandpass filter is applied to eliminate undesired frequency components. A common choice for the frequency band is [0.7 Hz, 4 Hz], which corresponds to an HR between 42 and 240 beats per minute (bpm) [50,51,52,53]. To separate a signal into uncorrelated components and to reduce dimensionality, independent component analysis (ICA) was utilized in [54,55,56,57] and principal component analysis (PCA) was utilized in [38,39,40,58,59]. In the heart rate estimation module, the dimensionality-reduced data will be mapped to certain levels using frequency analysis or peak detection methods. The survey papers on rPPG methods that have already appeared in the literature are listed in Table 2. These survey papers provide comparisons with contact-based PPG methods.
There are challenges in rPPG which include subject motion and ambient lighting variations [60,61,62]. Due to the success of deep learning in many computer vision and speech processing applications [63,64,65], deep learning methods have been considered for rPPG to deal with its challenges, for example, [44,49]. In deep learning methods, feature extraction and classification are carried out together within one network structure. The required datasets for deep learning models are collected using RGB cameras. As noted earlier, the focus of this review is on deep learning-based contactless heart rate measurement methods.
Table 2. Survey papers on conventional contactless methods previously reported in the literature.
Table 2. Survey papers on conventional contactless methods previously reported in the literature.
EmphasisRefYearTask
Contactless[66]2018Provides typical components of rPPG and notes the main challenges; groups published studies by their choice of algorithm.
Contactless[67]2012Covers three main stages of monitoring physiological measurements based on photoplethysmographic imaging: image acquisition, data collection, and parameter extraction.
Contactless
and contact
[68]2016States review of contact-based PPG and its limitations; introduces research activities on wearable and non-contact PPG.
Contactless
and contact
[69]2009Reviews photoplethysmographic measurement techniques from contact sensing placement to non-contact sensing placement, and from point measurement to imaging measurement.
Contactless
newborn infants
[28]2013Investigates the feasibility of camera-based PPG for contactless HR monitoring in newborn infants with ambient light.
Contactless
newborn infants
[30]2016Comparative analysis to benchmark state-of-the-art video and image-guided noninvasive pulse rate (PR) detection.
Contactless
and contact
[70]2017Heart rate measurement using facial videos based on photoplethysmography and ballistocardiography.
Contactless
and contact
[71]2014Covers methods of non-contact HR measurement with capacitively coupled ECG, Doppler radar, optical vibrocardiography, thermal imaging, RGB camera, and HR from speech.
Contactless
RR
and contact
[72]2011Discusses respiration monitoring approaches (both contact and non-contact).
Contactless
newborn infants
[31]2019Addresses HR measurement in babies.
Contactless[73]2019Examines challenges associated with illumination variations and motion artifacts.
Contactless[74]2017Covers HR measurement techniques including camera-based photoplethysmography, reflectance pulse oximetry, laser Doppler technology, capacitive sensors, piezoelectric sensors, electromyography, and a digital stethoscope.
Contactless
Main challenges
[75]2015Covers issues in motion and ambient lighting tolerance, image optimization (including multi-spectral imaging), and region of interest optimization.
In essence, this paper provides a review of combinations of conventional and deep learning rPPG methods as well as end-to-end deep learning-based rPPG methods for heart rate measurement. More specifically, the deep learning-based methods for heart rate measurement are grouped into two main categories, and the ones whose codes are publicly available are compared by examining the same public domain dataset.

2. Contactless PPG Methods Based on Deep Learning

Previous works on deep learning-based contactless HR methods can be divided into two groups: combinations of conventional and deep learning methods, and end-to-end deep learning methods. In what follows, a review of these papers is provided. Later, in Section 3, the end-to-end deep learning methods whose codes are publicly available are compared by applying them to the same public domain dataset.

2.1. Combination of Conventional and Deep Learning Methods

Li et al. 2021 [76] presented multi-modal machine learning techniques related to heart diseases. From Figure 3, it can be seen that one or more components of the contactless HR framework can be achieved by using deep learning. These components include ROI detection and tracking, signal estimation, and HR estimation.

2.1.1. Deep Learning Methods for Signal Estimation

Qiu et al. 2018 [77] developed a method called EVM-CNN. The pipeline of this method consists of three modules: face detection and tracking, feature extraction, and HR estimation. In the face detection and tracking module, 68 facial landmarks inside a bounding box are detected by using a regression local binary features-based approach [78]. Then, an ROI defined by eight points around the central part of a human face is automatically extracted and inputted into the next module. In the feature extraction module, spatial decomposition and temporal filtering are applied to obtain so-called feature images. The sequence of ROIs is down-sampled into several bands. The lowest bands are reshaped and concatenated into a new image. Three channels of this new image are transferred into the frequency domain; then, fast Fourier transform (FFT) is applied to remove the unwanted frequency bands. Finally, the bands are transferred back to the time domain by performing inverse FFT and merging into a feature image. In the HR estimation module, a convolutional neural network (CNN) is used to estimate HR from the feature image. The CNN used in this method has a simple structure with several convolution layers which uses depth-wise convolution and point-wise convolution to reduce the computational burden and model size.
As shown in Figure 4, in this method, the first two modules which are face detection/tracking and feature extraction are conventional rPPG approaches, whereas the HR estimation module uses deep learning to improve performance for HR estimation.

2.1.2. Deep Learning Methods for Signal Extraction

Luguev et al. 2020 [79] established a framework which uses deep spatial-temporal networks for contactless HRV measurements from raw facial videos. In this method, a 3D convolutional neural network is used for pulse signal extraction. As for the computation of HRV features, conventional signal processing methods including frequency domain analysis and peak detection are used. More specifically, raw video sequences are inputted into the 3D-CNN without any skin segmentation. Several convolution operations with rectified linear units (ReLU) are used as activation functions together with pooling operations to produce spatiotemporal features. In the end, a pulse signal is generated by a channel-wise convolution operation. The mean absolute error is used as the loss function of the model.
Paracchini et al. 2020 [80] implemented rPPG based on a single-photon avalanche diode (SPAD) camera. This method combines deep learning and conventional signal processing to extract and examine the pulse signal. The main advantage of using a SPAD camera is its superior performance in dark environments compared with CCD or CMOS cameras. Its framework is shown in Figure 5. The signal extraction part has two components which are facial skin detection and signal creation. A U-shape network is then used to perform skin detection including all visible facial skin surface areas rather than a specific skin area. The output of the network is a binary skin mask. Then, a raw pulse signal is obtained by averaging the intensity values of all the pixels inside the binary mask. As for signal estimation, this is achieved by filtering, FFT, and peak detection. The experimental results include HR, respiration rate, and tachogram measurements.
In another work from Zhan et al. 2020 [81], the focus was placed on understanding the CNN-based PPG signal extraction. Four questions were addressed: (1) Does the CNN learn PPG, BCG, or a combination of both? (2) Can a finger oximeter be directly used as a reference for CNN training? (3) Does the CNN learn the spatial context information of the measured skin? (4) Is the CNN robust to motion, and how is this motion robustness achieved? To answer these four questions, a CNN-PPG framework and four experiments were designed. The results of these experiments indicate the availability of multiple convolutional kernels is necessary for a CNN to arrive at a flexible channel combination through the spatial operation but may not provide the same motion robustness as a multi-site measurement. Another conclusion reached is that the PPG-related prior knowledge may still be helpful for the CNN-based PPG extraction.

2.2. End-to-End Deep Learning Methods

In this section, end-to-end deep learning systems are stated which take video as the input and use different network architectures to generate a physiological signal as the output.

2.2.1. VGG-Style CNN

Chen and Mcduff 2018 [82] developed an end-to-end method for video-based heart and breathing rates using a deep convolutional network named DeepPhys. To address the issue caused by subject motion, the proposed method uses a motion representation algorithm based on a skin reflection model. As a result, motions are captured more effectively. To guide the motion estimation, an attention mechanism using appearance information was designed. It was shown that the motion representation model and the attention mechanism used enable robust measurements under heterogeneous lighting and motions.
The model is based on a VGG-style CNN for estimating the physiological signal derived under motion [83]. VGG is an object recognition model that supports up to 19 layers. Built as a deep CNN, VGG is shown to outperform baselines in many image processing tasks. Figure 6 illustrates the architecture of this end-to-end convolutional attention network. A current video frame at time t and a normalized difference between frames at t and t + 1 constitute the inputs to the appearance and motion models, respectively. The network learns spatial masks, which are shared between the models, and extracts features for recovering the blood volume pulse (BVP) and respiration signals.
Deep PPG proposed by Reiss et al. 2019 [84] addresses three shortcomings of the existing datasets. First is the dataset size. While the number of subjects can be considered as sufficient (8–24 participants in each dataset), the length of each session’s recording can be rather short. Second is the small numbers of activities. The publicly available datasets include data from only two–three different activities. Additionally, third is data recording in laboratory settings rather than in real-world environments.
A new dataset, called PPG-DaLiA [85], was thus introduced in this paper: a PPG dataset for motion compensation and heart rate estimation in daily living activities. Figure 7 illustrates the architecture of the VGG-like CNN used, where the time–frequency spectra of PPG signals are used as the input to estimate the heart rate.

2.2.2. CNN-LSTM Network

Long short-term memory (LSTM) is a recurrent neural network (RNN) architecture which allows only process handling a single data point (such as images), but also an entire sequence of data points (such as speech or video). It has been previously used for various tasks such as connected handwriting recognition, speech recognition, and anomaly detection in network traffic [86,87,88].
rPPG signals are usually collected using a video camera with a limitation of being sensitive to multiple contributing factors, which include variation in skin tone, lighting condition, and facial structure. Meta-rPPG [89] is an end-to-end supervised learning approach which performs well when training data are abundant with a distribution that does not deviate too much from the testing data distribution. To cope with the unforeseeable changes during testing, a transductive meta-learner that takes unlabeled samples during testing for a self-supervised weight adjustment is used to provide fast adaptation to the changes. The network proposed in this paper is split into two parts: a feature extractor and an rPPG estimator modeled by a CNN and an LSTM network, respectively.

2.2.3. 3D-CNN Network

A 3D convolutional neural network is a type of network with kernel sliding in three dimensions. 3D-CNN is shown to have better performance in spatiotemporal information learning than 2DCNN [90].
Špetlík et al. 2018 [46] proposed a two-step convolutional neural network to estimate the heart rate from a sequence of facial images, see Figure 8. The proposed architecture has two components: an extractor and an HR estimator. The extractor component is run over a temporal image sequence of faces. The signal is then fed to the HR estimator to predict the heart rate.
In the work from Yu et al. 2019 [91], a two-stage end-to-end method was proposed. This work deals with video compression loss and recovers the rPPG signal from highly compressed videos. It consists of two parts: (1) a spatiotemporal video enhancement network (STVEN) for video enhancement, and (2) an rPPG network (rPPGNet) for rPPG signal recovery. rPPGNet can work on its own for obtaining rPPG measurements. The STVEN network can be added and jointly trained to further boost the performance, particularly on highly compressed videos.
Another method from Yu et al. 2019 [92] provides the use of deep spatiotemporal networks for reconstructing precise rPPG signals from raw facial videos. With the constraint of trend consistency in ground truth pulse curves, this method is able to recover rPPG signals with accurate pulse peaks. The heartbeat peaks of the measured rPPG signal are located at the corresponding R peaks of the ground truth ECG signal.
To address the issue of a lack of training data, a heart track convolutional neural network was developed by Rerepelkina et al. 2020 [93] for remote video-based heart rate tracking. This learning-based method is trained on synthetic data to accurately estimate the heart rate in different conditions. Synthetic data do not include video and include only PPG curves. To select the most suitable parts of the face for pulse tracking at each particular moment, an attention mechanism is used.
Similar to the previous methods, the method proposed by Bousefsaf et al. 2019 [94] also uses synthetic data. Figure 9 illustrates the process of how synthetic data are generated. A 3D-CNN classifier structure was developed for both extraction and classification of unprocessed video streams. The CNN acts as a feature extractor. Its final activations are fed into two dense layers (multilayer perceptron) that are used to classify the pulse rate. The network ensures concurrent mapping by producing a prediction for each local group of pixels.
Liu et al. 2020 [95] developed a lightweight rPPG estimation network, named DeeprPPG, based on spatiotemporal convolutions for utilization involving different types of input skin. To further boost the robustness, a spatiotemporal rPPG aggregation strategy was designed to adaptively aggregate rPPG signals from multiple skin regions into a final one. Extensive experimental studies were conducted to show its robustness when facing unseen skin regions in unseen scenarios. Table 3 lists the contactless HR methods that use deep learning.

3. Selected Deep Learning Models for Comparison

Among the deep learning-based rPPG methods, the codes for four methods are publicly available. In this section, a comparison of these methods is carried out. First, the architectures of these methods are stated in some detail.

3.1. STVEN-rPPGNet

This deep learning-based method considers low-resolution input video clips to measure the heart rate. Its training occurs in two stages. The first stage involves a video enhancement network (called STVEN) whose output corresponds to spatially enhanced videos. The second stage involves a measurement network (called rPPGNet) whose output provides the heart rate. The measurement network rPPGNet is formed using a spatiotemporal convolutional network, a skin-based attention module, and a partition constraint module. The skin-based attention module selects skin regions. The partition constraint module enables an improved representation of the rPPG signal. An illustration of the two-stage architecture of STVEN-rPPGNet is shown in Figure 10.

3.2. IPPG-3D-CNN

In this method, the training phase is performed on synthetic data. That is, the pseudo-PPG video streams are formed by repeating waveforms, which are constructed by Fourier series approximation. In the testing phase, no pre-processing step, such as automatic face detection, is carried out. To synthesize video streams, the following steps are taken: (1) via Fourier series, a waveform model fitted to the rPPG waveform is generated, (2) based on the waveform in (1), a two-second signal is generated, (3) the signal is repeated to form a video stream, and (4) random noise at a specified noise level is added to each image of a video stream.
Then, video patches are fed into the network which are mapped to the targeted heart rate. By subtracting the average value, each video is centered around zero. Training is conducted by constantly adding 15,200 batches in duration (200 video patches in each of the 76 levels of heart rates). Thus, each batch changes the network parameters with respect to an input tensor of 15,200 × 25 × 25 × 60. An illustration of the architecture of this deep learning-based method is shown in Figure 11.

3.3. PhysNet

In this method, the RGB frames of the face are mapped into the rPPG domain directly without any pre- and post-processing step. In fact, the solution developed is an end-to-end one. The architecture of this deep neural network uses two different structures for training: (1) the first architecture maps the facial RGB frames into the rPPG signal via several convolution and pooling layers, and (2) the second architecture uses RNN processing units. The difference between the first and second structures is that T-frames are inputted to the first network structure at the same time, and 3D convolution layers are used in the second network structure by inputting one frame at a time. An illustration of the architecture of this deep learning-based method is depicted in Figure 12.

3.4. Meta-rPPG

The idea of using meta-learning for heart rate measurement from the rPPG signal is to fine-tune the parameters of a network for situations that are not covered in the training set. The architecture of this network consists of two parts: one part enables a fast adaptation process and the other part provides heart rate measurement. Its learning process involves the following: (1) extracting facial frames from video, and the face area is cropped with the region outside the face area set to zero to obtain facial landmarks, and (2) for each facial frame, the modified PPG signal, which is obtained by a small temporal offset, is used as the network target.
The architecture of this network consists of three modules: convolutional encoder, rPPG estimator (with LSTM), and a synthetic gradient generator. During its inference mode, only the convolutional encoder and the rPPG estimator are used. The synthetic gradient estimator is utilized in its transductive mode. This network is designed to remove spatiotemporal features by modeling visual information using a deep convolutional encoder and then by modeling the PPG signal using Bi-LSTM. An illustration of the architecture of this deep learning-based method is provided in Figure 13.

4. Comparison Results and Discussion

This subsection demonstrates the comparison results of the above four algorithms whose codes are publicly available for the purpose of measuring the heart rate. The performance of these four algorithms is found in terms of bpm.

4.1. Dataset

The UBFC database [96] is used here to train and test the above four methods. This database consists of 37 uncompressed videos with a resolution of 640 × 480 in 8-bit RGB format. Each video corresponds to a specific subject. The ground truth value of the video data is PPG waveform (magnitude and time) along with heart rates recorded with a pulse oximeter. There is no need to perform any pre-processing on this database. Ten randomly selected subjects were used for our test set, and the rest were used for the training set.

4.2. Experimental Setup

In the studies conducted in [78,91,97,98], it was shown that the deep learning methods performed better than the conventional methods. Hence, the focus of the experimentation conducted here is placed on the above selected deep learning models. An overview of the architecture of the selected deep learning models is provided in Table 4.
The experiments for this study were conducted in one phase, where the above-mentioned dataset was divided into a training and a test set with no overlap. The image frames were extracted from the video clips using the MATLAB toolbox [99]. A region of interest (ROI) was then selected and cropped using the Viola–Jones algorithm [45] from the original image. One of the deep learning models required the skin map of the frames. The skin map of each image was extracted using the Bob package [100]. Finally, the extracted images and skin labels were then used to train and test the CNN-based pulse rate measurement algorithms. The outcomes of each of the four algorithms were assessed as a function of the mean square error (MSE) [101], mean absolute error (MAE) [102], and standard deviation (SD) [103]. To be fair in terms of objective metrics, the ratio of training and test sets was kept the same for all four selected deep models.
The metrics used for evaluation are stated next. As mentioned above, to quantify the performance of each deep learning method, the MSE and the MAE between the predicted heart rate and the ground truth were considered. The SDs of the reference heart rate and the predicted heart rate are also reported. The MSE and MAE were computed using the following equations:
  MSE = 1 N i = 1 N | R i P i |
MAE = 1 N | R i P i |
where R i and P i denote the ground truth and predicted heart rates, respectively, and N is the total number of heartbeats.

4.3. Results and Discussion

The results obtained are reported in Table 5 for the test set. The reference value for each metric is placed in the last row of the table. In most cases, the PhysNet method performed better than the other deep learning methods in terms of the objective metrics. For instance, the MAE and MSE of subject 10 in PhysNet were found to be lower than the other methods.The same result was obtained for subject 5 as well. More specifically, the MAE of rPPGNet, 3D-CNN, PhysNet, and Meta-rPPG for subject 10 was found to be 3.14, 3.36, 2.60, and 3.67, respectively, whereas the MSE measure was found to be 10.74, 12.34, 7.63, and 14.60. The better performance of PhysNet is attributed to its architecture enabling the extraction of effective features from input frames.
The latency or computation time associated with each of the methods is also reported in Table 6 for a batch with a size of 64. As seen from this table, 3D-CNN takes only 0.74 s to predict the heart rate from 64 images. In other words, 3D-CNN runs the fastest among the four methods.
To have an overall assessment of the four methods, the results were averaged for all the subjects. Figure 14 shows this outcome. As shown in this figure, the vertical axis corresponds to the range of the heart rate in bpm and the reference of the heart rate is denoted by the first bar from the left. From this figure, one can see that the average of the PhysNet method is closer to the reference. The results of individual subjects in the test set are shown in Figure 15. In this figure, the first bar from the left represents the reference. The legend associated with each bar is shown on the right side of the bar charts. By comparing the bar charts shown in this figure, one can see that PhysNet performs better than the other methods in terms of the mean and standard deviation. In other words, it provides the highest accuracy on average.

5. Conclusions

This paper has provided a comprehensive review of deep learning-based contactless heart rate measurement methods. First, an overview of contact-based PPG and contactless PPG methods was covered. Then, the review focus was placed on deep learning-based methods that have been introduced in the literature for heart rate measurement using rPPG. Among the deep learning-based contactless methods, four methods whose codes are publicly available were identified, and a comparison among these methods was conducted to see which one generates the highest accuracy for heart rate measurement by considering the same dataset across all four methods. Among these four methods, PhysNet was identified to provide the highest accuracy on average.

Author Contributions

Conceptualization, A.N., A.A., N.K.; methodology, A.N., A.A., N.K.; software, A.N., A.A., N.K.; validation, A.N., A.A., N.K.; formal analysis, A.N., A.A. and N.K.; investigation, A.N., A.A., N.K.; resources, A.N., A.A., N.K.; data curation, A.N., A.A., N.K.; writing—A.N., A.A., N.K.; visualization, A.N., A.A. and N.K.; supervision, N.K.; project administration, N.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Challoner, A.V.; Ramsay, C.A. A photoelectric plethysmograph for the measurement of cutaneous blood flow. Phys. Med. Biol. 1974, 19, 317–328. [Google Scholar] [CrossRef]
  2. Alian, A.A.; Shelley, K.H. Photoplethysmography. Best Pract. Res. Clin. Anaesthesiol. 2014, 28, 395–406. [Google Scholar] [CrossRef]
  3. Shelley, K.H. Photoplethysmography: Beyond the calculation of arterial oxygen saturation and heart rate. Anesth. Analg. 2007, 105, S31–S36. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Madhav, K.V.; Ram, M.R.; Krishna, E.H.; Reddy, K.N.; Reddy, K.A. Estimation of respiratory rate from principal components of photoplethysmographic signals. In Proceedings of the IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES), Kuala Lumpur, Malaysia, 30 November 2010; pp. 311–314. [Google Scholar] [CrossRef]
  5. Karlen, W.; Raman, S.; Ansermino, J.M.; Dumont, G.A. Multiparameter respiratory rate estimation from the photoplethysmogram. IEEE Transact. Biomed. Eng. 2013, 60, 1946–1953. [Google Scholar] [CrossRef]
  6. Yousefi, R.; Nourani, M. Separating arterial and venous-related components of photoplethysmographic signals for accurate extraction of oxygen saturation and respiratory rate. IEEE J. Biomed. Health Inf. 2014, 19, 848–857. [Google Scholar]
  7. Kinnunen, H.; Rantanen, A.; Kenttä, T.; Koskimäki, H. Feasible assessment of recovery and cardiovascular health: Accuracy of nocturnal HR and HRV assessed via ring PPG in comparison to medical grade ECG. Physiol. Meas. 2020, 41, 04NT01. [Google Scholar] [CrossRef]
  8. Orphanidou, C. Derivation of respiration rate from ambulatory ECG and PPG using ensemble empirical mode decomposition: Comparison and fusion. Comput. Biol. Med. 2017, 81, 45–54. [Google Scholar] [CrossRef]
  9. Clifton, D.A.; Meredith, D.; Villarroel, M.; Tarassenko, L. Home monitoring: Breathing rate from PPG and ECG. Inst. Biomed. Eng. 2012. Available online: http://www.robots.ox.ac.uk/~davidc/pubs/WT2012.pdf (accessed on 26 May 2021).
  10. Madhav, K.V.; Raghuram, M.; Krishna, E.H.; Komalla, N.R.; Reddy, K.A. Extraction of respiratory activity from ECG and PPG signals using vector autoregressive model. In Proceedings of the 2012 IEEE International Symposium on Medical Measurements and Applications Proceedings, Budapest, Hungary, 18–19 May 2012; pp. 1–4. [Google Scholar]
  11. Gu, W.B.; Poon, C.C.; Leung, H.K.; Sy, M.Y.; Wong, M.Y.; Zhang, Y.T. A novel method for the contactless and continuous measurement of arterial blood pressure on a sleeping bed. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 3–6 September 2009; pp. 6084–6086. [Google Scholar]
  12. Wang, L.; Lo BP, L.; Yang, G.Z. Multichannel reflective PPG earpiece sensor with passive motion cancellation. IEEE Transact. Biomed. Circ. Syst. 2007, 1, 235–241. [Google Scholar] [CrossRef] [PubMed]
  13. Kabiri Ameri, S.; Ho, R.; Jang, H.; Tao, L.; Wang, Y.; Wang, L.; Schnyer, D.M.; Akinwande, D.; Lu, N. Graphene electronic tattoo sensors. ACS Nano 2017, 11, 7634–7641. [Google Scholar] [CrossRef] [PubMed]
  14. Nardelli, M.; Vanello, N.; Galperti, G.; Greco, A.; Scilingo, E.P. Assessing the Quality of Heart Rate Variability Estimated from Wrist and Finger PPG: A Novel Approach Based on Cross-Mapping Method. Sensors 2020, 20, 3156. [Google Scholar] [CrossRef] [PubMed]
  15. Phan, D.; Siong, L.Y.; Pathirana, P.N. Smartwatch: Performance evaluation for long-term heart rate monitoring. In Proceedings of the 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB), Beijing, China, 14–17 October 2015; pp. 144–147. [Google Scholar]
  16. Wong, M.Y.; Leung, H.K.; Pickwell-MacPherson, E.; Gu, W.B.; Zhang, Y.T. Contactless recording of photoplethysmogram on a sleeping bed. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Beijing, China, 14–17 October 2009; pp. 907–910. [Google Scholar]
  17. Allen, J. Photoplethysmography and its application in clinical physiological measurement. Physiol. Meas. 2007, 28, R1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Charlton, P.H.; Birrenkott, D.A.; Bonnici, T.; Pimentel, M.A.; Johnson, A.E.; Alastruey, J.; Tarassenko, L.; Watkinson, P.J.; Beale, R.; Clifton, D.A. Breathing rate estimation from the electrocardiogram and photoplethys-mogram: A review. IEEE Rev. Biomed. Eng. 2017, 11, 2–20. [Google Scholar] [CrossRef] [Green Version]
  19. Schäfer, A.; Vagedes, J. How accurate is pulse rate variability as an estimate of heart rate variability?: A review on studies comparing photoplethysmographic technology with an electrocardiogram. Int. J. Cardiol. 2013, 166, 15–29. [Google Scholar] [CrossRef]
  20. Biswas, D.; Simoes-Capela, N.; Van Hoof, C.; Van Helleputte, N. Heart Rate Estimation From Wrist-Worn Photoplethysmography: A Review. IEEE Sens. J. 2019, 19, 6560–6570. [Google Scholar] [CrossRef]
  21. Castaneda, D.; Esparza, A.; Ghamari, M.; Soltanpur, C. A review on wearable photoplethysmography sensors and their potential future applications in health care. Int. J. Biosensors Bioelectron. 2018, 4, 195. [Google Scholar]
  22. Pereira, T.; Tran, N.; Gadhoumi, K.; Pelter, M.M.; Do, D.H.; Lee, R.J.; Colorado, R.; Meisel, K.; Hu, X. Photoplethysmography based atrial fibrillation detection: A review. npj Digit. Med. 2020, 3, 1–12. [Google Scholar] [CrossRef] [Green Version]
  23. Nye, R.; Zhang, Z.; Fang, Q. Continuous non-invasive blood pressure monitoring using photoplethysmography: A review. In Proceedings of the 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB), Beijing, China, 14–17 October 2015; pp. 176–179. [Google Scholar]
  24. Johansson, A. Neural network for photoplethysmographic respiratory rate monitoring. Med. Biol. Eng. Comput. 2003, 41, 242–248. [Google Scholar] [CrossRef]
  25. Panwar, M.; Gautam, A.; Biswas, D.; Acharyya, A. PP-Net: A Deep Learning Framework for PPG-Based Blood Pressure and Heart Rate Estimation. IEEE Sens. J. 2020, 20, 10000–10011. [Google Scholar] [CrossRef]
  26. Biswas, D.; Everson, L.; Liu, M.; Panwar, M.; Verhoef, B.-E.; Patki, S.; Kim, C.H.; Acharyya, A.; Van Hoof, C.; Konijnenburg, M.; et al. CorNET: Deep Learning Framework for PPG-Based Heart Rate Estimation and Biometric Identification in Ambulant Environment. IEEE Trans. Biomed. Circ. Syst. 2019, 13, 282–291. [Google Scholar] [CrossRef]
  27. Chang, X.; Li, G.; Xing, G.; Zhu, K.; Tu, L. DeepHeart. ACM Trans. Sens. Netw. 2021, 17, 1–18. [Google Scholar] [CrossRef]
  28. Aarts, L.A.; Jeanne, V.; Cleary, J.P.; Lieber, C.; Nelson, J.S.; Oetomo, S.B.; Verkruysse, W. Non-contact heart rate monitoring utilizing camera photoplethysmography in the neonatal intensive care unit—A pilot study. Early Hum. Dev. 2013, 89, 943–948. [Google Scholar] [CrossRef]
  29. Villarroel, M.; Chaichulee, S.; Jorge, J.; Davis, S.; Green, G.; Arteta, C.; Zisserman, A.; McCormick, K.; Watkinson, P.; Tarassenko, L. Non-contact physiological monitoring of preterm infants in the Neonatal Intensive Care Unit. NPJ Digit. Med. 2019, 2, 128. [Google Scholar] [CrossRef] [Green Version]
  30. Sikdar, A.; Behera, S.K.; Dogra, D.P. Computer-vision-guided human pulse rate estimation: A review. IEEE Rev. Bio Med. Eng. 2016, 9, 91–105. [Google Scholar] [CrossRef]
  31. Anton, O.; Fernandez, R.; Rendon-Morales, E.; Aviles-Espinosa, R.; Jordan, H.; Rabe, H. Heart Rate Monitoring in Newborn Babies: A Systematic Review. Neonatology 2019, 116, 199–210. [Google Scholar] [CrossRef] [PubMed]
  32. Fernández, A.; Carús, J.L.; Usamentiaga, R.; Alvarez, E.; Casado, R. Unobtrusive health monitoring system using video-based physiological in-formation and activity measurements. IEEE 2013, 89, 943–948. [Google Scholar]
  33. Haque, M.A.; Irani, R.; Nasrollahi, K.; Moeslund, T.B. Heartbeat rate measurement from facial video. IEEE Intell. Syst. 2016, 31, 40–48. [Google Scholar] [CrossRef] [Green Version]
  34. Kumar, M.; Veeraraghavan, A.; Sabharwal, A. DistancePPG: Robust non-contact vital signs monitoring using a camera. Biomed. Opt. Express 2015, 6, 1565–1588. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Liu, S.Q.; Lan, X.; Yuen, P.C. Remote photoplethysmography correspondence feature for 3D mask face presentation attack detection. In Proceedings of the European Conference on Computer Vision, ECCV Papers, Munich, Germany, 8–14 September 2018; pp. 558–573. [Google Scholar] [CrossRef]
  36. Gudi, A.; Bittner, M.; Lochmans, R.; van Gemert, J. Efficient real-time camera based estimation of heart rate and its variability. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
  37. Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote plethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Balakrishnan, G.; Durand, F.; Guttag, J. Detecting Pulse from Head Motions in Video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 25–27 June 2013; pp. 3430–3437. [Google Scholar]
  39. Lewandowska, M.; Rumiński, J.; Kocejko, T.; Nowak, J. Measuring pulse rate with a webcam—A non-contact method for evaluating cardiac activity. In Proceedings of the 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), Szczecin, Poland, 18–21 September 2011; pp. 405–410. [Google Scholar]
  40. De Haan, G.; Jeanne, V. Robust pulse rate from chrominance-based rPPG. IEEE Transact. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef]
  41. Tasli, H.E.; Gudi, A.; den Uyl, M. Remote PPG based vital sign measurement using adaptive facial regions. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 1410–1414. [Google Scholar]
  42. Yu, Y.P.; Kwan, B.H.; Lim, C.L.; Wong, S.L. Video-based heart rate measurement using short-time Fourier transform. In Proceedings of the 2013 International Symposium on Intelligent Signal Processing and Communication Systems, Naha, Japan, 12–15 November 2013; pp. 704–707. [Google Scholar]
  43. Monkaresi, H.; Hussain, M.S.; Calvo, R.A. Using Remote Heart Rate Measurement for Affect Detection. In Proceedings of the FLAIRS Conference, Sydney, Australia, 3 May 2014. [Google Scholar]
  44. Monkaresi, H.; Calvo, R.A.; Yan, H. A Machine Learning Approach to Improve Contactless Heart Rate Monitoring Using a Webcam. IEEE J. Biomed. Health Inf. 2013, 18, 1153–1160. [Google Scholar] [CrossRef]
  45. Viola, P.; Jones, M.J.C. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA, 8–14 December 2001; Volume 1, p. 3. [Google Scholar]
  46. Špetlík, R.; Franc, V.; Matas, J. Visual heart rate estimation with convolutional neural network. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; pp. 3–6. [Google Scholar]
  47. Wei, L.; Tian, Y.; Wang, Y.; Ebrahimi, T.; Huang, T. Automatic webcam-based human heart rate measurements using laplacian eigenmap. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 281–292. [Google Scholar]
  48. Xu, S.; Sun, L.; Rohde, G.K. Robust efficient estimation of heart rate pulse from video. Biomed. Opt. Express 2014, 5, 1124–1135. [Google Scholar] [CrossRef] [Green Version]
  49. Hsu, Y.C.; Lin, Y.L.; Hsu, W. Learning-based heart rate detection from remote photoplethysmography features. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 4433–4437. [Google Scholar]
  50. Lee, K.Z.; Hung, P.C.; Tsai, L.W. Contact-free heart rate measurement using a camera. In Proceedings of the 2012 Ninth Conference on Computer and Robot Vision, Toronto, ON, Canada, 28–30 May 2012; pp. 147–152. [Google Scholar]
  51. Poh, M.Z.; McDuff, D.J.; Picard, R.W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef]
  52. Poh, M.Z.; McDuff, D.J.; Picard, R.W. Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Transact. Biomed. Eng. 2010, 58, 7–11. [Google Scholar] [CrossRef] [Green Version]
  53. Li, X.; Chen, J.; Zhao, G.; Pietikainen, M. Remote heart rate measurement from face videos under realistic situations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 4264–4271. [Google Scholar]
  54. Lee, K.; Lee, J.; Ha, C.; Han, M.; Ko, H. Video-Based Contactless Heart-Rate Detection and Counting via Joint Blind Source Separation with Adaptive Noise Canceller. Appl. Sci. 2019, 9, 4349. [Google Scholar] [CrossRef] [Green Version]
  55. Kwon, S.; Kim, H.; Park, K.S. Validation of heart rate extraction using video imaging on a built-in camera system of a smartphone. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 2174–2177. [Google Scholar]
  56. Datcu, D.; Cidota, M.; Lukosch, S.; Rothkrantz, L. Noncontact automatic heart rate analysis in visible spectrum by specific face regions. In Proceedings of the 14th International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 28–29 June 2013; pp. 120–127. [Google Scholar]
  57. Holton, B.D.; Mannapperuma, K.; Lesniewski, P.J.; Thomas, J.C. Signal recovery in imaging photoplethysmography. Physiol. Meas. 2013, 34, 1499. [Google Scholar] [CrossRef] [PubMed]
  58. Irani, R.; Nasrollahi, K.; Moeslund, T.B. Improved pulse detection from head motions using DCT. In Proceedings of the 2014 International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, 5–8 January 2014; Volume 3, pp. 118–124. [Google Scholar]
  59. Wang, W.W.; Stuijk, S.S.; De Haan, G.G. Exploiting Spatial Redundancy of Image Sensor for Motion Robust rPPG. IEEE Trans. Biomed. Eng. 2015, 62, 415–425. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Feng, L.; Po, L.M.; Xu, X.; Li, Y. Motion artifacts suppression for remote imaging photoplethysmography. In Proceedings of the 2014 19th International Conference on Digital Signal Processing, Hong Kong, China, 20–23 August 2014; pp. 18–23. [Google Scholar]
  61. Tran, D.N.; Lee, H.; Kim, C. A robust real time system for remote heart rate measurement via camera. In Proceedings of the 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 29 June–3 July 2015; pp. 1–6. [Google Scholar]
  62. McDuff, D. Deep super resolution for recovering physiological information from videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1367–1374. [Google Scholar]
  63. Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F. Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 14413–14423. [Google Scholar] [CrossRef]
  64. Ciregan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3642–3649. [Google Scholar]
  65. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 1097–1105. [Google Scholar] [CrossRef]
  66. Rouast, P.V.; Adam MT, P.; Chiong, R.; Cornforth, D.; Lux, E. Remote heart rate measurement using low-cost RGB face video: A technical lit-erature review. Front. Comput. Sci. 2018, 12, 858–872. [Google Scholar] [CrossRef]
  67. Liu, H.; Wang, Y.; Wang, L. A review of non-contact, low-cost physiological information measurement based on photople-thysmographic imaging. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 2088–2091. [Google Scholar]
  68. Sun, Y.; Thakor, N. Photoplethysmography Revisited: From Contact to Noncontact, From Point to Imaging. IEEE Trans. Biomed. Eng. 2016, 63, 463–477. [Google Scholar] [CrossRef] [Green Version]
  69. Hu, S.; Peris, V.A.; Echiadis, A.; Zheng, J.; Shi, P. Development of effective photoplethysmographic measurement techniques: From contact to non-contact and from point to imaging. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 3–6 September 2009; pp. 6550–6553. [Google Scholar]
  70. Hassan, M.A.; Malik, A.S.; Fofi, D.; Saad, N.; Karasfi, B.; Ali, Y.S.; Meriaudeau, F. Heart rate estimation using facial video: A review. Biomed. Signal Process. Control 2017, 38, 346–360. [Google Scholar] [CrossRef]
  71. Kranjec, J.; Beguš, S.; Geršak, G.; Drnovšek, J. Non-contact heart rate and heart rate variability measurements: A review. Biomed. Signal Process. Control 2014, 13, 102–112. [Google Scholar] [CrossRef]
  72. AL-Khalidi, F.Q.; Saatchi, R.; Burke, D.; Elphick, H.; Tan, S. Respiration rate monitoring methods: A review. Pediatr. Pulmonol. 2011, 46, 523–529. [Google Scholar] [CrossRef] [Green Version]
  73. Chen, X.; Cheng, J.; Song, R.; Liu, Y.; Ward, R.; Wang, Z.J. Video-Based Heart Rate Measurement: Recent Advances and Future Prospects. IEEE Trans. Instrum. Meas. 2019, 68, 3600–3615. [Google Scholar] [CrossRef]
  74. Kevat, A.C.; Bullen DV, R.; Davis, P.G.; Kamlin, C.O.F. A systematic review of novel technology for monitoring infant and newborn heart rate. Acta Paediatr. 2017, 106, 710–720. [Google Scholar] [CrossRef] [PubMed]
  75. McDuff, D.J.; Estepp, J.R.; Piasecki, A.M.; Blackford, E.B. A survey of remote optical photoplethysmographic imaging methods. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 6398–6404. [Google Scholar]
  76. Li, P.; Hu, Y.; Liu, Z.-P. Prediction of cardiovascular diseases by integrating multi-modal features with machine learning methods. Biomed. Signal Process. Control 2021, 66, 102474. [Google Scholar] [CrossRef]
  77. Qiu, Y.; Liu, Y.; Arteaga-Falconi, J.; Dong, H.; El Saddik, A. EVM-CNN: Real-Time Contactless Heart Rate Estimation From Facial Video. IEEE Trans. Multimedia 2018, 21, 1778–1787. [Google Scholar] [CrossRef]
  78. Ren, S.; Cao, X.; Wei, Y.; Sun, J. Face alignment at 3000 fps via regressing local binary features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 1685–1692. [Google Scholar]
  79. Luguev, T.; Seuß, D.; Garbas, J.U. Deep Learning based Affective Sensing with Remote Photoplethysmography. In Proceedings of the 2020 54th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 18–20 March 2020; pp. 1–4. [Google Scholar]
  80. Paracchini, M.; Marcon, M.; Villa, F.; Zappa, F.; Tubaro, S. Biometric Signals Estimation Using Single Photon Camera and Deep Learning. Sensors 2020, 20, 6102. [Google Scholar] [CrossRef]
  81. Zhan, Q.; Wang, W.; De Haan, G. Analysis of CNN-based remote-PPG to understand limitations and sensitivities. Biomed. Opt. Express 2020, 11, 1268–1283. [Google Scholar] [CrossRef]
  82. Chen, W.; McDuff, D. Deepphys: Video-based physiological measurement using convolutional attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September2018; pp. 349–365. [Google Scholar]
  83. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  84. Reiss, A.; Indlekofer, I.; Schmidt, P.; Van Laerhoven, K. Deep PPG: Large-Scale Heart Rate Estimation with Convolutional Neural Networks. Sensors 2019, 19, 3079. [Google Scholar] [CrossRef] [Green Version]
  85. Available online: https://archive.ics.uci.edu/ml/datasets/PPG-DaLiA (accessed on 26 May 2021).
  86. Fernandez, A.; Bunke RB, H.; Schmiduber, J. A novel connectionist system for improved unconstrained handwriting recog-nition. IEEE Transact. Pattern Anal. Mach. Intell. 2009, 31. [Google Scholar] [CrossRef] [Green Version]
  87. Sak, H.; Senior, A.W.; Beaufays, F. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. Available online: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43905.pdf (accessed on 26 May 2021).
  88. Li, X.; Wu, X. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 19–24 April 2015; pp. 4520–4524. [Google Scholar]
  89. Lee, E.; Chen, E.; Lee, C.Y. Meta-rppg: Remote heart rate estimation using a transductive meta-learner. arXiv 2020, arXiv:2007.06786. [Google Scholar]
  90. Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
  91. Yu, Z.; Peng, W.; Li, X.; Hong, X.; Zhao, G. Remote Heart Rate Measurement from Highly Compressed Facial Videos: An End-to-End Deep Learning Solution with Video Enhancement. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 151–160. [Google Scholar]
  92. Yu, Z.; Li, X.; Zhao, G. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. arXiv 2019, arXiv:1905.02419. [Google Scholar]
  93. Perepelkina, O.; Artemyev, M.; Churikova, M.; Grinenko, M. HeartTrack: Convolutional Neural Network for Remote Video-Based Heart Rate Monitoring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 288–289. [Google Scholar]
  94. Bousefsaf, F.; Pruski, A.; Maaoui, C. 3D Convolutional Neural Networks for Remote Pulse Rate Measurement and Mapping from Facial Video. Appl. Sci. 2019, 9, 4364. [Google Scholar] [CrossRef] [Green Version]
  95. Liu, S.-Q.; Yuen, P.C. A General Remote Photoplethysmography Estimator with Spatiotemporal Convolutional Network. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; pp. 481–488. [Google Scholar]
  96. Bobbia, S.; Macwan, R.; Benezeth, Y.; Mansouri, A.; Dubois, J. Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognit. Lett. 2019, 124, 82–90. [Google Scholar] [CrossRef]
  97. Song, R.; Zhang, S.; Li, C.; Zhang, Y.; Cheng, J.; Chen, X. Heart rate estimation from facial videos using a spa-tiotemporal representation with convolutional neural networks. IEEE Trans. Instrum. Meas. 2000, 69, 7411–7421. [Google Scholar] [CrossRef]
  98. Huang, B.; Lin, C.-L.; Chen, W.; Juang, C.-F.; Wu, X. A novel one-stage framework for visual pulse rate estimation using deep neural networks. Biomed. Signal Process. Control 2021, 66, 102387. [Google Scholar] [CrossRef]
  99. McDuff, D.; Blackford, E. iPhys: An Open Non-Contact Imaging-Based Physiological Measurement Toolbox. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 6521–6524. [Google Scholar]
  100. Available online: https://www.idiap.ch/software/bob/docs/bob/docs/stable/index.html# (accessed on 26 May 2021).
  101. Tsou, Y.Y.; Lee, Y.A.; Hsu, C.T.; Chang, S.H. Siamese-rPPG network: Remote photoplethysmography signal estimation from face videos. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 2066–2073. [Google Scholar]
  102. Wang, Z.-K.; Kao, Y.; Hsu, C.-T. Vision-Based Heart Rate Estimation via a Two-Stream CNN. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 3327–3331. [Google Scholar]
  103. Jaiswal, K.B.; Meenpal, T. Continuous Pulse Rate Monitoring from Facial Video Using rPPG. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–5. [Google Scholar]
Figure 1. PPG signal processing framework.
Figure 1. PPG signal processing framework.
Sensors 21 03719 g001
Figure 2. Illustration of rPPG generation: diffused and specular reflections of ambient illuminance are captured by a camera with the diffused reflection indicating volumetric changes in blood vessels.
Figure 2. Illustration of rPPG generation: diffused and specular reflections of ambient illuminance are captured by a camera with the diffused reflection indicating volumetric changes in blood vessels.
Sensors 21 03719 g002
Figure 3. rPPG or contactless PPG image processing framework: signal extraction step (ROI detection and tracking), signal estimation step (filtering and dimensionality reduction), and heart rate estimation step (frequency analysis and peak detection).
Figure 3. rPPG or contactless PPG image processing framework: signal extraction step (ROI detection and tracking), signal estimation step (filtering and dimensionality reduction), and heart rate estimation step (frequency analysis and peak detection).
Sensors 21 03719 g003
Figure 4. EVM-CNN modules.
Figure 4. EVM-CNN modules.
Sensors 21 03719 g004
Figure 5. rPPG using SPAD camera.
Figure 5. rPPG using SPAD camera.
Sensors 21 03719 g005
Figure 6. DeepPhys architecture.
Figure 6. DeepPhys architecture.
Sensors 21 03719 g006
Figure 7. Deep PPG architecture.
Figure 7. Deep PPG architecture.
Sensors 21 03719 g007
Figure 8. HR-CNN modules.
Figure 8. HR-CNN modules.
Sensors 21 03719 g008
Figure 9. Process of generating synthetic data.
Figure 9. Process of generating synthetic data.
Sensors 21 03719 g009
Figure 10. Architecture of STVEN-rPPGNet.
Figure 10. Architecture of STVEN-rPPGNet.
Sensors 21 03719 g010
Figure 11. Architecture of iPPG-3D-CNN.
Figure 11. Architecture of iPPG-3D-CNN.
Sensors 21 03719 g011
Figure 12. Architecture of PhysNet.
Figure 12. Architecture of PhysNet.
Sensors 21 03719 g012
Figure 13. Architecture of Meta-rPPG.
Figure 13. Architecture of Meta-rPPG.
Sensors 21 03719 g013
Figure 14. Averaged heart rate measurement of all the subjects in the test set. The vertical axis indicates the heart rate for each method in bpm. Each bar shows the mean and the standard deviation of a method. The first bar from the left indicates the reference.
Figure 14. Averaged heart rate measurement of all the subjects in the test set. The vertical axis indicates the heart rate for each method in bpm. Each bar shows the mean and the standard deviation of a method. The first bar from the left indicates the reference.
Sensors 21 03719 g014
Figure 15. Heart rate bar charts of all the subjects in the test set for the four compared deep learning methods. Each part of figure (aj) corresponds to one subject in the test set. The vertical axis indicates the heart rate in bpm. The mean and the standard deviation of each subject are specified in separate bar charts. In each chart, the first bar from the left indicates the reference for a subject.
Figure 15. Heart rate bar charts of all the subjects in the test set for the four compared deep learning methods. Each part of figure (aj) corresponds to one subject in the test set. The vertical axis indicates the heart rate in bpm. The mean and the standard deviation of each subject are specified in separate bar charts. In each chart, the first bar from the left indicates the reference for a subject.
Sensors 21 03719 g015
Table 1. Previous survey papers on contact-based PPG methods.
Table 1. Previous survey papers on contact-based PPG methods.
EmphasisRefYearTask
Contact[17]2007Basic principle of PPG operation, pulse wave analysis, clinical applications
Contact
ECG and PPG
[18]2018Breathing rate (BR) estimation from ECG and PPG, BR algorithms and its assessment
Contact[22]2020Approaches for PPG-based atrial fibrillation detection
Contact
Wearable device
[20]2019PPG acquisition, HR estimation algorithms, developments on wrist PPG applications, biometric identification
Contact
ECG and PPG
[19]2012Accuracy of pulse rate variability (PRV) as an estimate of HRV
Contact
Wearable device
[21]2018Current developments and challenges of wearable PPG-based monitoring technologies
Contact
Blood pressure
[23]2015Approaches involving PPG for continuous and non-invasive monitoring of blood pressure
Table 3. Deep learning-based contactless PPG methods.
Table 3. Deep learning-based contactless PPG methods.
FocusRefYearFeatureDataset
End-to-end system
Robust to illumination changes and subject’s motion
[46]2018A two-step convolutional neural network composed of
an extractor and HR estimator
COHFACE
PURE
MAHNOB-HCI
Signal estimation enhancement[77]2019Eulerian video magnification (EVM) to extract face color changes and using CNN to estimate heart rateMMSE-HR
3D-CNN for signal
extraction
[79]2020Using deep spatiotemporal networks for contactless HRV measurements from raw facial videos;
employing data augmentation
MAHNOB-HCI
Single-photon camera[80]2020Neural network for skin detectionN/A
Understanding of
CNN-based PPG methods
[81]2020Analysis of CNN-based remote PPG to understand
limitations and sensitivities
HNU
PURE
End-to-end system
Attention mechanism
[82]2018Robust measurement under heterogeneous lighting
and motions
MAHNOB-HCI
End-to-end system
Real-life conditions dataset
[84]2019Major shortcoming of existing datasets: dataset size,
small number of activities, data recording in
laboratory setting
PPG-DaLiA
Synthetic training data
Attention mechanism
[93]2020CNN training with synthetic data to accurately
estimate HR in different conditions
UBFC-RPPG
MoLi-ppg-1
MoLi-ppg-2
Synthetic training data[94]2019Automatic 3D-CNN training process with
synthetic data with no image processing
UBFC-RPPG
End-to-end supervised learning approach
Meta-learning
[89]2017Meta-rPPG for abundant training data with a distribution
not deviating too much from distribution of testing data
MAHNOB-HCI
UBFC-RPPG
Counter video
compression loss
[91]2019STEVEN for video quality enhancement
rPPGNet for signal recovery
MAHNOB-HCI
Spatiotemporal network[92]2019Measuring rPPG signal from raw facial video;
taking temporal context into account
MAHNOB-HCI
Spatiotemporal network[95]2020Spatiotemporal convolution network,
different types of input skin
MAHNOB-HCI
PURE
Table 4. Overview of the selected network parameters.
Table 4. Overview of the selected network parameters.
MethodNetwork Architecture
STVEN-rPPGNetModuleLayerKernel
STVENConvolution 13 × 3 × 7
Convolution 23 × 4 × 4
Convolution 34 × 4 × 4
Spatiotemporal block[3 × 3 × 3] × 6
Deconvolution 14 × 4 × 4
Deconvolution 21 × 4 × 4
Deconvolution 31 × 7 × 7
rPPGNetConvolution 11 × 5 × 5
Spatiotemporal block[3 × 3 × 3] × 4
Spatial global average pooling1 × 16 × 16
Deconvolution 11 × 1 × 1
iPPG-3 DCNN Convolution 158 × 20 × 20
Max pooling2 × 2 × 2
Dense512
Dense76
PhysNet Convolution 11 × 5 × 5
Max pooling1 × 2 × 2
Convolution 23 × 3 × 3
Convolution 33 × 3 × 3
Spatial global average pooling
Convolution 41 × 1 × 1
Meta-rPPG Convolution 13 × 3
Convolution 23 × 3
Convolution 33 × 3
Convolutional EncoderConvolution 43 × 3
Convolution 53 × 3
Average pooling2 × 2
rPPG EstimatorBidirectional LSTM---
Linear---
Ordinal---
Synthetic Gradient GeneratorConvolution 13 × 3
Convolution 23 × 3
Convolution 33 × 3
Convolution 43 × 3
Table 5. Objective metrics for the four compared deep learning methods.
Table 5. Objective metrics for the four compared deep learning methods.
Subject #MethodHR (bpm)
MAEMSESD
Subject 1rPPGNet3.2211.413.93
3D-CNN3.7514.923.86
PhysNet2.537.313.96
Meta-rPPG4.0917.673.95
Subject 2rPPGNet2.727.823.82
3D-CNN2.878.813.93
PhysNet2.255.473.79
Meta-rPPG3.1810.714.01
Subject 3rPPGNet3.1211.142.32
3D-CNN3.4313.282.33
PhysNet2.748.742.42
Meta-rPPG3.6314.782.36
Subject 4rPPGNet2.637.791.79
3D-CNN2.748.421.74
PhysNet2.145.481.75
Meta-rPPG2.838.961.77
Subject 5rPPGNet2.828.905.48
3D-CNN2.969.725.50
PhysNet2.386.665.54
Meta-rPPG3.2211.375.48
Subject 6rPPGNet3.7615.095.71
3D-CNN4.2118.915.66
PhysNet2.939.265.95
Meta-rPPG4.5622.345.63
Subject 7rPPGNet3.4212.408.79
3D-CNN3.8515.788.66
PhysNet2.919.048.94
Meta-rPPG4.0117.028.72
Subject 8rPPGNet3.6614.514.87
3D-CNN3.9316.824.92
PhysNet3.1811.214.92
Meta-rPPG4.2019.074.96
Subject 9rPPGNet2.245.493.47
3D-CNN2.526.763.47
PhysNet2.044.763.55
Meta-rPPG2.788.133.58
Subject 10rPPGNet3.1410.745.65
3D-CNN3.3612.345.63
PhysNet2.607.635.77
Meta-rPPG3.6714.605.62
Averaged across all subjectsrPPGNet3.0710.534.58
3D-CNN2.9812.584.57
PhysNet2.577.564.66
Meta-rPPG3.6214.474.61
Reference value 000
Table 6. Latency or computation time for the four compared deep learning methods.
Table 6. Latency or computation time for the four compared deep learning methods.
MethodrPPGNet3D-CNNPhysNetMeta-rPPG
Time1.12 (s)0.74 (s)1.19 (s)1.7 (s)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ni, A.; Azarang, A.; Kehtarnavaz, N. A Review of Deep Learning-Based Contactless Heart Rate Measurement Methods. Sensors 2021, 21, 3719. https://doi.org/10.3390/s21113719

AMA Style

Ni A, Azarang A, Kehtarnavaz N. A Review of Deep Learning-Based Contactless Heart Rate Measurement Methods. Sensors. 2021; 21(11):3719. https://doi.org/10.3390/s21113719

Chicago/Turabian Style

Ni, Aoxin, Arian Azarang, and Nasser Kehtarnavaz. 2021. "A Review of Deep Learning-Based Contactless Heart Rate Measurement Methods" Sensors 21, no. 11: 3719. https://doi.org/10.3390/s21113719

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop