Skin Segmentation for Imaging Photoplethysmography Using a Specialized Deep Learning Approach

Matthieu Scherpf1, Hannes Ernst2, Leo Misera2, Hagen Malberg2, Martin Schmidt2
1TU Dresden, Institute of Biomedical Engineering, Dresden, Germany, 2TU Dresden


Objective: Imaging photoplethysmography (iPPG) enables the extraction of vital signs from standard RGB video recordings. An important preprocessing step is the selection of a suitable region-of-interest. For iPPG, the relevant information is contained in the pixels where skin is present and therefore pixels containing for example hair, textiles or background must be excluded. For this skin segmentation preprocessing step, we present a new deep learning-based approach. Methodology: We used an open-source network architecture namely Deeplab and implemented a specialized training procedure to train Deeplab for the pixelwise classification into skin and non-skin. Four databases containing a wide diversity of images with pixelwise annotations for the two classes were included within the training. For the evaluation process, we used two open-source databases containing video recordings from over 180 different subjects. For benchmark purposes, we compared our approach with two state-of-the-art approaches called Cheref and Levelset. The framewise skin masks were computed for each recording using the three approaches. Subsequently, the masked framewise mean for the green color channel for each approach was calculated and the interval-based heart rate was extracted. We then compared the three approaches regarding the mean-absolute-error (MAE) which is measured in beats-per-minute (bpm), the pearson-correlation-coefficient (PEARSONR) and the signal-to-noise-ratio (SNR) to quantify the heart rate extraction quality. Results: Our approach (MAE: 6.27 bpm, PEARSONR: 0.23, SNR: -1.72 dB) significantly outperformed state-of-the-art approaches Cheref (30.43 bpm, 0.07, -5.70 dB) and Levelset (21.65 bpm, 0.06, -3.2 dB). Conclusion: The results demonstrate the superiority of the Deeplab approach. Because of its architecture, it is capable of learning complex relationships regarding skin color and morphology. Since the Deeplab approach outputs a pixelwise probability for being skin further investigation regarding a threshold tuning is necessary.