Adaptive Skin Color Detection Based on Human Face under Complex Background

, we tried to improve the adaptive skin color detection methods in this study. First, the improved adaptive binarization method is adopted to extract the skin color pixels of human face. The influence of shadow is fully considered in the improved binarization, so that the extracted information is complete. Then, the histogram backprojection algorithm of Cb-Cr component in the YCbCr color space is combined with the scope of luminance component (Y) to classify the extracted region. In this way, the luminance information is taken into consideration, which contributes to the robustness of illumination and shadow, and decreases the missing rate. Moreover, the skin section in an image can be detected accurately, which is a solution to the issue of interference caused by background similar to skin.


INTRODUCTION
Skin color detection is an important topic in image processing.It makes contributions to various fields of computer vision such as face detection and recognition, gesture recognition, human-computer interaction and screening for objectionable images based on contents.With the skin color detection technique, we can effectively eliminate the interference from background, reduce the scope of processing, and increase working efficiency and the accuracy of results.
Amidst the studies on skin color detection, Duan et al. [1] proposed a parametric method combining YUV and YIQ color space to fix the color component threshold and to detect the skin color pixels.Peer et al. [2] discovered that the skin color pixels have more superior clustering performance in other color spaces.After transforming the skin color pixels to the YCbCr color space, the components of Cb and Cr will cluster on a 2D plane in the shape similar to ellipse.Accordingly, Peer et al. conducted the fitting of skin color pixels on the Cb-Cr plain to construct an elliptic boundary model, thus achieving the differentiation of skin color and non-skin color.Although the method determining a fixed boundary is convenient and is effective to detect pixels of skin color, the differentiation effect is not satisfactory for the background having similar color to the skin.
To solve the problem of interference caused by complex background, adaptive skin color detection methods [3,4,5] have been proposed continually in recent years.In these methods, facial skin color pixels were used as the learning sample, and the Gaussian model or the weighted average Mahalanobis distance was adopted to judge whether other pixels are from skin or not.However, the methods for extracting skin color pixels were not satisfactory.Literature [4] adopted fixed threshold to extract skin color pixels, while literature [5] applied the Sobel edge detection.When extracting skin pixels with the two approaches, the robustness of shadow processing was not good, and the information was not correct.In the process of adaptive learning, the training of Gaussian model will be time-consuming, and cannot be used for real-time demands.Besides, the extraction of skin color pixels is the training of small sample size.Gaussian model, which is suitable for big sample environment, does not fit this situation.
In addressing the above issues, we tried to improve the adaptive skin color detection methods in this study.First, the improved adaptive binarization method is adopted to extract the skin color pixels of human face.The influence of shadow is fully considered in the improved binarization, so that the extracted information is complete.Then, the histogram backprojection algorithm of Cb-Cr component in the YCbCr color space is combined with the scope of luminance component (Y) to classify the extracted region.In this way, the luminance information is taken into consideration, which contributes to the robustness of illumination and shadow, and decreases the missing rate.Moreover, the skin section in an image can be detected accurately, which is a solution to the issue of interference caused by background similar to skin.Traditional skin color detection technique has a weak anti-interference ability against the pixels with color similar to skin under a complex background, and cannot reduce the influence of illumination on the characteristics of skin color.In this study, an adaptive skin color detection method is proposed to tackle the issues.The skin section containing illumination information is extracted by combining the face detection methods proposed by Haar and Adaboost, and using the improved binarization algorithm.Then, combining the best threshold of luminance component (Y) of skin color samples obtained after training in the YCbCr space, the improved histogram backprojection method is adopted to detect the skin color of the whole image.Experiments show that the method is robust under complex background and the influence of illumination.Moreover, the method has a higher accuracy and recall rate than traditional skin color detection methods.algorithms and the related thresholds using the confirmable information of the image to be detected.The updating result is adopted for the re-detection of the original image to obtain a more accurate outcome.The framework of the proposed algorithm is shown as Figure 1.The objective of face detection is to partition the confirmed skin section and eliminate the interference from the background at the edge and the hair.First, the face detection method of Adaboost and Haar [6] is employed to detect the size and position of the human face.Then, in accordance with the standard face features as shown in Figure 2, a square area covering the face is drew with the center of the circumcircle of face, O, as the center of this area, and the radius of the circumcircle, r, as the side of length.In this way, the desired objective can be achieved.

Elimination of Non-Skin Section
The non-skin pixels in the area, such as eyebrow, eye, and the shadow of nose, are eliminated.To guarantee the correctness of the extracted skin pixel samples, the face section is divided into three categories: the white of eyes and the teeth; eyebrow, eyeball and the shadow of nose; skin color section.Experience shows that the colors of the above three categories lie in three intervals, which is suitable for comparison and exclusion.
First, the pixels of the white of eyes and the teeth are transformed.The values of three components of white color in RGB space are close and approach to 255, while for the skin color, R(red) > G(green) > B(blue) and the values are close.Therefore, the pre-attribute of pixel P can be judged by setting a limiting condition.Where, Value (R), Value (G) and Value (B) denote the values of three components of the pixel in RGB space, respectively.
By annotating the skin sections of 26 face images, 229,736 skin color pixels were obtained.The teeth and the white of eyes in 9 of the images were clear, and 546 non-skin pixels were extracted.By dividing the pixels according to the statistical rules above, it was derived that the two types can be distinguished perfectly with threshold t=5.
The next step is to eliminate the eyeball, eyebrow and the shadow of nose.The V component in the HSV space, which denotes the brightness of color and has the range of 0 (black) -255 (white), is processed.Besides, an improved adaptive thresholding method for binarization is utilized to eliminate the interference of non-skin section.Otsu algorithm (maximization of interclass variance) is the traditional adaptive thresholding method for binarization.It has an outstanding performance on images having a unimodal interclass variance.Nevertheless, if there exists the influence of shadow, the interclass variance will be bimodal or multimodal, and the division performance will be unsatisfactory.As shown in Figure 3, the right part with weak lightness is directly binarized to non-skin color, losing mass effective information.To solve the problem, the local binarization principle is utilized.The face image is divided into 15 rectangular sections by taking the size of an eye as template to form a window, for which the length and height are 1/3 and 1/5 of the length of side of the square area mentioned above, respectively.Then the best threshold, t, is calculated for each of the rectangular section with Otsu algorithm.The process is shown as follows.
Step 1.The image is divided into 15 sections by 1/3 of the width and 1/5 of the height.Step 2. The gray-scale histogram of the divided image, i S , is calculated, and the histogram is sub- jected to normalization.
Where, M is the number of all pixels, j n is the number of the pixels with gray scale being j, and j P is the probability that the gray scale of a pixel is j.
Step 3.With the threshold t, the gray scale can be divided into two categories, C 1 and C 2 .The probabilities for the two categories The maximal value of interclass variance V in- dicates that the difference between the two classes is the greatest.The threshold t corresponding to this situation is the best division threshold.The right dark part is clearly improved.However, the sections 1, 2, 7, 10 and 11 in the figure which were previously identified as smooth skin color were divided incorrectly after the binarization, causing the loss of some information.Hence, to guarantee the completeness of information, the lost information should be retrieved by performing geometric operation on the locally binarized result and the globally binarized result, as shown in Figure 5.

Improved Histogram BackProjection Algorithm
With the blue-difference chroma (Cb) and red-difference chroma (Cr) components in the YCbCr color space as reference, the histogram backprojection  Hence, the threshold standard for Y component is added to the back projection to search in a certain scope.Thus, the missing rate will not increase, while the false positive rate will decrease.The specific processes are as follows: Step 3.Meanwhile, the histogram of Cb-Cr component is established for all pixels in the test image.
Step 4. For the bin of the histogram where the pixel h w P , of the test image is located, the value of the corresponding bin is inquired for the pixel in the sample histogram model.Besides, the Y component of the pixel to be checked is compared with the corresponding , then the value of bin does not change; otherwise, it is changed to 0.
Step 5.The value of the pixel h w P , in the original image is replaced by the inquired value of bin, which is the process of back projection.
Step 6.The above process of back projection is applied to each pixel in the test image.The result is the skin color probability obtained after the projection of test image according to the sample skin color.

Result of Improved Algorithm
Taking Figure 7(a) as an example, the relationship between the detection accuracy T (formula 6) and the threshold E , and the effect of skin color detection can be obtained using the improved algorithm, as shown in Figure 8.In the figure, Y axis denotes the ratio of the sum of correct judgment of skin color and non-skin color to the total number of pixels in the test image, i.e., the detection accuracy Accuracy ( ) It can be seen that with the changing of E , the ac- curacy increases and then decreases.This conforms to the regularity above: the Y component and the mean y bin E are the same when E =0, and the missing rate is higher; when E =255, Y component is not considered, and the false positive rate is higher.Therefore, when the threshold E is 55, the detection effect is the best, as shown in Figure 9.The threshold of Y component for each test image is different.Hence, the best threshold should be found by the training with a number of images.

Data set for Experiment
The Pratheepan [11] data set of human skin color was used in this study.First, 12 human images were chosen from the 32 images in the dataset to train and obtain the optimal threshold of Y component E de- scribed in Section 1.2.The other 20 images were used as the test set to validate the result of training.The results were compared with those obtained with the traditional parametric method [1], self-learning skin color detection method (integration method) [11] and the luminance-based adaptive method [5].

Training of the Threshold of Y component and the Validation
The 12 sample images for training contained 2,459,261 pixels, and the skin pixels and non-skin pixels are annotated manually.First, the method described in Section 2.1 is adopted to detect the face in each image and the extract sample skin pixels.Then, the skin pixels extracted from each image are utilized for the projection of all pixels in the image with the method described in Section 2.2, and the projected pixels are compared with the annotated results.As a result, the relationship between detection accuracy T and the threshold E is obtained as shown in Figure 10.It can be seen that the detection effect is optimal when E [55,105].To validate the training results, the test set is used, and the method is the same with that of training pro-

Contrast Experiment
The parametric method, integration method and the luminance-based adaptive method were used for experiments with 20 images as the training, and the result was compared with that of the proposed method.Since the training result for the threshold of Y component E was [55, 105], E =80 was set during the test process.The detection results are shown in Figure 12, and the parameter results are shown in Table 1.
It can be seen from the contrast of detection effect that although the parametric method has the worst performance on background judgment, the highest recall rate in Table 1 indicates the largest number of skin pixels detected.This means that this method is suitable for large-scale detection, while the effect is not satisfactory for some environments.As for the method proposed in this study, the processing of the background and the influence of luminance are both satisfying, with remarkable improvement in both accuracy and recall rate compared with the integration method.Although the recall rate of the online luminance-based adaptive method is superior to the proposed method, the accuracy of our method is higher, and the final outcome of the comprehensive judgment standard, F value, is the highest.

CONCLUSION
In this study, an adaptive skin color detection method is proposed by combining the color component and luminance component.After detecting human face, the sample pixels are not extracted based on the skin color features.Instead, by combining facial distribution feature and the improved local adaptive binarization method, the completeness of sample extraction is guaranteed.Then, with the histogram backprojection algorithm, the interference of background pixels having similar color as the skin can be eliminated effectively.The accuracy of detection is ensured, while the false positive rate is also low.Meanwhile, the influence of luminance on skin color is also taken into consideration, which further increases the accuracy.

Keywords:
Skin detection; Local binarization; Adaptive threshold; Histogram back projection.DOI: 10.1051/ C Owned by the authors, published by EDP Sciences, 2015

Figure 1 . 1
Figure 1.Frame of the adaptive skin-detection

Figure 2 .
Figure 2. Face detection and segmentation

Figure 3 .
Figure 3.The result of traditional Otsu algorithm image and, i S denotes the image after division.

Step 5 .
The accumulated gray value T P for the whole gray scale in the scope of 0-T(255) is calculated as follows:

Figure 4 Figure 4 .
Figure 4.The result of improved Otsu algorithm to acquire the skin section and avoid the interference of background.The relevant study result of literature[8] is shown in Figure6.The clustering performance of skin color pixels is outstanding in the YCbCr color space, especially the projection on Cb-Cr plane.But the Y component has little fluctuation along the Y axis.Therefore, most approaches proposed in previous skin color detection studies were based on the Cb and Cr components.However, the skin pixels distributed in Cb-Cr plain shown in Figure 6(a) are still influenced by the Y component (the scope of Cb-Cr component is small where the luminance is relatively high and low).If only the Cb and Cr components are considered, the false positive rate will increase, as shown in Figure 6(b).If all color information in Y-Cb-Cr is used as the parameter of back projection, it is equivalent to searching the pixels having the same value as the pixel sample in the image.As a result, a large number of pixels will be undetected, as shown in Figure 7(c).

Step 1 .
A sample histogram model of Cb-Cr component is established with the N extracted skin color pixels, 2, ... , n).Step 2. The Y component is introduced.The sum of each Y component corresponding to m pixels ) 2, ... , m) in each bin of the histogram, and the mean are calculated: in the histogram.

Figure 8 .Figure 9 .
Figure 8.The relationship between the accuracy and threshold

Figure 10 .
Figure 10.The relationship between the accuracy and threshold in sample data

Figure 11 .Figure 12
Figure 11.The test result in different threshold of Y weight

Table 1 .
The result of different approaches