Support vector machine filtering data aid on fatigue driving detection

. This paper proposes an assumption that filtering out the confusing “awake” data from fatigue driving detection model promotes the accuracy of detection of “drowsy” status under real driving situation. Instead of focus on both “drowsy” and “awake” driving status, we set our first priority to alarm “drowsy” and temporarily ignore the accuracy of “awake” status recognition. The Support Vector Machine as a good classifier is employed for data filtering, provides more efficient training data and removes the data that may confuse the detection model. The results prove our assumption by 72% accuracy on “drowsy” recognition, which is higher than 38% recognition performed by detection without SVM filtering. In addition, the size of training samples after filtering for conducting detection model is extremely smaller than no filtering.


Introduction
The fatigue driving as known as drowsy driving problem creates a huge risk on the road all over the world. Fatigue driving makes drivers lose attention to the road, slows reaction time when drivers brake or steer suddenly, affects driver's ability to make correct decisions. It's reported that 4% adult drivers have fallen asleep when driving in the previous 30 days [1]. Nobody would know exactly when sleep comes over. It is clearly dangerous when drivers fall asleep at the wheel. Untreated sleep disorders, medications, drinking alcohol, or shift work all possibly cause fatigue driving. However, if there is an efficient alarm system for fatigue driving, the accident number will drop down without any doubt.
The current research on Fatigue driving detection can be categorized into four technologies, which includes steering pattern monitoring, vehicle position in lane monitoring, driver eye/face monitoring, physiological measurement [2].
Steering pattern monitoring analyzes steering wheel movement. The steering wheel movements do not get influence from changes in driver's or vehicle's accessories, weather etc. for all drivers. But in order to achieve significant results, the experiment should be conducted on real traffic conditions, the actions carried out on the simulators provide limited number of degrees of the driver's fatigue [3]. Vehicle position in lane monitoring focuses on analysis of vehicle deviation from the lane position and speed change of the vehicle. If the lane position changes cross a specified threshold, it indicates a significantly increased probability that the driver is drowsy. But people have different driving behaviors. The threshold is very difficult to determine [4]. Driver eye/face monitoring also highly relies on drivers' behavior, and even yawning, eye closure, eye blinking, head pose can be monitored easily through camera [4]. Physiology-based data measure the correlation between physiological signals ECG (Electrocar-diogram) and electroencephalogram (EEG), and drowsiness is able to be detected through pulse rate, heart beat and brain information with sensors. ECG and EEG monitoring presents adjust signals for optimal recording continuously, but only limited channel can be monitored by sensors [5].
Machine learning is seen an efficient way to enhance the performance of detection methods. Gabor wavelets, fast Fourier transform and linear discriminant analysis extraction methods [6], artificial neural networks and support vector machine classification has been popular employed in this field [7].
Due to the limitation of the invasive detection, this paper uses Steering Wheel Angle (SWA) data. SWA is a reliable, non-invasive steering pattern data which can be collected by sensors embedded in various places inside a vehicle. It can acquire the operating information accurately and in real time. In Li's work [8][9][10], using SWA data has successfully promoted the performance of fatigue driving detection methods.
In this paper, the time series data firstly is bladed to different size of windows (10, 20, 50 and 100 million seconds) with labeled fatigue or non-fatigue. Then the energy of each observation point is calculated to enlarge the character of the sliding windows. In order to keep the high accuracy of fatigue pattern recognition, the SVM is used to filter the data influence the detection result out. SVM is also employed as the classifier of this experiment.
The paper is structured as follows. Section II gives a brief review of popular methods of fatigue driving detection. Section III introduces the methodology of this paper. The experiment is presented in section IV. Section V summaries this work.

Method
The proposed method to determine the driver's status is based on the processing of SWA time series and consists in five steps: (i) selecting an appropriate continuously sliding window and fitting range of continuous sequence of SWA data, (ii) forming training data by firstly isolating all data labeled as "drowsy" out for first part of training samples and randomly selecting a fixed amount of data labeled as "awake" for the second part of training samples, (iii) enlarging the data feature by energy in each time point, (iv) filtered noise samples from data labeled as "awake" by SVM, (v) training the classifier (SVM training model) based on the felted training data.

Time series analysis
Given a time series data X { (1), (2),..., ( ),..., denotes an data sample consisting of N time points (million seconds) and T represents elapsed time. A single point is no sense for fatigue driving detection, but a time series could be very long, sometimes containing millions of observations. Figure.1 shows the procedure of sliding windows subsequence extraction with any of the real-valued representations. As a result, we store all slided subsequences. x t t w    represents a subsequence. Note that the corresponding label i y is included in the calibration data according to status of driver for each subsequence.

New training data
The data therefore has been formed as:  (1) In order to maintain the high accuracy of detecting the fatigue data, the data labeled as fatigue is isolated from all training data C . Due to the accuracy of "awake" data is not high priority in this paper, we randomly select a fixed number (10% -20%) from rest data labeled as "awake" to the new training data D . The rest data E with all label as "awake" is separated. Note that C DE  .

Enlargement of feature
In this research, enlarging the differences of character for each observed n c gives more efficient information rather than filtering them by FFT or DFT. In (2), we enlarge the feature of each n c .
The new training set D after normalization therefore reformed as: is applied on SVM to transform the input space to a higher dimensional feature space so that the classes may be linearly separable prior to calculate the separating hyperplane. In this work, air quality data is considered as a linearly inseparable case, and only Gaussian RBF kernel function was attempted for emission sources detection due to its good generalization and without the guidance from those prior experiences.
The normal form of SVM classifier is defined as follows: where "  " means a dot product and

Quality of labeling criteria
In order to ensure the quality of data labeling, three experts are involved in our experiment. The experts provided us accurate and reliable measuring method to evaluate the real fatigue level of the driver via video surveillance. They firstly clipped facial videos into 1-min segments, also bladed the operating feature data by starting time and ending time according to the video clips. They evaluate all facial video samples based on criteria in table 1 from Li's work according to the time windows after [8]. Finally, they negotiate the driver's states of each sliding window. If the driver's states cannot be agreed, the sample would be discarded.

Results and discussion
The training and testing data are randomly select 1% from given data set, because the purpose of this paper is to approve the assumption. We run the experiment 10 times to perform an average result. We set 70% of data to be training data, and 30% of them to be testing data.
The averaged accuracy of detection without SVM filtering is 82.73%, which is much outperforming the 67.99% detection rate with SVM filtering. However, if we compare the confusion matrix in Table 1 and Table 2. The "drowsy" status level detection accuracy is only 38% with 5827 training samples. The experiment conducted with SVM data filtering provides 72% with 3429 training samples. From the result, the accuracy of "drowsy" status detection without SVM filtering is much higher than with SVM filtering. But the purpose of "drowsy" driving detection is alarm to driving under tired condition, high "awake" status detection does not solve this problem. The SVM filtering training data keeps as much information as the training model required. Therefore, the detection accuracy of "drowsy" status outperforms the detection without SVM filtering.

Conclusion and future work
This paper proposes an assumption to keep enough information of "drowsy" status for fatigue driving detection outperforms conducting detection model with all data. It is worthy to note that detection of "drowsy" status is more important than "awake" detection. Because a more "drowsy" situation alarm saves driver life.
The result from our experiments is that the SVM filtering training model does not provide detection accuracy as high as without filtering in the overall detection. But it outperforms on the "drowsy" status detection. The training model with less confusion of "drowsy" status assists in alarming the driver under a potentially harmful situation.
This work is the beginning of our research. We have made the first assumption and proved it in this paper. However, this work has lots improvement waiting to be performed. We randomly selected 20% of data labelled as "awake", it is for sure if there is a boundary exploration between the first selected "awake" and "drowsy" training data. The energy of each time point enlarge the feature of each sliding windows. Gabor wavelet also can be employed to future experiment. This work is supported by NSF China with 61873043, in part by Natural Science Foundation of Chongqing with cstc2018jcyjAX0048, in part by the Science Technology Research Project of CQJW (KJQN201901530), and in part by the Campus Research Foundation of CQUST (CK2017zkyb024).