Classification of Motor Imagery EEG Based on Sparsification and Non-negative Matrix Factorization

. The analysis of EEG is a hot topic in the area of biomedical signal processing. In this paper, the EEG signals with Mu (μ) rhythm and Beta (β) rhythm are used to solve the motor imagery problem, i


Introduction
Biomedical signal is a weaker low frequency signal in the background of strong noise. It is a natural unstable signal which is emitted by complex living organisms. From the characteristics of the signal, it's detection methods and processing technology are different from the general signal. From the nature of electricity, it can be divided into electrical signals and non-electrical signals, such as ECG, EMG, EEG and other electrical signals. Body temperature, blood pressure, respiration, blood flow, pulse, and heart sounds belong to non-electrical signals [1]. EEG is a kind of bio-electric activity with rhythm and spontaneity. It is closely linked with life. Once the organism dies, the electricity phenomenon disappears [2].
In the practical application of brain-computer interface, there are three kinds of experimental models: motion imagination, steady-state visual evoked(SSV-EP) [3] and P300 [4], which SSVEP and P300 are based on the induced pattern of EEG. The essence of the movement imagination of EEG is a spontaneous EEG, in this dynamic state can not feel the muscle movement. Some studies have shown that the area of sensory motor cortex activated by motor imagination is the same as the cortex area of sensory motor brain activated by actual physical exercise [5]. If one does not actively participate in motor activities, feel the processing or imagine such movements and processing [6], one will show signify-cant oscillations in the sensory area of the 8Hz-12Hz and 18Hz-25Hz frequency band EEG. This oscillation, commonly referred to as Mu (μ) rhythms (8Hz-12Hz) and Beta(β) rhythms (18Hz-25Hz), that is generated by the circuit of the thalamic cortex [7]. In the exercise of imagination, ipsilateral hemispheric μ rhythm and β rhythm are inhibited, while the countermeasures of halfbrain μ rhythm and β rhythm increases. This phenomenon is event-related desynchronization (ERD) or event-related synchronization (ERS) [8,9].
In recent years, feature extraction methods based on EEG signals have been rapidly developed. These studies mainly use different feature extraction techniques to evaluate different classifiers. Recently, Zhao et al. calculated the energy difference between the μ rhythm and the β rhythm through power spectrum analysis to extract the signal features [10]. This method has the advantages of clear objective and simple calculation and is conducive to the realization of online BCI system. Sun et al. presents a method of online classification for BCI based on common spatial pattern(CSP) for feature extraction and support vector machine as a classifier [11]. Zhang et al. proposed a semi-supervised feature learning method based on deep stack network [12]. This method combines unsupervised learning of restricted boltzmann machine with gradient descent algorithm based on batch mode, which need two stages of pre-training and finetuning. However, these algorithms all have some limitations, such as poor classification accuracy or need labels as supervisory information.
Recently, non-negative matrix factorization(NMF) is commonly used to classify EEG signals. Lee et al. exploited non-negative matrix factorization to select discriminative features in the time-frequency representation of EEG [13]. The non-negative matrix factorization method was used to extract attention-related EEG features [14]. In this paper, we propose an unsupervised feature extraction method called sparsification and nonnegative matrix factorization. The central idea of the method is that since the characteristics of motor imaging EEG focus on the μ rhythm and β rhythm, the signal is MATEC Web of Conferences 160, 07007 (2018) https://doi.org/10.1051/matecconf/201816007007 EECR 2018 first filtered by a 8-25 Hz band-pass filter and then the sparsity of the matrix, increases by maximizing the feature difference, extracting distinguishable features with matrix decomposition to extract, and finally clustering these features. The main advantage of this paper is that non-negative matrix factorization is an unsupervised learning process that is widely used and does not require known signals as oversight. After the feature difference is maximized, the sparsity of the matrix is increased, the computational complexity of nonnegative matrix factorization is reduced, and the efficiency of the algorithm and the accuracy of the classification are improved. Experiments show that we method based on non-negative matrix decomposition to learn the discriminable basis vectors, which is very suitable for offline analysis of motor imaging EEG in many experiments.
The structure of this paper is as follows. In the second part, we introduces how to increase the matrix sparsity, and the background of non-negative matrix factorization, and explain the main idea of the algorithm. The real data experiment results are shown in Section 3, and the conclusions are presented in Section 4.

Sparsification
In the experiment, we distinguish the left-right movement imagination by ERD and ERS. In order to improve the efficiency and classification accuracy of the algorithm, we first increase the signal sparsity by maximization of feature difference.
Suppose the k -th experimentally measured EEG sig- It consists of the signal with n channels, the length of time is t , and the sparsification process is as follows: 1. Find the covariance matrix of left and right hand motion imagination respectively l C and r If the k -th example is a left-hand imagination movement, The eigenvalue decomposition of the covariance matrix is represented as where  is the eigenvalue diagonal matrix, c O is the corresponding eigenvector matrix.
3. Construct a whitening matrix, convert the covariance matrix l C and l C do the corresponding eigenvalued ecomposition, described by According to the principle of simultaneous diagonaliza-tion of the matrix, l S and r S have the same eigenvector, it's O , and their sum of corresponding eigenvalues is 1 , which is: where, I is the identity matrix. And so, is the space filter that we want. Since the sum of the eigenvalues is 1, if l S has the maximum eigenvalue, then the corresponding r S has the smallest eigenvalue, and vice versa. So distinguishing between these two sets of data can be implemented with the feature vector M . The first and last eigenvectors of the EEG projection into M will give the least squares meaning to distinguish the best features of the two types of EEG. According to the projection matrix, single trial EEG experiment can be decomposed into: In this way, the characteristics used for classification can be obtained from the V variance of the first and last line.
It can be seen that, after the maximization of feature difference, the sparsity of the matrix is well increased.

EECR 2018
After sparsification of motor imagery EEG signals, the signals of different channels in different experiments were respectively classified into different matrices, and the features were extracted by NMF. Finally we conducted clustering analysis of the features of matrix.

NMF
Non-negative matrix factorization(NMF) has been introduced as a matrix factorization tool that produces a useful decomposition in the dimensionality reduction of data set. The non-negative matrix factorization can be described as follows: It is shown that the cost function defined by Eq. (9) is a convex function of W and H separately. Therefore, multiplicative and alternating update rules for H and W are derived by gradient decent optimization [16]. The update rules is used to gradient descent algorithm. It is proven by LEE and SEUNG [16] The update rule of matrix H is given by Eq. (11).
The update rule of matrix W is given by Eq. (12).
Our signal classification process is divided into the following steps: Step 1: Filtering the original signal X through an 8-25 band-pass filter to obtain the signal of the frequency band related to the characteristic, denoted as E .
Step 4: Clustering of 3 C H and 4 C H .

Experimental results
In this section, we investigate the use of our proposed NMF algorithm for data clustering. Several experiments are carried out to show the effectiveness of our algorithm for motor imaging EEG classification. For our empirical study, we used one of BCI competition 2003 data sets, which was provided by the department of medical informatics, institute for biomedical engineering, Graz university of technology, Austria [17]. The data set involves left-right imagery hand movements and consists of 140 labeled trials for training and 140 unlabeled trials for test. Each trial has a duration of 9 seconds, where a visual cue (arrow) is presented pointing to the left or the right after 3-second preparation period and imagination task is carried out for 6 seconds. It contains EEG acquired from three different channels (with sampling frequency 128 Hz) C3, Cz and C4. In our study we use only two channels, C3 and C4, because ERD has contralateral dominance and Cz channel contains little information for discriminant analysis.

Performance evaluation and comparisons
In this paper, we use a sparsification measure based on the relationship between the C3 and the C4 channel. We compare the motor imaging EEG with the maximization of feature difference to those without the maximization of feature difference, as shown in the Fig.1.
Divide the data with the maximization of feature difference into two matrices, 3 C V contains the signal of the C3 channel of the left and right hand motion imaging signal, 4 C V contains the signal for C4 channel , 3 C V and 4 C V , respectively, through the NMF feature extraction and classification, inferred accuracy of their classification.
The test set in the dataset was extracted and classified separately by CSP and NMF, and compared with the method we proposed, as shown in Table 1.  To show the data classification performance, we compare our algorithm with other related methods on BCI competition 2003 data sets. The detailed classification accuracy and comparison shown in Table 1.As we can see, when only CSP and NMF were used, the average classification accuracy of the NMF was 2.13% higher than the CSP. And the accuracy of NMF classification was improved by 5.01% after the sparsification of data.

Conclusion
In this paper, we propose a non-negative matrix factorization method to classify motor imagination EEG. Based on the sparse degree of matrix and the selection of dimension will affect the efficiency of the algorithm and the accuracy of the classification. Firstly, the method of maximization of feature difference is used to increase the sparsity of the matrix; Secondly, the data of training set and test set of BCI competition in 2003 data are respectively classified by our method. Experiment shows：the average classification accuracy of test set was 91.43%, the highest single class motion imaging accuracy reached 94.28%.