Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition

In this paper, a new subspace speech enhancement method using low-rank and sparse decomposition is presented. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank for the underlying human speech signal. Then the low-rank and sparse decomposition is performed with the guidance of speech rank value to remove the noise. Extensive experiments have been carried out in white Gaussian noise condition, and experimental results show the proposed method performs better than conventional speech enhancement methods, in terms of yielding less residual noise and lower speech distortion. Keywordsspeech enhancement; subspace method; low-rank plus sparse decomposition.


Introduction
Speech enhancement refers to the improvement in quality and intelligibility of noise corrupted speech signals by using supervised or unsupervised speech enhancement methods.It is widely used as a pre-processing block in a lot of applications like automatic speech recognizer and other communication systems.
Over the last fifty decades, many algorithms have been proposed about for speech enhancement.The typical algorithms including spectral subtraction [1], minimum mean square error (MMSE) estimation [2][3][4], Wiener filtering [5][6][7][8], and subspace methods [9][10][11][12][13]. Spectral Subtraction and Wiener filtering have been widely used for enhancing speech because of their simplicity and ease of implementation in single channel systems but they suffer from the production of musical noise after enhancement and is one of their major drawbacks.Signal subspace approach [9][10][11][12][13], have shown to give a better compromise between less residual noise and signal distortion of the output signal, compared to the other existing techniques.
Signal subspace approach was firstly proposed by Ephraim Y, et al.The principle of this method is to separate the noisy speech observation space into a signal subspace and a noise subspace, and the enhanced speech was constructed using only the components of the signal within the signal subspace.In the subspace-based algorithms, subspace decomposition is a critical step for subspace separation, which is often performed via Karhunen-Loeve transform (KLT) [10] or singular value decomposition (SVD) [9].The main issue in developing a subspace-based model is the way of splitting and refining the signal and noise subspace in an optimal way.In [14], variance of the reconstruction error criterion was introduced to optimize the subspace selection for speech enhancement.In [15], to optimize the subspace decomposition model, human auditory psychoacoustic properties are incorporated into the subspace filter to reconstruct the enhanced signal.Although many efforts were conducts to improve the subspace methods, the existing subspace-based speech enhancement methods still suffer from the problem of low decomposition accuracy in the presence of large noise, resulting in a high remainder noise within enhanced speech in strong noise cases.
In this paper, we propose a new subspace-based method for speech enhancement based on the principle of low-rank and sparse decomposition (LSD).The main idea behind our method is motivated by the recent development of lowrank and sparse theory [16].According to this theory, if a given corrupted data matrix Y has an underlying low-rank structure, yet corrupted by sparse additive noises.The underlying low-rank component L can be effectively recovered by solving a convex optimization problem, even if the noise is arbitrary in magnitude.In the time domain, owing to the short-time stability of human speech, speech signals can be assumed to have a low-rank structure.On the other hand, due to the randomness of noise, background noise is more variable and thus can be viewed as sparse and high-rank.Thus LSD theory can be exploited to recover the underlying speech from corrupted speech signals.
The rest of the paper is organized as follows.We first briefly review the previous works in Section 2. In Section 3, we describe the LSD based signal subspace speech enhancement method.Section 4 presents the experiments and results.Finally, we give the concludes and future work in section 5.

Related work
The goal of principal component analysis (PCA) technique is to determine the most significant basis to re-express a noisy speech set [17].This new basis will filter out the noise and reduce a multidimensional speech to lower dimensions by avoiding redundant data.
Let us consider the problem of the enhancement of a speech signal contaminated by an independent additive noise.Let x(t) and d(t) denote the sampled clean speech and noise signal, respectively.The observed noisy speech signal ) Suppose y(t) was framed with the length N. Arranging the N-dimensional vectors into a (M-l+1)×l Toeplitz structure matrix, we can get .Y X D   (2) Assuming that the rank of matrix Y is r, the optimal enhanced speech matrix X can be estimated according to the following least-square criterion 2 ˆˆm in , rank( ) where symbol F  denotes the Frobenius norm of a matrix and If d(t) is a white Gaussian noise, it satisfies the conditions  is the variance of noise.The optimal solution of ( 4) can be obtained by applying singular value decomposition (SVD) of Y.
Here, U and V are two orthogonal matrices holding the left and right (approximate) singular vectors of given matrix, and  is a diagonal matrix holding the singular values: 1 The above low-rank matrix X represents the original speech matrix X in the sense of least-square minimization.This may get the optimal estimate when the noise is small, independent, and identically distributed Gaussian.
However, PCA is highly sensitive to the presence of large corruptions.Even a single outlier in the data matrix can render the estimation of the low-rank component arbitrarily far from the true model.In [16], a new theory called Robust PCA was developed for this shortcoming.The basic idea of Robust PCA is to decompose the data matrix M as M=L+S, where is a sparse matrix with a sparse number of non-zero coefficients with arbitrarily large magnitude.RPCA can be solved by minimizing the following convex program where *  denotes the matrix nuclear norm, which is defined as the sum of all singular values and is suggested as a convex surrogate to the rank function [18]. 1  denotes the l 1 -norm of a matrix, which is defined as the sum of the absolute values of matrix elements.This problem is known to have a stable solution provided L and S are sufficiently incoherent [19], i. e., the low-rank matrix is not sparse and the sparse matrix is not low-rank.More recently, RPCA theory was introduced into the speech enhancement task in [20], where a constrained low-rank and sparse matrix decomposition (CLSMD) algorithm is designed for noise reduction.

LSD based speech denoising method
In this work, we propose a new subspace decomposition algorithm based on the LSD, which is less sensitive to the large noise interferences.
Firstly, we formulate the speech enhancement problem as the following optimization problem, The above formula can be solved by alternatively solving the following two formulas until convergence Given an estimate of sparse matrix 1 , i S  the minimization in (7-a) over L is to learn a rank-r low-rank matrix from partial observations.This is a fixed-rank approximation problem, we can solve it use bilateral random projections (BRP) based fast low-rank matrix approximation. Where , .
are Gaussian random matrices.The minimization in (7-b) over S is to learn a sparse matrix from partial observations.This can be computed via entry-wise hard thresholding function [21], ( ) 1( ), which keeps the input if it is larger than the threshold; otherwise, it is set to zero.In summary, we have following optimization algorithm for LSD.

Algorithm 1. Optimization algorithm for LSD
Given r, T, ε,  Figure 1 shows the scheme of LSD based speech enhancement method.At first, the noisy speech signal is divided into frames in the time domain.Then we arrange each frame of the noisy speech into a Toeplitz matrix.After we estimated the effective rank r with the analysis-bysynthesis approach [22], the noisy speech matrix Y is decomposed into the low-rank matrix L with the rank r using the LSD algorithm.Since L is not a Toeplitz matrix, we average all the diagonal elements of L to let it became a Toeplitz matrix form.Finally, the enhanced speech is constructed by taking the inverse transform of Toeplitz matrix followed by least-squares overlap-add synthesis [23].

Experimental results
For evaluation of the proposed JLSMD method, we choose a total of 30 sentences (sp01~sp30) taken from NOIZEUS database.Both speech and noise were sampled at 8 kHz 16 bits.Time frame length is 264 sample points with 50% frame overlap.White Gaussian noise was added to clean speech at various levels.We use segSNR and PESQ ((Perceptual Evaluation of Speech Quality) scores for performance measure.four conventional speech enhancement methods: spectral subtraction (SSboll [1]), Subspace SVD based subspace decomposition algorithm (SSVD) [9], Wiener filter based method (Wiener [8]), minimum mean-square error algorithm (MMSE [24]), KLT [12] and CLSMD [20]).
Tables 1 and 2 show the comparison of performance in terms of PESQ and segSNR scores.The larger the PESQ-MOS and segSNR scores are, the better the performances are.We can see the proposed method LSD has got the highest PESQ-MOS and segSNR scores among all the compared methods, except at 0 dB where CLSMD has the highest segSNR score.method is still able to preserve most of the low-energy speech components compared with the seven speech enhancement methods.

Conclusions
In this paper, we presented a LSD based signal subspace speech enhancement method.The proposed method is less sensitive to the large interferences as compared with traditional algorithms, and can significantly reduce noise.Experiments demonstrate that the proposed method is good at improving the overall enhanced speech quality, especially in low SNRs.It should be pointed out that LSD method has improved the original subspace method based on SVD and can wipe out more residual noise.In the future research work we will devote more efforts to improving the noise reduction performancein the colored noise.

Figure 1 .
Figure 1.The scheme of LSD based speech enhancement method

Figure 2 .
Figure 2. Comparison of the spectrograms for speech enhanced by different methods Fig. 2 presents spectrogram comparisons for various speech enhancement methods in the 10 dB SNR.We can see from these enhanced speech spectrograms.Along with the high levels of noise reduction, the proposed LSD based

Table 1 .
PESQ scores in the white noise case at different SNRs

Table 2 .
PESQ scores in the white noise case at different