Anomaly Detection for Hyperspectral Images Based on Anisotropic Spatial-Spectral Total Variation and Sparse Constraint

A novel anomaly detection method for hyperspectral images (HSIs) is proposed based on anisotropic spatial-spectral total variation and sparse constraint. HSIs are assumed to be not only smooth in spectral dimension but also piecewise smooth in spatial dimension. The proposed method adopts the anisotropic spatial-spectral total variation model which combines 2D spatial total variation and 1D spectral variation to explore the spatial-spectral smooth property of HSIs. Meanwhile, the sparse property of anomalies is exploited for its low probability in the image. To utilize both the spatial and spectral information of HSIs, we preserve the original cubic form of HSIs and divide the HSIs into three 3D arrays, each representing the background, the anomaly, and the noise respectively. By using anisotropic spatialspectral total variation regularization on the background component and sparse constraint on the anomaly component, this anomaly detection problem has therefore been formulated as a constraint optimization problem whose solution has been derived by alternately using Split Bregman Method and Go Decomposition (GoDec) Method. Experimental results on hyperspectral datasets illustrate that our proposed method has a better detection performance than state-of-the-art hyperspectral anomaly detection methods.


INTRODUCTION
Hyperspectral sensors can collect a spectral vector with hundreds or thousands of elements from each pixel in a given scene(Bioucas-Dias et al, 2013), and the data product, which is generally termed hyperspectral image (HSI), is a three-dimensional array or "cube" with the width and length of the array corresponding to spatial dimension and the spectrum of each point as the third dimension (Stein et al, 2002).With abundant spatial and spectral information, HSIs possess a strong diagnostic capability to discriminate different materials and a unique advantage in target detection.With available targets spectra, target detection methods for HSIs are generally called spectral matching algorithms (Matteoli et al, 2010).Before applying spectral matching algorithms, atmosphere calibration is needed to convert the raw radiance data collected from the hyperspectral sensors into reflectance or convert the targets spectra into radiance at the sensor.However, these involve many complicated factors in practical applications.Besides, different species of objects may have the same spectrum, and the same object may have different spectra (Matteoli et al. 2014).These restrictions of spectral matching methods have therefore led to the development of anomaly detection (AD).Anomaly detection means detecting objects without any reference to target information (Matteoli et al, 2010).Usually, "anomalies" refer to two aspects: (1) its spectral signatures are different from background and (2) they have a low probability in the whole image (Stein et al, 2002).
In the literature, many AD algorithms have been put forward.Among them, Reed-Xiaoli (RX) (Reed and Yu, 1990) algorithm is considered as the benchmark for HSIs AD.RX assumes that the spectral signatures of the background pixels follow a multivariate Gaussian distribution.However, the background is usually so complex that it does not follow a multivariate Gaussian distribution.And the existence of anomalies and noise often makes the background estimation inaccurate.Therefore, some methods focus on making background closer to a multivariate Gaussian distribution and the background covariance matrix estimation more accurate.BACON (Billor et al, 2000), RSAD (Du and Zhang, 2011), and WRXD (Guo et al, 2014) refine the background estimation by removing anomalies or reducing the weight of anomalies in the background samples.Cluster based anomaly detection (CBAD) (Carlotto, 2005) algorithm holds that a global Gaussian mixture model (GMM) is more appropriate to characterize the complex background.Moreover, some methods try to detect anomalies by suppressing the background.Based on the linear mixture model (LMM) that each pixel is a linear combination of endmembers, an unsupervised orthogonal subspace projection (OSP) (Chang, 2005) method detects the anomalies by projecting the data onto the subspace orthogonal to the background subspace to suppress background components.Usually the background subspace is unknown, and it has to be adaptively estimated from the image itself, which is quite difficult.
In addition to aforementioned methods, recently, lowrank and sparse matrix decomposition-based methods for HSIs AD have gained much attention.This idea comes from that the background has the low-rank property due to the high correlation of spectral information (Zhang et al, 2014) and the anomalies are sparse for its low probability (Cui et al ， 2014).By resizing the hyperspectral cube into 2D matrix, the image can be decomposed into a low-rank matrix plus a sparse matrix, representing the background and the anomaly respectively (Chen and Chang, 2013).Candes presents the robust principal component analysis (RPCA) (Candes et al, 2009) to recover the low-rank matrix and the sparse matrix.Considering the universal noise in the image, Zhou and Tao (Zhou and Tao, 2011) develop the model of RPCA by adding a Gaussian noise term and propose Go Decomposition (GoDec) to recover the low-rank matrix and the sparse matrix from a high-dimensional data matrix corrupted by noise.Applying GoDec model to detect anomalies is generally called low-rank and sparse matrix decomposition-based anomaly detection (LRaSMD).More recently, considering the ubiquitous mixed pixels in HSIs, Liu (Liu et al, 2010) assumes that the low-rank matrix representing the background should lie in multiple subspaces instead of a single subspace proposed in RPCA model.This low-rank representation (LRR) model is consistent with the LMM.Based on this idea, AD methods based on low-rank and sparse representation (LRASR) (Xu et al, 2016) have been proposed.Besides, taking the spatial correlation into consideration, a collaborative representation-based anomaly detection (CRD) (Li and Du, 2015) algorithm has been proposed recently.It is based on the concept that each pixel in the background can be approximately represented by its spatial neighborhoods, whereas anomalies cannot.However, this method fails to work well when anomalies are placed in close proximity.
By rearranging the 3D hyperspectral data into 2D matrix, the above low-rank-based anomaly detection methods destroy the spatial correlation of the pixels in neighboring locations and only exploit spectral correlation of HSIs.Although CRD has taking the spatial correlation into consideration, it ignores the spectral correlation of HSIs.In this paper, we intend to integrate the spectral correlation and spatial correlation to explore the piecewise spatial and spectral smooth property of HSIs (Bioucas-Dias et al, 2013).Total variation (TV) (Rudin et al, 1992) model is an effective tool to impose the spatial constraint (Sun et al, 2017) Wang et al, 2017), which is defined as a combination of 2D spatial TV and 1D spectral variation of the HSIs, has recently been proposed to keep the spatial-spectral smoothness of HSIs.Meanwhile, considering the low probability of anomalies in the image, the anomalies are still assumed to have the sparse property.To utilize both the spatial and spectral information of HSIs, we preserve the original cubic form of HSIs and divide HSIs into three 3D arrays, each representing the background component, the anomaly component and the noise component respectively.Combing with the above analysis, a novel anomaly detection method based on anisotropic spatial-spectral total variation and sparse constraint (ASSTVSC) is proposed with the assumption that the background has the spatial-spectral smooth property, the anomaly is sparse and the noise is independent and identically distributed Gaussian noise.

HSIs Decomposition Model
represent the background component and the anomaly component respectively, where , mn represent the rows and columns of the image and b is the number of spectral bands.Then, the observed HSI cube , can be formulated as:

ASSTV Model
In most natural hyperspectral images, spatial neighboring pixels are usually correlated and neighboring bands also exhibit high spectral correlation.This prior information can be exploited using spatial-spectral TV model in an anisotropic way.The ASSTV can be formulated as:

ASSTVSC-Based Anomaly Detection Model for HSIs
In this paper, we integrate the sparse constraint and ASSTV regularization into a unit framework and propose a novel anomaly detection method, which can be written as: where  is the scalar parameter trading off between the decomposition error and regularization term, kN is the upper bound of the cardinality of A .The cardinality k of A reflects the sparse energy in the image scene, which is usually defined as the 0 -norm l of A (Zhang et al,   2016), and = N m n b  .|| .|| F generally represents the Frobenius norm of matrix and is extended to 3D arrays, which can be formulated as:

PROBLEM OPTIMIZATION
To recover the components of ASSTVSC model, we first convert the 3D array optimization problem into vector optimization problem and then use Split Bregman Method (Goldstein, 2009) and GoDec Method (Zhou and Tao, 2011) to solve the vector optimization problem.Finally, the approach of detecting anomalies with the sparse anomaly component is presented.

Suppose a hyperspectral cube
, and ,, i j l X denotes a pixel at the spatial location ( , )  ij in the th l band.Define () = x V X as the vector representation of any cube and () = X C x as the cube representation of any vector.
() = x V X means using vector . We use small bold letters for vectors, capital bold letters for matrices and capital bold letters with symbol ~ on top for arrays.
() = x V X is represented as follows: This defines a linear mapping V And, the inverse linear mapping C i j l i j m l mn Xx i m j n l b

V and C
are respectively called "vectorization mapping" and "cubic mapping" in this paper, and have:  .According to (5), the ASSTV can be converted from (8) to ( 9):    And the solutions to problem ( 18) and ( 19) are respectively: where  means the nonzero subset of the first kN largest entries of − yx .By alternately updating the background subproblem and the anomaly subproblem, problem ( 14) can be optimized.And the background component x and the anomaly component a are then recovered.

Detection Of Anomalies With The Anomaly Component
By mapping anomaly component a and the background component x into the cube, the anomaly component A and the background component X are recovered: () A is a hyperspectral cube with structured sparsity, and the nonzero entries in each band indicate the position of anomalies.According to the sparsity structure of A , the final detection result S is defined as follows:

HYPERSPECTRAL DATASETS DESCRIPTION
In this section, three real hyperspectral images data are used to validate the effectiveness of our proposed algorithm.The first hyperspectral data, named PaviaC, were collected by the Reflective Optics System Imaging Spectrometer sensor.In this experiment, a small subset is segmented from the original image, with 108 120  pixels and 102 bands after removing low signal-to-noise ratio (SNR) bands, as shown in figure 1(a).Some vehicles on the bridge and bare soil are regarded as anomalies, occupying 43 pixels and accounting for 0.33% of the image scene, as shown in figure 1(b).
The second and the third hyperspectral data come from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over San Diego, CA, USA.The whole image has a 3.5m resolution with 224 spectral channels in wavelengths ranging from 370 to 2510 nm.After removing the bands that correspond to the water absorption regions, low SNR, and bad bands, 189 available bands of the data were retained in our experiments.A region with size of 120 120  pixels in the upper-left corner is selected as the second hyperspectral data and is called upper-left San Diego data, thereafter, as shown in figure 2(a).Three planes in the image are selected as anomalies, which consist of 90 pixels and account for 0.625% of the image, as shown in figure .2(b).And the third hyperspectral data lie in the central part of the whole image and is named central San Diego data, thereafter, with a size of 100 100  pixels, as shown in figure 3(a).Three planes which consist of 114 pixels and account for 1.15% of the image are regarded as anomalies, as shown in figure 3(b).

Parameter Analysis and Determination
For ASSTVSC, there are four parameters need to be set: the stopping criteria 12  ， , the sparsity parameter k and the regularization parameter  .In all experiments, -2 -2 12 10 10 . What we need to discuss is the sparsity parameter k and the regularization parameter  .
One thing needs to be clarified in advance, that is, all the raw hyperspectral data are normalized before experiments to make the selected parameters more universal.
Referring to the most recent related publications (Zhang et al, 2016; Cui et al, 2014; Sun et al, 2014), the parameter k is set according to the ratio of the anomalies in the image scene (RAI).To test whether this prior is suitable for our algorithm, the performance of ASSTVSC with respect to the sparsity parameter k are investigated on the aforementioned three hyperspectral data.Interestingly to find that, when k is two and a half times RAI, the detection performance reaches the best for all the data.Specifically, for PaviaC whose backgrounds are relatively simple, the detection performance achieves the best when k is smaller than two and a half times RAI and stays unchanged when k increases.For two San Diego data whose background are much complex, the detection performance reaches the best only when k reaches two and a half times RAI and drops a little when k increases.A closer analysis may account for this phenomenon.For relatively complex images, there may exist more than one kind of anomalies and a larger k will assure the detection of interested anomalies.For relatively simple data, such as PaviaC, only one kind of anomalies exist, so a smaller k can guarantee a good performance.It reveals that the parameter k may probably not only be related with RAI but also the complexity of image.
The regularization parameter  can be set in an empirical method.Experiments show parameter  is relatively robust.For simplicity, in our experiments, we empirically set 0.5  = , achieving satisfactory results in all cases.

Detection Performance
The detection performance of ASSTVSC-based anomaly detection method is evaluated and compared with three state-of-the-art detectors: Global RX (GRX) (Reed and Yu, 1990), LRASR (Xu et al, 2016), and LSMAD (Zhang et al, 2016).Based on the aforementioned experiments, in the following comparative experiments, we set 2.5RAI, 0.5 k  == for our proposed algorithm.For LRASR, there are four parameters need to be set.We adopt the fixed parameters mentioned in (Xu et al, 2016) for the following experiments.For LSMAD, two parameters, respectively the rank r of the low-rank matrix and the cardinality k of the sparse matrix, have to be set in an empirically way.Through abundant experiments, we determine the best parameters for each hyperspectral data, as follows:  For PaviaC data, as shown in figure 4 and table 1, ASSTVSC and LSMAD demonstrate a much better performance than the other two detection algorithms, achieving a high detection rate with a low false alarm rate.From figure 5, we find the differences between ASSTVSC and LSMAD are subtle and both can suppress most of the background effectively.By a further observation on figure 4, we find that the GRX outperforms LRASR because LRASR fails to suppress the background, resulting in many false alarms.For the upper-left San Diego data, the ROC curves and AUC values are shown in figure 6 and table 1, respectively.It reveals that ASSTVSC performs much better than the other three algorithms.The 2D plots of the obtained detection results are shown in figure 7. The planes are pretty obvious using ASSTVSC accompany with a few kinds of uninterested anomalies, whereas the planes detected using LSMAD and GRX are not obvious with naked eyes.Although the planes detected are evident using LRASR, its inability to suppress the background incurs too many false alarms.As we can see from ROC curves and AUC values in figure 8 and table 1, ASSTVSC still performs the best among the four algorithms for the central San Diego data.From figure 9, we can see that although both ASSTVSC and LSMAD can suppress the background, ASSTVSC can distinguish all the anomalies from the background whereas LSMAD has difficulty in detecting the wings of the planes, thus affecting the performance.LRASR and RX are much better than LSMAD since they can detect all the anomalies.
Based on the aforementioned experimental results, we find that ASSTVSC performs much better than GRX and LRASR for all the data, since ASSTVSC can suppress the background effectively and remove the noise from the anomalies.This can be explained as follows: the proposed ASSTVSC method has divided the HSI data into three parts: the background, the anomaly and the noise.This guarantees that the detected anomalies do not be bothered by the background and the noise.Both LSMAD and ASSTVSC can achieve satisfactory detection performance on PaviaC data and have nearly no distinct differences as they can suppress the background and separate the noise from the anomalies.But for relatively complex scenes, LSMAD sometimes has difficulty in detecting all the anomalies.However, our proposed method does not be bothered by this problem and it can achieve good performance regardless of the complexity of images.The reasons may be as follows: the ASSTV model in our algorithm not only explores the spectral correlation but also the spatial correlation whereas the low-rank property of background used in LSMAD only considers the spectral correlation.So some marginal pixels of the anomalies, such as the wing of planes in central San Diego data, which cannot be detected using LSMAD, can easily be detected using ASSTVSC.

CONCLUSIONS
In this paper, ASSTVSC has been proposed from a novel perspective to employ total variation model for hyperspectral anomaly detection.Different from the low-rank-based models which only explore the spectral correlation of HSIs, we use ASSTV model to exploit the spatial correlation and spectral correlation of HSIs.To the best of our knowledge, it is the first time that the ASSTV model is adopted for HSIs anomaly detection.Besides, the sparse property of the anomaly is utilized and experiments reveal that the sparsity parameter k is not only related with the ratio of the anomalies in image but also the complexity of image.Experiments on hyperspectral datasets indicate that ASSTVSC has a superior performance in suppressing background and detecting anomalies than state-of-the-art anomaly detection methods.
But, there are some aspects that need a further research.In our proposed algorithm, parameter k needs to be determined as a prior.Although we have pointed that k is related with RAI and 2.5RAI k = can achieve satisfactory results for all data, it is actually hard to acquire RAI for different data as we usually don't know the size of anomalies.So the selection of parameter k becomes a limitation in reality application.Besides, the runtime of ASSTVSC needs to be optimized, which will be the focus of our future work.

D
denote the horizontal and vertical difference operators along the spatial dimension, b D is a linear difference operator along the spectral dimension, and ,, and    are the weights that control the proportion to horizontal, vertical and spectral correlation.And in this paper 1 ,

I
is an identity mapping in mnb R and m n b R  I is an identity mapping in m n b R  to represent(9) in a matrix linear transform form becomes the key to vectorize problem(3).Suppose a first-order difference operator  L in L indicates the order of the matrix.With the first-order difference operator  optimization of problem (16) can be finished by alternately solving the following two subproblems: for its high dimension.Actually, due to T + E D D is a sparse matrix, we use Conjugate Gradients Squared (CGS)(Sonneveld, 1989) algorithm to solve the following sparse linear equation to obtain 1 w+ x and d , the subproblem for updating the background component x can be solved.The subproblem for updating the anomaly component a is formulated as: of GoDec Method to update anomaly component a , which can be solved via entry-wise hard thresholding of − yx(Zhou and Tao, 2011):
for central San Diego data.The experimental results for all the detectors are provided through receiver operating characteristic (ROC) curves, AUC values and 2D plots of the detection statistic results.

Figure 4 :Figure 5 :
Figure 4: ROC curves obtained by different methods on PaviaC data.

Figure 6 :Figure 7 :
Figure 6: ROC curves obtained by different methods on upperleft San Diego data.

Figure 8 :Figure 9 :
Figure 8: ROC curves obtained by different methods on central San Diego data.
the equivalent formulation of (3) becomes as follows:

Table 1 :
AUC values by different methods for three hyperspectral data.