Non-local mean filtering algorithm based on deep learning

Aimed at the problem that the traditional image denoising algorithm is not effective in noise reduction, a new image denoising method is proposed. The method combines deep learning and non-local mean filtering algorithms to denoise the noisy image to obtain better noise reduction effect. By comparing with Gaussian filtering algorithm, median filtering algorithm, bilateral filtering algorithm and early nonlocal mean filtering algorithm, the noise reduction effect of the new algorithm is better than the traditional method and the peak signal to noise ratio is compared with the early non-local mean algorithm. The performance is better.


Introduction
Non-local mean (NLM) filtering algorithm [1], the basic idea is that the ultrasound image contains a lot of redundant information, as there are many similar image blocks but these similar image blocks are distributed over the whole image. Their positions may be quite different but the grayscale information is similar. The NLM filtering method fully exploits the similarity of the images based on the existence of the above features. Firstly the similarity between these similar blocks and the image block where the current noise is located is calculated, and secondly the weighted average is used to recover the value of the pixel to be restored. Non-local mean filtering is one of the better noise reduction algorithms in recent years. Many researchers have improved it. Although the improved filtering algorithm is better than the previous filtering effect, there is no big change in the improved thinking. Many papers are improved on the attenuation parameter h that controls the degree of filtering. This paper will jump out the idea of improving the weighted average, exponential function and parameter h and use the pixel value and similarity value of all adjacent points of the noisy image as the input of the deep learning network. The original image is trained as an output and the trained model can be directly used for filtering [2].

Traditional nlm filtering algorithm
The principle of the NLM algorithm is as follows. As shown in Figure 1, the figure contains three pixel points P , 1 Y and 2 Y their respective field. It can be seen that the pixel points P and 1 Y have similar field structures while the field of P and 2 Y are similar. The property is very small then 2 ω P Y hen filtering the P point pixel. The final recovery result of the pixel point P can be obtained by searching all the pixels in the entire image having similar field with the pixel P and then weighting the similarity of the obtained similarities [3].
Where I is the entire image space, the weight coefficient ( ) , ω i j is the degree of influence of pixel j on pixel i , as shown below

Denoising frame
The idea of the NLM filtering algorithm is to fully exploit the information of similar image blocks in the image, use the image block to express the characteristics of the pixel points, estimate the similarity between the pixels by the similarity between similar blocks and then recover the value of the pixel to-be recovered by weighted average. Because the neural network and deep learning models have strong expressive power, they are very suitable for learning the function of picture block to picture block. First, a clean picture block y is randomly selected from the picture data set and then a corresponding noise picture block x is generated by artificially adding Gaussian white noise noise. Then we use the vectorized noise picture block x as the input of the neural network and use the corresponding vectorized clean picture block as the output of the neural network, update the parameters of the neural network by means of backward propagation and then iteratively gradually learn the required model. x represents the observed noise picture, y represents the original clean and noiseless picture then the process of noise pollution can be described as In the formula : × × → m n m n R R is a process of randomly adding noise, m and n are the length and width of the digital image respectively. Then the task of noise reduction is to seek a function f to satisfy In the formula, f is a function of R and it is difficult to learn f directly for machine tens of thousands of pixels. So in practice we need to split a picture into several smaller picture blocks and learn the mapping g from the picture block to the picture block. All the picture blocks are denoised by such a mapping g , respectively. All of these picture blocks are reaggregated in some way to form a picture after denoising, as shown below In this way, we reduce the size of the problem and indirectly learn the image-to-image mapping f by learning the mapping g of image blocks to image blocks, making machine learning easier to learn f .The framework for denoising based on deep learning is as follows

Image block splitting and aggregation
The most conservative method of splitting in this paper is to take out all the image blocks and then proceed to the subsequent process. Usually, the method of moving the window is used and the window moves in the picture at a specific step size and the image block in the window is taken out each time. The size of the step size is determined according to the size of the selected noise image block. We estimate the number of estimates for each target pixel point based on the side length S of the noise picture in Figure 2 and the side length L of the model denoised picture. In this paper, a 512 × 512 image is used. The side length of the image block before and after denoising is taken as 8 steps of 2 [4].
For the process of re-aggregation, that is a pixel on the noise picture appears in multiple noise image blocks and denoised and how to obtain the final estimated value after multiple evaluations is obtained. In this paper, the averaging process is used to solve this problem. The average method usually selected in the process of reaggregation is to directly calculate the arithmetic average, calculate the weighted average and calculate the Gaussian average. This paper uses the method of calculating the weighted average to calculate the final estimate. The reason for choosing to use it for calculation is that although one pixel on the noise picture is selected in many image blocks, the position appearing in each noise image block is different. According to Figure 2, the expected mean square error of the pixels appearing in the middle portion of the noise image block is smaller. Therefore, this paper adopts the method of weighted averaging, firstly the mean square error of each position of the image block is calculated and the weight of the estimated value is inversely proportional to the mean square error of the position in the image block.

Introduction to neural networks
Neurons are the basic building blocks of neural networks. An example of a neuron with four input parameters is shown in Figure 4. This neuron takes four input parameters 1 x , 2 x , 3 x , 4 x and intercept +1 as input values, and the output is Where w and b are the parameters of the neuron corresponding to the weight of each input : → f R R is called the activation function. In this paper, the Rectified Linear Unit (ReLU) is used as the activation function. The neural network is to combine multiple neurons together. The input of some neurons is the output of other neurons to form a network, as shown in Figure 5. Here a circle is used to indicate the input of the neural network and a circle labeled "+1" is called a bias node. The leftmost layer of the network is called the input layer, the rightmost layer is called the output layer and the middle layer is called the hidden layer. In the figure 5, there are three input units (not counting units), two hidden layers and two output units. We use the back propagation algorithm to update the weight parameters of the adjusted neural network.

Activation function
The ReLU [5] function formula is The ReLU function is a piecewise linear function. All negative values become zero and the positive value is unchanged. This operation is called unilateral suppression. Unilateral inhibition results in sparse activation of neurons in the neural network. Especially in the deep neural network model when the model increases the N layer, the activation rate of ReLU neurons will be reduced by 2 N times.

Parameter setting and training method a) Loss function
The experiments in this paper are based on the keras framework and the underlying backend of keras relies on TensorFlow. The loss function is one of the two parameters required to compile the keras model. In this paper, the mean square logarithm error is used as the loss function [6].
Where n is the observed value of the entire data set, i p is the predicted value and i a is the true value. b) Optimizer The main advantage of Adam that after the offset correction, each iteration learning rate has a certain range which makes the parameters relatively stable [7]. The formula is as follows ( ) Where t m and t v are first-order moment estimates and second-order moment estimates for the gradient, respectively which can be considered as approximations According to the advice of the Adam algorithm proposer, the default value of 1 β is 0.9, the default value of 2 β is 0.999 and the default value of  is 8 10 − . c) Learning rate A large value of the learning rate may make the neural network model learn faster and may also cause the neural network to diverge. After the experiment, the learning rate is set to 0.1. d) Training method This paper uses the gradient descent algorithm model to train in small batches. Small batch processing. Small batch processing is between the batch algorithm and the random gradient drop. The small batch processing algorithm selects fixed n sample points from the training data set each time calculates the total error of the small batch of samples for the derivative of the weight matrix and then performs weight update, and each time after sweeping all sample points is called a cycle. The stochastic gradient descent algorithm can be regarded as a small batch processing algorithm with 1 = n and the batch processing algorithm can be regarded as a small batch processing algorithm with = n N ( N is the number of training data). Compared with batch processing algorithms small batch processing has the advantages of fast update weight, online learning, less resources required and randomness to leave undesired local extreme points. Compared with random gradient reduction small batch processing. The error curve is more stable during training and is less affected by noise in the data. Considering that the current equipment has done a lot of optimizations on matrix operations, the speed of updating one weight of one sample point at a time and the weighting of one sample point at a time may be similar. Therefore, the training method of small batch processing has also been widely used in recent years. Based on the above considerations, in the work of this paper, the method of small batch processing is adopted and the parameter value is set to 128.

Evaluation index
The effect of image noise reduction can be evaluated by commonly used objective evaluation indicators. In this paper, Peak Signal to Noise Ratio (PSNR) is chosen to evaluate the performance index of the algorithm [8].

( ) ( )
Where ( ) , f i j is the original image; ( ) , g i j is the denoised image; M and N are the row and column, respectively. The larger the value of PSNR the better the filtering effect that is the smaller the distortion of the filtered image.

Experiment analysis
The size of the search area and the similar window in the NLM filtering algorithm has a great influence on the denoising effect. This paper reviews the relevant literature and tests it to determine that the selected search area is 8×8 and the similar window size is 4×4. This article uses 200 flowers as a dataset with a minimum image resolution of 1024 × 768 and a maximum image resolution of 1600 × 1200. The reason for selecting a flower picture set is that neural network based image denoising belongs to a denoising algorithm that learns the statistical characteristics of images. Selecting a specific type of image set can reduce the number of patterns that the network model needs to learn to a certain extent so that we can get better denoising effect for a specific type of image with a smaller network. If you need to get a more versatile model increase the type and size of the training data set and expand the network size. We used 180 of them for training data and 20 of them for testing.
For each picture in the training set, we extract 2000 image blocks so a total of 360000 image blocks are used as training samples. For the images in the test data set, we also select 2000 image blocks for each image, for a total of 40000 image blocks for verification set. The number of cycles for training iterations is set to 500. Experience has shown that this iteration number allows the network to converge to a nice local minimum for this simple structure. In this paper, we choose sigma = 20, sigma = 40, sigma = 60 to compare our model with Gaussian filtering algorithm, median filtering algorithm, bilateral filtering and traditional non-local mean filtering algorithm. The PSNR values of some test images of the three noise environments are listed in Table 1, Table 2 and Table 3. Note: The data corresponding to the bold font is the best value for each indicator. Note: The data corresponding to the bold font is the best value for each indicator.
The NLM filtering algorithm exhibits a good denoising effect when sigma=20. In the test picture, the algorithm of this paper is better than the NLM filtering algorithm in only 4 of them. The algorithm of this paper is superior to the NLM filtering algorithm among the 13 images in sigma=40. At sigma=60, only one image in this paper is inferior to the NLM filtering algorithm. It can be seen that with the increase of noise intensity, the denoising effect of the proposed algorithm which is a good choice for denoising in high noise environment is better. For low-noise situations a larger training set is needed to train the network model for better results. For example, the larger multi-layer perceptron network trained in [9] also has good performance in low noise.

Conclusion
This paper analyzes the image denoising problem from the perspective of function and introduces the denoising idea based on image block denoising from the macro level. This article describes the framework for denoising using deep learning and discusses some of the parameter settings to be aware of when using this algorithm. Unfortunately there is not enough image smoothing. Although the peak signal-to-noise ratio of the image is very high after denoising, there is still some room for improvement in the processing of the background. This paper only verifies that the non-local mean algorithm combined with deep learning is effective and has better denoising effect than the traditional non-local mean algorithm. However, the training data set used in this experiment is limited if you want better denoising. The effect also requires more training data to train the network model.