Research on Convolutional Neural Network Model for Sonar IMAGE Segmentation

. The speckle noise of sonar images affects the human interpretation and automatic recognition of images seriously. It is important and difficult to realize the precision segmentation of sonar image with speckle noise in the field of image processing. Full convolution neural network (FCN) has the advantage of accepting arbitrary size image and preserving spatial information of original input image. In this paper, the image features are obtained by autonomic learning of convolutional neural network, the original learning rules based on the mean square error loss function is improved. Taking the pixel as the processing unit, the segmentation method based on FCN model with relative loss function(FCN-RLF) for small submarine sonar image is proposed, sonar image pixel-level segmentation is achievied. Experimental results show that the improved algorithm can improve the segmentation accuracy and keep the edge and detail of sonar image better. The proposed model has better ability to reject sonar image speckle noise.


Introduction
Due to the presence of seabed reverberation and various environmental noises, the sonar image has severe speckle noises which will affect sonar image segmentation accuracy. More serious, the human interpretation and automatic recognition of small target sonar images will also be affected. In this paper, a kind of submarine small target sonar image segmentation algorithm is proposed to extract image features rapidly and achieve target segmentation precisely.
Due to the low resolution and severe speckle noise of sonar images, it is difficult to achieve accurate segmentation by ordinary image segmentation methods [1]. At present, there are many sonar image segmentation methods, such as clustering method [2], edge detection method [3], Markov Random Filed method [4] and combining with specific mathematical theory methods [5,6]. However, these methods have poor anti-speckle noise capabilities and cannot achieve automatic segmentation. With the advent of convolutional neural networks(CNN), many researchers apply CNN to image recognition and segmentation. CNN are not only improving the whole-image classification [7,8], but also making process on object detection [9,10], part and keypoint prediction [11,12]. In 1998, Lecun et al. [13] proposed the LeNet-5 network structure, the convolutional neural network was applied to handwritten character recognition firstly. In 2012, Krizhevsky et al. [14] designed AlexNet to further deepen the network structure and improve the accuracy of image recognition. Based on the classical network model, new convolutional neural network models have been proposed successively, such as VGG [15] of Oxford University, GoogLeNet of GoogLe [16], ResNet of Microsoft [17] and so on. Paper [18] apply CNN to Medical Ultrasound Images Automatic Segmentation. However, using CNN for image segmentation, the original image needs to be divided into a large number of image blocks as the input of the network, resulting in low computational efficiency, and the network cannot utilize the connections between contexts in the image fully. To cope with this problem, a fully convolutional neural network was proposed for image semantic segmentation, which can accept arbitrary size image input while preserve the spatial information of the original input image [19]. All the above studies applied CNN and FCN to the image segmentation process greatly and achieved Satisfactory segmentation result, which played a good reference role to sonar image segmentation process.
In order to solve the low accuracy problem of sonar image segmentation, this paper applies FCN to sonar image segmentation process. By improving the loss function of FCN back propagation process, the improved network can take the target segmentation and detail preservation into account much better while improving the segmentation accuracy.
The submarine small target sonar image contains three areas: the target bright area, the dark shadow area and the seabed reverberation area, see Fig. 1. The shape of the shadow region also contains the target features and needs to be segmented separately. Sonar images often have severe speckle noise, poor image contrast and low resolution. In particular, the target area small target sonar image has less pixels and the edges is serious deterioration. Thus, an effective segmentation algorithm is necessary to improve the accuracy of segmentation.

Submarine small target sonar image data set
It is difficult to obtain a large amount of comprehensive sonar image data. Therefore, sonar image data set for training and testing need to be manually established on the basis of limited sonar images. Through the analysis of a large number of sonar images, the noise statistical laws obey the Rayleigh distribution. Thus, speckle noise with different intensities and spot sizes are added on the basis of limited sonar images to expand the sonar image training data set. Rayleigh noise can be achieved by average noise, see equation (1): The values of a and b represent noise intensity and spot size respectively. Based on the original 200 sonar images, three kinds of speckle noise with different intensities were added to obtain 1800 images. The establishment process are shown in Fig. 2 and Fig. 3 .
The partial sonar images data set results are selected in Fig. 4 and Fig. 5.

Model building
The FCN structure is shown in Fig. 6. The convolution process uses a 16-layer VGGNet network. The improved network can accept arbitrary size input image. The FCN deconvolution process needs to mirror the 16-layer VGGNet and restore the extracted feature map to the size and position of the original image. Both learning and inference are performed whole-image-at-a-time by dense feedforward computation and backpropagation.

FCN parameter setting and feature extraction
The FCN used in this paper has 8 convolution sections. Each section of the first 5 sections is connected with a maximum pooling layer, to reduce the image size, the following 3 sections are still use convolution method for feature extraction. The specific parameters settings of convolution kernel size, step size and number are shown in Table 1.
Taking submarine small target sonar image of 224 * 224 as an example shown in Fig. 1, after the first layer convolution operation, 64 sonar image feature maps with the size of 222 * 222 are output which is also used as the input for the second layer of convolution shown in Fig. 7. After the second layer of convolution operation, 128 sonar image feature maps with the size of 220 * 220 are output. After being processed by the first pooling layer, 128 feature maps with the size of 110 * 110 are generated correspondingly, and the image is reduced dimensionally , see Fig. 8. The 10th convolutional output 512 features map with the size of 18 * 18, see Fig. 9. After the 16th convolution layer, the deconvolution sampling operation is performed to obtain 1000 original size images. Through the pixelby-pixel operation, the maximum number of features in the deconvolution feature map is obtained. Further, predict the classification of each pixel, output image segmentation results finally. 3 Sonar image segmentation of FCN model

Loss function and its improvement
The convolutional neural network uses backpropagation chain derivation in the training process [20]. The weights are updated by minimizing the loss function to obtain the optimal weight parameters of the network. The optimal weight parameters are obtained from the mean square error loss function as shown in equation (2).
where E(i) is the training error of a single sample, d(i) is the expected output, and y(i) is the predicted output of the network.
In convolution neural network learning process, the mean square error (MSE) loss function backpropagation algorithm has the shortcomings of small gradient change, slow update of weights and does not consider the nontarget area pixels, resulting in larger pixel segmentation errors. In order to solve this problem, the paper presents a relative loss function algorithm, which target region and non-target region pixels are both taken into account. Because there is no obviously difference between the reverberation area and the target area, the algorithm calculates the probability of each pixel that belongs to the target area, so that the algorithm can speed up the convergence of the network and improve the segmentation accuracy. The relative loss function is as follows: where Y i represents the target area pixels set of samples i ;

Relative loss function algorithm flow
Convolution kernel weights W and bias b are mainly trained in FCN learning process , the operation of update and iteration according to the following steps:

1) Forward propagation process
Hidden layer input: 2) The establishment of a single sample i loss function 3) Reverse derivation process:  Figure 10. Image segmentation process.

5) Cycle Iteration:
Repeat steps 1) to 4) until the loss function E(W,b) is within the tolerance of the error, and then the optimal weight parameter are obtained. The FCN network segmentation process is shown in Fig. 10.

Results and analysis
Submarine small target sonar images data set which contains 1800 training images and 25 test images are used in the experiment. Two groups comparative experiments are carried out, the first experiment compared FCN-MSE and FCN-RLF methods; the second experiment adds two classic image segmentation methods Otsu and FCM for comparison.

FCN network training results
In FCN network, the image misclassification rate decreases with the increase of the training iterations number during the training process, and tends to be stable finally .
The training results of the two methods are shown in Fig. 11 and Fig. 12. In Fig. 11, the lowest training and testing error rate are 0.140 and 0.160 respectively. After algorithm improved, the lowest training and testing error rate are 0.086 and 0.122 respectively, which are lower than FCN-MSE method.

The first group experiment
5 pieces of 25 test images are chosen to analyze the segmentation results, the sonar image segmentation results for the first group experiment are shown in Fig. 13.  As can be seen from the segmentation results: The proposed method can segment the target outline and preserve the target details much better. Although the proposed method also misclassifies the reverberation area due to the existence of speckle noises, the improvement is obvious.  Table 2-5 are the evaluation index values of the twenty five  images, and Table 3 is the mean value of the experimental data results.

The second group experiment
In the second experiment, Otsu and FCM methods are added for comparison. The segmentation results are shown in Fig. 17. The precision and recall values are shown in Table 6,7.   It can be seen that the recall of the segmentation methods is generally higher, Indicates that the methods used are all suitable for sonar images segmentation, however, the Otsu and FCM segmentation methods are affected by speckle noise seriously. The Ostu segmentation precision rate is up to 83.65%, but it is also lower than the FCN-MSE and FCN-RLF segmentation methods where the convolutional neural network is added, the precision of the two method can reach 99.11% and 99.7% respectively. The improved method has a strong robustness to the sonar image speckle noise, and can improve the accuracy of sonar image segmentation effectively.

Conclusions
A submarine sonar image segmentation method based on FCN model with relative loss function is proposed in this paper. The experimental results show that the improved method has better robust to speckle noise than the traditional sonar image segmentation methods. Network training error reduced 31.14% compared with FCN-MSE method. The edges and details of the target image can be better preserved. The averaged segmentation accuracy and precision reached 91.33% and 96.4% respectively, which meets the sonar image processing accuracy requirements. The proposed method plays an important role in the special task of marine target recognition.
Since the network used in this paper belongs to the deep network, the introduction of deconvolution makes the network structure more complicated. Therefore, improving the segmentation accuracy and simplifying the network structure will be the major research direction in the future.