An efficient fuzzy optimization algorithm based on convolutional neural network

. The paper proposes a method based on dense-sparse-dense optimization algorithm. It uses sparsity to tune network weights. By adding fuzzy membership, the optimization strategy can enhance the feature information with larger weights and weaken the feature information with less weight. Through accurate cutting of network weights, parameters in network are effectively reduced. The experimental results show that the performance of this method is better than the existing method.


Introduction
Deep learning is a research field of machine learning, which forms a more abstract high-level representation feature by combining low-level features to discover the characteristics of data [1]. it evolved from multi-layer unsupervised pre-training initialization to convolutional neural networks (CNN). Lenet [2]is the first popular CNN model which is a major breakthrough in deep learning, and consists of two convolutional layers, two pooled layers, and one fully connected layer. In 2012, AlexNet [3] , which successfully uses ReLU as the activation function and uses dropout to decrease overfitting, won the Imagenet challenge. The subsequent networks of GoogleNet [4] and VGG [5] have better network performance which are all based on Alexnet. They tried various methods such as the size of different convolution kernels, the depth of the convolutional network, and the processing of image input. VGG is composed of 3*3 convolutional kernel. It proves that the increase of CNN depth could improve the accuracy of the image classification. The training numbers of GoogleNet structured with Inception is not greatly increasing the amount of computation, but the accuracy of image classification in the database of ImageNet is higher than that of AlexNet. The ResNet [6] launched in 2015 solves the problem of the disappearance of gradients. And It maps feature maps from low level to high level network directly by short connection. Saining Xie [7] present a highly modularized network architecture that is constructed by repeating a building block that aggregates a set of transformations with the same topology. Springenberg et al [8] questioned the down-sampling layer in CNN, and they design a full CNN. From current research circumstance, it has been exploring in two major fields: the depth of CNN and optimizing structure. The appearance of spatial pyramid pooling [9] shows very good result. Its core is to extract features in convolution layers after different pooling size on the featured image, then aggregate to featured vector. XNOR-Net [10] and BinaryNet have also made very interesting advances in smart devices using binarization methods. Other networks were also proposed [11][12][13][14]. However, with the increasing network layers, the number of parameters has increased significantly, which has led to an increasing difficulty of training. In 2017, Song Han [15] proposed a dense-sparse-dense(DSD) algorithm, focusing on how to improve the accuracy of the traditional model by selective network weights. Later, Ma et al [16] proposed the sparse to dense algorithm to attain a higher level of robustness and accuracy by introducing additional sparse depth samples. Naveed Akhtar et al [17] proposed a dense collaborative representation with a sparse representation for classification.
Based on above methods, we proposed a Fuzzy-DSD algorithm. This method uses the degree of membership to measure network weights. The train is divided into three phrases: dense-fuzzy sparse-dense. By adding fuzzy phase, the method can enhance accurate cutting of network weights. The experiment shows that the proposed method has better performance than the existing method.

The DSD algorithm
The DSD algorithm consists of three steps ( Figure 1). Dense, it trains networks to get connection weights. The dots represent Neurons and the connection lines represent the connections between the Neurons. Sparse, it regularizes the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the phase, the unimportant connections and weights in the network are pruned into small and medium networks. Dense, it removes the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. In the phase, it is to increase the model capacity Mask denotes sparseness(Equation (1)). When the weight is less than λ, Mask is set to zero. And the weights are remained. When Mask is 1, the weights are removed. it is expressed by Equation (2).

The proposed method
Aim to inaccuracy of removing unimportant network weights, we add the membership to DSD. Figure 2 is our proposed method which consists of three parts: Dense, Fuzzy phase, and Dense. The first phase is trained to obtain the initial network weights. The second phase, the membership is used to judge the importance of network weights.  The third stage is to retrain the weights in the network. The structure changes the capacity of the network by pruning and filling the network weights. After the weights are updated by equation (3), they are multiplied by membership η (Equation(4)). And it uses membership to quantify the absolute value of weights to make the network more accurate.
α is the learning rate, x is the input. The Workflow of Fuzzy-DSD training is as following.

End
Final Dense Phase While not converged do go to Sparse Phase for iterative Fuzzy-DSD;

Experiments
In this section, experiments on MNIST and CIFAR-10 are conducted to demonstrate the effectiveness of the proposed method. We also compare the proposed method with DSD on different CNN.

MNIST data set
The MNIST is a classic handwritten digital data set, which has 60,000 training sets and 1000 test sets consist of 28*28 pixel gray image. From Table1, with the increasing iterations, the accuracy of the network models is improved. For LeNet-5 and Alexnet, we add DSD and Fuzzy-DSD. When the iteration is 40, the accuracy of LeNet-5 is 98.43. LeNet-5 with DSD is 98.7. LeNet-5 with Fuzzy-DSD is 99.8. The accuracy of Alexnet is 92.96. Alexnet with DSD is 93.48. Alexnet with Fuzzy-DSD is 94.36. The accuracy of the Fuzzy-DSD is higher than that of others. 1. CIFAR-10 data set experiment results. The CIFAR-10 data set has 60,000 color images which are 32*32 pixels and divided into 10 categories. There are 50,000 for training and 10,000 for testing. For LeNet-5 and Alexnet, we also add DSD and Fuzzy-DSD. From table 2, it shows that the network with the Fuzzy-DSD is better than others. When the iteration is 40, the accuracy of LeNet-5 is 49.7. LeNet-5 with DSD is 51.29. LeNet-5 with Fuzzy-DSD is 54.03. The accuracy of Alexnet is 76.27. Alexnet with DSD is 76.89. Alexnet with Fuzzy-DSD is 82.59. AlexNet. For AlexNet, It may be related to the excessive training parameters. When the training samples and times are less, the network cannot be fully trained.

Conclusion
Based on DSD algorithm, an optimization method is proposed by using membership of the weights. The Fuzzy-DSD algorithm can not only preserve the image feature better, but also suppress the unimportant network weights. The experimental results also show that the performance of fuzzy DSD algorithm is better than others.