Research on the tea bud recognition based on improved k-means algorithm

. The identification and extraction of tea buds is the key technology for the development of automated tea picking robots. Machine vision technology is an effective tool for tea bud recognition. In this paper, the tea tree leaves in the tea garden picking period are taken as research objects, and the research experiments are carried out from the aspects of tea image collection, image enhancement, image segmentation, edge detection, binarization and foreground extraction. After continuous exploration and research, the HSI color model is finally selected. After the S factor was used to grayscale the tea image, the improved K-means algorithm was used to identify and separate the tea shoots. The experimental results show that the improved K-means algorithm has a good effect on the segmentation of young leaves in tea images. This study can provide reference and reference for tea bud recognition algorithm.


Introduction
China is a big tea producing country, and the labor force used for tea picking accounts for more than one-second of the total labor force in the whole process of tea production. Although there are a large number of tea picking equipments in the market, most of these equipments are based on traditional mechanical mechanisms. When picking tea leaves, there is no selectivity for tea leaves, and the integrity of tea buds cannot be guaranteed, and the picking standards of famous teas cannot be achieved. The picking of famous tea has strong timeliness, that is, the famous tea should be picked at a specific time period. With the increasing proportion of industrial economy in the gross national product, the labor force in the tea industry is in short supply. During the tea picking period, improving the efficiency of picking famous tea can bring huge economic benefits. Therefore, the research of intelligent automatic tea picking robots is particularly important. The application of intelligent automatic tea picking robot to tea picking can not only reduce the cost of tea production, reduce the damage rate of tea picking, but also improve the efficiency of tea picking and solve the problem of shortage of famous and excellent tea picking labor.
Tea bud recognition is the key technology for the development of intelligent picking robots. Different from other fruit and vegetable picking robots, the technical difficulty in the identification of tea buds is that the contrast between tea buds and old leaves is not obvious and there is no clear dividing line. [1,2,3] In addition, the volume of tea buds is relatively small and is greatly affected by natural light conditions. It has brought certain difficulties to the identification of tea buds. In this paper, the original image of tea leaves is processed by image enhancement technology (here using histogram equalization), and the image features such as edge information, contour information and contrast of tea images are highlighted. The enhanced original image of the tea leaves is converted from the RGB color model to a suitable color model to select a color component that is advantageous for tea bud recognition. Finally, the improved K-means clustering algorithm is used to segment the image of tea leaves. It provides technical support for the research and development of intelligent picking robot.

Overall design of the experimental program 2.1Robot picking principle
The work efficiency of tea picking robot mainly depends on the speed of the manipulator and the end picking actuator, and the speed of the manipulator and the end picking actuator is largely limited by the speed of tea recognition and positioning. Tea picking robot identifies and locates the tender buds of tea by video camera. The tea picking robot is shown in Figure 1.

Experimental materials and equipment
The selected tea samples were taken under natural light. The software base was Microsoft Visual Studio Professional Edition 2015, OpenCV Edition 3.3.0, image algorithm programming language was C++, computer configuration was Intel (R) Core (TM) i5-4200U CPU@1.60Hz 1.60Hz, running memory was 8GB, hard disk was 500GB.  In order to be able to distinguish and identify tea buds in complex lighting conditions and tea background, a suitable image recognition model must be established. Based on the improved K-means algorithm, the tea bud recognition model is shown in Figure 2.

Image enhancement technology
Because the intensity of natural light is not stable and the direction of natural light is uncertain, noise can not be avoided in the process of tea original image acquisition. The difficulty of tea bud recognition is that the color and morphological contrast of the bud and the old leaf are not large, so it is necessary to enhance the original image to improve the quality and recognition of the image, so that the image is more conducive to observation and further image processing. Image enhancement technology enhances image features such as edge information, contour information and contrast, thus making the useful information of the image more prominent. [4]  The original image of tea is shown in Figure 2. It can be seen from the picture that the original image of tea is greatly affected by natural light. Because of the uneven illumination of natural light, the degree of brightness and darkness of tea buds varies. In addition, the complex background of tea leaves and the intrinsic color of tea buds and old leaves are similar. And so on, have led to the contrast of tea is not obvious. It is difficult to identify and separate tea shoots.
For color images (RGB), it is not desirable to equalize three channels separately and then synthesize them, because histogram equalization is not a linear operation.Samples cause color distortion by converting RGB to HSV, HSI, YUV or YCbCr, and then equalizing the brightness (i.e. the V, I, Y channels in front) so that it does not affect the color tone and then returns to RGB space. In this paper, YCbCr designed for digital images is used to equalize the histogram of Y channel. As shown in Figure 4, the contrast of the tea image before (a) and after (b) histogram equalization can be seen when the brightness of the tea has been improved, the contrast between the tender buds and the old leaves has also been improved.

HSI color model
The actual working environment of the tea picking robot is complex and changeable. In order to capture images in complex and varied lighting conditions and tea growing environments and ultimately identify tea shoots, a suitable color model needs to be selected. Choosing the right color model can reduce the complexity of the tea identification algorithm and improve the computational efficiency of the algorithm. With the continuous development of computer vision technology, researchers have proposed many color models. These color models include RGB models, CMY models, HSI models, HSV models, etc. In digital image processing, more are used as RGB color models and HSI color models. [5] HSI model was proposed by H. A. Munseu in 1915. It reflects the way the human visual system perceives color. It perceives color by three basic characteristics: hue, saturation and intensity. [6] HSI color model describes the color characteristics with three parameters H, S, I, in which H defines the wavelength of the color, called Hue; S represents the depth of the color, called Saturation; I represents Intensity or Lightness. Choosing HSI color model here can reduce the effect of light intensity change on color judgment. [7] Using the HSI color model is conducive to enhancing the robustness of the tea bud recognition algorithm.

K-means clustering algorithm
The K-Means algorithm is an unsupervised clustering analysis algorithm, in which the K value is defined according to the specific situation. The K-mean value is divided into samples according to the nearest neighbor's iterative rule without any other prior knowledge. Class K.
K-means is one of the most commonly used clustering techniques. When the number of iterations and moving centroids increases, the clustering function converges to the end of the K-Means algorithm to complete the classification. K-Means++ is an improvement on K-means algorithm. The original K-means algorithm first randomly selects K points in the data set as clustering centers, and K-means++ selects K clustering centers according to the following ideas: suppose n initial clustering centers have been selected (0 < n < K), then when selecting n + 1 clustering centers, the distance between the current n clustering centers is as follows: The farther the point will be, the higher probability will be selected as the n+1 cluster center. When selecting the first cluster center (n=1), the same method is adopted. The center of clustering is, of course, the farther away from each other, the better. Although this improvement is intuitive and simple, it is very effective. The implementation steps of the k-means++ algorithm are as follows [8] : 1)Randomly select a sample from the data set as the initial cluster center C 1 2)Firstly, the shortest distance (the distance from the nearest cluster center) is calculated and denoted by D(x). Then the probability of each sample being selected as the next cluster center is calculated. Finally, the next cluster center is selected according to the roulette method.
3)Repeat second steps until you select a total of K clustering centers.
4)The following procedure is the same as in the classical K-means algorithm, that is, calculating the similarity distance (D(X i ,C j (r)). i=1,2,3,... n,)between each data object in the sample space and the initial clustering center. forming a cluster W j , if the following formula is satisfied： (1) ； 5)For each cluster W j , k new cluster centers are calculated. The formula is as follows: (2) The formula of clustering criterion function value is as follows: (3) 6) According to the following formula to determine whether the cluster is reasonable, the discriminant formula is as follows: (4) If it is reasonable, the iteration terminates; if not, then returns to 4), 5) the step continues to iterate. After many experiments and explorations, it was found that when K=3 was selected, the buds of the separated tea leaves could be well identified. As can be seen from Fig.  5, in contrast to the original image of the tea leaf, the black area in Fig. 5(b) is the tea bud. The outline of the tea buds can be more completely segmented, providing a key technology for the subsequent positioning of the tea buds.

Conclusion
1) Taking the image of tea in picking period as the research object, according to the characteristics of tea bud, a method of tea bud identification based on improved K-means clustering algorithm was proposed.
2) In the HSI color model, the S color factor is used to compare the tea bud and the background contrast, and the squared Euclidean distance is used as the similarity distance between the pixels, and the mean square error is used as the clustering criterion function to classify the color. The clustering results are corrected by mathematical morphology operations to improve the segmentation accuracy.
3) Using the method proposed in this paper, the buds can be well identified and extracted, which can better ensure the integrity of the tea buds.