Identification of Blueberry Beverage Using Vis/NIR Spectroscopy

Four kinds of blueberry beverage from different varieties, a total of 140 samples were acquired and analyzed by applying of spectrum technology. Using Savitzky-Golay spectral smoothing and multiplicative scatter correction (MSC) on the sample data pretreatment, four varieties of blueberry beverage were cluster analyzed by using principal component analysis method (PCA),a three-dimensional score view was achieved by the first 3 principal components of all samples (PC1, PC2 and PC3), which shows an obvious classification effect on the blueberry beverage. The first three principal components of the load diagram analysis, the characteristic bands related with the blueberry beverage varieties were 420430nm, 490-500nm, 570-580nm and 1350-1365nm. According to the cumulative contribution rate (99.20%) of the first 6 principal components, the first 6 principal components was choosed as the input of multilayer perceptron (MLP) neural network, 100 samples in all the blueberry beverage samples were selected as a training set, and the remaining 40 samples were used as the prediction set. Training set were trained and prediction set were predicted by applying the multilayer perceptron neural network, and the correct rate of prediction were 100%.Research shows, using principal component analysis combined with multilayer perceptron neural network to identify the varieties of blueberry beverage is feasible.


Introduction
Blueberry, also known as bilberry, is called the King of Berries because of their special texture, sweet-tart taste, and a nutrious abundance in VC [1][2] .The earliest cultivation of blueberries took place in North America, but that happened less than a hundred years ago [3] . In China, the cultivation area of blueberries is mainly in Greater Khingan Mountains, Lesser Khingan Mountains, which are in northeast of China, and Shandong Peninsula. In Zhejiang, Hubei and Sichuan Basin and other areas also have a small amount of cultivation. With an abundance in vitamin, Superoxide Dismutase (SOD), arbutin and flavonoids compounds, blueberries are antioxidant food. It not only works in erythropsine synthesis, vision improvement, immune enhancement and cardiac strengthen [4] , but also helpful in medical efforts against aging, ulceration, inflammation and cardiovascular diseases [5] .Wild blueberries contain many great substances such as folic acid and ursolic acid that are good for hypertension treatment, and anthocyanin that is capable of preventing and curing inflammation [6][7] . Blueberries are also natural anti-cancer fruit because of the existence of many active constituents that inhibit the activity of cancer cell growth and even accelerate their apoptosis [8] . For Healthy adults, a moderate drinking of blueberry beverage everyday could not only reduce the memory of the recession, but also can improve the body's ability to resist oxidative stress, while reducing the loss of lymphocyte DNA [9] .
Blueberry beverage is mainly made from blueberries, which gains much popularity among people. However, blueberry beverage in market is various and the identifications of blueberry beverage varieties are also complex and diverse [10] . To better develop blueberry beverage, it's vital and necessary to find a fast and effective way among these complex and diverse identifications Visible/near infrared spectroscopy is a spectroscopic method that bases on different results of the absorption of electromagnetic waves against different substances, which is typically applied in the field of chemistry, medical, agriculture and detection of agricultural products [11][12] . And if NIR Spectroscopy is applied in the identification of blueberry beverage, there will be many advantages such as fast speed, high efficiency, no damage, high stability and accuracy [13][14] .
Principal Component Analysis (PCA), can effectively find the most important elements and structure among a large number of data. It can also remove the noise and redundancy to deal with the original complex data for dimension reduction processing so that the simple structure inside the complex data can be extracted. That is, without affecting the main spectral information, many original variables can be replaced with few variables. As one of neural network [15][16] . Multilayer Perceptron (MLP) neural network, which is widely used in pattern recognition with good distribution ability, can solve the complex classification problem of pattern distribution [17][18] .
In this study visible/ near infrared spectroscopy (NIRS) and principal Component Analysis (PCA) with Multilayer Perceptron (MLP), as a new way for blueberry beverage identification, are used to analyze different kinds of blueberry beverage.

Sample Source and Spectral Scan
A total of 35 blueberry beverages of four varieties of Genhe blueberry juice, Ye Laoda, Ye Shanpo and Wild blueberry were collected randomly, samples were collected before the blueberry beverages were shaken well and 140 samples were prepared. The spectral scanning adopts transmission method with an optical path of 2 mm, each sample was scanned 30 times and three spectral curves were preserved, the average spectrum of the three spectral curves was used as the final transmission spectrum.

Preprocessing of spectral data
In order to reduce or eliminate the effects of noise, baseline drift and sample nonuniformity during spectral scanning, the original spectrum needs to be mathematically transformed to improve the accuracy and stability of the estimation model.

Smooth method
Savitzky-Golay convolution smoothing is an improvement on moving smoothing by multiplying the measured value by the smoothing factor to reduce the effect of smoothing on useful information and then fitting of the least squares method. The concrete formula is: In formula (1), i h represent the smoothing factor, H represent the normalization factor.

Multiplicative scatter correction
When calculating the multiplicative scatter correction (MSC) for the acquired spectrum, the average spectrum is used instead of the "ideal" spectrum to the least squares method fit of each spectrum and the average spectrum so that the average spectrum can as much as possible becomes as linear relationship.

Preprocessed spectra
In the Unscrambler 9.7 software, Savitzky-Golay smoothing is used, the smoothing point is 9, and the multiple scattering correction (MSC) is used to preprocess the sample. The processing effect is the best. In order to remove the noise effects of the first and the end of the spectral curve, use the spectrum of the band 400 ~ 1800 nm [19][20][21] . The spectral curves of the four blueberry beverages were obtained after pretreatment. As shown in Fig. 1, the spectral lines of one sample were arbitrarily selected in each blueberry beverage sample.Taking the abscissa as the wavelength of the spectrum and the ordinate as the absorbance of each sample.

Principal component analysis
A total of 140 samples were cluster analyzed by using principal component analysis method of four blueberry beverages, blueberry juice, Ye Laoda, Ye Shanpo and Wild blueberry. As shown in Fig.2, in which the X, Y and Z axes respectively represent the scores of the first principal component (PC1), the second principal component (PC2) and the third principal component (PC3).The four kinds of blueberry beverages in Fig.2 were divided into four categories, indicating that PC1, PC2 and PC3 have very good clustering effect on four kinds of blueberry beverages also can qualitatively characterize the characteristics of the four blueberry beverages. But the edge sample distinction of Genhe blueberry juice, Ye Laoda and Ye Shanpo are not obvious.In order to improve the prediction accuracy, using principal component analysis combined with multilayer perceptron neural network to establish detection and analysis model of four kinds of blueberry beverage.

Characteristic bands selected
According to the principal component 1, 2, 3 in the entire wavelength range of the load value shown in Fig.  3.Taking abscissa as the wavelength of 400 ~ 1800nm, the ordinate as the load of different principal components in each wavelength variable values. The correlation between the principal component and the absorbance of the blueberry beverage at each wavelength. From Figure  3 can be obtained that range in 420 ~ 430nm, 490 ~ 500nm, 570 ~ 580nm and 1350 ~ 1365nm have maximum correlation with principal component. The main component 4-20 has maximum correlation with the absorbance of these four blueberry beverage's bands. Therefore, the four wavelength bands were selected as the characteristic bands of near infrared spectrum of the blueberry beverage over the entire wavelength range.

Principal component analysis results
The cumulative contribution rate of the first six principal components has reached 99.204% by the principal component analysis method. As shown in Table 1,the spectral data of 140 samples were replaced by the first six principal components. All the samples were analyzed by the first six principal components. The first six principal components obtained by principal component analysis are used as inputs to the multilayer perceptron (MLP) neural network. The three-layer network structure is chosen as the best training structure, and its structure is 6 nodes, the hidden layer is 5 nodes and the output layer is 4 nodes, with hyperbolic tangent as the activation function of the hidden layer, softmax as the activation function of the output layer. A total of 100 blueberry beverage samples were used as training set for training. The remaining 40 blueberry beverage samples were used as test set. The results show that, as shown in Table 2 ("1" is Genhe blueberry juice, "2" is Ye Laoda, "3" is Ye Shanpo, "4" is Wild blueberry) accuracy of four kinds of blueberry beverage classification has reached 100%.

Conclusion
Based on the combination of principal component analysis and multilayer perceptron(MLP) neural network, the model of blueberry beverage brand identification was established, and the accuracy of classification prediction was 100%.This fully illustrates the use of near infrared spectroscopy technology can quickly and accurately identify blueberry beverage varieties. This method of principal component analysis combined with multilayer perceptron (MLP) neural network is use the principal component as the input of multilayer perceptron(MLP) neural network, which can reduce the calculation of neural network, accelerate the training of the sample and eliminate the spectral signal interference, greatly improving the accuracy of the forecast. At the same time, the characteristic bands of the blueberry beverage brand are 420 ~ 430nm, 490 ~ 500nm, 570 ~ 580nm and 1350 ~ 1365nm., which are closely related to the beverage brands of Blueberry. Therefore, it is feasible to use the principal component analysis and the multi-layer perceptron neural network to identify the varieties of blueberry beverages. It also opens up a good prospect for the future development of blueberry beverage testing equipments.