Performance of an automatic inspection system for classification of Fusarium Moniliforme damaged corn seeds by image analysis

. This paper presents algorithms for pre-processing, feature selection and classifier design which are used for Parameter Refining in the task of development of an automatic system for recognition and grading of corn seeds with external signs of Fusarium Moniliforme disease. The abilities of several feature selection methods – FDR, Scatter matrices and Stepwise Discriminant Analysis and two classification methods - Support Vector Machine (SVM) and K-Nearest Neighbours (K-NN) are investigated. Design and implementation of the system also has been showed. The system could continually present one by one positioned corn kernels to CCD camera, perform a classification procedure of captured images and discharge seeds to assigned containers. The software was developed in LabVIEW environment including image analysis and classification procedures performed using MATLAB Script. Results for total error rate of 8.4% - 7.2% from preliminary classification related to 8480 seeds from16 Bulgarian varieties and total error rate of 6.95% - 20.4% for experimental results obtained with the system during the control measurements of the seed sample are obtained.


Introduction
In the field of agricultural and food technology the task of qualify production is complex and complicated. Difficulties in solving it are linked to the diversity of products and their non-homogeneous nature. Each product or good which is subject to quality control has multiple specific features that need to be considered when drawing up the criteria for assessing the quality and for realization of their actual grading [6,8,20].
The main share in the production of agricultural products takes cultivation of cereals such as maize globally ranks third in sown area and first in production of grain in all cereals. Corn is a major crop in the economy of the Republic of Bulgaria. It ranks second after wheat in planted area and production level. Preparation of seed in Bulgaria even nowadays is carried out by drying and calibration of hybrid maize. In practice seeds grading is done manually in the so-called seed control inspection stations in accordance with the regulations of law. Corn diseases are one of the important factors that have a negative influence on the yield, and incidence rate is one of the main quality indicators. Most damage cause the following diseases: Ustilago zea Unger, Sorosporium zeilianum, Fisarium sp, Helminthosporium turcicum, Puccinia sorghi, Nigrospora oryzae. Causers are numerous different fungi from Fusarium species. Maize contamination by fungi not only renders grains unfit for human consumption by discoloration and reduction of nutritional value, but can also lead to mycotoxin production. Mycotoxins [13,18,19] are poisonous secondary metabolites produced by some fungi in staple foods and foodstuffs. The Bulgarian commission estimated the distribution levels of disease isolates by species in regard to corn seeds. Main and the most common are F.moniliforme and F.graminearum.
The state-of-the-art analyses in the subject area show that classical methods for microbiological analysis and quality control of the corn seeds are precise, reliable and comparable, but they require special laboratory conditions, chemical reagents, qualified experts and the like [19,25,26,27]. The duration of analyses, their labour consumption and the destruction of the samples analysed make them inefficient for on-line control. Organoleptic methods are subjective and depend on the qualifications and the experience of the assessing expert. There is a need to develop a model of an automated system, which integrates a comprehensive non-invasive and objective approach to the recognition of the disease based on a study of Bulgarian maize varieties. Several aspects regarding the influence of varietal identity, year of harvest and side of capturing images also need to be studied. This will improve and simplify the classification process so that the feature development of an automated system to be facilitated.
A brief review on some of the main disease detection techniques on kernels [16,23] proposed is presented next. Machine vision-based systems are well known [1] and preferred in the field of identification and automation of grain handling, but the variety of diseases and damages on kernels are leading the need of more researches. Several authors have applied digital image analysis and pattern recognition for objective identification of cereal grains [3,5,7,10,11,12]. Various aspects of researches on the application of machine vision techniques for disease detection have been previously reported in the field of grain inspection. Detection of diseased kernels by image analysis is a relatively often discussed topic, recently, but most of researches are on wheat kernels. For example Singh et al. [14] used short-wave near-infrared hyperspectral and digital colour imaging for detection of healthy and midge-damaged wheat kernels. NIR hyperspectral image features combined with the top 10 colour image features Yang et al. [22] reported several works on the effort to develop more efficient methods for separating Fusarium-damaged kernels from sound wheat kernels using high-powered pulsed LED system. Two parameters (slope and r2) from a regression analysis of the green response onto the red response were used as input parameters in a linear discriminant analysis (LDA). They have demonstrated that Fusarium-damaged and sound individual wheat kernels can be correctly categorized at up to 91% average accuracy.
Shahin and Symons [15] investigated the use of hyperspectral imaging in the visible-NIR spectral range (400-1000 nm) for the detection of varying degrees of fusarium damage in wheat. The study reported that hyperspectral imaging over 450-950nm can be used to detect varying degrees of fusarium damage in CWRS wheat. Using Linear Discriminant Analysis, sound and damaged kernels can be classified with an overall accuracy of 92% and the extent of fusarium damage can be predicted with an accuracy approaching 86%. A set of six wavebands can be used to achieve accuracies similar to those with the entire range of 450-950 nm.
However, these methods are good basis on which to stand but naturally they are not suitable directly to serve for rapid online inspection of Fusarium Moniliforme damaged corn seeds. Currently, the available researches [21] on identification of F. moniliforme on corn seeds by color features are not enough.
The objectives of this study were to establish valid inspection parameters from color characteristics of corn seeds to develop automatic inspection system for recognition and grading of sound and Fusarium Moniliforme damaged corn seeds and to study the corn seed classification accuracy and repeatability of the computer vision system as compared with results of human inspection.

Data collection of corn samples
The corn samples used in this study were provided and certified by 'Bulgarian Institute of Corn' from the city of Kneja. They have been collected in two following crop years. A total of 8480 corn samples from individual cobs and 16 varieties, listed in Fig. 1 and Fig.2, including sound kernels and diseased with Fusarium Moniliforme were inspected in a random order by a human inspector.

Image Acquisition and Preprocessing
Fluorescent lighting was chosen due to several reasons, including less generation of infrared wavelengths that tend to bias video camera sensors, mentioned in investigation by Steeenhoek et al., 2001. For the purposes of preliminary analysis, maize kernels are captured by two distinct, opposing sides coded on the survey as "pericarp" and a "germ side" (fig.3.). Images are sized (352 x 289) pixels with 8 bit encoding for the red (R), green (G) and blue (B) components of RGB color model and stored in bmp format.  [24] are first-order statistics and are derived from the respective values of the color components for all the pixels of the seed image ROI. The first goal, namely, to select those color features that are independent of one another within each class and rich in discriminatory information with respect to the classification problem at hand is achieved as follows

Feature subset selection and classification
A major issue in classifier synthesis is the selection of informative features [9]. Feature reduction here is important with respect to the working time of the algorithm. The first goal -to select the most relevant https://doi.org/10.1051/matecconf/201821002014 CSCC 2018 features, those color features that are independent of one another within each class and rich in discriminatory information with respect to the classification problem at hand is achieved as follows. The feature selection module in the presented work is based on the filter approach, witch selects relevant features before the classification algorithm is applied.

Stepwise Discriminant Analysis. (General Discriminant Analysis GDA) Sequential Forward floating selection (SFFS)
Discriminant analysis [2] was used to identify those color features (variables) that contributed most to the between-group differences. The null hypothesis H0 was tested using the Wilks  , defined as where |W|, |B|, |T| denote determinants of withingroup,b etween-groups, and total variation matrices, bearing in mind that x is a p-dimensional vector of the jth object in the ith population; i x is a p-dimensional vector of means in the ith population; x is a vector of the overall mean.
Stepwise selection begins with no variables in the classification model. At each step of the process, the variables within and outside the model are evaluated. The variable within the model, at that particular step, which contributes least to the model as determined by the Wilks' Lambda  method is removed from the model. Likewise, the variable outside the model that contributes most to the model and passes the test to be admitted is added. When no more steps can be taken, the number of variables in the model is reduced to its final form. Based on these analyses, the features were ranked by the level of contribution (determined by Correlation Coefficient (CC) and Average Squared Canonical Correlation (ASCC)) to the classifier. In this method, Wilks' Lambda is the likelihood ratio statistic for testing the hypothesis that the means of the classes on the selected variables are equal in the population. Lambda is close to zero if any two groups are well separated.
where р -the overall number of features, g -number of classes, λр -Wilks  for current feature sub-set.
The Wilks' Lambda (15) statistic for the overall discrimination is computed as the ratio of the determinant (det) of the within-groups variance/covariance matrix over the determinant of the total variance covariance matrix.
In accordance with the number of features from baseline feature space -17, total number of 131,069 feature sets for each variety has been generated and tested by the GDA method. The computations were carried out with the Discriminant Analysis module of the Statistica software.

Scalar Feature Ranking -SFR
Feature subset selection technique proposed by Theodoridis and Koutrombas [17], which employs the FDR (Fisher's Discriminant Ratio) criterion (1) as well as a cross-correlation measure between pairs of features and scatter matrices in order to rank them, was used. Scatter matrices are among the most popular measures for quantifying the way feature vectors "scatter" in the feature space. Because of their rich physical meaning, a number of class-separability measures are built around them [17], so we choose one of the available measures, based on scatter matrices -3 J criterion. The procedure included computing of class separability criterion for each feature, so first FDR criterion (16) is used to rank features in descending order.  -the respective variances associated with the values of a feature in two classes. As result, features are ranked in descending order according FDR criterion. The highest-ranked are identified and exhaustive search is employed to select the combination that maximizes the 3 J criterion. The 3 J criterion is defined as: Where w S is the within-class scatter matrix, and b Sbetween-class scatter matrix. Where i P denotes the a priori probability of class  i 1,2,...,c and i S is the respective covariance matrix of class i.

Best Feature Combination -BFC
We assume that m features are chosen employing "Scalar Feature Ranking" technique, so the goal is to find the "best" combination of features. The Best Feature Combination technique [17] represents exhaustive search method, which follows the next steps: Read the training data and normalize the feature values; Rank the features using the normalized training data set employing the above described technique. To reduce the dimensionality of the feature space, work with seven highest-ranked features; Choose the best feature combination consisting of three features (out of the previously selected seven) using exhaustive search method; Classify the normalized feature vectors of the test data using k-NN classifier (k=3) and compute the classification error, employing the Leave-one-out (LOO) method. The three-dimensional feature space, which results in the best performance of k-NN (k=3) classifier, in respect to the LOO Cross-validation error is chosen.

Classifier performance evaluation
During this stage, two types of classifiers are employed in the selected feature space. The selection of the most appropriate classifier in specific automated grading system is determined by a number of requirements, the most important of which are: enough classification accuracy, simplest possible classification algorithm, and computational efficiency, minimum time of execution (boost), easy implementation and possibility of additional settings, versatility and flexibility. Methods of pattern recognition, requiring long time for analysis and classification, or those that are based on large amounts of computational procedures are practically "not applicable" under consideration task. For the purpose of this work is investigated the possibility of image classification with K-Nearest Neighbours (k-nn) and Support Vector Machine (SVM). Random sampling is performed, as 75% of the observations are taking part in the train set, and 25 % in the test set.
To evaluate the performance of the designed system Total error rate 0 e was used as criterion for classifier performance evaluation.
where TP, TN, FP and FN are number of true -positives, true-negatives, false-positives and false-negatives, respectively.

Design and Implementation of the System
The global structure of the system is shown on fig.4. There are specified three general levels -Hardware level, Software level and User Interface level.  In the software design level of the developed automated system, LabVIEW 8.6. (National Instruments, Inc.) is adopted as software development and integration platform. The proposed software is designed to obtain a reliable, simple and easy to use program for managing automated recognition system for classification of corn based on their color. NI IMAQ, NI Vision Assistant and NI DAQ packages were used to develop subroutines associated with image capturing, image processing and motion control, respectively.
Module NI Vision Acquisition software is the basic element in the development of this type of applications and includes the necessary drivers, namely NI-IMAQ and NI-IMAQdx through which easily controls the camera. IMAQdx driver allows use with different types of cameras -IEEE 1394 (FireWire), GigE Vision (Ethernet), and USB. Fig 5 shows Diagram of synchronization of the system actions in case of healthy or diseased seed. The structure and working algorithm of the virtual instrument for Fusarium damage detection is shown in fig.6. The first step of the block diagram is image registration. Original image, which is captured by the camera, includes grain and background. To be performed correctly classification procedure is necessary in the area of inspection (corresponding to specified coordinates in the image to the location of the camera) is actually located object. The template color searching function in LabVIEW Vision Development Module is used for searching the template color image of a seed in the ROI Region of interest to make decision whether there is properly positioned object. The identified ROI "A" and the cropped ROI area "B" for the current image can be seen on the Front panel of the VI (Fig.8.). The block "Croped ROI area" is performed to remove the initial background image in order to reduce the quantity of pixels for further processing. The phase of classification is performed only if it is found correct positioning of the object. For the current image is formed three integer array with color data of each pixel corresponding to R, G and B components. These arrays are fed directly as input to MATLAB Script, since there is the algorithm for the classification of objects studied by analysis of the color components.
View of the camera, visualization of results and the light indicators are also part of the operation of the LabVIEW instrument. MATLAB, as part of the overall virtual instrument, is used for calculating the values of the 17 color features, and implementation of the algorithm for classification. The left side of the VI shows the real-time field of view of the camera (Fig.8-G); the field 'Error Code' (Fig.8-H), which indicates if there is an error obtained while executing program code; indications 'Working Conveyor' (Fig.8-I) and 'Conveyor in position for processing' (Fig.8-J); Stop button ( Fig.8-K) .On the right there is the main menu for corn variety selection (Fig.8-L), by default this is variety Kneja 308. All of the available submenus allow performing the following actions: Selection of folder for storing the images from the available folders on your computer (Fig. 8-A); Visualization image of the last classified corn seed ( Fig.  8-B); Visualization of the results of the procedure for searching correctly positioned object in front of the camera (ROI -Region of interest) ( Fig. 8-C); LED indication for the presence of a properly positioned object in the ROI (Fig.8-D); LED indication of the classification procedure result -green indication for sound seed and red indication for damaged seed (Fig. 8-E); Fields for automatically counting classified sound and damaged seeds with option for zeroing the fields before start working (Fig.8-F).

Results from Feature Subset Selection
The computations were performed in MATLAB environment (Mathworks, USA) and STATISTICA (Six Sigma Version 8, StatSoft). First color features were compared by applying one-way ANOVA (Analysis of variance) method followed by Tukey's post-hoc test using the software STATISTICA 8, with a 0.05 significance level. Applied ANOVA [24] showed that the factor variety has a statistically significant impact over all 17 color features, both healthy and infected seeds. This is also an indication that the coating on corn seed that is formed as a result of the disease Fusarium Moniliforme has different color characteristics for each variety.
By observing Table 1, it can be seen that there is no repeatability in the feature sets for different varieties, excepting Kneja 433 and Kneja 449 by method BFC and Kneja 501 and Ruse 424 by the same method. The features ym (xyY), S(HSV) and b (from Lab model) which contribute distinguishing between two classes are met in feature sets for 6 of varieties. The weakest features are: R(RGB), x(xyY) and Ycrcb, cr(yCrCb), they are not included in more than one set.
Procedures for determining the relevant sets of features for all varieties were met when setting the two values for the maximum number of features, respectively 7 in method Scalar Feature Ranking and 3, the method Best feature combination. Results for all data sets and varieties when samples -germ side and pericarp side are combined into a single sample are given in Table 1.

Results from classification
SVM makes use of the so-called kernel. The most commonly used kernels are linear kernels, polynomial kernels, and radial basis functions. First, the average test data set classification accuracy obtained from a single SVM without any involvement of the designing procedure proposed was estimated. The implementation of the method was made for radial basis function (RBF) with a width σ and is formulated as The optimal values of the regularization constant C and the kernel width have been selected experimentally. The classification procedure has to assign the samples in two classes (healthy and infected seeds); software package STATISTICA 8 (StatSoft) is used for implementing the classifiers. The training sample comprises 75% of the overall sample and V-fold cross validation (V = 10) is applied. The parameters of the classifiers gamma and C are experimentally chosen in fixed ranges during training phase, respectively for C = 1÷10 (minimum-1, maximum -10, with the increment unit), and for gamma -in the range of 0.091 to 0.333. The objective here is to combine a set of features and the most appropriate classifier for each one in order to improve the overall classification accuracy. Table 2 presents the average test data set classification accuracy obtained for all the varieties sets from a single SVM when using different combinations of features.
The evaluated results listed in Table 2 are the averaged values of total error rate of 16 varieties: 7.7%, dealing with feature set from GDA method; 7.8% -dealing with feature set from SFR method and 8.2%, -dealing with feature set from BFC method. Regarding k-NN classifiers, results were obtained on the following terms: for each variety optimal number of nearest neighbors -k is selected experimentally; selected is k, which received the highest accuracy of cross-validation, as tested in five variants, respectively for k = 1,2,3,4 and 5; the measure of similarity, which is the second parameter setting k-NN classifier and depends on the characteristics of the input data in this case is chosen to be the Euclidean distance as (6); Vfold cross validation (V=10) was applied as well.
The evaluated results listed in Table 3 regarding classification with k-nn includes averaged values of total error rate of 16 varieties respectively: 7.6%, dealing with feature set from GDA method; 7.2% -dealing with feature set from SFR method and 8.4%, -dealing with feature set from BFC method.
In practice in the field of recognition and sorting food stuffs, mostly the performance and applicability of classification algorithms are evaluated by various error of classifiers. The evaluation of the classification error as the ratio of the wrongly classified samples to the total number of samples is often used with the status of a standard criterion for evaluating the quality of recognition systems. In the practice of automatic grading and sorting of food products, the classes are more than two, but the evaluations of the credibility of recognition in general include mainly two types of errors: highway and marginal [4].
In accordance with the above, the proposed classification approaches, which are integrated into the system are tested with a control sample of seed and evaluated by the computed values for total error е0 (9), highway errors ei (12) and marginal errors gi (13). Highway (major, basic) error is calculated according to the expression: In a synthesized form summarizing the bialternative case of error, the highway error shows how many objects have mistakenly gone to other classes, i.e. how many objects are not assigned to the class to which they actually belong, and the marginalhow many objects have mistakenly come from other classes and in this sense are equivalent to the type I and type II errors in mathematical statistics.
A series of tests have been conducted to validate the property of the automated grading system. A control sample of 230 corn seeds (120 healthy and 110 infected seeds) from a variety Kneja 451, were tested in automatic mode. When submitting in the inspection area, seeds are not steered in advance. Figure 9 shows images of sound and damaged corn seeds, captured during the experimental study with automated system. Fig.9. Images of inspected seeds in grading system Tables 4 and 5 represents Tables of errors, which summarize the accuracy results obtained by the system using previously proposed approaches:  Classification with SVM classifier and three color features from method BFC  Classification of seeds with k-nn classifier using three color features, which is ready to use instrument in Labview environment. process for one seed. It determines the system's performance at all. In order to measure the speed of image processing, a counter block was placed in the Labview instrument. At the beginning of the image analysis, the procedure triggered the time counter. At the end of the image processing, a new signal stopped the clock, thus allowing us to measure the time that had elapsed.
The k-nearest neighbour classifier (k-nn) produced the best time, but also had highest value of total error (20,4%) and highest value of marginal error for class 'sound' -15,2% (Table 5.). Analysis of results demonstrated that using SVM classifier the total error rate is lower (6.9%) compared to k-nn with the same number of features.

Conclusion
The variety of corn seeds influences significantly on the color features of sound kernels as well as of the diseased area (Fusarium Moniliforme) of kernels. The attached two-factor analysis of variance showed that the factors variety and side of seed have a statistically significant impact on the studied 17 color features, both for healthy and infected seeds.
Different approaches are proposed for the classification task including feature subset selection by General Discriminant Analysis, Fisher's discriminant ratio together with Scatter matrices -J3 criterion and classification with k-nn and SVM. Tests analysis carried out with sixteen different varieties of corn seeds (Kneja 308, Kneja 436, Kneja 613, Kneja 620, 26A, Ruse 424, XM 87/136, ЕСО 85, Kneja 433, Kneja 449, Kneja 451, Kneja 452, Kneja 501, Kneja 515, Kneja 549 and PR35 P12) demonstrated that the varietal identity significantly influenced the selection of informative subsets of color features for classification of corn kernels. Tests analysis of impact of the side of capturing of seed on the accuracy of the classification procedure demonstrated that this influence could be compensated using the proper combination of feature set and classifier, corresponding to seed variety.
A comparative study to assess the impact of the methods used for feature selection and classification on recognition accuracy is made. It has been found that classification accuracy for k-nn and SVM without taking into account the side of capturing in the preliminary research reaches total error rate 8.4 ÷ 7.2% depending on seed variety. The classifier performance is estimated and results showed that the lowest total error rate of 2-20% according to the variety is achieved by classification with k-nn classifier and features selected by Scalar Feature ranking method.
A computer-based system, which has been reported in this paper allows real-time recognition and grading of corn seeds with external signs of Fusarium Moniliforme disease in laboratory practice. The designed automatic system has successfully incorporated visible-light imaging techniques to a modular mechatronic system with quite precise functions of motion control, signal processing, and machine vision, based on LabVIEW development platform and MATLAB workbench. The proposed virtual instrument combines an easy realization of a graphical user interface, camera control unit and complex mathematical analysis. It allowed for the classification of the 16 varieties of corn seeds. In terms of processing time the k-nearest neighbour classifier is leading by the fact that it is used a k-nn built-in Labview environment module for classification. The classifier performance is estimated also experimentally with the system during control measurements of seed sample (variety Kneja 451) and results showed that total error rate is 6.95% for SVM classifier and 20.4% for k-nn classifier.