New approach for gas identification using supervised learning methods (SVM and LVQ)

This article proposes a new approach for gas identification, this approach relies on applying supervised learning methods to identify a single gas as well as a mixture of two gases. The gas is trapped in a gas discharge tube, it is then ionized at a relatively low pressure using an HV transformer. The images captured after the ionization of each single gas is then captured and transformed into a database after being treated in order to be classified. The obtained results were very satisfying for SVM as well as for LVQ. For the case of identification of a single gas, the learning rate as well as the validation rate for both methods were 100%. However, for the case of mixture of two gases, a Multi-Layer Perceptron neural network was used to identify the gases, the learning rate as well as the validation rate were 98.59% and 98.77% respectively. The program developed on MATLAB takes the captured image as an input and outputs the identified gases for the user. The gases used in the experiments are Argon (Ar), oxygen (O2), Helium (He) and carbon dioxide (CO2).


Introduction
Gas sensors and detectors are essential devices in many industrial applications. The aim of this article is to present a new approach that uses supervised learning methods in order to detect a mono gas as well as to recognize the presence of two gases in a mixture. This approach could be considered as an important step toward developing another new approach for identifying several gases in a mixture (more than two).

State of the art:
Machine Learning is an application of artificial intelligence that affords to systems the ability to learn from a given data (experience) without being explicitly programmed. In this part of the article, we explain briefly the theory of the supervised learning methods (SVM and LVQ) that were used for gas identification.

SVM (Support Vector Machines):
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labelled training data (supervised learning), the role of the algorithm is to output an optimal hyperplane which categorizes new examples. In two-dimensional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side [1].

LVQ (Learning Vector Quantization):
LVQ algorithms are related to other competitive learning algorithms such as self-organizing maps (SOMs) [2] and c-means. Competitive learning algorithms are based on the winner-take-all learning rule, and variants in which only certain elements or neighbourhoods are updated during learning. The original LVQ algorithms and most modern extensions use supervised learning for obtaining classlabelled prototypes (classifiers). However, LVQ can also be trained without labels by unsupervised learning for clustering purposes. [3] 3 Results and Discussion: The database obtained from the response of the mono gas (images captured after ionization of each gas) is of dimension (Mx4). The first three columns correspond to the three variables RGB characterising each sample xi (pixel). The forth corresponds to the class of the sample yi. M is the number of samples. Figure   The data are three-dimensional, hence tracing the points xi in a 3D space gives us an idea about the linear or non-linear separability of the data. The repartition of this data in 3D space shows us that the data is totally linearly separable, hence the usage of artificial intelligence or SVM approaches are well adequate to this problem.

Identification by SVM:
Since the problem is a multi-class problem (4 classes), we used the tools of SVM multiclasses. We chose the one against one algorithm. We are searching for the number of hyperplans. Using each time as input the samples of the two classes to separate. To classify a new spectrum (identify the gas corresponding to it), a majority vote on the six classifiers is implemented. As we saw when viewing the database, the data is perfectly linearly separable. Therefore, the application of hard-margin SVMs is sufficient to find the optimal hyperplanes that perfectly divide this data into four classes. Then the matrix of quadratic coefficients Q is generated. The criteria to minimise is written in the quadratic program as: Respecting the constraints: ‫ݕ‬ ் ߙ = 0 (2) and 0≤ ߙ ≤C (3) With: ݂ = ‫ݕݐ݅݊ݑ(−‬ ‫ݎݐܿ݁ݒ‬ ‫݂‬ ‫݊݅ݏ݊݁݉݅݀‬ (ܰ × 1)) and ‫ܥ‬ = ∞ The command "quadprog" in Matlab permits us to solve the minimisation problem with constraints. As mentioned earlier, we use the one against one approach, so for every two different data classes we follow these steps in order to determine the hyperplane that separates their data. Thus, we will get six hyperplanes. To classify our data with the SVM approach, we first tested the SVM hard margin method. The result obtained is very satisfying (learning and validation rate 100%).

Identification by LVQ:
The architecture of the network can vary according to the number of neurons of the competitive layer, and according to the number of neurons of this same layer assigned to each class. The purpose of learning is the estimation of the coefficients of the matrix W (connection weights) to best fulfil the task for which the network is intended. By phase of learning is therefore meant the phase in which the network coefficients undergo modifications according to rules called learning rules, until the network is stabilized, that is to say until what the desired output is almost achieved. The parameters to be optimized are generally four in number: the number of neurons of the competitive layer and the number of neurons affected at each class, the number of iterations, adjustment coefficient, and the coefficient of initialisation of weight. The resultant configuration for this network gives us a learning rate and a validation rate equals to 100%.

Identification of Two Gases using an MLP Neural Network:
In our case (four gases), there exist six possible cases for mixtures of two gases which are (CO2,O2), (CO2,Ar), (CO2,He), (O2,Ar), (O2,He), and (Ar,He). For every case of mixture of two gases, the total RGB of all possible mixtures ranging from (99% Gas1, 1% Gas2) to (1% Gas1, 99% Gas2) was provided as an input database to the Matlab neural network toolbox (nftool). The output corresponding to the total RGB for mixtures of (CO2,O2), (CO2,Ar), (CO2,He), (O2,Ar), (O2,He), (Ar,He) are 1100, 1010,1001,0110,0101,0011 respectively. The neural network was trained on 70% of the input samples. Validation and testing were done on 15% of input samples. The learning rate as well as the validation rate were 98.59% and 98.77% respectively. The figure below shows a Matlab GUI that takes the measured values of RGB as an input, and uses the neural network to identify the gases in the mixture.

Conclusion and Perspectives
As a result, a new approach for gas identification using SVM and LVQ was studied and tested on four different gases. This approach allows us to identify only a mono gas, the results showed a 100% learning rate for SVM and LVQ, which is considered to be promising. In addition, an MLP neural network was used in order to identify the gases in a mixture of two gases, the learning rate as well as the validation rate were 98.59% and 98.77% respectively. It will be more interesting in future work to develop an advanced algorithm capable to identify several gases (more than two) in a mixture. This requires a much bigger database that provides more information for the machine allowing it to identify gases in a mixture. The proposed approach is essential to go further in identifying gases in a mixture.