Multi-robot system learning based on evolutionary classification

This paper presents a novel machine learning method for agents of a multi-robot system. The learning process is based on knowledge discovery through continual analysis of robot sensory information. We demonstrate that classification trees and evolutionary forests may be a basis for creation of autonomous robots capable both of learning and knowledge exchange with other agents in multi-robot system. The results of experimental studies confirm the effectiveness of the proposed approach.


Introduction
The main factor that determines the functionality and adaptive capabilities of modern autonomous robots is associated with the use of intellectual control technologies and knowledge processing methods.The knowledge, which is a priori incorporated into autonomous robot control system, in general case can be supplemented by results of robot self-learning based on the analysis of its accumulated experience [1].
Recent research actively carried out in this area shows with all the compelling evidence that the implementation of learning mechanisms allows for a significant increase of the adaptive properties of intellectual autonomous robots operating in conditions of uncertainty.The variety of autonomous robots application areas determines the possibility of different approaches to organization of selflearning process.In this regard, multi-agent learning is a problem of particular interest, in which the acquisition of new knowledge can be done both individually and through mutual data exchange between robots.
In this paper we show how classification trees can be employed for individual robot learning and how this process can be scaled for a group of robots by the means of evolutionary forests technology enabling accumulation and exchange of knowledge in the task of multi-robot terrain exploration.

Robot learning based on classification trees
Intellectual and adaptive properties of the robot designed to work in an uncertain or dynamically changing environment are characterized not only by its ability to take necessary control decisions based on existing knowledge, but also by the possibility of augmenting it with new observations in the self-learning mode.
Research conducted in the field of intellectual information processing shows that classification trees are an effective method for developing machine learning algorithms in the task of intellectual robot control [2,3].This method allows for substantial increase of robot autonomy on the basis of knowledge discovery.A typical example of knowledge discovery is determination of terrain passability in the task of mobile robot path planning.Through series of trials and errors the robot can figure out the dependencies between terrain region visual properties and the time necessary to pass it.
The construction of a classification tree is associated with the analysis of a large number of sample observations.Input data for this process can be represented as pairs = { , }, are the input parameters, based on which the classification should be performed and t y is the required classification label.E.g. in a task of mobile robot control i X might be a set of features gathered from analysis of on-board camera image and t y is a terrain passability estimation determined as a function of robot velocity.
It should be noted that in case of active learning the sample set is generated continually during robot operation on the basis of collected data about outcomes of various situations.In other words, the set of sample labels for input vectors can be generated through retrospective analysis of consequences of various actions, which had been taken by the robot in different states of operating environment.
The classification tree composition algorithm is based on the dichotomization procedure [4].For a given set of In accordance to the property of dichotomization, the obtained subsets will satisfy the following criterion: S S S S S .In other words after dichotomy of the set S the verity or the falsity of classified parameter y will be more evident than in the original set.
The described dichotomization process may be iteratively applied to every subset derived from it, ending up with the subsets of minimal entropy.
Existing algorithms which implement classification trees formation method have two main disadvantages: x overfitting the data during the learning process (which increases the tree complexity and reduces its ability to generalize classification rules) x greedy classification tree construction strategy which doesn't guarantee the tree to have an optimal number of nodes.

Multi-robot system learning based on evolutionary forests
Another alternative to solving knowledge discovery problems is associated with the use of the method of random forests, formed independently from random training subsamples with a different subset of the available parameters [5].The relevant decision-making model of random forest is based on various implementations of voting over multiple trees and aggregating their predictions by taking averaged or the most popular response.The main strength of this model lies in the fact that the trees of the forest are built on the basis of random subsets of data and therefore have different data overfitting peculiarities, which are virtually eliminated by averaging the individual responses.
Existing deficiencies of this approach are determined by model bulkiness and more complex decision-making procedure that involves calculation of all the trees in the forest.It should be noted that random forest algorithm in its classical form is not suitable for robot online-learning.Construction of the forest from scratch several times during robot operation is not feasible due to high computational complexity of this task.
One possible solution to the above problems is to employ a genetic algorithm [1] for the search of the trees in the forest.In this sense, the trees act as chromosomes in selection procedure that is running during robot operation.The control decision is then formed from the trees with the highest fitness values.Self-learning scheme for a single agent of multi-robot system is illustrated in Figure .1.The learning algorithm for the proposed model can be described as follows: 1. Initialization of classification trees population, where each tree ^, , , , is determined by a random subset of learning samples i S , a set of input parameters i X , an output set of classified parameters i Y , tree structure i N and tree fitness value i F .The trees in the population are either initialized empty or trained on a subset of samples available from the beginning of the algorithm.
2. Augmentation of learning dataset through robot operation and analysis of incoming sensory information about environment state and task execution progress.On this stage a training sample consisting of input parameters and classification value is stored in datasets 1 2 , ,..., k S S S of k randomly chosen trees.The corresponding leaves of the selected trees are updated with the output values from the new samples.
3. Calculation of fitness values for each tree.Each classification tree has access only to the random portion of the learning samples.Thus, fitness values are defined as the probability of correct classification on a set of examples that were not available for the tree during the learning process.At every iteration a new fitness value is defined as an exponential moving average  , , , ,  , ,  , ., , Based on them a new tree is created and added to the population: , ,  ,  ,  , ,  ,  ,  ,  , 1 trees that have low fitness values are excluded from the population.5. Directed mutation of trees through adding or removing the nodes [6].Each tree in a population is assigned a maximum number of nodes proportional to their fitness values.The procedure of adding a node may be described as a search for such a split of a leaf that would maximize the information gain in the task of correct classification of output parameters.Removing the node is likewise based on search of a node which has a minimal impact on the information gain over the samples dataset.
6.If learning process is interrupted then algorithm terminates, else go to step 2.
The proposed approach to learning based on evolutionary forests is not only more efficient due to the use of genetic algorithm but also is closely correlated with the ideas of self-learning in the group of robots with a common operating environment.Robots in a group collect different measurements and therefore can construct non-overfitted trees, which would contain supplementing knowledge.
Thus, evolutionary forest method will provide ability for robots to learn not only from generalization of individually acquired experience, but also through mutual exchange of accumulated knowledge.
The knowledge exchange is simplified by the fact that classification trees can be easily serialized and transferred via wireless communication channel.

Experimental results
Modelling software has been developed using C# programming language to test the proposed machine learning approach.We chose a simple and illustrative task of multi-robot terrain exploration for our experiments (Figure 2).Each robot in a group is assigned a goal point, where the environment should be investigated.The time required for the task fulfillment is determined by the ability of the robot to learn environment properties.For example terrain regions of different colors have different impact on robot velocity depending on their passability value p K .Thus in Figure 2. white color corresponds to easily passable regions ( 1), The advantage of using evolutionary forest technology in this problem is that it allows organizing an inexpensive exchange of knowledge between agents in multi-agent system through the mutual transfer of the classification trees with the highest fitness.
We used multi-agent systems of 2 and 3 robots for the experiments.In both cases we first tested the system with individual robot learning based on random forest technology without knowledge exchange and then we carried out the same experiment with the knowledge exchange enabled.Three different maps were used and 25 trials were conducted for each map.The experimental results are summarized in Figure 3.
For each map in our experiments knowledge exchange based on evolutionary forest method allows to reduce the time necessary for terrain exploration.Robots learn faster to distinguish regions with low and high passability, which leads to a better path planning [7].

Summary
Materials of this paper clearly demonstrate the practicability of self-learning for increasing autonomous robots effectiveness in conditions of uncertainty about operating environment properties.A group of mobile robots can enhance its task execution time through learning to distinguish terrain regions of different passability based on evolutionary forest technology.Hardware implementation of the proposed approach to design of self-learning multi-robot systems requires conducting additional fundamental research in a number of directions and specifically in the following domains: x efficient organization of databases for storage of accumulated sensory information; x mechanisms of associative memory construction for consolidation of knowledge about percepted objects and situations; x developing generalized criteria of intellectual systems effectiveness.Specified problems are of considerable interest for developing intellectual information processing and control systems with an extended ability to adapt complex environments.

1 =
DOI: 10.1051/ C Owned by the authors, published by EDP Sciences, 201 This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits distribution, and reproduction in any medium, provided the original work is properly cited.a set of discrete values of the parameter y; ( , ) P y S is the statistical probability of observing a value y on the set of examples S. Let us define a logical predicate d on the set the set of examples can be divided into two subsets: a subset on which d is true and a subset, where d is false.For these two subsets in the same way as for the original set, the values of entropy are computed using the same formula:

Figure 1 .
Figure 1.Robot self-learning based on evolutionary forests.

1 '
t F is the previous fitness, ' F represents the fitness measured on the new samples and P is fitness update rate.4.Trees recombination.N trees with the highest fitness are selected:

Figure 2 .
Figure 2. Multi-robot system in a virtual environment with regions of high and low passability.

Figure 3 .
Figure 3.Comparison of multi-robot system efficiency with and without knowledge exchange.