Ensemble probability distribution for novelty detection

This paper explores a new ensemble approach called Ensemble Probability Distribution Novelty Detection (EPDND) for novelty detection. The proposed ensemble approach provides a metric to characterize different classes. Experimental results on 4 real-world datasets show that EPDND exhibits competitive overall performance to the other two common novelty detection approaches – Support Vector Domain Description and Gaussian Mixed Models in terms of accuracy, recall and F1 scores in many cases.


Introduction
One of the basic assumptions in most supervised machine learning algorithms is that the class label set is predefined and shared by the training and testing sets so that the classification model could have a good generalization capability.However, there are so many cases in open-domain applications make this assumption invalide.For example, in online webpage classification, we can easily list out some common classes, such as entertainment, politics, and sports.But it is extremely difficult to provide a complete list of all classes beforehand due to the webpages on any new topics and new classes can appear as the data comes.The classification performance will be degraded sharply while the new classes that are never defined in the training phase emerge in the testing phase.Similar examples can be found in many domains, such as fraud detection, ecosystem disturbance, and so on.Novelty detection is a challenging problem we need to explore and research.Novelty detection is defined as the task of recognising that test data differ in some respects from the data that are used in training stage [1] [2].
In this paper, we propose an efficient ensemble framework to detect novelty and present a specialization of the framework involving 5 individual classifiers.The rest of the paper is organized as follows.Section 2 briefly reviews the related work.Section 3 describes ensemble learning and confidence distribution.Section 4 describes the proposed method for novelty detection in details followed by experiments and related analysis in section 5. Finally, section 6 concludes the paper.

Related work
For most applications, the acquisition of novelties becomes serious obstacle.Many approaches are proposed for novelty detection.According to the recent work, the widely used techniques in novelty detection mainly consist of the following types: probabilistic, nearest neighbour-based, domain-based approaches, clustering-based.[1] Probability approaches are based on estimating the generative probability density function (PDF) of the data.The resultant distribution may then be threshold to define the boundaries of normality in the data space and test whether a test sample comes from the same distribution or not.Gaussian mixed model (GMM) which assumes the objects following a mixture of Gaussian distribution have proven popular [3].The samples with low probabilities than a specific threshold can be regarded as novelties.Unfortunately, in many real-life scenario, no a priori knowledge of the data distributions is available, Assuming a distribution for training data may be problematic, resulting a poor novelty detection result.
The nearest-neighbour based approach assumes that normal data lie near their neighbourhoods, while potential novelties lie far away from their neighbours.It is a very simple and effective method but the drawback is that it needs to store all the training data points which are further used to compute the distance between a new unseen data point and all the given data points.Angiulli [4] introduced a nearest neighbours based novelty detector.It accepts data points on the basis of their nearest neighbour distances in a training dataset.Tziakos et al. [5] employed the Mahalanobis distance to train a novelty detector and defined a metric to score each vector in a video sequence.Then, the frames in the sequence that score above the threshold were labelled as abnormal.
Another approach called domain-based approach of novelty detection is to find bounded region that contains (almost) all known normal data.A sample is regarded as novelty when it falls outside of the region.Support vector data description (SVDD) which is inspired by the support vector classifier and proposed by Tax and Duin [6], seems to give a flexible and tight data description among the boundary approaches and uses a hypersphere to enclose all objects in one target class with a minimal volume by minimizing the structural risk.A novelty is assessed by determining if a test point lies within the hypersphere.A drawback of these methods is the complexity associated with the computation of the kernel function.
It is noticed that an ensemble of classifiers can actually provide a kind of metric to measure the proximity of a sample and a specific class.As far as we know, there are few works on novelty detection using an ensemble approach.The one of the most outstanding research, using random forest for novelty detection, is proposed by Zhou et al [7].Zhou et al (2015) made full use of the vote distribution from trees and find a metric to measure the proximity of different samples.Our work is related to this recent work.We make an ensemble system by building n different types of learners and use the class probability vectors from these learners to obtain the mean confidence distribution of each class for novelty detection.Firstly, our method do not rely on the properties of the distribution of data in the training set Furthermore, Our method do not suffer computational complexity like domain-based approaches.Finally, our approach is appropriate to deal with the high-dimensional data.It should be pointed out that Zhou et al [7] use a vlaue generated from vote information of trees to characterize a class while we use a vector generated from probability information from individual learners to charaterize a class.

Confidence distribution
For a classification task, the classifier aims to predict a label from class label set } for a sample x.In most cases, an ensemble method constructs a set of base classifiers from the training data and performs classification by taking a vote on the predictions made by base classifiers [7] .
We hypothesis that the ensemble includes T base classifiers  are used to represent the predicted outputs of i h on sample x, where is the output of hi on Class cj.Formally, (i) Majority voting where wi is the weigh of hi.A test sample x is classified by taking a majority vote on the individual predictions or by weighting each prediction with the accuracy of the base classifier.
Eq.( 1) and Eq.( 2) impose no restriction on the output types of hi.In Realistic task, different types of individual learners output different types values of ) (x h j i .There are two commons : . The voting using class label is called as "hard voting".
Class probability: , corresponds to a estimation of the prior probability ) | ( x c p j .The voting using class probability is called as "soft voting".
In an ensemble-based system, it usually assigns a confidence to the decision made by the system.The confidence can be obtained by integrating the outcome of each classifier.which is defined as follows: where is the confidence of the prediction as Class cj.T is the total number of base classifiers.
Confidence is used to estimate the reliability of predicting a class label for an observation --the greater the confidence is, the greater the probability of corresponding sample belonging to a class is.A confidence vector for the instance x can be represented as Eq. ( 4), which represents the confidence distribution generated by an ensemble system.)] ( ),..., ( ), ( [

Ensemble probability distribution novelty detection approach
Based on the aforementioned discussion, it is noted that an ensemble of classifiers is able to provide a kind of metric to measure the proximity between one new sample and known classes.Because samples from the same class have similar confidence distribution, then for one certain class, it have similar confidence distribution.As a result, it can be characterized by the average or mean confidence distribution of those instances belonging to the same class for a certain class.The proximity between one new sample and a known class can be obtained, based on the distance between sample confidence distribution and the mean confidence of a specific class.A distance threshold need to be preset.While the distance value exceeds the threshold, the sample is rejected by this class.The sample will be regarded as novelty when it is rejected by all known classes.A concrete approach to novelty based on the class probability vectors from component classifiers, denoted as Ensemble Probability Distribution Novelty Detection (EPDND).The EPDND algorithm is described in Algorithm 1.For ith class, we take a sum of probability values of each class respectively and the summation are averaged to obtain mprodi which reflects the mean confidence distribution for ith class.The mprod represents the confidence distribution matrix of all classes.For a test sample, the distance between its confidence distribution vector and the mprodi which is used to measure the proximity of a new sample and a known class is compared with a threshold t to determine whether it belongs to ith class class or not.The distance can be Euclidean distance, cosine similarity or some else.In this paper, we choose the Euclidean distance.If the new instance is rejected by all known classes, it will be predicted as a novelty.

Preliminaries
In this section we present our experimental evaluation of the EPDND framework.In our experiment, we consider five algorithms -Neural Networks (NN), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM) and Discriminant Analysis Classifier(DAC).And we denote this instance of the framework by EPDND-5.
In order to validate the effectiveness of our proposed methods, we evaluate EPDND-5 by comparing them with the two traditional traditional methods SVDD and GMM.For SVDD, we utilize the tool package libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm)developed by Lin Chih-Jen and select RBF as kernel function.For GMM, we set the number of Guassian models as the number of known classes in training dataset, and in initialization, conduct clustering with k-means to determine the GMM components.SVDD, GMM and EPDND-5 are executed on matlab R2015b.We conduct a series of experiments on 4 real-world datasets from UCI.The set of experiments are to demonstrate the effectiveness of the framework on a wide range of real-world datasets.The overall performance of detection could be estimated by recall, accuracy and F1 which are defined as Eq. ( 5), (6).
where the Nnew is the number of total novelties, Ni is the number of samples recognized as novelties and Nia is the number of true novelties among Ni.In order to reflect the overall performance more clearly, we introduce F1, which is defined as Eq.( 7).

Datasets
UCI datasets: 4 datasets from UCI Data Repository [9] are selected.Some general information about these datasets is shown in Table 1 which lists the details of the UCI datasets.

The result analysis
Experimental results are shown in the Table 2 to 5, which reveals there is no single approach performs best on all datasets.Even on one dataset, for different classes as novelty, no approach outperforms others.In summary, from tables, the best results often achieve when using EPND, although there are some cases that EPND is on equal terms with GMM, and optimal results (the bold in the tables) are rarely achieved by SVDD.
The experimental results on real-world datasets are illustrated in Table 2 to Table 5.For Wine dataset, the EPDND behaves best when the Class1 is regarded as novelties.For Balance dataset, GMM shows prominent advantages while the Class1 is regarded as novelties.However, EPDND has visible advantages to GMM and SVDD in term of F1 scores while the Class1 is used as new class, though it is not always the best detector.For Zoo, EPDND stands out in terms of 3 performance indexes.Especially when the Class3 of Zoo is used as novelty class, the results are pleased with 3 indexes equalling to 1.For Segment, EPDND achieves the best result in terms of F1 scores while the Class1 and Class2 are regarded as novelties.
According to the results analysis above, our approach outperforms the others in most cases.

Conclusions
In this paper, we proposed an efficient framework (EPDND) based on ensemble learning for novelty detection.In particular, we present a specialization of EPND involving 5 different individual classifiers, called EPDND-5, for novelty detection.The probability information from those classifiers are employed to obtain the mean confidence distribution for every class which are used to judge whether a new sample is a novelty.
Extensive experiments show that EPDND achieves superior performance on the novelty detection task.Moreover, EPDND outperforms, in many cases, two commonly used novelty detection approaches, Support Vector Domain Description and Gaussian Mixed Models, in terms of accuracy, recall and F1 scores.
There are several avenues for future research.First of all, boosting, Random Forest and bagging can also be considered in our method.Secondly, other proximity measures will be investigated in our future work.Thirdly, since no approach beats its counterparts, meaning no one approach is appropriate to all datasets.The selection of datasets is significant and more datasets should be considered in our experiment to find what kind of method is applicable to what kind of data.

4 Return
th sample  the m -th where } Result vector result According the Algorithm 1 above, an ensemble framework are constructed by build n different individual learners, using training dataset.Then input all training dataset into the n classifiers to get the class probability vectors.

Table 1 .
Description of the UCI Real-World datasets and Minist dataset.For Wine dataset with 3 classes, as shown in Table2, we select Class1 as the novelty.For Balance dataset, we randomly select two classes from these 3 classes, and regard each of these two classes as the novelty in turn.For Zoo and Segment dataset, we randomly select 3 classes from 7 classes, and regard each of these 3 classes as the novelty, in turn.The training dataset is also constituted by 70% samples of the known classes, then the remaining 30% of the known classes and part of samples in the novelty class as the testing set.The experiment results on the four UCI datasets are illustrated in Table2to Table5.The best results are denoted in bold.

Table 2 .
The result of three approach on Wine dataset.

Table 3 .
The result of three approach on Balance datasets.

Table 4 .
The result of three approach on Zoo datasets.

Table 5 .
The result of three approach on Segment datasets.