Using Machine Learning Methods Jointly to Find Better Set of Rules in Data Mining

. Rough set-based data mining algorithms are one of widely accepted machine learning technologies because of their strong mathematical background and capability of finding optimal rules based on given data sets only without room for prejudiced views to be inserted on the data. But, because the algorithms find rules very precisely, we may confront with the overfitting problem. On the other hand, association rule algorithms find rules of association, where the association resides between sets of items in database. The algorithms find itemsets that occur more than given minimum support, so that they can find the itemsets practically in reasonable time even for very large databases by supplying the minimum support appropriately. In order to overcome the problem of the overfitting problem in rough set-based algorithms, first we find large itemsets, after that we select attributes that cover the large itemsets. By using the selected attributes only, we may find better set of rules based on rough set theory. Results from experiments support our suggested method.


Introduction
Rough sets are one of widely accepted machine learning technologies.Good property of rough set theory is that it can describe uncertain facts solely based on data.As a result, there is no room for prejudiced views to be inserted in the discovered knowledge [1,2,3].For example, assume that we have a decision table T like table 1.A decision table T is a table having conditional attributes and decision attributes in a fixed setting.Each attribute has corresponding values.In our example T contains the universe U of objects, x1, x2, x3, x4, and x5.The set of attributes A has 2 attributes, a1 and a2.The set of decision class d has two values, 1 and 2. Vi is the set of values that each attribute has where i = 1, 2, and V1 = {1, 2, 3}, V2 = {1, 2}.As we can see from the example, rules based on rough set theory reflect the available data quite honestly.
There are also some other data mining or machine learning techniques solely based on data.Association rules [4,5,6] are one of representative techniques of such kind.Association rules find rules of association, where the association resides between sets of items in database.For example, assume that we have a simple database of transaction records like table 2 that shows the records of items purchased together in a supermarket.If minimum support that represents how many times an itemset occurs in a transaction record is two, the large itemset of length one can be found as in table 3. Itemset means the set of items that occur in the same records, and large itemset means that the itemset that occurs at least over given minimum support.
There are no large itemset of length three, so we can stop the iteration.From the found large itemsets, we may generate association rules like; if item 2 has been purchased, then item 3 will be purchased also with the confidence of 67%, and if item 3 has been purchased, then item 2 will be purchased also with the confidence of 67%.As we see from the example, association rule algorithms find itemsets that occur more than given minimum support.This fact allows the algorithms to find the itemsets practically even for very large databases by supplying the minimum support appropriately.

Method and experiment
Because rough set theory-based algorithms find rules very thoroughly, it may confront with the overfitting problem.Overfitting training data set is a very wellknown problem [7].Overfitting in machine learning algorithms occurs, because a training data set usually does not cover data space fully.For example, assume that we have a data set that has 10 attributes and each attribute can have 10 discrete values.So the possible data space is 10 10 = 10,000,000,000.If we have one instance for each data point in the 10 dimensional data space, we have 10 billion objects.Assuming each object occupies four bytes, the size becomes 40GB.As we see from this even simple example, the training data from real world usually occupy very small portion of their data space.
Therefore, if we apply the rough set-based rule discovery method directly to real world data sets, we may not get such good results as we expected, especially the size of data set is small compared to the domain of the data set, and the data set values are very specific for some attributes values.In other words, the data values are subdivided very much.
In order to prove our assertion we'll perform an experiment with a real world data set.For our experiment, a data set called 'zoo' from UCI machine learning depository [8] is used.Zoo data has 17 conditional attributes and one decision attribute.The decision attribute has 7 different class values which classify animals in a zoo.The total number of instances is 101.Table 5 shows the meaning of each attribute.The percentage in parentheses shows the confidence of the rule, and the fraction represents the number of classified instances over the number instances having the same condition part.As we see in the rules, rough setbased method found rules precisely and rules are very accurate.But, the test result is not good.
Even though MODLEN can handle continuous values, because association rule algorithms cannot deal with continuous values, discretization method by Fayyad et al. [11] was applied for later application of the association rule algorithms.MODLEM generated the same 7 rules and with the same accuracy for the discretized data also like the original data set.
In order to find association rules class association rule algorithm [11] was used, because the data set has several conditional attributes and a decision attribute.Table 6 shows the large itemsets when minimum support is 10.According to the large itemsets in table 6, the collection of items in the found large itemsets is { hair, Feather, eggs, milk, airborne, aquatic, toothed, backbone.breathes, venomous, fins, legs, tail, catsize}.MODLEM algorithm was applied using these attributes only to see the effect of attribute selection.Eight rules with the accuracy of 96.0% were generated as follows: When we removed attribute 'predator' only that is the other attribute not having been selected, MODLEM generated the same result with the original data discretized and un-discretized as well.The above experiments prove that our assertion is true and show the property of overfitting in rough set theory-based rule generation method.
One more experiment after eliminating the two attributes, 'name' and 'predator' in the original data set without discretization was performed, and we found similar result with accuracy of 96.0% by MODLEM, and the found 8 rules have slight different shape with the one after discretization as follows: Our method is also effective for a representative rule learner like RIPPER [12] as we can see in table 17.The experiment was also based on 10-fold cross validation

Conclusions
Rough sets are one of widely accepted machine learning technologies because of their strong mathematical background and capability of finding optimal rules based on given data sets.Good property of rough set theory is that it can describe uncertain facts solely based on data.As a result, there is no room for prejudiced views to be inserted in the found knowledge from the data.But, because rough set theory-based algorithms find rules very thoroughly, it may confront with the overfitting problem.Overfitting may occur because the theory find rules very precisely, but, on the other hand, it is not easy to find a training data set that covers its data space fully, even for big data.On the other hand, association rules find rules of association, where the association resides between sets of items in database.Association rule algorithms find itemsets that occur more than given minimum support.The algorithms can find the itemsets practically in reasonable time even for very large databases by supplying the minimum support appropriately.
In order to overcome the problem of overfitting problem in rough set theory-based algorithms, we find large itemsets using class association rule algorithm, then we select attributes that cover the large itemsets.By using the selected attributes only, we may find better set of rules in accuracy.Various experiments have been Possible set of minimal rules for T are as follows : Rule 1.If (a1 = 1) Then (d = 1) Rule 2. If (a1 = 2) Then (d = 2) Rule 3. If (a1 = 3) Then (d = 1)

Table 1 .
A decision table T.

Table 2 .
A simple transaction database.

Table 3 .
Large itemset of length one.

Table 4
shows large itemset of length two.

Table 4 .
Large itemset of length two.

Table 5 .
The attributes of data set 'zoo'.

Table 7 .
The result of RIPPER.

Table 8 .
The summary of the experiments with rough set algorithm.
public data set called 'zoo' in UCI machine learning repository to support our suggested method.This work was supported by Dongseo University, "Dongseo Frontier Project" Research Fund of 2016.