MATEC Web Conf.
Volume 76, 201620th International Conference on Circuits, Systems, Communications and Computers (CSCC 2016)
|Number of page(s)||7|
|Published online||21 October 2016|
Attribute Selection via a Novel Interval Based Evaluation Algorithm: Applied on Real life data sets
The British University in Egypt, Cairo, Egypt
Real life problems handled by machine learning deals with various forms of values in the data set attributes, like the continuous and discrete form. Discretization is an important step in the pre-processing stage as most of the attribute selection techniques assume the discreetness of the input values. This step could change the internal structure of the input attribute values with respect to the classification problem, and thus the quality of this step directly impact the quality of the selected features. This work discusses the problems existing in the current discretization techniques and proposes an attribute evaluation and selection technique to avoid these problems. Attributes are evaluated in its continuous form directly without biasing its internal structure and enhances the computational complexity by eliminating the discretization step. The basic insight of the proposed approach relies on the inverse relationship between class label distribution overlap and the relative information content of a given attribute. In order to estimate the validity of this assumption, a series of data sets were examined using several standard approaches including our own implementation, and the approaches ranked with respect to the overall classification accuracy. The results, at least with respect to the testing data sets deployed in this study, indicate that the proposed approach outperformed other methods selected for evaluation in this study. These results will be examined over a wider range of continuous attribute data sets from nonmedical domains in order to investigate the robustness of these results.
© The Authors, published by EDP Sciences, 2016
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Initial download of the metrics may take a while.