A novel approach to outliers removal in a noisy numeric dataset for efficient Mining.
Loading...
Date
2016-02-16
Journal Title
Journal ISSN
Volume Title
Publisher
Published by Faculty of Computing and Information Systems, University of Ilorin.
Abstract
Data pre-processing is a key task in the data mining process. The task generally
consumes the largest portion of the total data engineering effort while unveiling useful
patterns from datasets. Basically, data mining is about fitting descriptive or predictive
models from data. However, the presence of outlier sometimes reduces the reliability of
the models created. It is, therefore, essential to have raw data properly pre-processed
before exploring them for mining. In this paper, an algorithm that detects and removes
outliers in a numeric dataset is proposed. In order to establish the effectiveness of the
proposed algorithm, the clean data obtained through the implementation of the proposed
approach is used to create a prediction model. Similarly, the clean data obtained through
the use of one of the existing techniques is also used to create a prediction model. Each of
the models created is simulated using a set of untrained data and the error associated
with each model is measured. The resulting outputs from the two approaches reveal that,
the prediction model created using the output from the proposed algorithm has an error
of 0.38, while the prediction model created using the cleaned data from the clustering
method gives an error of 0.61. Comparison of the errors associated with the models
created using the two approaches shows that, the proposed algorithm is suitable for
cleaning numeric dataset. The results of the experiment also unveils that, the proposed
approach is efficient and can be used as an alternative technique to other existing
cleaning methods.
Description
Keywords
Citation
Ajiboye, A. R, Adewole, K. S., Babatunde, R. S. and Oladipo, I. D. (2016):