By Topic

Applications of Rough Sets in the Field of Data Mining

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ayesha Butalia ; Coll. of Eng., Maharashtra Inst. of Technol., Pune ; Manikrao Dhore ; Geetika Tewani

The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modul- - es of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.

Published in:

2008 First International Conference on Emerging Trends in Engineering and Technology

Date of Conference:

16-18 July 2008