Skip to Main Content
Data mining environments produces large volume of data. The large amount of knowledge contains can be utilized to improve decision-making process of an organization. Large amount of available data when used for decision tree construction builds large sized trees that are incomprehensible to human experts. The learning process on this high volume data becomes very slow, as it has to be done serially on available large datasets. Our ultimate goal is to build smaller trees with equally accurate solutions with randomly selected sampled data. We experimented on techniques based on the idea of incremental random sampling combined with genetic algorithms that uses global search techniques to evolve decision Trees to obtain compact representation of large data set. Experiments performed on some data sets proved that the proposed random sampling procedures with genetic algorithms to build decision Trees gives relatively smaller trees as compared to other methods but equally accurate solution as other methods. The method incorporates optimization with the comprehensibility and scalability. We tried to explore the method using that we can avoid problems like slow execution, overloading of memory and processor with very large database can be avoided using the technique.