AUTO PREPROCESSING ON REGRESSION DATASETS.
Abstract:
Data preprocessing is crucial in the Machine Learning pipeline because the models’ learning ability directly affects the quality of data and the underlying information ac...Show MoreMetadata
Abstract:
Data preprocessing is crucial in the Machine Learning pipeline because the models’ learning ability directly affects the quality of data and the underlying information acquired from this stage. Nevertheless, surprisingly, there are many alternatives for each transformation task, which makes an inexperienced user overwhelmed. A simple Python-based Auto-preprocessing architecture for Automated Machine Learning is developed to offer automated, interactive, and data-driven support to help the users perform data preprocessing tasks efficiently. The suggested method provides valuable insights into a dataset and can handle standard data preprocessing tasks adeptly. Initially, it detects the data problem and presents it to the end-user using compelling visualizations. Then, it recommends the most effective data cleaning and preparation method to the user after evaluating the state-of-the-art candidate techniques. For evaluation, the proposed architecture is employed on ten different and diverse datasets for automatic data preprocessing before passing it to an ML algorithm. The results are then compared with the results generated by the same ML algorithm but implemented on manually preprocessed data. The results have shown that not only did this approach make the whole process uncomplicated and facile, but it was also able to improve the performance of the model significantly.
AUTO PREPROCESSING ON REGRESSION DATASETS.
Published in: IEEE Access ( Volume: 10)