AI-Powered Data Governance: A Cutting-Edge Method for Ensuring Data Quality for Machine Learning Applications | IEEE Conference Publication | IEEE Xplore

AI-Powered Data Governance: A Cutting-Edge Method for Ensuring Data Quality for Machine Learning Applications


Abstract:

In the past few decades, the banking sector has increasingly recognized the significance of an automated system for managing significant data quality, leading to a growin...Show More

Abstract:

In the past few decades, the banking sector has increasingly recognized the significance of an automated system for managing significant data quality, leading to a growing focus on data quality evaluation. Data governance for the computerized system is necessary to ensure the performance of the machine learning (ML) models. Data cleansing is a fundamental component of evaluating data governance, which focuses on quality and is an essential step before creating data analytics services. This paper introduces an automated framework for ensuring data quality using statistical and ML methods in banking, highlighting its objectives, functionality, and methodological advancements. The novel proposed approach focuses on proving the necessity of data quality assessment before training the ML models. In the evaluation of data quality, the outliers are detected using three different approaches: the first approach used is Tukey's IQR, the statistical method, the second is Isolation Forest (IF), the supervised learning method, and the last is the DBSCAN, the unsupervised learning method. Among the three, the highest outliers were detected by the IQR method. The outliers are removed, and then the three methods are compared to train the three ML models, i.e., logistic regression, K nearest neighbors, and the Naive Bayes.
Date of Conference: 22-23 February 2024
Date Added to IEEE Xplore: 18 April 2024
ISBN Information:
Conference Location: Vellore, India

Contact IEEE to Subscribe

References

References is not available for this document.