Production Level Data Pipeline Environment for Machine Learning Models | IEEE Conference Publication | IEEE Xplore

Production Level Data Pipeline Environment for Machine Learning Models


Abstract:

Machine learning field has a plethora of options to help diagnose various medical ailments. These models and algorithms seldomly form production level tools as the design...Show More

Abstract:

Machine learning field has a plethora of options to help diagnose various medical ailments. These models and algorithms seldomly form production level tools as the designs are compromised at the implementation level. The compromise is in the form of hardcoded file paths, variables, and development in a local environment. To offer scalable, deployable and platform independent code, the machine learning models should be implemented using best software practices from the initial design phase. Whenever it is required to analyze a big amount of data, loading the complete data adds latency. This latency in analysis is commonly seen in log files and health records [1]. This paper discusses the best practices for writing production level code for an example of epileptic seizure prediction. The design, analysis and visualization is done using Python language. The packages `kedro' and `kedro-viz' are discussed in detail for electroencephalograms (EEG) readings dataset available on UCI's (University of California, Irvine) machine learning repository [2]. The packages are used to create data pipelines for developing production level code. This paper is a preliminary effort to demonstrate the basics of designing production level models including pipelines taking an example of epileptic seizure prediction.
Date of Conference: 19-20 March 2021
Date Added to IEEE Xplore: 03 June 2021
ISBN Information:

ISSN Information:

Conference Location: Coimbatore, India

Contact IEEE to Subscribe

References

References is not available for this document.