Abstract:
Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware config...Show MoreMetadata
Abstract:
Many data mining workloads are being analyzed in large-scale distributed cloud computing environments which provide nearly infinite resources with diverse hardware configurations. To maintain cost-efficiency in such environments, understanding the characteristics and estimating the overheads of a distributed matrix multiplication task that is a core computation kernel in many machine learning algorithms are essential. This article aims to propose a Matrix Multiplication Performance Estimator on Cloud (MPEC) algorithm. The proposed algorithm predicts the latency incurred when executing distributed matrix multiplication tasks of various input sizes and shapes with diverse instance types and a different number of worker nodes on cloud computing environments. To achieve this goal, we first analyze the characteristics of distributed matrix multiplication tasks. With characteristics generated from qualitative analysis, we propose to apply an ensemble of non-linear regression algorithm to predict the execution time of arbitrary matrix multiplication tasks. Thorough experimental results reveal that the proposed algorithm demonstrates higher accuracy than a state-of-the-art machine learning task performance estimation engine, Ernest, by decreasing the Mean Absolute Percentage Error (MAPE) in half.
Published in: IEEE Transactions on Cloud Computing ( Volume: 10, Issue: 1, 01 Jan.-March 2022)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Data Mining ,
- Matrix Multiplication ,
- Distributed Matrix Multiplication ,
- Machine Learning ,
- Learning Algorithms ,
- Task Performance ,
- Cloud Computing ,
- Mean Absolute Percentage Error ,
- Machine Learning Tasks ,
- Hardware Configuration ,
- Instances Of Type ,
- Task Execution Time ,
- Worker Nodes ,
- Training Dataset ,
- Column Vector ,
- Block Size ,
- Sparse Matrix ,
- Left Column ,
- Input Matrix ,
- Gradient Boosting ,
- Left Matrix ,
- Task Scenarios ,
- Latin Hypercube Sampling ,
- Non-negative Matrix Factorization ,
- Apache Spark ,
- Non-negative Least Squares ,
- Experimental Scenarios ,
- Number Of Training Datasets ,
- Output Matrix ,
- RMSE Values
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Data Mining ,
- Matrix Multiplication ,
- Distributed Matrix Multiplication ,
- Machine Learning ,
- Learning Algorithms ,
- Task Performance ,
- Cloud Computing ,
- Mean Absolute Percentage Error ,
- Machine Learning Tasks ,
- Hardware Configuration ,
- Instances Of Type ,
- Task Execution Time ,
- Worker Nodes ,
- Training Dataset ,
- Column Vector ,
- Block Size ,
- Sparse Matrix ,
- Left Column ,
- Input Matrix ,
- Gradient Boosting ,
- Left Matrix ,
- Task Scenarios ,
- Latin Hypercube Sampling ,
- Non-negative Matrix Factorization ,
- Apache Spark ,
- Non-negative Least Squares ,
- Experimental Scenarios ,
- Number Of Training Datasets ,
- Output Matrix ,
- RMSE Values
- Author Keywords