Conferences >2021 15th Turkish National So...

A Performance Study Depending on Execution Times of Various Frameworks in Machine Learning Inference

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This work is intended to compare the latency of various frameworks in machine learning inference through an average power calculation model. This model is created in term...Show More

Metadata

Abstract:

This work is intended to compare the latency of various frameworks in machine learning inference through an average power calculation model. This model is created in terms of a 2-layer neural network with PyTorch, in Python. Then, it is converted to a traced Torch Script module and also to ONNX file format. Afterwards, the C++ front-end is used for the inference process. The traced model is run with Libtorch on CPU and GPU, the ONNX file is run with ONNX Runtime on both CPU and GPU and it is also run with TensorRT on GPU. The inference execution times for 100 trials are averaged for all cases and it is realized that TensorRT with ONNX file format significantly outperforms its counterparts as expected. Hence, this work highlights the performance of TensorRT in machine learning inference and sheds light into the future by proposing several extensions.

Published in: 2021 15th Turkish National Software Engineering Symposium (UYMS)

Date of Conference: 17-19 November 2021

Date Added to IEEE Xplore: 05 January 2022

ISBN Information:

DOI: 10.1109/UYMS54260.2021.9659677

Conference Location: Izmir, Turkey

Contents

I. Introduction

In machine learning (ML) inference, performance is usually measured by two metrics, which are latency and throughput. Latency is the amount of time spent in the inference process and throughput is the measure of the number of inferences that can be completed in unit time [1]. Hence, it is desirable to have low latency and high throughput. In this work, we focused on latency as the issue of speeding up ML inference has been in the forefront of novel research in the recent years. It is significant to explore especially when latency is critical for the system at hand. One such system is average instantaneous power calculation. We deployed a simple power calculation neural network model on various frameworks and computed the execution times of the inference processes. We considered only the inference part and excluded pre and post-processing of data from our calculations. We expected TensorRT to outperform other options due to the fact that it was developed for high-performance inference and is considered to be the leading framework [2]. Demonstrating its superiority was our main motivation. We believe our work contributes to the state-of-the-art literature as it is a quantitative study where various frameworks are compared in terms of their inference performance.

References is not available for this document.

A Performance Study Depending on Execution Times of Various Frameworks in Machine Learning Inference

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Performance Study Depending on Execution Times of Various Frameworks in Machine Learning Inference

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?