I. Introduction
In machine learning (ML) inference, performance is usually measured by two metrics, which are latency and throughput. Latency is the amount of time spent in the inference process and throughput is the measure of the number of inferences that can be completed in unit time [1]. Hence, it is desirable to have low latency and high throughput. In this work, we focused on latency as the issue of speeding up ML inference has been in the forefront of novel research in the recent years. It is significant to explore especially when latency is critical for the system at hand. One such system is average instantaneous power calculation. We deployed a simple power calculation neural network model on various frameworks and computed the execution times of the inference processes. We considered only the inference part and excluded pre and post-processing of data from our calculations. We expected TensorRT to outperform other options due to the fact that it was developed for high-performance inference and is considered to be the leading framework [2]. Demonstrating its superiority was our main motivation. We believe our work contributes to the state-of-the-art literature as it is a quantitative study where various frameworks are compared in terms of their inference performance.