Abstract:
Deep learning models have revolutionized the field of artificial intelligence by exhibiting exceptional capabilities in understanding complex patterns and making accurate...Show MoreMetadata
Abstract:
Deep learning models have revolutionized the field of artificial intelligence by exhibiting exceptional capabilities in understanding complex patterns and making accurate predictions. However, the growing complexity of these models presents a challenge in achieving real-time inference, which is crucial for practical applications. As deep learning models become more popular, there is an increasing demand for faster inference times to enable real-time decision-making in critical scenarios, such as fire detection and classification. In recent years, the advancement in parallel computing hardware, such as graphics processing units (GPUs), specialized tensor processing units (TPUs) and application-specific integrated circuits (ASICs), has provided a significant boost to deep learning inference. One of the challenges in accelerating deep learning inference on diverse hardware platforms is the lack of optimized implementations of mathematical operations specifically tailored for each platform. This limitation can significantly hinder the performance of deep learning models on these platforms. In this paper, we focus on Advanced Micro Devices (AMD) GPU graphics cards and explore the utilization of the Apache Tensor Virtual Machine (TVM) compiler framework to accelerate deep learning inference for Generative Pre-trained Transformer 2 (GPT-2) large language model (LLM) and ResNet-18 image classification model, represented in PyTorch and Open Neural Network Exchange (ONNX) formats, respectively.
Published in: 2024 11th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN)
Date of Conference: 03-06 June 2024
Date Added to IEEE Xplore: 03 September 2024
ISBN Information: