Abstract:
Recently, Deep learning (DL) compilers have been widely developed to optimize the deployment of DL models. These DL compilers transform DL models into high-level intermed...Show MoreMetadata
Abstract:
Recently, Deep learning (DL) compilers have been widely developed to optimize the deployment of DL models. These DL compilers transform DL models into high-level intermediate representation (IR) and then into low-level IR, ultimately generating optimized codes for different hardware targets. However, DL compilers are not immune to generating incorrect code, leading to potentially severe consequences. Testing techniques for low-level IR are limited, and efficient approaches for detecting some categories of non-crashing bugs are lacking. In this paper, we address the limitations of existing low-level IR DL compiler testing techniques and introduce DeepDiffer, a priority-guided differential testing framework designed to detect bugs resulting from low-level optimizations in the DL compiler, specifically TVM. We propose a novel DL compiler coverage metric and establish an optimization goal to maximize the detection of valuable differences between DL compilers. Our experiments demonstrate that DeepDiffer outperforms existing low-level IR fuzzers, detecting a wider range of bug types. In fact, DeepDiffer has successfully identified 13 bugs in TVM, which can be categorized into 9 distinct root causes, and 9 bugs are first found. We have submitted these bugs to the TVM community, where they have been confirmed.
Published in: 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)
Date of Conference: 22-26 October 2023
Date Added to IEEE Xplore: 25 December 2023
ISBN Information: