ACC_TEST: Hybrid Testing Techniques for MPI-Based Programs

,


I. INTRODUCTION
Message-Passing Interface (MPI) is one of the most widely used programming models for parallelizing most scientific applications. This programming model is used for supporting parallelism in sequential programming languages by adding MPI directives to control data movements between processes. MPI also supports integration with some programming models and has several implementations from different vendors, including open-source and commercial implementations. However, testing parallel programs is a difficult task, especially when using programming models with different errors behaviours and types based on the programming model type. In addition, the increased use of these programming models by non-computer science specialists can cause several errors due to lack of experience in programming, which needs to be considered when using any testing tools.
The associate editor coordinating the review of this manuscript and approving it for publication was Fan Zhang .
As a part of our previous work [1]- [3], we proposed and created a parallel hybrid testing tool named ACC_TEST that targeted programs built in a heterogeneous architecture and covering different errors. In addition, we aim to develop hybrid-testing techniques for detecting errors in the dual-programming models MPI + OpenACC at the ends of our project. In this paper, we enhance ACC_TEST to have the ability to test MPI-based programs and detect runtime errors occurring with different types of MPI communications. We also focus on the interaction between MPI and the other programming models, especially high-level programming models such as OpenACC.
The rest of this paper is structured as follows. Section 2 briefly gives an overview of MPI, and section 3 will discuss the related work. In section 4, we explain our techniques for testing MPI-based programs covering different types of MPI communications, including point-to-point and collective communications. In sections 5 and 6, we explain and discuss implementing, testing, and evaluating our FIGURE 1. General MPI program structure [4].
techniques for testing MPI. Finally, conclusions and future work will be discussed in section 7.

II. MPI PROGRAMMING MODEL
Message-Passing Interface (MPI) [4] is a programming model used for message-passing techniques for supporting parallelism in traditional programming languages, including Fortran, C, and C++. The first official version of MPI was released in May 1994. Data is moved from a process address space to another process by using cooperative operations in each process. The aim of MPI is to establish a standard for creating message-passing programs to be portable, efficient, and flexible. MPI has several implementations, including open-source implementations such as Open MPI [5] and MPICH [6], and commercial implementations such as IBM Spectrum MPI [7] and Intel MPI [8]. MPI is considered as a standard, portal programming model that can be implemented in several platforms, hardware, systems, and programming languages. Additionally, MPI can work perfectly with several programming models and with heterogeneous networks. In addition, MPI has various versions of MPI implementations from different vendors and organizations that are available as open-source and commercial implementations.
An MPI program has a specific structure, starting with MPI including the file in the header and then MPI environment initialization, which considers the beginning of the parallel code. After that, the main massage-passing calls take place, and in the end terminate the MPI environment. Figure 1 demonstrates the general MPI program structure with respective examples of MPI codes.
There are two types of MPI communications, including point-to-point and collective communications. MPI pointto-point communication typically involves message passing between two and only two different MPI tasks. The first task is to perform a send operation, and the other task is to perform a matching receive operation. There are different types In terms of collective communication, there are three types: synchronization, data movement, and collective computation. Also, MPI collective communications can be blocking or non-blocking, just like MPI point-to-point communications. Figure 2 demonstrates the collective computation and data movements for MPI collective communications.
The newest MPI standardization version 4.0 [4] is currently available, which aims to add new techniques, approaches, and concepts to the MPI standard to help MPI address the needs of current and next-generation applications and architectures. The new version extends to better support hybrid-programming models, including hybrid MPI+X concerns and support for fault tolerance in MPI applications.

III. RELATED WORK
There are many testing tools that target MPI using different testing techniques and covering different types of error. In our survey, we cover more than 20 testing tools that target MPI programming models. Using static testing techniques, the testing tool MPI-Checker [9] has been used for detecting mismatching errors in MPI-related programs.
For the dynamic testing techniques, we studied more than 15 testing tools that use dynamic testing for MPI-related programs to detect various errors. For detecting runtime errors including mismatching, data race, and deadlocks, many testing tools have been used such as MUST [10], [11], MEMCHECKER [5], Intel Message Checker [12], STAT [13], Nasty-MPI [14], and MARMOT [15]. In addition, the testing tools that have been used for detecting mismatching and deadlocks include Umpire [16], GEM [17], and MPI-CHECK [18]. For race condition and deadlocks, MPI detections MAD [19], [20], and PDT [21] have been used. MPIRace-Check [22] has been used to detect race conditions in MPI. Finally, MOPPER [23] and ISP [24] are used for deadlock detection.
In the case of hybrid testing techniques, we did not find enough research tools that used the hybrid testing techniques for building a testing tool for MPI-related programs covering wide range of errors. However, MPI collective communication has been validated by using a two-phase analysis for detecting collective patterns in an MPI program that can cause deadlock [25]. Also, for detecting errors in the MPI/OpenMP dual programming model, two hybrid testing tools have been used, including PARCOACH [25] and [26] for detecting deadlocks and other runtime errors.
In our study, we noticed that some tools considered as debugging tools, not testing tools, including AutomaDeD [27], ALLINEA DDT [28], TotalView [29], PDT [21] and MPVisualizer [30]. We have noticed that these five debuggers do not help to test or detect errors, but instead are used for finding the causes of the errors.
In our literature review [31], we note that dynamic testing techniques have been used for testing the majority of MPI programs. The dynamic testing techniques detect errors by analyzing the source code during runtime, which will cause overheads, and this will affect the program's performance, especially when targeting massive parallel applications generating thousands or millions of threads. Also, dynamic testing needs some insertion mechanisms to perform the testing and get better results, which also comes with its own cost. On the other hand, only one testing tool that used the static testing has less execution and size overheads, but does not detect all errors. Finally, in this version of our testing tool ACC_TEST, we decided to use hybrid testing techniques by combining both static and dynamic testing techniques to gain the benefit of each and reduce the cost. Also, we decided to cover errors from each type of MPI communications because the previously mentioned testing tools did not cover some errors or only focused on race condition and deadlocks.

IV. OUR TECHNIQUES FOR TESTING MPI-BASED PROGRAMS
In this section, we will explain our techniques for detecting some errors related to MPI applications. As we discussed earlier, there are many testing tools related to MPI; therefore, we only focus on detecting some errors that occur in MPI programs, which include GPU-related programming models. On the MPI side, we tried to minimize the overhead and the slowdown that can occur during dynamic testing, as we will explain later. We will cover some errors from each type of MPI communication, including point-to-point and collective communications.
In terms of MPI point-to-point, we cover two different cases: blocking and non-blocking point-to-point communications. In the blocking type, our hybrid testing technique will examine the targeted source code by analyzing the code, collecting the related information to MPI_Send, MPI_Recv, and MPI_Sendrecv for detecting any actual or potential errors. In addition, in the non-blocking type, ACC_TEST will deal with MPI_Isend and MPI_Irecv by collecting their related information and analyze their behavior to detect any errors. In the following, we will explain how ACC_TEST detects an MPI-related program and classify them into three sections: A. POINT-TO-POINT BLOCKING COMMUNICATION DETECTION ACC_TEST will be responsible for detecting any pointto-point blocking communication, including MPI_Send, MPI_Recv, and MPI_Sendrecv. We chose the previous three MPI directives because of their popularity, and they are the most-used MPI blocking calls in several related programs. In ACC_TEST, we focus on the effects of OpenACC and MPI directives for each other and what types of error could be caused by this interaction.
Our static analysis will analyze the source code to find any MPI-related calls and determine their place in the source code, what type of MPI calls they are, and what arguments they have including source, destination, data type, communicator, and rank, as well as their relationship to the OpenACC regions. Our static analysis also will lexically analyze the MPI calls, as well as parsing them to ensure they are following the MPI call rules. Our static analysis will start by determining each MPI send-and-receive and their locations in the source code, storing their information in a data structure and also ascertaining to which MPI rank they belong for determining the message direction. Our static analysis will create several tables for storing the related send-and-receive calls based on their type and storing their related information, including the rank and type, communicator, tag, and line number. Then we compare this information in the static phase, searching for any missing potential for race condition or deadlock as well as mismatching. In addition, our static analysis will check for any illegal MPI calls before MPI_Init and after MPI_Finalize. Race conditions caused by reading and writing to the same MPI buffer can be detected by our static tool. Also, in the same rank, if there is MPI_Send and MPI_Rev, the MPI_Recv should precede the MPI_Send; if not, our static testing will detect that and issue a warning message to the programmer.
Additionally, ACC_TEST will determine the MPI_Send/MPI_Recv pair, which will be used to detect any differences between the numbers of send-and-receives. This pairing will also examine the message tag to detect any unmatched message pairing. Our static analysis will check any message leaks (messages that were sent but never received) or inconsistent types on the sender and receiver for the same message.
In the case of having more senders than receivers, which is considered a lack of resources that can also lead to a potential race condition, which will be detected by our dynamic tester to know the exact error. On the other hand, when the number of receivers is more than the number of senders, this will lead to potential deadlock because they will be waiting for a message without receiving it; this will also be annotated for further testing in our dynamic phase.
In addition, our static analysis has the ability to detect any mismatching in data types and message sizes. Our static analyzer will also analyze the relationship between OpenACC and MPI directives to determine the mismatch in the data movement between the OpenACC and MPI directives. This will detect the data type mismatching not only between the MPI_Send and MPI_Recv calls, but also in the code; for example, if the programmer defines a variable as an INT in his code and passes this variable in an MPI call as MPI_CHAR, ACC_TEST will detect this error.
In terms of race condition detection, our static analysis will detect any case of several messages being sent to the same destination with the same tag number, which can cause a race condition.
In terms of deadlock detection, our static phase has the ability to detect actual and potential deadlock based on our static analysis of the targeted source code. One of the potential error situations is using the wildcard receive. Additionally, our static analysis will detect any wildcard receive with any source or any tag and examine them to avoid any potential deadlock or race condition and annotate them to be detected in our dynamic phase.
Another case of point-to-point blocking communication is the MPI_Sendrecv calls, which will be examined and analyzed like the previous MPI_Send and MPI_Recv calls. Additionally, the error detection will be as described in previous MPI calls because they show the same behavior but with a different structure.
In our dynamic phase, deadlock and race condition will be detected using the annotation of our static phase and insertion of the appropriate statements for detecting the actual error during runtime. ACC_TEST tests only the connections that have potential errors as determined by our static testing analysis, which saves time and enhances testing performance by testing only the part that needs to be tested and minimizing overhead and slowdown from dynamic testing.
For detecting deadlock in point-to-point blocking communication, ACC_TEST will reference the marked MPI_Recv that has potential errors as determined by our static analysis. Then, our insertion mechanism will replace the MPI_Recv with MPI_Irecv and define new MPI_Request and MPI_Status for testing purposes. A timer will be set for a specific time, determined by calculating the average of the required times. Finally, we test the MPI_Irecv by using the In the case of race condition detection, when all calls arrive ACC_TEST will compare the actual message exchange with the information from our static analysis for detecting any potential race condition, as shown in the insert test code in Figure 4. ACC_TEST will insert the values from our static testing and compare them to the resulting values from the actual runtime values by using the following insert statements: Similarly, MPI_Sendrecv will be tested by dividing each MPI_Sendrecv into MPI_Send and MPI_Recv, as we explained earlier. The following Figure 5 displays the insertion mechanism of the MPI_Sendrecv calls.
Additionally, for testing race condition in point-to-point blocking communication in (MPI_Sendrecv), ACC_TEST will also use the same insertion mechanism of the previous test as shown in Figure 4 by comparing the actual message exchange information with the information from our static analysis.
In addition, to distinguish between actual and potential deadlock in our dynamic tester, we will test our inserted code multiple times where: • If all tested cases detect the same error, this indicates actual deadlock.
• If some cases detect errors and some not, this indicates potential deadlock, which can be affected by the execution environment and order.
• If all cases have no error, that indicates this code is deadlock-free. VOLUME 8, 2020 For example, if we test the same connection 5 times where if the number of detected errors is 5, this indicates actual deadlock. If the detected error(s) is 1, 2, 3, or 4 out of 5, that indicates potential deadlock. Finally, if the number of detected errors is 0, this indicates deadlock-free. Our dynamic testing will inform the programmer about the error type, line number, which rank, and what MPI call has caused this error.
Finally, by using this approach, ACC_TEST minimizes the overhead from using the dynamic analysis and enhances our testing performance by decreasing slowdown, as well as testing accuracy, without extra unnecessary testing operations or inserting codes that actually cause overhead without getting accurate results.

B. POINT-TO-POINT NON-BLOCKING COMMUNICATION DETECTION
In this section, we will discuss how our testing techniques will examine and detect runtime errors related to pointto-point non-blocking communication, including MPI_Isend and MPI_Irecv. Similar to the previous point-to-point communication detection, our static approach will also collect information related to MPI_Isend and MPI_Irecv and store them for detecting some errors similar to the previous class, including mismatch and different numbers of senders and receivers, as well as analyzing the MPI_Isend/MPI_Irecv pairs.
Unlike the blocking communication, non-blocking communication has an object called request, used to identify a communications operation and its properties. This feature needs to be detected by our static phase to avoid any potential error. Our static analysis will detect request lost, that if the same request variable is used in different MPI_Isend, MPI_Irecv in the same rank, this can cause request overwrite and should be detected before it can cause further errors in related operations.
In addition, non-blocking communication will cause potential race condition, especially in the case of operations needing to be completed before sending or receiving. Therefore, the MPI_Wait calls need to be used for completing the non-blocking communication. As a result, our static tool will investigate the targeted source code and detect any missing MPI_Wait calls. However, in the case of using MPI_Wait while the source code has a deadlock, the program will freeze, and therefore our static analysis will annotate this situation to be tested by our dynamic tester. Also, if there is a deadlock and the MPI_Wait is not used, the program will complete running with wrong results that cannot be detected by our static phase and need further dynamic testing.
In terms of detecting errors in our dynamic phase, our testing detects the deadlock in the non-blocking point-to-point connection (MPI_Isend/MPI_Irecv) by adding MPI_Test before any MPI_Wait to avoid any program freeze, because in this case the deadlock will occur in the MPI_Wait call. Also, we can detect the race condition if we found Isend and Irecv without using MPI_Wait or MPI_Test because we cannot ensure the arrival order of the threads; therefore, any potential race condition message will be issued to the programmer. The insertion mechanism of detecting deadlock will be similar to that used in the point-to-point blocking communication shown in Figure 3.
Similar to our approach of detecting race condition in the blocking communication, our dynamic tester of the non-blocking communication will also compare the actual message-receiving information with the information from our static analysis for detecting any potential race condition, as shown in Figure 4.

C. COLLECTIVE COMMUNICATION DETECTION
The third class of detection that our testing tool can target is the MPI collective communication, including blocking and non-blocking, which in our case will be MPI_Bcast and MPI_Ibcast. Our static phase will be responsible for collecting the related information needed to test and detect any runtime errors related to MPI collective communication codes. Additionally, ACC_TEST will lexically analyze and parse the targeted source code to ensure the correctness of the MPI calls, as well as detecting some errors that can be resolved during our static analysis.
The type-matching conditions for the collective operations are stricter than the corresponding conditions between sender and receiver in point-to-point [4]. Therefore, our static phase will be responsible for detecting any data type and size mismatching errors, as discussed earlier. The collective operation order will also be examined in our static phase to avoid any potential errors resulting from incorrectly ordered collective operations in the same MPI communicator, such as the example shown in Figure 6. Finally, our static phase will detect any potential deadlock that occurs as a result of not calling the MPI collective operation by all processes in the MPI communicator.
In our dynamic phase, ACC_TEST will test the MPI_Bcast during runtime to avoid any deadlock. Because of the behavior of MPI_Bcast in the case of deadlock, ACC_TEST will use inserted statements as shown in Figure 7 to test the data exchange between the broadcasts, even in the case of deadlock without facing the effect of deadlock, which causes  the program to freeze without knowing the actual reasons behind it.
Our dynamic phase will use the annotation from our static analysis to replace each blocking broadcast (MPI_Bcast) with a non-blocking broadcast (MPI_Ibcast) to avoid any blocking behavior, and our dynamic testing will set a timer for waiting for the broadcast calls before testing their situations. Our dynamic testing will then use MPI_Test for each MPI broadcast call and extract the actual information, including the broadcast's source.

V. IMPLEMENATION AND TESTING
Many experiments have been conducted to test and simulate runtime errors that can occur in MPI, and their behaviour has been studied to understand them better and discover their causes and effects on the applications. Also, several experiments have been conducted to test our proposed solution and ensure ACC_TEST's ability to detect different types of errors in MPI, as well as covering errors from different types of MPI connections. To perform our experiments, we used an Intel(R) Core(TM) i7-7700HQ CPU (2.80GHz) with 16 GB main memory, with an NVIDIA GeForce GTX 1050 Mobile GPU, which has 768 NVIDIA CUDA cores, 4 GB GDDR5 RAM, and memory speed of 7 Gbps.
More than 40 MPI benchmarks from four different benchmark suites have been used to evaluate ACC_TEST, including NAS Parallel Benchmarks [32], OSU Micro-Benchmarks [33], EPCC [34], and mpiBench [35]. Table 1 shows some statistics from the chosen MPI benchmarks, including the number of MPI calls and each class of communication to include blocking, non-blocking point-topoint, and collective communications.
In this section, our implementation and testing for testing MPI-based programs will be explained to show our tool's ability to detect some errors related to MPI applications. We will show examples of some errors from each type of MPI communication.

A. POINT-TO-POINT BLOCKING COMMUNICATION DETECTION
Our static analysis will start by determining each MPI sendand-receive and their locations in the source code. Also, we store their information in a data structure as shown in Figure 8, and we also determine to which MPI rank they belong for determining the message direction. Our static analysis will create several tables for storing the related sendand-receive calls based on their type, storing their related information such as rank and type, communicator, tag, and line number. Then we compare this information to the static phase, searching for any missing or potential for race condition or deadlock as well as mismatching.
ACC_TEST will determine the MPI_Send/MPI_Recv pair, which will be used to detect any differences between the numbers of send-and-receives, as shown in Figure 9. This pairing will also examine the message tag to detect any unmatched message pairing. Our static analysis will check any message leaks (sending message without receive) or inconsistent types on the sender and receiver for the same message.
In the case of having more senders than receivers, which is considered as a lack of resources as shown in Figure 10, this can also lead to a potential race condition that will be detected by our dynamic tester to know the exact error. On the other hand, when the number of receivers is more than   the number of senders, this will lead to potential deadlock because they will be processes waiting to receive a message without receiving it; this will be annotated for further testing in our dynamic phase.
In addition, our static analysis will detect any mismatching in data types and message sizes, as shown in Figure 11. Additionally, our static analyzer will examine the relationship between OpenACC and MPI directives to determine the mismatch between the data movement between the OpenACC and MPI directives, not only between the MPI_Send and MPI_Recv calls, but also in the code; for example the programmer defines a variable as an INT in his code and passes this variable in an MPI call, as MPI_CHAR ACC_TEST will detect this error.  In addition, our static analysis will detect any mismatching in data types and message sizes, as shown in Figure 11. Additionally, our static analyzer will examine the relationship between OpenACC and MPI directives to determine the mismatch between the data movement between the OpenACC and MPI directives, not only between the MPI_Send and MPI_Recv calls, but also in the code; for example the programmer defines a variable as an INT in his code and passes this variable in an MPI call, as MPI_CHAR ACC_TEST will detect this error.
In terms of race condition detection, our static analysis will detect any case of several messages sent to the same destination with the same tag number, which can cause a race condition. Figure 12 shows an error message indicating potential race condition, and further detection by our dynamic phase is needed.
In terms of deadlock detection, our static phase detects actual and potential deadlock based on our static analysis of the targeted source code. One potential error situation is using the wildcard receive. Our static analysis will also detect any wildcard receive with any source or any tag and examine them to avoid any potential deadlock or race condition and annotate them to be detected by our dynamic phase. Figure 13 shows error messages related to deadlock detection in our static phase. Figure 14 also shows a potential deadlock due to wildcard receives that need further investigation by our dynamic tester to determine the exact error type, including deadlock or race condition based on the execution environment and the source code analysis. Another case of potential deadlock can be caused by data exchange between two processes, which can    be detected by our static analysis in Figure 15, and this error will be annotated to be tested by our dynamic tester.
Another case of point-to-point blocking communication is the MPI_Sendrecv calls, which will be examined and analyzed just like the previous MPI_Send and MPI_Recv calls. Figure 16 displays the information collection for the MPI_Sendrecv calls. The error detection will be conducted as described in the previous MPI calls because they display the same behavior but with different structures. An example of detected error in MPI_Sendrecv is shown in Figure 17.
This is for detecting deadlock in point-to-point blocking communication, as we explained previously. Figure 18 shows    the instrumented inserted code used for testing the deadlock in the point-to-point blocking communications for each MPI_Recv, and Figure 19 displays the related error message.
In the case of race condition detection, when all calls arrive, ACC_TEST will compare the actual message exchange with the information from our static analysis to detect any potential race condition. Figure 20 shows the actual information from the dynamic tester and the process of comparing them with the static analyzer information, which will be shown in our  historical log file. In case of an error, the error message will be displayed in the dynamic error file, as shown in Figure 21.
Similarly, MPI_Sendrecv will be tested by dividing each MPI_Sendrecv into MPI_Send and MPI_Recv, and test them as we explained earlier. Figure 22 shows the instrumented inserted test code to be used by our dynamic tester.

B. POINT-TO-POINT NON-BLOCKING COMMUNICATION DETECTION
In this section, we will discuss how our testing tool will examine and detect runtime errors related to point-topoint non-blocking communication, including MPI_Isend and MPI_Irecv. Figure 23 shows the collective data from our static analysis for non-blocking communication, and Figure 24 shows the mismatching error message from our static tester for non-blocking communication. Figure 25 displays a lack of resources by having a sender without a receiver. Finally, the wildcard receive will be tested as the previous detection technique in the blocking communication.
Unlike blocking communication, non-blocking communication has an object called request, which is used to identify a communication operation and its properties. This feature must be detected by our static phase to avoid any potential error. Our static analysis will detect request lost, that is, if the same request variable is used in different MPI_Isend, MPI_Irecv in the same rank, this can cause request overwrite and should be detected before it can cause further error when it needs to be used for related operations, as displayed in Figure 26.
Also, non-blocking communication will cause a potential race condition, especially in the case of operations to be     completed before sending or receiving. Therefore, the MPI_Wait calls needed to be used for completing the non-blocking communication. Figure 27 shows deadlock detection by our static testing and indicates not having MPI_Wait call.
In terms of detecting errors in our dynamic phase, our dynamic testing detects the deadlock in the non-blocking point-to-point connection (MPI_Isend/MPI_Irecv) by adding MPI_Test before any MPI_Wait to avoid any program freeze  because in this case, deadlock will occur in the MPI_Wait call. We can also detect the race condition if we found Isend and Irecv without using MPI_Wait or MPI_Test because we cannot ensure the arrival order of the threads; therefore, any potential race condition message will be issued to the programmer. The insertion mechanism of detecting deadlock will be similar to that used in the point-to-point blocking communication shown in Figure 3. Figure 28 shows the instrumented inserted code in the case of having two MPI_Irecv.
Similar to our approach to detecting a race condition in the blocking communication, our dynamic tester of the nonblocking communication will also compare the actual message receiving information to that from our static analysis for detecting any potential race condition, as shown in Figure 4.

C. COLLECTIVE COMMUNICATION DETECTION
Our static phase will be responsible for collecting information needed to test and detect any runtime errors related to MPI collective communication codes, as shown in Figure 29. ACC_TEST will also lexically analyze and parse the targeted source code to ensure the correctness of the MPI calls, as well as detecting errors that can be resolved during our static analysis.  Finally, our static phase will detect any potential deadlock that occurs as a result of not calling the MPI collective operation by all processes in the MPI communicator; an example of the error message for detecting this error is shown in Figure 30.
As we explained in Section 4, our dynamic phase will use the annotation from our static analysis to replace each blocking broadcast (MPI_Bcast) with a no-blocking broadcast (MPI_Ibcast). Figure 31 demonstrates an example of the instrumented inserted test code for detecting errors in the MPI collective communications, and Figure 32 shows an MPI collective communication deadlock error detected in our dynamic phase. Table 2 demonstrates our hybrid testing tool's ability to detect MPI errors based on their types, similar to the approach used to evaluate ACC_TEST's ability to detect OpenACC errors. We noticed in Table 2 that ACC_TEST tried to detect errors by our static approach as much as possible to decrease overhead from the dynamic testing approach and therefore enhanced our testing performance. However, race condition and deadlock are partially detected by our static testing   approach and need further investigation by our dynamic testing approach due to their behavior, and they were affected by the execution environment and sequence. As a result, these errors were detected using our hybrid testing approach. Figure 33 shows the size overhead that resulted from the insertion mechanism for executing the dynamic testing to detect runtime errors that cannot be detected by the static approach by using Equation 1, as shown at the bottom of this page, which measures size overhead. Also, Figure 35 shows the testing time needed to conduct the static testing approach on the MPI-related program with our hybrid-testing tool. The average overhead in the number of lines added is 22%, and 29% size overhead in bytes, and the average testing time for the MPI-related program is 17 milliseconds.  As we noticed in the previous Figures 33 and 34, the benchmark PingPong has the largest overhead at 48%; that is because it has the largest number of MPI point-to-point blocking calls to be tested by our dynamic approach. Based on our results, the range of overheads size varies based on the behavior of the insertion statements.

VI. DISSCUSSION AND EVALUATION
ACC_TEST minimizes the size overhead when testing MPI-related programs because we only add the insertion statements when needed and only on the MPI receiver side. We avoid adding unnecessary messaging (communications) to test the connection between senders and receivers to detect deadlock, unlike the research that suggested adding (MPI_Isend) before any send and (MPI_Irecv) before any receive [36]. Even if the connection seems to be deadlockfree after testing, the connection for any reason (nonprogrammatic fault) can cause the message not to arrive, which means the detected message itself has not been tested. MPI_Isend and MPI_Irecv can also cause deadlock or a race condition if there is any error or if the MPI_Wait has not been used, while using them in the insertion mechanism. Therefore, we choose to test the arrival of the detected message

Size Overhead =
Size with inserted test code−Size without inserted test code Size without inserted test code (1) without adding overhead or sending unnecessary messages, which will affect system performance and testing time.

VII. CONCLUSIONS AND FUTURE WORK
Despite the fact that there are many testing tools that target MPI, there is still much work to be done, primarily for covering more errors as well as reducing the execution and size overheads resulting from dynamic testing techniques. Our testing tool ACC_TEST has used hybrid testing techniques combining both static and dynamic techniques for detecting errors at lower cost and overheads. ACC_TEST can cover errors from each type of MPI communication because the testing tools previously mentioned in our related work did not cover some errors or only focused on race condition and deadlocks. Finally, ACC_TEST can be integrated for testing the dual-programming model MPI + X.
In our future work, we will create a hybrid testing tool for the dual-programming model MPI + OpenACC. Our new version of ACC_TEST will have the ability to detect run-time errors when using the hybrid programming model in a heterogeneous architecture.