A Survey on Hardware-Based Malware Detection Approaches

This paper delves into the dynamic landscape of computer security, where malware poses a paramount threat. Our focus is a riveting exploration of the recent and promising hardware-based malware detection approaches. Leveraging hardware performance counters and machine learning prowess, hardware-based malware detection approaches bring forth compelling advantages such as real-time detection, resilience to code variations, minimal performance overhead, protection disablement fortitude, and cost-effectiveness. Navigating through a generic hardware-based detection framework, we meticulously analyze the approach, unraveling the most common methods, algorithms, tools, and datasets that shape its contours. This survey is not only a resource for seasoned experts but also an inviting starting point for those venturing into the field of malware detection. However, challenges emerge in detecting malware based on hardware events. We struggle with the imperative of accuracy improvements and strategies to address the remaining classification errors. The discussion extends to crafting mixed hardware and software approaches for collaborative efficacy, essential enhancements in hardware monitoring units, and a better understanding of the correlation between hardware events and malware applications.


I. INTRODUCTION
M ALWARE, short for malicious software, poses a sig- nificant threat to computer security.It includes any code modification within a software system aimed at causing harm or disrupting the system's intended function [1], [2].Malware attacks cover spying, intrusive ads, email abuse, system damage, ransom demands, data release, slowdown, browser manipulation, and unauthorized access to sensitive information.Successful attacks lead to consequences that can be categorized into four groups: (i) unauthorized disclosure, where an authorized entity gains access to data; (ii) deception, where an authorized entity receives false data; (iii) disruption, causing interruptions in system services; and (iv) usurpation, resulting in unauthorized control of system services [3].Computing systems, including personal computers, mobile phones, Internet of Things (IoT), 5G devices, Cyber-Physical Systems (CPSs), and enterprise-wide systems, are vulnerable to malware.The complexity and size of modern systems, often indicated by a rising number of lines of code, amplify the threat.Factors such as numerous bugs, unsafe programming languages, improper configuration, and the ease of concealing malicious code create potential vulnerabilities.Additionally, the increased network connectivity expands the security risks, making all devices potential targets for attackers.For example, cybercrimes have seen a 70% increase in online fraud accomplished through mobile platforms, with a 30% rise in IoT malware in 2020 [4].
Globally, cybersecurity is paramount, with malware being a primary vehicle for cybercrimes.The World Economic Forum Global Risk Report 2023 ranks cyber insecurity eighth among top global risks, alongside threats like climate change and involuntary migration [5].Cybersecurity Ventures predicts a 15 percent annual growth in international cybercrime costs, reaching USD 8 trillion in 2023 and USD 10.5 trillion annually by 2025 [5].Global spending on cybersecurity products and services is expected to exceed USD 1.75 trillion from 2021 to 2025, growing 15 percent year-over-year [6].Ransomware, a prevalent malware threat, was predicted to cost USD 20 billion globally in 2021, with damage costs projected to exceed USD 265 billion annually by 2031 [5].
Researchers have developed various malware detection methods in response to these alarming statistics, leveraging Machine Learning (ML) and Deep Learning (DL) techniques.Surveys have evaluated and categorized research in this domain, focusing on specific Operating Systems (OSs), such as Windows, or mobile platforms like Android.Ye et al. [7] conducted a comprehensive survey on intelligent malware detection using data mining techniques, emphasizing the importance of Feature Extraction (FE) and algorithm selection.Subsequently, Ucci et al. [8] provided an overview of machine learning-based malware analysis, focusing on analysis objectives, FE, and ML algorithms, albeit limited to Portable Executable (PE) files.Gibert et al. [9] systematically reviewed ML and DL techniques for Windows malware detection, comparing input features, classification algorithms, and dataset characteristics.Similarly, Qiu et al. [10] and Liu et al. [11] addressed deep Android malware detection, emphasizing supervised classification using Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) architectures.Catal et al. [12] conducted an extensive literature review on DL techniques for mobile malware detection, highlighting the prevalence of MLP and CNN architectures, with a focus on supervised learning and static features.Furthermore, Deldar et al. [13] proposed a survey on DL techniques for zero-day malware detection, targeting features extracted at the software level to address emerging threats.
In the early 2010s, researchers initially proposed the idea of Hardware-Supported Malware Detection (HMD) [14], [15].HMD involves dynamically analyzing microarchitecture events in a processor using ML algorithms to differentiate between benign applications and malware.The shift towards HMD is justified because of the potential of enhanced security by leveraging robust hardware monitoring infrastructures.This provides a more robust defense against sophisticated attacks that may exploit vulnerabilities in software-based approaches.Specifically, hardware features reflect phase behavior in the underlying hardware, as observed in prior studies [16], [17].These phases often correspond to time-behavioral patterns in micro-architectural events, which vary significantly between programs, enabling the distinction between malicious and benign applications.Additionally, these hardware-based approaches address the zero-day issue, as demonstrated in [18].To the best of our knowledge, a comprehensive overview of HMD methods is still missing.This paper tries to cover this gap.
The structure is as follows: Section II covers the basics of malware, serving as a foundation for understanding the field.Section III presents a comprehensive overview of software and hardware-based malware detection solutions, with a detailed discussion of their strengths and weaknesses.Section IV delves into crucial aspects of hardware-based detection.Lastly, Section VI provides conclusions and outlines research challenges.

II. MALWARE FUNDAMENTALS
Categorizing malware is difficult because of its growing complexity and diverse properties.Yet, creating a malware taxonomy provides valuable insights into understanding it better.Before exploring the fundamentals of malware operation, let us define a set of keywords commonly used to describe different malware categories [1], [19]: • Virus: malicious code with the capability of inserting itself into other programs; • Worm: malicious code that propagates similarly to viruses but does not require a target software to replicate, often exploiting connectivity such as emails; • Trojan horse: malicious code that masquerades as a useful program; • Spyware: malicious code secretly installed into an information system to transmit private user data to an external entity; • Adware: malicious code that displays computer advertisements, primarily aiming for financial benefits; • Ransomware: malicious code that denies access to a user's data, usually by encrypting it until a ransom is paid; • Backdoor: malicious code that opens systems to external entities by subverting local security policies to allow remote access and control over a network; • Keylogger: malicious code designed to record keystrokes, used to obtain passwords or encryption keys to bypass security measures; • Botnet: a network of infected computers controlled by a remote criminal; • Rootkit: malicious application attackers use to conceal their activities and maintain control over a host.Organizations like NIST [20] and ENISA [21] recognize these malware types.In literature, three common properties describe malware: (i) propagation method, categorizing based on spread and purpose; (ii) concealment strategy, focusing on hiding tactics against users and detection; and (iii) data structure manipulation, dealing with software vulnerability exploitation.Table 6 organizes malware based on these categories.
Regarding concealment strategy, malware can be categorized into two main groups: (i) no concealment and (ii) stealthy malware [22]- [24].No concealed malicious code lacks techniques to hide itself, making it easy to detect.However, as shown in Table 1, only a small subset of malware does not employ concealment.File infectors like traditional viruses or worms may not heavily focus on concealment, spreading by attaching to executable files.Adware may not invest heavily in hiding and may rely on user interactions.Similarly, if achieved without sophisticated evasion, simple trojans may prioritize their primary goal over concealment.
Conversely, stealthy malware is a general term for all kinds of malicious code capable of hiding from users and detection mechanisms [25], [26].Its primary purpose is to remain undetected for an extended period in the computing system, allowing compromising computers and stealing information before a suitable detection mechanism can be deployed to protect against it.In general, the concealment actions aim to hide the malware's trails or code.Stealthy malware may employ several techniques: • Encryption/obfuscation: the oldest and simplest technique consists of a decryptor and an encrypted main body.When the infected file runs, the decryptor recovers the main body.The malware may use a different key for each infection to hide its signature, making the encrypted part unique.The decryptor small size compared to the main body reduces detection probability.Encryption complexity ranges from basic operations to strong encryption methods [22], [23], [27]; • Oligomorphism and polymorphism: the encryption technique limitation lies in the constant decryptor across exploitations, enabling detection based on code patterns.Oligomorphism employs a small set of decryptors, using a different one for each infection.Polymorphism, similar but with theoretically infinite decryptor variations, relies on obfuscation methods like dead-code insertion and register reassignment for distinct decryptor creation [22], [23], [28], [29]; • Metamorphism: the binary sequence is altered by making a new malware version for each new infection through a mutation engine.The mutation engine uses code transforming and obfuscation to change the malicious code [22], [23], [30].Several classes of software vulnerabilities can be explored to perform security attacks.This paper focuses on the prevalent memory errors enabling memory corruption for security attacks [31], which lead to two main exploit categories: control-flow attacks and data-only attacks.
Control-flow attacks are common, easy to construct, and demand minimal application-specific knowledge.They exploit vulnerabilities like buffer overflows or injection attacks to redirect the program's execution flow, enabling arbitrary code execution [15], [32]- [35].Techniques such as code injection [36], Return-Oriented Programming (ROP) [37], or Jump-Oriented Programming (JOP) [38] divert execution to specific memory locations housing malicious code, bypassing standard security measures.
In contrast, data-only attacks are rarer, subtler, and require advanced knowledge of program semantics.They manipulate critical data while maintaining a valid control flow, compromising target programs without injecting additional code.These attacks alter essential data elements, such as identification or configuration data, influencing target application behaviors during runtime [39].

III. OVERVIEW OF MALWARE DETECTION
Malware detection involves determining whether a given program exhibits malicious intent.Figure 1 offers an overview of contemporary solutions for malware detection, categorized into two main groups: software-based and hardware-based approaches.This division is rooted in differing observation points within the system stack and different detection methodologies.Recent advancements, as underscored by [13] and [18], increasingly rely on ML or Artificial Intelligence (AI) techniques to facilitate detection.
This section presents an overview of software-based and hardware-based malware detection (sub-sections III-B and III-C), starting by reviewing the metrics used for evaluating the performance and efficiency of the detectors (sub-section III-A).

A. EVALUATION METRICS
Before delving into specific malware detection techniques, readers need to consider the evaluation metrics used to assess their effectiveness.These metrics serve as quality indicators, pivotal in determining the adoption of a technique on a commercial scale.Since malware detection is a classification problem, the quality evaluation of the detectors is based on the standard classification metrics.They can be grouped as performance metrics and efficiency metrics.Performance is the degree to which a system or component accomplishes its designated functions within given constraints, i.e., correctly detects the malware.Efficiency is the degree to which a system or component performs its specified functions with minimum consumption of resources [41].
The primary evaluation tool for performance is the confusion matrix.This matrix is fundamental in ML and classification tasks, summarizing results in a tabular form.It comprises four elements (see Table 2

Actual Negative
TNs FPs

Actual Positive
FNs TPs Such a matrix allows for the definition of more descriptive metrics, and Table 3 summarizes the most common ones [7].The accuracy summarizes the overall correctness of the classification model by expressing the number of correct predictions, making it one of the most widely used metrics.In scenarios where it is crucial to avoid incorrect malware predictions, precision provides an accurate measure of the TPs among all positive predictions.Shifting the evaluation focus to ensure no malware passes unnoticed, the True Positive Rate (TPR) (also known as Recall or Sensitivity) weighs TPs against all positive samples.It has two counterparts: (i) the False Positive Rate (FPR), representing the probability of a TP being missed, and (ii) the specificity, also known as True Negative Rate (TNR), indicating the probability of an actual negative (TN) being correctly classified.Balancing Precision and Recall is often essential, and the evaluation can be accomplished using the F1-score, which represents their harmonic mean.
Eventually, the Receiver Operating Characteristic (ROC) curve offers a visual perspective to performance evaluation.It plots the TPR against the FPR on a 2D graph, enabling a visual comparison of different models and capturing multiple classification aspects by inspecting the Area Under the Curve (AUC).In simple terms, the larger the AUC, the better the model.AUC is closely related to the robustness of the classifier, indicating how effectively the classifier distinguishes between malware and benign applications.

Matrix Expression
Accuracy (A) According to [41], efficiency is related to the resources used for malware detection.Many metrics can be used to evaluate the efficiency [42], but in the malware detection field, latency, power consumption, and hardware cost are the main interest: • Latency is the time between collecting all features analyzed by the malware detector and concluding its detection.A low latency is vital for run-time detection of malware that acts in a short interval of time; • Power consumption indicates the energy the detector consumes per unit of time.Two factors primarily impact the power consumption of the detector: the hardware that implements or where the classifier runs and the detection algorithm (those with higher computing processing tend to consume more); • Hardware cost indicates the monetary cost of building the detection system.This is important from both an industry and a research perspective to dictate whether a system is financially viable.The main parameter to evaluate the hardware cost is the chip area (usually reported in square millimeters) in conjunction with the process technology (for example, 45 nm).Sometimes, the amount of memory is also used to evaluate the hardware cost.

B. SOFTWARE-BASED MALWARE DETECTION
Software-based protection relies on specific software running in the system and analyzing the potential malware presence using different approaches.Authors in [40] and [13] proposed a very comprehensive selection of them: • Signature-based: the signature is a unique malware feature extracted from structural properties (e.g., code sequences) or run-time properties [43].The detection works as follows: features extracted from the executable generate a signature stored in a signature database.
When the system is required to classify a potential threat, the detector extracts the related features and computes the signature, comparing it with signatures on the database.The potential threat is marked as malware if a hit occurs during the comparison.This approach is widely used within commercial antivirus and does not allow zero-day detection [13]; • Software behavior analysis: this approach is based on dynamic characteristics from run-time executions of programs [40].Dynamic characteristics might include processor and memory information, kernel usage (system calls), file system activities, and network communications.They are extracted with monitoring tools, a dataset is created, and a ML detector distinguishes malicious and harmless applications.Software behavior analysis can detect malware variants often missed by the signature-based approach; • Heuristic-based detection: this method relies on experiences and techniques, including rules and ML.The process involves two phases: first, the detector system is trained with normal and abnormal data to identify relevant characteristics.In the second phase, known as monitoring or detection, the trained detector intelligently assesses new samples to make decisions [44]; • Deep Learning: this falls under the umbrella of ML algorithms, enabling computational models with multiple layers to extract more advanced features from raw input [13].The FE aspect combines elements from previous approaches, making it a novel method.Additionally, it proves highly effective for zero-day detection, as the FE, employing multiple techniques, facilitates context adaptation and model updates, as highlighted in [13].
Regarding software-based detection, it is also crucial to distinguish among the types of analysis carried out to extract the required information.According to [43], three ways are possible: (i) via static analysis, using syntax or structural properties of the program/process (e.g., code sequences), (ii) via dynamic analysis, extracting the necessary data during or after program execution, leveraging run-time information, and (iii) via hybrid analysis, combining the two previous.Selecting one of those also affects the expected latency of the detection.While a static analysis aims to detect the threat even before executing the malicious program, the other two might require an entire execution before detection.

C. HARDWARE-BASED MALWARE DETECTION
Hardware-based detection, or HMD, addresses the performance and computational overhead challenges of traditional malware detection techniques by utilizing low-level microarchitectural features of running applications on the target system [18].The concept that malware can be identified through micro-architecture hardware events stems from the observation that programs exhibit phase behaviors [16], [17].Program phases, which vary significantly between programs, manifest as patterns in architectural and micro-architectural events.This variation enables the discrimination of programs based on their time-behavioral hardware event patterns, facilitating the differentiation between malicious and benign applications.In 2011, Malone et al. [14] demonstrated the feasibility of detecting program code modifications based on the deviation of hardware events.In 2013, Demme et al. [15] showed the feasibility of detecting Android malware and Linux rootkits using hardware events values analyzed by a ML classifier.
The idea of HMD is to perform dynamic analysis leveraging micro-architecture hardware events monitored by most modern microprocessors using Hardware Performance Counters (HPCs) [45].Various ML techniques can be applied to the HPCs collected data [18].One of the primary advantages of HMD is that the analysis relies on real-time hardware collected data, enabling fast ML classification; a few milliseconds suffice to identify threats.This translates to low latency, enabling runtime detection [46]- [48].Unlike static technique analysis employed by most software-based antivirus solutions, which can be easily subverted by stealthy malware using concealment techniques, dynamic analysis via hardwarebased approaches facilitates the detection of code variants and unknown malware [15].Moreover, while software-based detection tools are software-based and susceptible to bugs or oversights in the underlying system software, hardwarebased detection with secure hardware significantly reduces the possibility of malware subverting protection mechanisms [15], [34].
On the performance front, the dynamic analysis conducted by software-based detection necessitates sophisticated computation, often at the expense of significant performance overhead.The increasing software size further complicates dynamic software analysis [15].Conversely, in the hardwarebased approach, understanding software behavior provided by micro-architectural events simplifies the analysis, reducing computational processing efforts and the cost of hardware-based detection [15], [49].
deHowever, while the HPCs demonstrate their ability to track behavioral deviations [50]- [52], their effectiveness remains open to discussion.On the positive side, [15], [34] demonstrated detector performance using this approach, reporting accuracy consistently exceeding 80%, deeming it effective.Conversely, [53] and [54] conducted experiments challenging the effectiveness of hardware-based detection.They argued that reported detection capabilities often stem from tiny sample sizes and experimental setups favoring the detection mechanism unrealistically.Even if accurate, an 80% accuracy is insufficient in scenarios with thousands of executables, risking many benign applications being misclassified as malware.They also questioned the causal link between low-level micro-architectural events and high-level software behavior.Lastly, they illustrated the hardware-based detector inability to distinguish ransomware embedded in a benign application like Notepad++.In a recent contribution, [55] acknowledged the absence of a perfect malware detector and argued that hardware-based detection is only effective for specific malware types.In particular, [55] proposes its effectiveness in identifying attacks exploiting architectural side-effects, citing examples such as RowHammer [56], [57] (detectable through excessive cache flushes [58]), ROP attacks [37] (identified by an abundance of instruction misses [59]), and DirtyCoW [60] (detectable through heightened paging activity).The authors also emphasized the necessity for a maliciousness theory to enhance the understanding of malware threats and assess proposed defenses.
While HPCs have been used in the past for safety and security, performance analysis, and optimization [61]- [63], it is well-known that they may suffer from inconsistency in implementation, leading to non-determinism and overcounting [64].Das et al. highlighted some of these HPC challenges in security [65].Recent studies address HPC discrepancies, propose methodologies, analyze resilience, and compare HPCs in various machines [66]- [69].Given that HPCs are hardware-based protections, detectors may be designed for specific devices with characteristics defined by the architecture and manufacturer.For instance, processors may track different numbers of events simultaneously, and discrepancies in instruction counting methods are possible [61].These factors underscore the need for malware detection applications to abstract software from the hardware level.
Among the inconsistencies and limitations of HPCs, some countermeasures can be deployed to stabilize the generated data [61], [65].They include per-process filtering of events (applied by saving and restoring the counter values at context switches), proper interrupt handling, and minimizing the impact of non-deterministic events.In general, all works acknowledge that the evolution and improvement of the processors hardware monitoring units also tend to reduce this issue.Eventually, the classification task built on top of the HPC data is commonly a ML one.This frequently leads to techniques that increase the complexity of such algorithms, like ensemble learning and time series or even Deep Neural Networks (DNNs) [18].

IV. HARDWARE-BASED MALWARE DETECTION BASICS
This section focuses on HMD techniques, outlining their key components.

A. HARDWARE EVENTS AND PERFORMANCE COUNTERS
Modern processors have units to monitor hardware events.In 2002, Sprunt [70] published a seminal paper on the basics of Performance Monitoring Units (PMUs).These units were developed to collect data about the performance of applications, operating systems, and processors and to help programmers tune algorithms and codes.Software dynamically adjusted to resource utilization would also benefit from the information collected.The proven advantages of utilizing the PMUs, the continuous improvements of these units, and their constant spreading among different devices have led to their leverage for safety and security purposes [50], [52], [62], [63].
Nowadays, PMUs can monitor several hardware events (see Figure 2).Complex devices like high-end processors have hundreds of events to monitor.These events include retired instructions (branches, load, store, etc.), branch predictions, cache hits and misses, floating-point operations, hardware interrupts, elapsed core clock ticks, core frequency, and temperature.However, to minimize hardware complexity, only a few HPCs (e.g., 2 to 8 in high-end processors) are generally available, thus limiting the number of parallel events that can be monitored.Each HPC has an event detector and an associated counter [71].

B. HARDWARE-BASED DETECTION FRAMEWORK
A generic framework can be a guiding structure to facilitate the implementation of HMD, as illustrated in Figure 3.The framework leverages the existing PMU within the processor and consists of two primary components: (i) data collection and preprocessing and (ii) malware detection.This section provides a detailed overview of the implementation process.
Data collection involves FE and Feature Selection (FS) [72], [73].FE captures and stores HPCs in a vector space, enabling the FS to select a subset that efficiently describes the input data while minimizing noise and irrelevant variables, ensuring optimal prediction results.FE can occur in the time or event domain [70].In the time-based domain, the application execution is periodically interrupted to record HPC values.Conversely, the event domain triggers interruptions based on specific events or a set number of executed instructions rather than regular intervals.In terms of strategies to perform FE, we envision four alternatives: (i) instrument the source code with the employment of a library, like PAPI [74]; (ii) develop of a proprietary kernel module or driver, as performed in [34]; (iii) use of an available utility that performs tasks mainly in the OS kernel, like PERF [75]; and (iv) use of a microarchitectural simulator to model the processor as it executes the application, like gem5 [76] and GVSoC [77].
During FE, the sampling strategy is crucial.In the timebased domain, parameters such as period, frequency, or number of cycles determine when HPCs are sampled.In the event-based domain, sampling depends on the number of event or instruction occurrences.The chosen FE strategy influences these definitions.A proprietary kernel module or driver allows programmers to choose between time-based or event-based domains, set parameters for sampling triggering, and specify values.However, configurations are limited when libraries like PAPI and PERF are used.Regarding sampling values, in time-based sampling, there is no fixed ideal period or frequency, varying based on the experiment and goal.Hardware-based detection experiments typically use periods in the order of milliseconds or seconds.Striking a balance between low and high sampling frequencies is essential, considering the trade-off between computational processing, data quantity, and system effects.FS offers multiple advantages, including addressing the Curse of Dimensionality in ML [78], enhancing data understanding, reducing computation requirements, and improving predictor performance.Filter-based algorithms dominate the FS in the HMD field, ranking features based on a scoring criterion, using a threshold for variable selection.They are valued for simplicity and practical application success, focusing on the relevancy of features.Prominent methods include Principal Component Analysis (PCA) (used by [47], [53], [79], [80]), Fisher Score [81] (used by [34], [51]), Pearson Correlation Coefficient [82] (used by [46]- [48], [79], [80]) and Information Gain (Mutual Information) [83] (used by [84], [85]).The Scikit-learn [86] library for the Python and Weka [87] are tools frequently used in the HMD field for FS.
Since the number of events that can be potentially monitored exceeds the available HPCs, some studies (for example, [14], [32], [48]) also perform a preliminary manual FS before data collection, thus reducing the number of software executions required to collect data.The selection is based on architectural and micro-architectural knowledge and other studies.
Eventually, in HMD, ML algorithms play a crucial role.Supervised and unsupervised learning techniques are employed in hardware-based malware detection.While for supervised detection, both benign and malignant samples, adequately annotated, are necessary, in unsupervised malware detection, the classifier is trained only with benign applications to perform anomaly detection [88].Unsupervised detection has two exciting advantages: (i) it does not require a malware dataset for training, and (ii) the classifier can detect zero-day malware [18].On the other side, unsupervised algorithms are complex, requiring more sophisticated analysis and resulting in complex hardware implementations.
Eventually, a crucial consideration is the trade-off between monitoring more events for better application characterization and detector performance and the impact on runtime applicability.Some studies used many events, exceeding available HPCs, necessitating multiple application runs [15], [84], [89].This trade-off is further addressed in ML solutions discussed in Section V-C.

V. HARDWARE-BASED DETECTION ASSESSMENT
The following sections analyze the performance and efficiency of the state-of-the-art in HMD and explore ML techniques to enhance detector performance.

A. PERFORMANCE
Tables 4 and 5 provide a comprehensive overview of the literature contributions in the field, aiming to facilitate fair comparisons by presenting the best-case results in Table 4. Metrics were directly sourced from the paper's text whenever feasible, with manual extraction from reported ROC curves employed only when necessary.The "Classification" column denotes the classification algorithm associated with the best result, with the Weka implementation serving as a reference.Conversely, Table 5 outlines, for each contribution in Table 4, the range of considered scenarios in terms of malware, classifiers, and system characteristics.The values in Table 4 underscore the efficacy of HMD in supporting malware detection and highlight the overall high quality of the findings.
Among all contributions reported in Table 4, authors in [93] showcase the effectiveness of HMD on real scenarios: DARPA Rapid Attack Detection, Isolation and Characterization Systems (RADICS), Intel Threat Detection Technology (TDT), and Microsoft Defender.This is a tangible exploitation of HMD into actual products.Still, using a single type of classifier (i.e., SVM) leaves room for research and improvements.
As most of the current works on HMD rely on ML classifiers, the analysis conducted by Patel et al. [48], summarized in Table 6, is particularly interesting.The authors thoroughly analyze eleven ML classification algorithms (based on Weka [87] implementations).The goal was to understand the tradeoffs between the design parameters offered by the algorithms.The chosen metric to evaluate performance was accuracy.The dataset used for training and testing the algorithms was extracted using the PERF tool in intervals of 10 ms executed in an Intel Haswell Core i5-4590 processor running Ubuntu 14.04 with Linux kernel 4.4.The baseline of benign application comprises the Mibench benchmark suite [94], Linux system programs, browsers, text editors, and word processors.The malware came from the VirusTotal dataset.Since the HPCs available in an Intel architecture are considerable, the accuracy of ML algorithms covers different numbers (i.e., 32, 8, 4, 2, and 1) of hardware events.Table 6 reports the accuracy for 4 hardware events, a reasonable quantity for concurrent monitoring in most modern processors, even in embedded scenarios [50].JRIP (rule-based) presented the top accuracy, followed by four classifiers with the same toptwo accuracy: J48 (decision-tree), OneR and PART (rulebased), and SGD.In this case, most classifiers have accuracy above 80%.Another interesting observation is that reducing the hardware events below four significantly impacts the performance of most classifiers.
Similar findings are reported in Torres and Liu [51].While the authors concentrated on a particular malware subclass (data-only exploits from [92]), they implemented two different experiments on different classifiers, distinguishing between using the complete set of 50 features or a smaller set of 6 features.The findings report a very high accuracy on the complete set of features (as seen on the first of the two rows dedicated to the paper in Table 4) and a degradation when only a subset is used.

B. EFFICIENCY
Alongside the detection quality, the HMD aims to reduce the detectors cost in terms of resources.As the data required for the classification come from the hardware layer of the system stack, most studies evaluate FPGA-based implementations of ML classifiers, providing measures for the power consumption and the area as the goal is to understand the trade-offs between the design parameters offered by the algorithms.When the classifier is software-based, the evaluation usually includes the latency, avoiding further monitoring of other resources.Unfortunately, as seen in Table 4, not all works report the latency of the detection or, more in general, the costs of it.Generally, whenever the detection is performed at the software level, the latency is less than 1 ms.At the same time, more optimized hardware implementations can scale down to tens or hundreds of ns.
As reported in the previous section, the work from Patel et al. [48] covered a thorough analysis and, for this reason, is undoubtedly an excellent candidate to show the efficiency of the methodology.For hardware implementation, authors used the Xilinx Virtex 7 FPGA, implemented Weka models in C code, and used the Xilinx High-Level Synthesis (HLS) compiler to generate the final bitstream.The latency was evaluated both in software and hardware implementations.Authors implemented the classification algorithms in software at the OS kernel level, which includes the time to read the HPC and execute the classifiers.Eventually, the Intel Turbo Boost technology was disabled, as it might introduce errors in the time measurement, and the CPU governor was operating at a constant frequency of 800 MHz.The IP cores with the algorithms were synthesized in Vivado to estimate the power consumption, considering a 100 MHz clock.Power estimation contains both static power and dynamic power consumption of digital logic.
Values in Table 6 show the considerable difference between the latencies in software and hardware implementations.Software implementations have latencies almost in the order of milliseconds (ranging from 0.624ms to 0.870ms, best and worst cases).In contrast, hardware implementations are in the order of nanoseconds (ranging, in this case, from 10ns to 3020ns).The authors underlined that these slow profiles displayed by classifiers in the kernel space are three orders bigger than several malware executions (ranging in microseconds).Other findings related to latency are crucial to highlight.In software implementations, the latency for reading the HPC is negligible when monitoring a single core but may increase significantly when monitoring multiple cores.Moreover, the more HPCs to read, the longer it takes.Concerning the classification algorithms, BayesNet (Bayesian network), PART (rule-based), and SimpleLogistic (logistic regression) showed the lowest latency values when implemented in software.Conversely, none of these three are on the list of the top three low latencies in hardware.NaiveBayes (Bayesian network), MLP (ANN), and J48 (decision tree) are the three best hardware implementations.This paradox demonstrates the uncorrelation between the algorithms' latencies when comparing implementations at the kernel space and hardware.TABLE 4. Summary of best-case performance from main studies in the hardware-based malware detection approach.# HPCs column refers to the number of hardware events the classifiers consider.Classification algorithm labels are based on Weka implementations used in the referenced studies.Evaluation metrics as defined in Section III-A: A is Accuracy, P is Precision, S is Specificity, and F1 is the F1-Score.

Year
Ref

C. MACHINE LEARNING TECHNIQUES CONSIDERATIONS
Recent studies have explored various ML methods to enhance the performance of HMD detection approaches, especially in the last five years.These techniques aim to overcome the challenge of limited application characterization due to the concurrent capacity of PMUs to monitor hardware events.While these methods show performance improvements, they often introduce increased complexity in classifiers, resulting in reduced efficiency, i.e., higher power consumption and increased area requirements.This section discusses ensemble learning, specialization, adaptive detection, and time series ML techniques in HMD.
In ensemble learning, multiple ML algorithms are trained separately to create a classifier, combining their results to improve decision accuracy [95].In HMD, ensemble classifiers leverage the characteristics of individual algorithms to detect various types of malware while minimizing hardware events for runtime detection [32], [46], [47].However, the performance gains come with increased complexity and efficiency overhead [46], [79].Sayadi et al. [46] assessed the efficiency impact of ensemble learning in a malware detector on Xilinx Virtex 7 FPGA.Significant latency increases were observed when comparing a general classifier with 8 HPCs to a Boosted classifier [96] with 4 HPCs.When Boosted, the general MLP algorithm passed from a latency of 3020ns to a latency of 5910ns.OneR increased from 10ns to 700ns, and J48 increased from 90ns to 670ns.In terms of hardware cost, the largest area increases were observed in OneR (from 2.1% to 5.1%), JRIP (from 2.5% to 5.3%), and BayesNet (from 11.5% to 13.6%).Conversely, J48, REPTree, and MLP showed smaller area increases.The findings highlight substantial overhead in both latency and hardware costs.
Another interesting ML technique is the specialization.Instead of training a single multi-class classifier able to recognize several malware categories, different classifiers are trained, each specialized in detecting a specific malware.Authors in [32] discuss and explore specialized detectors in HMD.They used a logistic regression-based classifier for each malware class.As a result, the proposed detectors reduced the false positive rate by more than half compared to a single detector while increasing the detection rate.The authors proposed a two-level detector in the same paper, mixing a first level based on the hardware detection approach and a second level based on the software detection approach.The hardware detector was based on specialized ensemble techniques.The latency of this scheme was compared with malware detection purely based on software methods.As a result, they reported average latency reduced to 1/6.6 when the fraction of malware is low and latency reduced to 1/3.1 when 20% of the programs are malware.
In 2019, Sayadi et al. [47] introduced a specialized two- Adaptive detection was proposed by Gao et al. [79] to optimize the performance versus cost.It targets higher or similar performance as ensemble learning, with a reduced cost.The technique leverages the concept that the ML algorithm employed in the detector strongly correlates both the nature of the scrutinized malware and the overall performance metric.Adaptive detection involves a dynamic framework that assesses all underlying ML algorithms in real time, opting for the optimal classifier to identify malicious patterns effectively.The implementation encompasses two primary online stages: (i) algorithm selection and (ii) malware detection.Consequently, only the most efficient ML-based detector is employed to differentiate malware from the benign class, eliminating the need to acquire results from individual base detectors and enhancing overall efficiency.
In the adaptive detector proposed by Gao et al. [79], the algorithm selection step is done by a lightweight treebased decision-making algorithm that accurately selects the most efficient model for inference.As a result, the scheme showed up to a 94% detection rate while improving the costefficiency by more than 5X compared to existing ensemblebased malware detection methods.Eventually, time series classification is fundamental to understanding the key concept behind hardware-based malware detection.The intuition driving this technique stems from the program's phase behavior, transforming malware detection into a time series classification problem.In addressing this challenge, Sayadi et al., as outlined in [97] and [80], introduced a time series ML technique designed to identify stealthy malware in real time.In scenarios where attackers embed malicious files within benign programs on target hosts, executing both applications as a single thread, traditional signature-based antivirus tools falter.Embedded malware remains elusive even when the exact malware signature is in the detector database.The authors proposed a classifier based on a Fully Convolutional Neural Networks (FC-NNs) and exclusively utilized branch instructions as a lowlevel feature in their solution.The results demonstrated the efficacy of their technique, achieving a remarkable average detection performance of 94% with only one HPC feature, surpassing state-of-the-art detection methods.This enhanced performance, however, comes at a higher computational cost associated with employing a deep-learning-based solution.
While not explicitly implementing a time series technique, also [93] reports similar results on the Intel TDT use case.Although no specific numbers are provided, the paper compares the Fast Fourier Transform (FFT) counting traces of the branch instructions and branch misprediction events for the WannaCry ransomware, underlining the significant difference with or without the ransomware.

VI. CONCLUSIONS AND RESEARCH CHALLENGES
In summary, this paper provided a comprehensive overview of HMD field, with a detailed analysis of hardware-based detection, harnessing the power of HPCs and ML.The advantages of HMD include resilience to malware subverting the protection mechanism, adaptability to code variants and unknown malware, low complexity and overhead, potential for run-time detection, and cost reduction.
However, challenges persist in HMD.The detection accuracy is the most significant challenge as classifiers have a statistical nature.Thus, their results are not deterministic, and ongoing research aims to minimize errors by exploring complex classifiers.In cases where high accuracy is unattainable, a potential solution combines software and hardware-based detectors concurrently, with hardware as the primary defense.Moreover, ensuring consistency, accuracy, and standardization of hardware monitoring units (including HPCs) is crucial for trustworthiness.Chip manufacturers can contribute by designing appropriate modules and providing comprehensive documentation.The limited number of HPCs in mobile and IoT devices poses a feasibility challenge for this approach in these domains.Addressing these challenges will contribute to the continued advancement and effectiveness of HMD.

FIGURE 1 .
FIGURE 1. Overview of the contemporary solutions for malware detection.Elaborated by the authors based on [40].
): True Positives (TPs) represent instances where the model correctly predicts malware presence, True Negatives (TNs) indicate correct predictions of malware absence.In contrast, False Positives (FPs) and False Negatives (FNs) denote incorrect predictions of malware presence or absence, respectively.

FIGURE 2 .
FIGURE 2. Hardware events and performance counters in a processor.Elaborated by the author.

FIGURE 3 .
FIGURE 3. A generic hardware-based detection framework.Elaborated by the author.

TABLE 1 .
Malware categories based on propagation method, concealment strategy, and data structure manipulation.

TABLE 2 .
Confusion matrix for malware detection.

TABLE 3 .
Most common metrics for performance evaluation of classification.

TABLE 5 .
Reference studies including details on the full list of targets and classifications approaches tested and details on the reference systems.