Hardware Trojan Detection Using Controlled Circuit Aging

This paper reports a novel approach that uses transistor aging in an integrated circuit (IC) to detect hardware Trojans. When a transistor is aged, it results in delays along several paths of the IC. This increase in delay results in timing violations that reveal as timing errors at the output of the IC during its operation. We present experiments using aging-aware standard cell libraries to illustrate the usefulness of the technique in detecting hardware Trojans. Combining IC aging with over-clocking produces a pattern of bit errors at the IC output by the induced timing violations. We use machine learning to learn the bit error distribution at the output of a clean IC. We differentiate the divergence in the pattern of bit errors because of a Trojan in the IC from this baseline distribution. We simulate the golden IC and show robustness to IC-to-IC manufacturing variations. The approach is effective and can detect a Trojan even if we place it far off the critical paths. Results on benchmarks from the Trust-hub show a detection accuracy of $\geq$99%.


I. INTRODUCTION
Manufacturing of ICs is expensive and requires special fabrication equipment that becomes outdated in a short time.
To reduce the cost of IC manufacturing, this task is typically outsourced to offshore IC foundries. In a related trend, embedded systems source specialized intellectual property (IP) cores from different vendors. Design and assembly of the IP cores is done using third party computer aided design, integration, and test tools. As an IC design travels through the complicated supply chain, the IC could be corrupted at one of the stages. Example threats due to the corruption include passing off low quality ICs as good, infiltrating the supply chain with imitation ICs, and insertion of Hardware Trojans in ICs. Hardware Trojans, which are typically triggered by rare events, may alter function, deny service, or leak information. Such infected ICs affect the critical information systems in finance, military, and health care. Functional and structural tests to weed out manufacture-time defects are ineffective against Trojans for the following reasons: This work was supported in part by the Office of Naval Research under Grant N00014-18-1-2672. 1  kanad.basu@utdallas.edu 1) Structural tests produced by automatic test pattern generation (ATPG) may not detect a Trojan, since the behaviour of the Trojan may not be in the fault list [1]. 2) Functional tests do not uncover Trojans since they trigger on rare events. 3) A brute force application of all inputs does not scale.
For example, a 64-bit circuit will need 2 64 inputs. Reverse engineering can authenticate the IC but it does not guarantee that the unchecked ICs do not have Trojans [2].
There are different types of Trojans, depending on their functionality. Some leak information, some change function, and some discharge power, etc. Therefore, a robust approach is needed to detect Trojans. Prior work [3]- [7] has considered side channel and ATPG test pattern analysis. Power side channel fingerprinting is not a reliable method for detecting Trojans within a large circuit. In classic VLSI test, even if the test patterns cover all corner cases, they may not trigger a Trojan. For example, the trigger could be inputs applied in a particular sequence, which is extremely challenging to replicate. We explore aging for Trojan detection. In transistor aging, we do not need the Trojan to be triggered because the underlying aging mechanism will naturally occur during the circuit's operation. From results (see Section VII), aging can detect small Trojans (occupying 0.22% of the circuit) and even if it is about 4000 paths far from critical path.
In this work, transistor aging along with over-clocking is used to expose the Trojan effects on various circuit properties. When aging and over-clocking are jointly applied to an IC, they produce a pattern of bit errors at the output. The output bit error patterns for clean IC are used train a Support Vector Machine to learn the clean IC output pattern distribution to determine presence of Trojan at the test time. The efficacy of our approach is shown on gate-level simulations on Advanced Encryption Standard (AES) and Rivest-Shamir-Adleman (RSA) crypto circuits with different Trojans that show different challenges using aging-induced standard cell libraries. This method applies to any circuit but the crypto circuits are used for secure data transmission and hence are subject to attacks. The Trojans appear at different locations corresponding to the rank of the critical path. The experimental results show an accuracy of over 99% on all the circuits when considered with varying number of inputs.
The paper is organized as follows. Section II discusses prior work on Trojans. Section III describes transistor aging and its effects. Section IV overviews the creation of aging-aware standard cell libraries. Section V introduces our approach and Sections VI and VII outline the experimental setup and results. Section VIII draws the conclusions.

II. RELATED WORK A. Hardware Trojans
Globalization has led to a distributed IC manufacturing environment. The globally-distributed IC design cycle has led to a lot of vulnerabilities, including Hardware Trojans. A Hardware Trojan is a malicious modification to the circuit, which is unknown to the designer and can have consequences like incorrect functionality, loss of secret information, etc. Hardware Trojans are critical threats to military, finance, transportation and corporate or consumer electronics [8]. A Hardware Trojan has two parts -trigger and payload. Trigger signal activates the Trojan. Payload is the effect of the Trojan. A trigger is a signal in the circuit which is rarely activated. As a result, the payload is dormant during normal function of the circuit. Hence, the Trojan is difficult to detect. Trojans can be classified based on five attributes: insertion phase, abstraction level, location, trigger and payload [9], [10].
Trojans can be inserted in various stages of a design flow. Semiconductor companies use third party EDA tools, third party IPs (3PIPs), and untrusted foundries. Insertion of Trojans at various stages of EDA design flow has been demonstrated by [11]. Insertion of Trojans during High-level Synthesis was proposed by [12]. [13] designed a malicious processor by modifying the open source Leon processor. The Trojans allow a user to violate Operating System exceptions and execute a malicious firmware. Don't care states in a design were utilized to trigger Hardware Trojans by [14]. [15] triggers Trojans by exploiting silicon wear-out.

B. Trojan Detection
Trojan detection methods are usually applied either at the design stage or post-manufacturing stage to verify them [16]. Pre-silicon detection approaches are used to validate 3PIP cores before integrating them to a design. Pre-silicon verification is performed using functional validation, structural analysis or formal verification. Functional validation methods use functional tests to activate a Trojan and validate the response against a "golden" Trojan-free circuit response. Since Trojan triggers are rarely turned on, researchers have developed test generation techniques that can activate those rare triggers [3], [4]. However, functional tests fail to detect a non-functional Trojan which does not alter the function of the circuit and transmits secret data. Structural analysis involves identifying redundant statements and circuits in the HDL code [15], [17].
Post-silicon Trojan detection involves either destructive or non-destructive testing. Destructive testing implies reverseengineering and de-layering to detect the presence of malicious circuitry [16]. Although this approach is costly, time consuming and renders the IC useless, it guarantees Trojan detection in the single IC. Non-destructive methods use functional testing (similar to pre-silicon Trojan detection) and side-channel analysis. The value of a side-channel parameter will differ between a Trojan activated circuit and a "golden" circuit. In [18], path delay information of the IC at each output is considered to generate a path delay fingerprint. The path delay fingerprints help distinguish between clean IC and Trojaned IC. There are millions of paths in ICs nowadays, it is not practical to measure the delay for all the paths. Also, the method does not work well for the Trojans that leak information through side channels. Temperature tracking using on-chip thermal sensors during run-time is an option [6], [19], [20]. However, if the Trojan activity lasts for a short duration, the slow rate of thermal variation cannot detect the Trojan. Furthermore, adversary can place the Trojan in an active area of the chip. This makes the Trojan detection difficult. In [5], leakage current measurement detects Trojans since additional gates consume extra leakage power. In [7], Picosecond Imaging Circuit Analysis (PICA) is used to measure optical emissions of the ICs and compare them with a trusted emission image of a "golden" IC. In both the methods (i.e., power and radiation), access to "golden IC" is required and as the feature size of IC shrinks, the deviation from "golden" IC due to process variations become pronounced, compensating for the deviations introduced by the Trojans. Our method does not need a "golden IC". We simulate the circuit for the golden IC and show robustness of the technique to IC-to-IC manufacturing variations. The approach is aging-based, non-destructive and can detect a Trojan even if it is dormant, removing the above limitations.

III. TRANSISTOR AGING
Semiconductor technology has advanced to the nanometer regime wherein electric fields are stronger with every new generation to allow the transistor to switch faster. When an IC is turned ON, the Bias Temperature Instability (BTI) effect comes into play and increases the threshold voltage (∆V th ) of transistors. The magnitude of (∆V th ) depends on the supply voltage (V dd ). High V dd causes a large increase in V th , thus increasing the delay of paths of circuit leading to a noticeable performance degradation. Modern ICs use fast voltage regulator (switching between voltage levels in less than a micro second) to implement effective power management schemes in which the overhead of voltage scaling is minimized. The high frequency of voltage switching does not allow for degradation accumulated at high V dd to settle down (i.e. to recover) at low V dd . This causes transient timing errors due to aging until the transient state disappears in which generated defects, caused by BTI aging at the high V dd , partially or fully recover. This is known as short-term aging [21]. Switching voltage from low V dd to high V dd does not cause aging effects as the degradation at low V dd is less and the circuits get more robust at high V dd .

A. Effects of Transistor Aging
Technology scaling is approaching its limits displacing a few atoms in a transistor during operation is akin to aging and can endanger their key electrical properties. The key aging phenomena are Negative and Positive Bias Temperature Instabilities (NBTI and PBTI), with a potential to degrade the switching speed of pMOS and nMOS transistors. BTI occurs when the vertical electric field is applied to the transistor in which some of the minority carriers -that are being attracted to form the transistor's channel-may combine with the available Si-H bonds at the Si-SiO 2 interface layer resulting in interface traps. Some of these carriers may move to the transistor's dielectric due to quantum tunneling and captured by the oxide vacancies resulting in oxide traps. These defects interfere with the applied electric field and weaken it due to Coulomb scattering. As a result, the transistor can switch from OFF to ON state only at a higher gate voltage than the fresh device (i.e. in the absence of aging). Hence, the threshold voltage (V th ) of the transistor increases. In addition, the generated interface traps reduce the mobility of carriers (µ) as they move from source to drain due to Coulomb scattering. Aging-Induced Timing Errors: The delay of a transistor is proportional to its current in the ON state (I ON ). I ON is a function of the threshold voltage and the carrier mobility as in Eq. 1 [22]. An increase in V th plus a decrease in µ due to aging reduces I ON and increases transistor delay.
(1) where, C ox , W and L are oxide capacitance, width, and length of transistor. V dd , V th , and µ are operating voltage, threshold voltage, and carrier mobility, respectively.
Aged transistors slowdown increasing the likelihood of timing violations and errors in circuits, as shown in Eq. 2, if no (or in-sufficient) timing guard band is included. This is because the switching frequency is unsustainable causing timing violations in critical paths and these.propagate to outputs manifesting as errors.
The Hidden Impact of Voltage Scaling: From Eq. 1, the impact of aging-induced ∆V th largely depends on the operating voltage (V dd ). Therefore, the same increase in V th due to aging (e.g., 25mV) can result in a much larger degradation in I ON and thus in the transistor speed when V dd is scaled down. The timing errors that the circuit will exhibit, under aging effects, are subject to aging-induced degradation (V th , µ) and operating voltage (V dd ). The combination magnifies the impact of aging and shifts the aging problem from a sole longterm degradation (i.e. a degradation that may take months to cause timing errors in circuits) to a short-term degradation (i.e. a degradation that might need merely hours to cause timing errors). Such a magnification in the impact of aging can point to using this as a knob to detect Trojans. In this study, we consider the relatively longer-term aging (at multiple controllable levels of aging) that can be induced by the various aging effects discussed above.

B. Combining Circuit Aging With Over-Clocking
Aging alone does not create sufficient delay in a circuit path for the timing errors to propagate to the output. To check this hypothesis, we generated the outputs for the clean and Trojan-inserted IC using a nominal clock (i.e., without over-clocking) with different states of aging (i.e with different aging-induced ∆V th ). The difference in outputs for both the cases is insufficient to detect a Trojan.
Similarly, over-clocking alone does not create significant bit-error patterns at the output. When the Trojan is placed far from the critical path, over-clocking does not generate a distinguishable signature of errors. We apply an ML classifier on over-clocking only data to see if it can detect a Trojan on the critical path. The results show a false negative rate ≥50%. Windowing over multiple input vectors and majority voting during testing does not help. False negative rate increases when multiple input vectors are used to test.
Combining transistor aging with over-clocking generates a distinguishable signature of timing errors at the IC output and this can be used to detect Trojans with a high accuracy. Implementing and demonstrating this idea is the key novel contribution of this paper.
In a nutshell, the delay caused due to aging alone is insufficient to produce bit errors at the output of an IC. Also, ICs include a timing guard band to ensure reliability. By overclocking, we either narrow or remove the timing guard band. Over-clocking in combination with aging creates a robust signature of bit-error patterns at the output.

IV. AGING-AWARE CELL LIBRARIES: BRIDGING THE GAP BETWEEN PHYSICS AND SYSTEM LEVEL
Aging is "driven" by different underlying mechanisms of defect generation that occur at the atomic level. In order to investigate and quantify how such defects may propagate all the way up to the system level, where they manifest as timing errors, the intervening abstraction layers must be carefully traversed. In addition, real digital circuits typically consist of numerous paths, which are all similar to each other with respect to overall delay. Therefore, when aging-induced degradation takes place, it is challenging to accurately quantify how the timing paths will be violated and how such violations will translate into errors at the circuit outputs. This necessitates an accurate modeling of how standard cells will behave in the presence of aging degradations. Any investigation in this direction requires that we use commercial tool flows for static timing analysis in order to rely on their underlying mature algorithms evolved over decades. Otherwise, the impact of aging-induced degradation on the delay of paths cannot be accurately captured and, more importantly, any proposed technique would not be compilable with the existing standard design flow of circuits.
To address these challenges, we create "aging-aware cell libraries" in which the delay of standard cells are characterized by considering the effects that aging-induced defects have on the electrical properties of pMOS and nMOS transistors, similar to [22], [23]. We start from the lowest level of abstraction where we employ state-of-the art physics-based BTI aging models to estimate the defects in pMOS and nMOS transistor and how they result in shifts in the transistor's parameters (i.e. V th and µ) [24]. Then, we employ SPICE simulation to estimate the delay and power of every standard cell considering the effects that ∆V th and ∆µ on the delay of the nMOS and pMOS transistors. We analyze every standard cell with 7×7 input signal slews and output load capacitances 1 . All the generated data is stored using the "liberty" format, which is the standard format for existing commercial EDA tool flows (e.g., Synopsys and Cadence). To cover a wide range of aging effects, we create the aging-aware cell libraries for a various aging stress conditions (i.e. various duty cycles 2 ). We start from 0% representing no aging all the way to 100% representing the maximum aging that considers a continuous aging stress without any recovery, in steps of 10%. These standard cell libraries are compatible with EDA tool flows like Synopsys and Cadence. Hence, designers can plug them directly within the commercial static timing analysis tools for accurate timing analysis. Implementation Details: We target the 45nm technology node. The methodology and implementation is not limited and applies to advance technology nodes. We employ state-ofthe-art physics-based aging model [24], which were validated against semiconductor measurements. They capture the defect generation of BTI aging under any stress condition. The physics models support all technology nodes and different transistor structures such as FinFETs. To create the standard cell library, we used the open-source 45nm nangate library [25]. The library provides SPICE netlist for sequential and combinational standard cells. For SPICE simulations we use open-source Predictive Transistor Model (PTM) [26].

V. PROPOSED METHODOLOGY
Consider an IC that performs a function f at a nominal operating clock. When an input x is applied to the IC, it generates an expected output f (x) as long as the IC is operating normally (i.e., operating without timing errors). If the IC is run on a faster clock (higher frequency), it alters its functionality causing mis-matches (e.g., bit errors, longer settling times) in the observed outputs due to the induced timing violations. If the logic gates of the IC age and slow down, the results take more time to propagate and reach the output bus and the observed output behavior will be different from expected because of timing errors. Aging induces a delay increase in the IC, which then results in transient timing errors. If the clock period is small enough to propagate the errors to the output, the output bits will differ from the expected outputs. This change in the output bit patterns helps detect extra circuits such as Hardware Trojan. Timing guard band is not used in this study. It is used during the operation of a chip and is not a consideration for the detection mechanism. The maximum clock frequency is purposefully violated by overclocking the circuit.
To validate this hypothesis using simulations, we create standard cell libraries as described in Section IV that consider aging effects by using the degradation of threshold voltage (V th ) and carrier mobility (µ) in nMOS and pMOS transistors. We created standard cell libraries for different stresses of duty cycle from 0% -100% in steps of 10%. A duty cycle of 100% refers to 100% aging stress and 0 refers to no aging.
We consider attacks pre-synthesis (RTL) and post-synthesis (gate-level netlist). In the case of post-synthesis Trojan insertion, the attacker can be either the designer or the fabrication company. For the RTL attack, we assume that we have access to the genuine RTL that is corrupted before proceeding to synthesis. For Trojan insertion post synthesis, the RTL without the Trojan is synthesized to produce a gate-level netlist and the Trojan is inserted within the netlist to get a corrupted, Trojaned netlist. We use the aging-aware standard cell libraries and the (clean and corrupted) gate-level netlists as inputs to a static timing analysis tool to generate the standard delay format (SDF) files for the different aging states. The SDF file reports the exact delay of every logic gate in the netlist. We generate a set of inputs and their outputs for the baseline IC using gate-level simulations. This set of inputs includes randomly generated inputs as well as input test patterns generated by Synopsys Tetramax Automatic Test Pattern Generator (ATPG) [27] tool. We generate these input-output pairs for combinations of aging states and clock frequencies using gatelevel simulations (with SDF annotations). We then develop a machine learning approach (discussed in Sections V-A, V-B) to compare the observed bit error patterns at the circuit output with the expected bit error patterns trained from a knowngood device/simulation. Figure 1 shows the stages where a hardware Trojan can be inserted. Our ML classifier detects all the Trojans. We obtain the maximum clock period for an IC using static timing analysis and generate input/output dataset for smaller clock periods that create pronounced output bit-

A. Feature Extraction for Machine Learning
The Hardware Trojan detection methodology is based on applying inputs to the IC, observing the outputs for several clock periods and aging states, and comparing the observed output variations with "baseline" characteristics to detect anomalies that indicate presence of Trojans. A one-class classifier based on an auto-encoder and a one-class Support Vector Machine (SVM) is proposed as shown in Figure 2.
To train a model using the baseline characteristics, we consider a set of inputs x T . For each input x ∈ x T , the output is recorded 3 for a range of clock periods and aging states. Denoting clock periods by t 1 , . . . , t n C and aging states by a 1 , . . . , a n A , the output for i th clock period and j th aging state is denoted as h ij (x). The clock periods t 1 , . . . , t n C , span significantly lower than the highest sustainable clock period under normal conditions 4 While the higher number of bit errors are expected at a higher clock, some bit errors are possible at higher than nominal clock in high-aging states. Given input x, the expected output is denoted by y 0 = f (x).
Given an input x, the expected output f (x) is deterministic, while the measured outputs h ij (x) are stochastic, when the IC is over-clocked and aged. IC-to-IC variations add to the variability of h ij (x). To account for variability and achieve robust anomaly detection, we do not simply learn mappings from inputs to expected outputs. Rather, we learn a deeper model of how the observed outputs (i.e., observed bit error patterns) vary with over-clocking and aging. The trained 3 After a number of cycles sufficient to read output in normal operation. 4 Nominal clock derived from slack analysis by Synopsys Primetime. classifier does not compare measured with expected outputs. It uses the variation in patterns of bit errors as a signature for the IC and learns a model of the characteristics of these variations (independent of the applied inputs). Hence, inputs used during testing of an IC are independent of inputs used in training (see Section VII).
Given an input x, the expected output y 0 = f (x), and a measured output y = h ij (x) for the i th clock period and the j th aging state, the mismatch between y and y 0 is measured by a set of four features: number of 0→1 bit flips, 1→0 bit flips and weighted combinations of 0→1 and 1→0 bit flips considering their bit locations. Denote the bit length of the output (i.e., y or y 0 ) as m and given any binary number a of bit length m, define the "bit indicator functions" 1(a) and 0(a) as the subsets of M = {0, . . . , m − 1}, given by where & is the bit-wise AND. 1(a) and 0(a) capture subsets of bit locations in {0, . . . , m − 1} corresponding to 1 or 0 bits in a. Given y, y 0 , the feature vector is defined as F (y, y 0 ) = [f 1 (y, y 0 ), f 2 (y, y 0 ), f 3 (y, y 0 ), f 4 (y, y 0 )] T ∈ R n F ; n F = 4 (8) Using the feature vector F (., .), a three-dimension feature tensor is defined to characterize variations of bit errors over the set of clock periods and aging states. Given input x, expected output y 0 = f (x), and measured outputs {h ij (x), i = 1, . . . , n C ; j = 1, . . . , n A }, the three-dimensional tensor F (x) of dimension n C × n A × n F is defined where its (i, j, k) th element is the k th element of F (h ij (x), y 0 ). Figure 3 visualizes the feature tensor computation across clock periods and aging states for RSA circuit (Trojan inserted in the nelist). The figure shows the features extracted at the different clock periods and aging states for an input as a heat map. Although the feature values for clean and Trojaned ICs look similar, the classifier can distinguish a clean IC from a Trojaned one (see Section VII). From left to right in each of the plots, the aging stress increases from 0% to 100% and the Y-axis represents different clock periods and the overclocking is more towards the bottom. The four plots in each column show the four types of features as discussed above. They correspond to (from the top) the number of 0→1 bit flips, number of 1→0 bit flips, weighted combinations of 0→1 and 1→0 bit flips considering their bit locations, respectively. The heat map shows how the features are varying with aging stress and clock periods as well as the difference between a clean IC and Trojaned IC. Lighter colour indicates that the particular kind of feature at the given clock period and aging stress is more pronounced.

B. Machine Learning for Trojan Detection
To train the anomaly detector, the feature tensors F (x) are generated for each input x in the training set x T . We combine features from multiple inputs by taking a window of size 1 during training for better accuracy in detecting Trojans. From the set {F (x), x ∈ x T }, we train a one-class classifier for outlier detection. Given a feature tensor computed from outputs measured from an IC under test, the trained classifier determines if the feature tensor is "different (outlier)" from that for a Trojan-free IC. A one-class classifier based on an autoencoder and a one-class SVM is used (Figure 2). From the feature tensor of dimension n C × n A × n F , a compact feature representation is computed using a four-layer autoencoder having a two-layer encoder and a two-layer decoder with ReLU activations. The feature vector from the hidden layer of autoencoder is input to a one-class SVM.
When testing an IC, inputsx in a test setx T are applied to the IC and the outputs y ij = h ij (x) are measured for the i th clock period and j th aging state. Feature vectors F (y ij , f (x)) and the feature tensor F (x) are extracted. The feature tensor is passed through the trained autoencoder to extract a lowdimensional feature vector which is then passed through a one-class SVM. The test setx T can be disjoint from or overlap with the training set x T . In either case, the anomaly detector is input-independent and does not rely on matching actual values of outputs, but on feature patterns of bit error variations across clock periods and aging states. Since the detector operates on a feature tensor extracted from outputs measured for a single input, applying one inputx is sufficient for inlier/outlier determination. However, to improve accuracy, multiple inputsx plus majority voting is used to generate the inlier/outlier estimate. Section VII shows that ≥ 95% accuracy of inlier/outlier determinations of clean vs. Trojan-inserted ICs) are obtained with a single input. Majority voting with multiple inputs improves it to 100%.
In summary, the approach implemented is as follows: • Training: A simulated model of the baseline (Trojanfree) IC is used to train using a set of inputs which is a combination of random and ATPG-generated inputs and a predefined set of operating condition tuples (e.g. clock frequency, aging state) are applied to the IC. For each of the conditions, inputs are applied and outputs are measured. The feature vectors are generated from each of the measured output to train the SVM. The feature vectors for challenging Trojans are enhanced by collecting the features over a set of inputs (called a bin). • The testing procedure is similar to the training. We collect input-output measurements at different clock frequencies and aging states. Testing is done on the baseline and Trojan ICs to measure accuracy of the classifier. To improve accuracy for the challenging Trojans, the number of bins (called a batch) used for testing at a time is increased. We use a batch of 5 bins during training. • Implementation: In the deployment site, same procedure as during testing is used. Different clock frequencies and aging states (i.e., operating condition) can be applied to the IC under test and input-output measurements with a batch of inputs at each operating condition can be collected. The extracted features are run through the classifier to determine if the IC is Trojaned or not.

A. Trojan Benchmarks Used in the Study
We use the following 32-bit RSA and the 128-bit AES circuits from the Trust-Hub [28].
• The Basic RSA-T100 benchmark implements a Shift-and-Add algorithm for modular multiplication. The trigger checks for an input and activates the Trojan when this is found on the input bus. The Trojan leaks the secret key (private exponent) through output bus. Figure 4 shows the gate-level netlist of the synthesized RSA circuit with a Trojan (the Trojan is inserted at the RTL). The Trojan occupies 0.75% of the IC area. • The AES-T100 uses a 128-bit key. In the baseline AES, the plaintext goes through 10 rounds of substitution, mix column, and shift rows. The AES-T100 Trojan is always on. The Trojan leaks bits from the secret key. The eight least significant bits are leaked through power side channel before which they are XORed with the bits generated from Linear-Feedback Shift Register (LFSR). This modification to the key obfuscates the power readings which allows only the adversary to recover the key. • The AES-T1000 benchmark uses a 128-bit key and has a trigger similar to Basic RSA-T100. The Trojan leaks the key using the technique similar to AES-T100. The difference is that the AES-T1000 has a Trojan trigger.

B. Synthesis and Simulation
We use Synopsys Design Compiler [29] and a 45 nm technology library operating at 1.1V, 25 o C (see Section IV) without aging to synthesize the baseline and the Trojan circuit (RTL from Trust-Hub) to produce gate-level netlists. The netlist of the baseline circuit and the Trojan are combined to produce a netlist with a Trojan so as to keep the original circuit unchanged. Synopsys Primetime [30] is used along with 45 nm technology with different percentages of aging stress to create SDF files, each for a case of aging stress, using the netlists created for Trojan-free and Trojaned circuits. The gatelevel simulations at different clock periods is performed using Synopsys VCS [31]. Figure 5 shows the tool flow.

C. SVM-based Machine Learning
A one-class SVM is trained to learn bit error patterns produced at the outputs (details in Section 2); therefore, detecting Trojans. Features are extracted for one input at a time to train the classifier in Experiment 1. The number of input vectors used for windowing over features is denoted by k (size of bin) and we will empirically show that k = 5 provides improved accuracy to detect the more challenging Trojans.   IC-to-IC delay variations due to temperature, pressure differences during manufacturing of ICs is also considered and addressed. These variations are modeled by altering the delay parameters of the gates in the SDF files that are caused by each of the aging states. There are two types of variations in ICs -1) IC-to-IC variation and 2) on-chip variation (die-todie). The variations in delay from IC to IC can be typically 5% or more [32]. The change that we applied to the SDF files is as follows: 1) 5% change in each parameter to model IC-to-IC variation, and 2) Gaussian random variations of zero mean and 4% standard deviation (σ) to model on-chip variations. The combined effect corresponds to variations of up to 17% (considering 3σ plus 5% as in the first case). At advanced technology nodes, the percentage variation in delay due to manufacturing process increases and aging effects become pronounced, as has been demonstrated in Intel measurements for the 14nm FinFET technology [33].

VII. EXPERIMENTAL RESULTS
We show the efficacy of our approach on three Trojaned circuits from the Trust-Hub [28]: 1) BasicRSA-T100 (Experiments 1 and 2), 2) AES-T100 (Experiment 3) , and 3) AES-T1000 (Experiment 4). The Trojan sits on the critical path for BasicRSA-T100 and about 4209 paths (rank of the path) off of the critical path for the AES-T100 and 2753 paths (rank of the path) off the critical path for the AES-T1000. Therefore, the detection of the Trojan in the AES circuit is much more challenging. To train and test the classifiers, we build a corpus of datasets with overclocking and aging. To show that overclocking is insufficient on its own even on the simplest circuit with Trojan on the critical path, we collect data for the BasicRSA-T100 using overclocking only as well. We also show results for BasicRSA-T100 when the Trojan is inserted at the RTL and netlist. This is due to the fact that the synthesized circuit changes considerably when insertion is performed at the RTL level; therefore, the Trojan detection will be easier. However, in AES-T100 and AES-T1000, only a side channel Trojan (no feedback to the original circuit) is included and therefore injection of the Trojan into RTL will not change the original circuit once synthesized. In all these cases, we use the data from Trojan-free ICs to train and the data from the Trojaned ICs to test. The inputs used during training are not re-used for testing to make sure that the classifier indeed learns useful patterns. Trojans in the training and testing dataset are kept dormant (except in Experiment 3 but the Trojan does not affect the output bits in Experiment 3) to make the problem challenging. The training and testing (which are offline) phases require larger dataset than during deployment (i.e., inference). The number of input-output pairs of data collected for each of the circuits is between 4000 and 4300 which takes a few kilo bytes to store. The features collected over the data also is not more than a few hundred kilo bytes. The training is offline and needs to be done only once per circuit. It takes roughly five minutes to train each of the circuits. For testing and inference phases, the computations are not complex and does not take more than a few seconds.

A. Experiment 1: Basic RSA-T100 Trojan Inserted into RTL
The baseline RSA circuit when synthesized has a clock period of 2.17 ns. The Trojan occupies 0.75% of the circuit. For the Trojaned and baseline ICs, we collect two sets of data: 1) Overclocking with aging: We over-clock the RSA circuit and collect data by sweeping the overclocking in the range 1.125 ns -1.4 ns with a step of 0.005 ns for Trojan and Trojan-Free cases and for all aging states. 2) Overclocking without aging: We over-clock the RSA circuit and collect input/output data in the range 0.9 ns -1.4 ns in steps of 0.005 ns without considering the aging effects for clean and Trojaned ICs. The data set is generated for 4226 inputs. Of these, 4096 are random patterns and 130 are ATPG patterns for the clean RSA circuit. We extracted four types of features (details in Section V-A). Figures 6(a), 6(b) show the histograms of bit flips at the outputs for the no-aging case of RSA circuit with Trojan inserted at RTL. Figures 7(a), 7(b) show the histograms for the maximum aging case. The bit flips largely depend on the inputs and the key used in RSA algorithm. Depending on the positions of bits in the inputs and the key used by the algorithm, each bit will have a different path from the input port to the output port. Additionally, when aging is applied to the circuit, it induces more delay in several paths that leads to an increase in number of bit flips. This is evident from the figures in which the histograms shifted towards right when aging is applied. Figures 8(a), 8(b) and Figures 9(a), 9(b) show the bar charts of the weighted location of bit flips for no and maximum aging, respectively. The locations of bit flips has changed from no aging case to worst aging case and from no Trojan case to the Trojan inserted case as discussed. These figures show that the chosen features provide a discernible difference when a Trojan is inserted.
In case of overclocking with aging, we use half of 4226 inputs for training and the other half for testing. When a single input is used for training, the accuracy is 99.3% on a clean IC and 100% for a Trojaned IC as shown in Figures 10(a), 10(b). False positive rate is 1.4% and false negative rate is 0%. When a 3-input batch is used, the accuracy improves to 99.95% for the clean IC and 100% for IC with Trojan as seen in Figures 11(a), 11(b). The false positive rate is 0.09% and the false negative rate is 0%. A larger batch size is required for challenging Trojans. During deployment, only 1 input or 3 inputs are sufficient. When IC-to-IC variations are considered and one input is used to test, the false negative rate for data from the clean IC is 2.73% and for Trojaned IC is 0%. A 3-input batch yields 100% accuracy. Figures  12(a), 12(b) show the time series of anomaly detection for random variations in ICs for a single input. The accuracy decreases slightly when process variations are considered. The perturbations introduced to the IC to simulate are random and may fall more in the baseline increasing the accuracy. We over-clocked the circuit by to 4× the frequency of the Trojan-free circuit without aging and tested the classifier.

B. Experiment 2: Basic RSA-T100 Trojan Inserted into Netlist
We make Trojan detection challenging by inserting the Trojan into the synthesized netlist. This makes minimal alterations to the original design and hence difficult to detect. The Trojan in this experiment occupies 0.22% of the circuit area. We collect data in the clock range 0.71 ns -0.84 ns with a step of 0.005 ns. This range is lower than the RTL inserted one since the synthesized circuit is similar to the original Trojan-free circuit. Aggressive over-clocking is to get discernible patterns of bit errors at the outputs. Figures 14(a), 14(b) show the histograms of bit flips at output for no aging and Figures 15(a), 15(b) show the histograms for maximum aging. Figures 16(a), 16(b) show the bar charts of weighted location of bit flips for no aging and Figures 17(a) and 17(b) show the same for maximum aging. These figures show that the features provide a discernible difference when the Trojan is inserted in the circuit. As the Trojan is inserted into the netlist, the overall netlist will now be different from the previous case (Experiment 1). The number of gates involved and the structure of the netlist changes. Thus, the bit flip distribution changes from the previous case.
The features are collected using a bin of 5 inputs at a time for training the classifier. When a single bin is used for testing, the accuracy of correctly classifying the Trojan-free case is 89.47% and for the Trojan case is 100%, as shown in Figure 18(a) and Figure 18(b), respectively. When 16 bins are used for testing, the accuracy for the clean IC increases to 99.47% and that for Trojaned IC remains 100%. When ICto-IC variations are considered and single bin input is used to test, the false negative rate for data from clean IC is 13.45% and for Trojan inserted IC, it is 0%. A 16-bin input yields 99.57% accuracy. For random variations in ICs, the number of inputs in the test data set is 4226. Figures 20(a), 20(b) show the time series of anomaly detection for random variations in ICs for single input. Table I summarizes the results. Using smaller number of bins yields more false positives.

C. Experiment 3: AES-T100 Trojan Inserted into RTL and Netlist
The Trojan in this experiment occupies 0.23% of the circuit area. The netlist generated by inserting Trojan at the RTL is same as the one obtained after inserting Trojans in the netlist. So we will show the results for one case. Since this Trojan is harder to detect (4209 paths away from the critical path) and the circuit is complex, the performance decreases when a single input is used for testing. Therefore, we enhance the features and use several bins containing different number of inputs at a time (during training and inference). For the AES circuit, the maximum clock that the circuit can be synthesized is 0.57 ns. We collect data in the clock range of 0.45 ns -0.55 ns in steps of 0.005 ns. The data is generated for 4340 inputs. show the bar charts for maximum aging. The location of bit flips is concentrated on the least significant bits of the output. In each round of AES algorithm, "Shift Rows" operation rotates the bits by an amount that depends on their position. This induces more operations on most significant bits than that of least significant bits. Thus, the bit flips are more concentrated towards the end. Additionally, there is a change in bit-flip distribution and locations of bit flips from no aging to the worst aging case and the Trojan free circuit to the Trojaned circuit. These figures show that these features provide a discernible difference when a Trojan is inserted.
With a single bin input (bin size = 5), the accuracy of correctly classifying a Trojan-free IC is 75.77% and that for a Trojaned IC is 98.98% (Figures 25(a) and 25(b), respectively). To reduce the false positive rate, we use a multiple-bin input. Using a batch of 32-bins increases the accuracy to 99.71% for classifying a Trojan-free IC and 100% accuracy when detecting a Trojaned IC (Figures 26(a) and 26(b)). When ICto-IC variations are considered and a single bin input is used, the false negative rate for data from clean IC is 24.91% and for Trojan-infected IC is 0.87% (Figures 27(a), 27(b)). A 32bin input yields 99.29% and 100% accuracy for clean and Trojan-infected IC. The resulting precision for the classifier is presented in Table II.

D. Experiment 4: AES-T1000 Trojan Inserted into RTL and Netlist
The Trojan in this experiment occupies roughly 0.3% of the circuit area. Figures 28(a), 28(b) show the histograms of bit flips for no aging case and Figures 29(a), 29(b) show the histograms for maximum aging. The number of bit flips have been increased from the no aging case to the worst aging case as can be seen by the shift in the histogram towards right. Figures 30(a), 30(b) show the bit flips of weighted location at outputs for no aging case. Whereas, Figures 30(c), 30(d) show the bar charts for maximum aging. As explained in the previous section (Experiment 3), the locations of bit flips are concentrated more towards the end. The bit flips and their locations vary from no aging case to worst aging case as well as Trojan free case to Trojan inserted case. These figures show that the chosen features provide a discernible difference when the Trojan is inserted. With single bin as input (bin size of 5), the accuracy for correctly classifying Trojan-free IC is 75.72% and that for Trojaned IC is 99.16% as shown in Figures 31(a), 31(b), respectively. When a batch of 32-bins is used, accuracy increases to 99.85% for classifying Trojan-free IC and an accuracy of 100% for detecting Trojan IC as in Figure 32(a), Figure 32(b), respectively. Table II summarizes the precision of the model on AES-T100 and AES-T1000 when using single or multiple inputs. Figures 33(a), 33(b) show the ROC curves (true vs false positives at different thresholds). Considering ICto-IC variations and a single bin as input, the false negative rate for data from clean IC is 20.33% and 0.87% for a Trojaned IC as in Figures 34(a), 34(b). A batch of 32-bins yields 99.15% accuracy for clean and 100% accuracy for Trojaned IC.

VIII. CONCLUSION
This study shows effectiveness of controlled aging in detecting Trojans. A machine learning classifier and feature selection distinguishes genuine ICs from the ICs in which Trojans are far off the critical path. Over-clocking alone does not distinguish genuine ICs from Trojan-inserted ones. Over-clocking plus aging provides sufficient patterns of output bit errors to detect Trojans. A high detection accuracy is achieved with 32-bins as input with a bin size of 5 in case of AES circuits. We will study detection of the smallest Trojan on and off the critical path. We will consider cell libraries at different voltages so that aging is better approximated using fast voltage switching. While this is a detailed simulation study, it is interesting to demonstrate the method on complex circuits and Trojans and on real ICs.   Sept. 1988. His research interests include adaptive and nonlinear controls, robotics and automation, unmanned vehicles, cyber security for cyber-physical systems, embedded systems security, machine learning, and large-scale systems and decentralized control. He has published more than 280 refereed journal and conference papers in these areas. His book "Modeling and Adaptive Nonlinear Control of Electric Motors" was published by Springer Verlag in 2003. He also has thirteen U.S. patents on novel smart micro-positioners and actuators, control systems, security, and wireless sensors and actuators. He has developed and directed the Control/Robotics Research Laboratory at Polytechnic University (Now NYU). He has also commercialized UAVs as well as development of auto-pilots for various unmanned vehicles. His research has been supported by the ARO, NSF, ONR, DARPA, ARL, AFRL, NASA, and several corporations. He has served as general chair and conference organizing committee member of several international conferences.