Leveraging Ferroelectric Stochasticity and In-Memory Computing for DNN IP Obfuscation

With the emergence of the Internet of Things (IoT), deep neural networks (DNNs) are widely used in different domains, such as computer vision, healthcare, social media, and defense. The hardware-level architecture of a DNN can be built using an in-memory computing-based design, which is loaded with the weights of a well-trained DNN model. However, such hardware-based DNN systems are vulnerable to model stealing attacks where an attacker reverse-engineers (REs) and extracts the weights of the DNN model. In this work, we propose an energy-efficient defense technique that combines a ferroelectric field effect transistor (FeFET)-based reconfigurable physically unclonable function (PUF) with an in-memory FeFET XNOR to thwart model stealing attacks. We leverage the inherent stochasticity in the FE domains to build a PUF that helps to corrupt the neural network’s (NN) weights when an adversarial attack is detected. We showcase the efficacy of the proposed defense scheme by performing experiments on graph-NNs (GNNs), a particular type of DNN. The proposed defense scheme is a first of its kind that evaluates the security of GNNs. We investigate the effect of corrupting the weights on different layers of the GNN on the accuracy degradation of the graph classification application for two specific error models of corrupting the FeFET-based PUFs and five different bioinformatics datasets. We demonstrate that our approach successfully degrades the inference accuracy of the graph classification by corrupting any layer of the GNN after a small rewrite pulse.


I. INTRODUCTION
T HE demand for artificial intelligence (AI) and machinelearning (ML) hardware for the edge computing paradigm has burgeoned in recent times with the growth of the Internet of Things (IoT). Deep neural networks (DNNs) are at the forefront of this revolution with applications in various domains, including computer vision, big data, natural language processing [1], [2], and so on. However, constructing and setting up a DNN incurs significant hardware costs and large-scale training data, requiring considerable monetary and logistical resources. Owing to this, cloud-based DNN applications and ML-as-a-service (MLaaS) have become popular commercial models, catering to a wide range of businesses [3], [4]. Though performing complex DNN operations is computationally expensive, in certain scenarios like remotely deployed IoT devices or security-critical military applications, it is preferable to have an onboard hardware DNN processing system. Hence, from both commercial and military standpoints, the security of the deployed DNN hardware is of paramount importance, the piracy of which can cause monetary loss or result in the leaking of sensitive information.
In particular, graph neural network (GNN) is a class of DNNs specifically designed to process data relationships that can be expressed as graphs, for example, datasets pertaining to molecular chemistry and biology, social networks, and data mining, among others [5]. GNNs are typically utilized in applications involving non-Euclidean graph structures of various types, including cyclic, acyclic, directed, and undirected graphs [6]. They have recently gained traction because many relationships in the natural world occur in graph data, and neural networks (NNs) like convolutional NNs (CNNs) cannot process such graph data accurately. CNNs process the input data, such as images represented as tensors, and consider them as ordered data. The change in the order of elements in a tensor leads to a change in the output of the CNN. This change in the output with the representation order does not apply to graphs. A graph representation does not require a fixed order; thus, the tensor-based representation is unsuitable for graphs. GNNs can process graph data irrespective of the order and are capable of learning the structural features of the overall graph.

A. HARDWARE SECURITY OF NEURAL NETWORKS
Various attacks have been proposed against the confidentiality of NN systems. These attacks aim to reverse-engineer (RE) the hardware of NNs by stealing the underlying model's vital information, that is, its weight mapping. In such attacks, an attacker queries the NN with various inputs and collects the corresponding output responses. Furthermore, using the input-output responses, an attacker can RE the weights of the target network. Such attacks have been proposed against different types of DNNs. In [7], [8], and [9], researchers have proposed an attack on black-box DNNs wherein they craft the inputs, that is, images to be queried in such a way that the output predictions reveal the internal attributes of the underlying network. There are different types of adversarial attacks that aim to affect the confidentiality of GNNs. Such attacks either aim to retrieve the important information of the training dataset or steal the GNN model itself. The attacks proposed in [10] and [11] target membership inference which aims to find valid data samples that are used for training, thus affecting the confidentiality of the training dataset. In [12], a link stealing attack has been proposed that aims to predict the existence of links between two nodes in the training graph, thus leaking the training dataset. In [13] and [14], property inference attacks have been proposed against GNNs, which aim to infer the properties of training datasets such as subgraphs, graph density, and so on. In [15], [16], and [17], model extraction attacks have been proposed against GNNs that aim to build a surrogate model with an accuracy similar to the original GNN model that is under attack. In [16], researchers have proposed different model extractions considering different attack scenarios, such as complete, partial, or no knowledge about the training dataset. However, this attack also extracts the model of GNN using input and output (I/O) queries. In [15], researchers have proposed a model extraction attack that targets inductive GNNs in an adversarial setting where they do not tamper with the training process. The attack proposed here queries the GNN considering two scenarios-with and without the structural information of the query graphs. In this work, we focus on such model stealing or extraction attacks on GNNs. All these attacks target the software implementation of the NN and aim to generate adversarial examples using the knowledge of the extracted model.
Several approaches have been proposed to defend against the model extraction/stealing attacks, especially for DNNs. These techniques defend the NN architecture at the hardwarelevel. In [19], researchers have demonstrated a technique that defends memristor-based NN architectures by leveraging the memristor's obsolescence effect. The continuous application of voltage causes an increase in memristance, which causes the obsolescence effect in memristors. This solution thwarts the attacker from querying the NN architecture to obtain enough input-output pairs to replicate the target network model. But this defense can be circumvented by controlling the obsolescence effect through input voltage amplitude scaling. Later, in [20], researchers proposed a superparamagnetic magnetic tunnel junction (s-MTJs)-based defense mechanism that leverages the thermally induced telegraphic switching property of s-MTJs to corrupt the weights. This defense is unlike [19], wherein the attacker cannot control the corruption of weights. However, the small retention time of s-MTJs warrants frequent refresh operations, leading to higher energy costs.

B. KEY CONTRIBUTIONS OF THIS WORK
In this work, we leverage two particular properties of emerging ferroelectric field effect transistor (FeFET) devices to secure NN systems, namely: 1) the inherent stochasticity in the spatial distribution of the ferroelectric (FE) domains to corrupt the NN weights [21] and 2) the in-memory computation capability of FeFET to perform efficient and compact XNOR-based logic-in-memory [22]. We design a weight encryption scheme for protecting the confidentiality of hardware NNs by combining these two effects. Specifically, we choose GNNs as the model network to be protected, although the proposed scheme can be applied to any DNN structure without loss of generality. Next, we describe the threat model assumed for the target GNN. Threat model: Here, we outline the resources and capabilities of the attacker considered in this work.
(a) An attacker has (only) black-box access to the hardware GNN intellectual property (IP) that is either a part of the cloud-based infrastructure or a part of an onchip core. They do not have physical access to the individual internal weights, to probe and find the programed weights at any given instant. (1) We exploit the randomness in FeFET devices to augment the security of NN systems by amalgamating it with the in-memory computation capabilities of FeFET XNOR gates.
(2) We present a comprehensive analysis and modeling of the randomness in FE domains and highlight the construction of a reconfigurable physically unclonable function (PUF) using this inherent randomness. This FeFET-based reconfigurable PUF is pivotal to the weight corruption mechanism.
(3) To the best of our knowledge, this is the first work to demonstrate a defense against model piracy attacks specifically targeting GNNs. (4) We explore the system-level implications of the GNN weight corruption on the accuracy of classification tasks and show how model piracy attacks can be foiled.

II. IMPLEMENTATION
The background on GNNs and FeFET device construction and characteristics is described in the Supplementary information.

A. MODELING THE INHERENT RANDOMNESS IN FeFET
It is noteworthy that not all the domains within the FE layer switch at the same time [23]. Therefore, positive and negative domains coexist in the FE layer when a ''weak'' write voltage (WV) pulse (i.e., a WV pulse with a smaller amplitude and/or smaller width than what is needed to completely switch all FE domains) is applied. Thus, depending on the percentage of domains polarized up or down (%P FE+ ), the FeFET can be set into intermediate V TH states by controlling the WV amplitude or pulsewidth. The high V TH state corresponds to 0%, where all the domains are polarized upward and vice versa for the low V TH state (100%), where all the domains are polarized downward. For a sufficiently long-channel FeFET, where the domain size is much smaller than the channel dimensions, a gradual switching of the FeFET is observed and there can be many intermediate states of polarization [21].
For intermediate V TH states, the polarized domains can exist in any spatial orientation throughout the channel [25]. Also, due to the stochastic switching time of the FeFET, it cannot be predicted the exact domains that might be switched even for the same pulse. This provides an additional source of variation in the distribution of the polarized domains along the channel. Thus, even for a fixed %P FE+ , we can have a different spatial distribution of the FE domains and thus variability in the underlying channel electron density. This can cause variability in the electrical characteristics of the FeFET at a given intermediate state. Also, conventional sources of variability in the underlying transistor, such as random dopant fluctuations, metal gate work function variation, and line edge roughness, can cause additional variation in the electrical properties of the intermediate state.
To model the variability and randomness (inherent stochasticity) inside FeFET, we employ our in-house TCAD-based framework as in [21] and [25]. It enables us to directly evaluate the impact of random spatial fluctuation of the polarization through emulating the polarization charges (P FE ) with fixed charges (Q FIX ) at the HfO 2 -SiO 2 interface. In practice, each domain is assigned a particular Q FIX depending on the polarization. The value of Q FIX can be calculated by measuring the residual P FE in the FE layer as Here, Q FIX represents the interface charge concentration (measured in cm −2 ) at the FE layer-interfacial layer interface. The sign of Q FIX determines the type of charge and thus the direction of the polarization of the domain. For a given %P FE+ , the total number of domains with Q FIX+ (or Q FIX-) is fixed and is randomly distributed in space to generate a random distribution of the channel electron density [21]. Next, Monte-Carlo simulations are performed to determine the effect of the random distribution of the domains on the electrical characteristics of FeFET. Additionally, variations due to the conventional sources of variability are simulated and combined with the inherent variations from the multidomain FeFET. The corresponding V TH distributions of the FeFET for 0%-100% is discussed in supplementary information. The maximum variation is observed at 50%, where there is an equal number of up and down polarized domains and thus maximum spatial variability. This variation in V TH also causes a variation in I DS at a particular %P FE+ .  In order to set the FeFET at a particular %P FE+ , we need to know the relationship between WV and %P FE+ . This relationship can be established by measuring the residual polarization after a write pulse. Once this value is known, it can be normalized between the minimum and maximum P FE and converted to %P FE+ . Our fixedcharge-based modeling framework also measures the same maximum and minimum P FE , converts it into fixed charges, and distributes it among the domains according to a given %P FE+ . This allows us to link our fixed-charge-based TCAD model with the already known Preisach model and determine the write pulse magnitude and duration to set it into a particular %P FE+ . Fig. 2(b) shows the relationship between %P FE+ and WV for a fixed pulsewidth.

B. FeFET-BASED RECONFIGURABLE PUF
As discussed in Section III-A, FeFET shows variation in V TH and correspondingly, the current flowing through it even for a fixed polarization strength. This forms the basis for FeFET to be used as a PUF. The structure of our designed PUF is similar to the recently proposed 1 FeFET per cell reconfigurable PUF [26]. The PUF is programed in three steps. First, all the transistors are set to an initial high-V TH or low-V TH state by setting all the domains in an upward or downward direction. This is done by applying a high positive or negative pulse. In the next step, we apply a voltage pulse of lower magnitude to set it in an intermediate V TH state. The third and final step is to generate the output bits from the PUF. Because of the inherent stochasticity and randomness that arises from the multidomain FeFET, there exists variability in the FeFET. Owing to this variability, when the FeFET is read using a particular V READ  In order to simulate the PUF, we set the polarization in the FE layer for each FeFET to 50% because the maximum variation is observed here. This can be done by applying a WV of a suitable magnitude of about 2.2 V determined from Fig. 2(b). We use our variability modeling framework to run Monte-Carlo simulations at 50% for FeFET to generate the current distribution. Finally, we can read the drain-source current (I DS ) from the bitline by applying a V READ at the gate terminal for a very short duration (0.5 ns) to not disturb the polarization state. When an m-bit challenge in the form of the address of the individual cells is input to the FeFET, an n-bit output is generated depending on the particular FeFET returning either ''1'' or ''0.'' In the case of any attack by the attacker, the PUF can be reconfigured (reprogrammed) by applying a reconfiguration pulse at the word-line for each FeFET parallelly. This sets the FeFETs in the PUF array to a different state of polarization. Table 1 shows the magnitude and duration of the reconfigure pulse required to change into nearby states of polarization from 50% P FE+ . These values are also calculated in the same way as described in Section II-A. The corresponding change in the distribution curves can be generated using our fixedcharge-based variability modeling framework (see Fig. 3). I REF does not change with the change in %P FE+ and thus we no more have equiprobable 0's and 1's. If the %P FE+ increases, the distribution shifts right, and the probability of getting a ''0'' (P(0)) increases. Conversely, for the decrease  in %P FE+ , P(1) increases. Therefore, on applying the reconfiguration pulse, the existing output bit probability from the PUF changes. Fig. 4(b) shows the overlapping distribution of the I DS for two other polarization strengths compared to the golden standard case of 50%. As I REF does not change, we have a probability for the output bits to flip from ''0'' → ''1'' (P(1 n |0 p )) or ''1'' → ''0'' (P(0 n |1 p )). The suffix ''n'' and ''p'' refers to the new state after reconfiguration and the previous state before reconfiguration respectively. The total P(error) is the sum of P(1 n |0 p ) and P(0 n |1 p ).
Assuming that the nature of the distribution curve remains the same (i.e., points left and right of the mean continue to do so even on changing the %P FE+ ), we can easily calculate the error probability. If %P FE+ increases, P(1 n |0 p ) remains zero because all of FeFETs which were originally producing an output '0' will continue to do so even after reconfigure. P(error) = P(0 n |1 p ) for this case. Alternatively, if %P FE+ decreases, P(error) = P(1 n |0 p ). To determine these probabilities, we use Bayes' theorem as follows: From our previous discussion, we know P(0 p ) = P(1 p ) = 0.5. P(1 n ) and P(0 n ) can be simply calculated as the probability for the new distribution curve to either lie left or right of the I REF .
For calculation of P(0 p |1 n ) and what is the probability that previously it was ''0'' (''1''). Table 2 demonstrates the P(error) for changing the %P FE+ for various V READ . Note that our model is able to capture only device-to-device variations and does not take into account cycle-to-cycle variations, which are present in real devices. However, the cycle-to-cycle variations will only add to the stochasticity of the FeFET device. Furthermore, we have considered an FeFET device with 100 domains and a very wide channel device. Thus, the cycle-to-cycle variations due to switching stochasticity will not play a huge role since the cycle-to-cycle variations are most prominent in highly-scaled devices with a very few domains [27].

C. IN-MEMORY COMPUTATION WITH FeFET XNOR
The logic-in-memory realization of an FeFET-based XNOR Boolean function can be achieved through coupling two FeFETs together [28] in which a logic value is always stored inside in a complementary manner. The structure of a single FeFET XNOR cell is shown in Fig. 5. For instance, when logic ''0'' is stored, FeFET 1 will be in the low V TH state and FeFET 2 will be in the high V TH state. Correspondingly, the FeFETs are in opposite configurations for storing ''1.'' Depending on whether the value inputted to the FeFETbased XNOR matches the stored value or not, the XNOR output will be either ''0'' or ''1.'' In practice, a matchline (Mline) is first charged to high V dd . Then, when A = B, both FeFETs will be OFF. Hence, no conducting path is formed and the gate output remains at high voltage. Therefore, the XNOR's output provides a logic ''1'' in such a case. Only when A ̸ = B, a conducting path is formed through the Ferroelectric FET (FeFET) that is in a low V TH state. Hence, the voltage rapidly drops and the output provides logic ''0.'' Concisely, if and only if A ̸ = B, the output is logic ''0.'' Otherwise, it is logic ''1,'' which is a realization of the XNOR Boolean function. Further details on the FeFET-based inmemory XNOR is shown in [22]. Fig. 6 delineates the {FeFET PUF + in-memory XNOR}based weight corruption architecture considered in this article. 1 Initially, the PUF array is programed to a fixed random state and the internal states of the cells of the XNOR array are written accordingly, to obtain the desired final weight array [ω final ] that is required for the GNN task. In the 1 Note that the weight corruption scheme is applicable for any standard FeFET-based crossbar without loss of generality.  case of no attack, the PUF is set once and device-to-device variations do not affect the functionality, that is, inference of GNN inference.

D. WEIGHT CORRUPTION SCHEME
Once an attack is detected, the PUF array is reconfigured (rerolled) which changes the original n-bit response from PUF and hence, the final weights from the XNOR operation will be different from the original golden weights. This weight corruption ensures that the attacker is unable to steal the GNN IP (weight mapping).
We re-program the FeFET XNOR cell array and the FeFET PUF after an attack as follows. We first retrieve the golden weights ω f that are stored in a tamper-proof memory [29] and XNOR them with the rerolled PUF weight array, that is, ω PUF_new . This gives us the ω int_new values, which are then updated in the FeFET XNOR memory cells, by setting them in high or low V TH . Now, by performing ω int_new ⊙ ω PUF_new , we can obtain the golden weights back.

III. EXPERIMENTAL EVALUATION
In this section, first, we describe the experimental setup details and then evaluate the proposed work by conducting experiments on GNN.

A. EXPERIMENTAL SETUP
The experiments have been performed on a single compute node with AMD EPYC CPU comprising 64 cores operating at 2.25 GHz, with 480 GB memory. We mimic the hardware-level corruption of weights at a software level by implementing the error distribution model of the proposed FeFET-based reconfigurable PUF. We perform experiments on five bioinformatics datasets, that is, PROTEINS, MUTAG, ENZYMES, NCI1, and D&D. These datasets are represented as graphs, and the classification of these graphs is useful for various bioinformatics applications. We have obtained datasets for our experiments from [30]. Next, we describe the parameters of the GNN topology and error models of the FeFET-based PUF.

1) GNN TOPOLOGY
We perform the experiments on Deep Graph Convolution Neural Network (DGCNN) [18] as discussed in Supplementary information. We use default parameters of the DGCNN architecture [18]. The GNN consists of four GCN layers with output channel dimensions of 32, 32, 32, and 1, respectively. The m value of the SortPooling layer is set to 0.6. Furthermore, the 1-D convolutional layers have 16 and 32 output channels, respectively. Finally, the dense layer consists of 128 hidden units followed by a softmax layer as the output layer. Also, the GNN is trained to minimize the cross-entropy loss using an Adam optimizer.

2) ERROR MODEL
The probability for a particular bit to flip is described in detail in Section II-B. From there, we chose two error models for our experiments: 1) Error ModelA: This model corresponds to changing the state of polarization from 50% to 49 % P FE+ for each FeFET in the PUF. V READ is chosen very low at 0.1 V. As the %P FE+ decreases in this case, the distribution shifts right and there exists a probability for the bits from PUF that were originally ''0'' changing to ''1.'' The corresponding P(error) for the output bits of PUF to flip is obtained from Table 2. Thus, for every bit in [ω final ], if b == ''0,'' the bit is flipped with the probability of 3.58%.
2) Error ModelB: This model corresponds to changing the state of polarization from 50% to 51%. As with the other error model, the duration and magnitude of the reconfigure pulse can be obtained from Table 1 and V READ = 0.1 V. As the %P FE+ increases, in this case, there exists a probability for the output bits from PUF to flip from ''1'' to ''0.'' The value for this can be again obtained from Table 2. Thus, for every bit in [ω final ], if b == ''1,'' the bit is flipped with the probability of 22.11%.

B. EXPERIMENTAL RESULTS
Considering the probabilistic nature of the error model, we report an average reduction in the accuracy of GNNs over ten trials for all the results. Fig. 7 demonstrates the reduction in accuracy of the GNN for all the five datasets consid- ered above over ten trials when all the weights of the GNN are corrupted. 2 The reduction in accuracy is the difference between the accuracy obtained upon weight corruption and the accuracy of the GNN with golden/original weights. It can be observed that the reduction in accuracy varies with the trial. This is observed because of the difference in the number of bit flips and corrupted weights. This observation is further justified by observing the effect of the GNN layer on accuracy degradation, which is discussed next.

1) EFFECT OF GNN LAYER
To observe the impact of the corruption of weights in each layer of the GNN, we corrupt the weights of each layer separately and obtain the corresponding accuracies. Fig. 8 demonstrates the accuracy reduction in the GNN output when weights of individual layers are corrupted one by one, across the eight layers of GNNs (four GCN layers, two 1-D convolutional layers, and two hidden layers) for the PROTEINS dataset. It can be observed that the accuracy degradation varies with the GNN layer, that is, the accuracy degradation due to the corruption in the second, third, and fourth GCN layers is low compared to corruption in other GNN layers. Thus, a defender need not corrupt all the weights of the GNN and can instead choose a particular layer to be corrupted, which lowers the power consumption.

2) EFFECT OF ERROR MODEL
As described above, we consider two error models for the bit flipping or corruption of weights. To observe the effect of the error model, we compare the accuracy degradation between the two error models for all the GNN layers in the ENZYMES dataset as shown in Fig. 9. There is a difference 2 Note that all the results of accuracy degradation have been calculated over ten trials.  in the accuracy reduction between the two error models for a considered layer. There is no particular trend observed between the error models. In some layers, Error ModelA has a higher accuracy reduction than Error ModelB, whereas it is the opposite for the remaining layers. Accuracy degradation depends on the weights' value because the trend varies for the GNN layers between Error ModelA and Error ModelB.

3) EFFECT OF DATASET
Along with the difference in accuracy degradation with the GNN layer and error model, we also observe the variation in accuracy degradation with the dataset. Fig. 10 demonstrates the results of the accuracy degradation for the datasets-ENZYMES, PROTEINS, MUTAG, D&D, and NCI1 for the first and second GCN layers, first 1-D convolutional layers, and second dense layer. This variation is observed because of the change in weights of the GNN model with the dataset. Thus, a defender can choose an error model or the layer to be corrupted based on the dataset the GNN model is designed for.
The defender should consider the time taken to corrupt the weights along with accuracy degradation. The runtime of corruption is important since the defender has to ensure the weights are corrupted before an adversary collects a sufficient number of input-output pairs of GNNs. The time taken for the corruption of weights depends on the magnitude of reconfigure pulse of the proposed FeFET-based PUF. Thus, the magnitude of reconfigure pulse should be chosen in such a way that results in accuracy degradation and thwarts the attacker from collecting the inference of a sufficient number of queries. Furthermore, the magnitude of reconfigure pulse also determines the GNN layer to be corrupted. Next, we discuss the estimated runtime of the GNN model considered in this work. As mentioned above, we consider an in-memory compute architecture of GNNs, which is built using an array of multiply and accumulate (MAC) instances. Table 3 reports the runtime required for operations of each GNN layer. We consider a fixed size of MAC array, that is, 128 × 128, and runtime for the operations of a single cycle of MAC array as 1 ns. A defender can set the magnitude of rreconfiguredpulse of the PUF based on each layer's runtime.

C. DETECTING AND THWARTING PHYSICAL ATTACKS
Physical attacks can be detected in the proposed solution using resistance/capacitance sensor arrays [31] or cryptographically secure mesh structures [32]. To deter an attacker from frequently querying the GNN, a mesh shield can be placed over the input-output terminals of the FeFET array. Any attempt to apply inputs through external leads will alter the data bit sequence through the mesh wires, thus detecting the incursion.
We note that physical incursions like cold-boot attacks are dependent on the delay between the logical turn-off of the memory cell and the time it takes to physically erase its remnant state [33]. Attackers can further increase this latency using cryogenic cooling to reduce the data entropy. In this scenario, the defender could try to erase all the IP information (stored weights) upon attack detection. However, data erasure incurs a write time penalty (O(µs)), which is much larger than the minuscule time taken to corrupt the FeFET PUF array in the proposed scheme (5 ns). Hence, attempting data erasure could still leave the attacker with ample time to obtain enough input-output data, whereas the FeFET PUF-based weight corruption will thwart such attacks.

D. EVALUATION AGAINST MODEL EXTRACTION ATTACK
Here, we first discuss the methodology of the considered attack in [17] and evaluate the proposed obfuscation scheme against it. The steps of the attack in [17] are as follows.
1) The attacker chooses a random network and default weights/connections as the starting point.
2) The attacker then repeatedly queries the golden GNN model to build an I/O dataset. 3) After a sufficient number of I/O pairs are obtained, the random network is trained with them to be almost similar to the original network. However, the individual weights/connections inside this newly trained network will be vastly different than the original network, even though their I/O behavior is very similar. Here, we discuss the attack's results on the GNN model with weight corruption. We launch the attack in [17] for the PubMed dataset. The original GNN model has three hidden layers with a dimension of 256. We consider four scenarios to compare the results, that is: 1) with no corruption in the original GNN model; 2) corruption in the weights of layer 1 of the original GNN model; 3) corruption in the weights of layer 2 of the original GNN model; and 4) corruption in the weights of layer 3 of the original GNN model. Fig. 11 demonstrates the accuracy of the recovered (surrogate) model for Error ModelA and ModelB with respect to the number of I/O queries. We observe that without corruption, the accuracy has increased with the increase in the number of I/O queries, whereas for the corrupted models, the accuracy of the recovered model remains the same, that is, in the range of ∼40%.

IV. CONCLUSION
In this work, we propose a design-for-trust technique to protect the IP of NNs against model stealing or replication attacks that RE the weights of the NN model. In the proposed solution, an FeFET-based reconfigurable PUF is integrated with an in-memory FeFET XNOR array to corrupt the weights of the NN when an attack is detected. The corrupted weights result in accuracy degradation and thus, the attacker fails to obtain a sufficient number of input-output pairs for modelstealing attacks. We perform experiments on GNNs for the application of graph classification on different bioinformatics datasets. We are able to successfully corrupt the weights of the GNN model and degrade the accuracy of graph classification. Furthermore, we showcase an extensive analysis of the effect of layer-by-layer corruption of the GNN weights on its output accuracy. We also discuss various physical attack scenarios against the proposed defense scheme and explain how they are circumvented.