LDPUF: Exploiting DRAM Latency Variations to Generate Robust Device Signatures

Physically Unclonable Functions (PUFs) are potential security blocks to generate unique and more secure keys in low-cost cryptographic applications. Memories have been popular candidates for PUFs because of their prevalence in the modern electronic systems. However, the existing techniques of generating device signatures from DRAM is very slow, destructive (destroy the current data), and disruptive to system operation. In this paper, we propose latency-based (precharge) PUF which exploits DRAM precharge latency to generate signatures. Our proposed methodology for key generation is fast, robust, least disruptive, and non-destructive. The silicon results from DDR3 chips show that the proposed key generation technique is at least ~4,300X faster than the existing approaches, while reliably reproducing the key in extreme operating conditions.


INTRODUCTION
In recent years, the hardware security community has helped to shift industry's attention towards the design of hardware-based security primitives to replace the more expensive and vulnerable software-based primitives. Hardware-based security primitives play important roles in protecting and securing the assets of an electronic system. Identification, authentication, secure communication, IC obfuscation to prevent IC piracy in semiconductor supply chain, detection of counterfeit ICs, etc. are common applications of hardware-based security primitives. Physical unclonable functions (PUFs), true random number generator (TRNG), and anti-counterfeiting are three of the most important hardware-based security primitives.
A PUF is a hardware-based security primitive capable of generating a unique identifying key [1]. Uniqueness is derived from physical variations inherent in the electronics manufacturing process such as resistances, capacitances, and transistor dimensions. These variations cannot be controlled during the manufacturing process and as such are considered unclonable [1,2]. The security of a LDPUF 999:3 robustness against voltage and temperature variation, evaluation time and system level integrity.
The rest of the paper is organized as follows. In Section 2, we present the background of DRAM architecture, read/write operation, existing DRAM-based PUFs and major challenges. We propose the latency-based DRAM PUF in Section 3. The experimental results and discussions are presented in Section 4. We conclude the paper in Section 5.

BACKGROUND AND MOTIVATION
In this section, we provide a brief background of the modern memory subsystem and its operation. We also present existing DRAM-based PUFs and the major obstacles to them in real low-cost applications.

DRAM Organization
For most of the modern computer system, DRAM servers the purpose of main memory. Fig. 1 illustrates the organization of a modern DRAM system. A modern DRAM system maintains a hierarchy of channel, rank, bank, DRAM chips, DRAM cells, and memory controller. Memory commands, address space, data are driven between the memory controller and DRAM modules by a memory channel. The memory requirement can vary from system to system which controls the total number of DRAM modules. A DRAM module is divided into one or multiple ranks. We access a rank in each attempt (reading/writing). Rank consists of several DRAM chips and provides a wide databus together. The same databus is shared among the ranks. A chip select pin is used to choose a particular rank. The width of the databus is usually 64 bits and distributed equally among the chips inside of a rank. Each DRAM chip consists of multiple banks to support the parallelism. In a memory bank, the DRAM cells are arranged in a two-dimensional array. The rows and columns of a DRAM are known as wordline and bitline respectively. The row of a DRAM is also known as the page. The bitlines are connected to the row-buffer (a row of sense-amplifiers). Sense-amplifier acts like a latch. When the DRAM is read, it senses the stored charge of each memory cell and latches the corresponding value. A DRAM cell is the smallest unit and used to store a single bit ('1' or '0'). The DRAM cell consists of two components-a capacitor to hold the charge and an access transistor to access the capacitor. The charging state of the capacitor determines the state of the value ('1' or '0'). Fully charged capacitor represents logic '1' and empty capacitor represents logic '0'. The access transistor connects the capacitor with a bitline and is controlled by the wordline. The DRAM content (i.e., the state of charge in the capacitor) is read or overwritten by activating a wordline. An applied V dd to the wordline creates a path between the capacitor and bitline in order to perform read or write operation. For most of the modern DRAM, a specific combination of the row address and column address can access 64 bits (most common interface width) of data simultaneously by accessing multiple chips at a time. Fig. 2i demonstrates the simplified DRAM read operation. In precharge state, the memory controller generates a PRECHARGE command (PRE). PRE command precharges all bitlines to V dd /2 (green line) and deactivates previously activated wordline. In the next state (activation state), the ACTIVATE command (ACT ) from memory controller activates the target wordline by raising the value of wordline to V dd (violet line). Once the pass-transistor (connected to the wordline) is ON, the charge flows from the capacitor (red line) to the attached bitline if the stored value is '1' and moves from bitline to the capacitor if the stored value is '0'. In the final stage, the differential sense-amplifier senses the voltage perturbation on the bitline and amplifies the (i) DRAM system.
bitline voltage to a strong logic '1' (or '0'). Then, the sense-amplifier latches the logic value from the bitline. Finally, the appropriate column address decides, which sense-amplifier data should appear in the data bus. Reading operation in a DRAM is destructive. After a successful reading, the initial charged state of the capacitor must be restored to preserve its value.

WRITE Operation:
In the DRAM write operation, initially all bitlines are precharged with the PRE command, and then, an ACT command is applied to activate the target wordline. Next, the target column's sense-amplifier is driven to the desired logic value (high or low). This sense-amplifier with desired logic value enables the corresponding bitline to charge or discharge the connected storage capacitor. During WRITE operation, the activated wordline turns ON all access transistors to overwrite the contents of each associated cell. After each successful READ/WRITE operation, the bitlines must be again precharged to V dd /2 to access the new set of memory cells from different wordline.

DRAM Timing
Timing is critical for reliable DRAM operation. All major timing parameters of a DRAM module are presented in Fig. 2ii. Initially, all bitlines are precharged to V dd /2. To access the data from a specific wordline, ACTIVATE (ACT ) command is applied to the corresponding wordline. After that, a READ/WRITE command is sent from the memory controller to sense the voltage perturbation on bitlines or to write a data to the memory cells. The minimum required time interval between ACT command and READ/WRITE command is defined as the activation time, t RC D . The Column Access Strobe (CAS) latency t C L is the minimum waiting time to get the first data bit on data bus after sending READ/WRITE command. After a successful READ/WRITE operation, precharge command (PRE) is applied to deactivate the previously activated wordline (if any) and precharge the bitlines to its initial precharge state (i.e., to V dd /2). If the WRITE command is applied, the PRE command should be further delayed for t W R period (write recovery time) at the end the write data burst. The PRE command is applied for at least t RP (precharge time) duration before sending the next ACT command. The time from activation state to the start of the precharge state is called row active time or restoration latency (t RAS ). The t RAS + t RP is the total time required to access a single row of a bank and known as row cycle time (t RC ). Usually, the t RC is on the order of 50ns for most modern DDR3 DRAMs.
Without changing the DRAM architecture, although two rows cannot be activated at the same time, it is possible to read/write multiple columns in a single row cycle (i.e., activating one row and then reading/writing multiple columns of that row). A system can perform such kind of data access in Burst mode. In Burst mode, instead of accessing data only from specified memory address, multiple consecutive bitlines from same the wordline are accessed (usually 4 or 8 consecutive locations from the address requested by the memory controller).
The DRAM manufacturer provides the minimum required timing latency to perform a reliable read/write operation. We can expect erroneous read/write if the minimum timing latency is not maintained. It has been observed that, during the read operation, the failure to ensure the minimum t RC D , t RAS and t RP can lead to [20]: • Observation 1: A reduced t RC D only affects the first accessed column/cache line.
• Observation 2: A reduced t RP might affect almost all cells of a row.
• Observation 3: Almost no bit error is introduced at the reduced t RAS .
(i) Signal waveform at reading cycle.

Existing DRAM-based PUF
2.4.1 Retention-based DPUFs: DRAM cells need to be refreshed with a periodic interval to ensure the integrity of the memory contents. The maximum allowed retention time is directly linked to the charge leakage across the memory cells of the DRAM module. The probability distribution of the charge leakage rate depends on several factors of the DRAM module such as: • The manufacturing process variations on the charge storage capacity among the memory cells [5,21]. • Operating voltage, temperature, and device wear-out. [5]. The DRAM contents need to be refreshed periodically before the cells lose their original value. According to the JEDEC standard, the refresh interval has to be 64ms/32ms [22] to ensure the data integrity against any hostile environment. Failing to refresh within this time interval can alter the memory contents. The increment of refresh interval by a sufficient margin will cause random data failure across the DRAM chips. This error pattern is unique from chip to chip and is used to generate device signatures.
The retention-based device signature is promising but suffers from several drawbacks. First, for most of the DRAM module, the periodic refresh operation is handled internally by a memory controller. There is no efficient way to control this refresh time for an arbitrarily small region of DRAM module as the granularity for such refresh operation is predefined by the vendors. An authentication key of sufficient length can be generated by retention failure from a small portion of a DRAM module, but the whole operation may cause unwanted data corruption of other memory cells under the same granularity [22]. Second, a key of sufficient length requires an adequate number of errors which might need a long waiting time (order of minutes) [18]. Third, the retention time is heavily temperature dependent. Therefore, the key is sensitive to temperature variations [15]. The bit error rate (BER) decreases exponentially with the temperature. i.e., at a lower temperature, the key generation scheme requires a longer time interval between two refresh operations. The required time to generate the key is also a function of the size of the memory segment. A smaller segment requires longer evaluation time than a bigger one [18]. Therefore, the designer must decide on area vs. time overhead. Several techniques can be used to address above challenges but with limited scope [4,15,21,[23][24][25].

2.4.2
Latency-based DPUFs: We know from Section 2.3 that the reduction in different DRAM timing parameters introduces erroneous read/write operation. This latency-based failure creates the opportunity to generate faster device signature generation. The latency-based failure is random across the whole DRAM modules because of random process variations. Like other PUFs, the latency-based error pattern is unique from module to module. Recently, Kim et al. [18] proposed a DRAM PUF by manipulating the activation time (t RC D ). Like retention-based DPUF, the reduction of t RC D introduces random errors across the whole chip which can be used to generate device signature. The evaluation time is much faster than the retention-based DPUFs. Their reported result shows that the mean evaluation time is ∼88.2ms and outperforms all previously proposed retention-based DRAM PUFs [15,26,27]. However, the throughput is still low because multiple row cycles are needed to evaluate the PUF response. Furthermore, this type of latency based DPUF still needs a filtering mechanism in each access which adds both hardware and computational overhead.

Other DPUFs:
In start-up based DPUF [28], the device signature is generated from the start-up states of DRAM cells. Initially, the bitlines are charged to V dd /2. But the process variations on the storage capacitor slightly deviates the bitline voltage and to V dd /2 + δ or V dd /2 − δ , where δ represents a small amount. The sense amplifier senses the voltage difference to '1' or '0' accordingly, which can be effectively used to generate device footprint. Recently, Hashemian et al. [29] performed Monte Carlo simulations and showed that a shorter duty cycle of the write enable signal might cause write failure randomly in DRAM memory cells and can be used to generate device signature.

Motivations
The major limitations of existing techniques for generating robust key form DRAM chips are summarized below.
• Waste of DRAM Power Cycle: Start-up based key generation requires a DRAM power cycle to obtain device signatures [28]. Hence, the whole system needs a turn-off and a turn-on to evaluate the PUF operation. Therefore, this type of PUF cannot be evaluated while the system is in operation. • Large Evaluation Time: Retention-based key generation requires a large amount of time to generate a key. Order of minutes is required to generate enough retention failures [15,26,27]. Latency-based DPUF can be a superior solution, but the existing one still needs multiple row cycles (reading one data burst at each cycle) to evaluate the PUF key [18] as the reduction in activation time only affects the first few bits in the cache line (see Section 2.3). • Destructive: Retention based key generation is destructive. The DRAM granularity causes random bit failure throughout the smallest granular region (usually a rank). Note that the DRAM refresh can be disabled only at the granularity of channels [22]. A dedicated memory might need to be used to overcome this problem, but it spoils the original no additional hardware advantage of memory PUFs. Like retention-based DPUF, the start-up based DPUF is also destructive. • Disruptive: DRAM granularity keeps entire DRAM rank busy during each access from that rank. As the evaluation time of a retention-based DRAM PUF is the order of a minute, such kind of PUF evaluation blocks the access on the target DRAM region by other applications for a long time. Though the existing latency-based DRAM PUF [18] solves the problem of long evaluation time and unwanted data failure due to granularity, it still needs a filtering LDPUF 999:7 mechanism to evaluate PUF in each access which introduces additional computational and hardware cost.

Generating Device Signature
In our proposed technique, we characterize the DRAM cells at the reduced t RP to find the most suitable cells for generating the quality signature. The latency is defined as the time required to move charge during read/write operation. In modern DRAM architecture, multiple DRAM cells are connected to the same bitline through access transistors. To access the memory cell properly, all bitlines should be precharged to V dd /2. Like transistor and capacitor of a DRAM, the variations on RC path delay and the capacitance of the bitline follow the Gaussian distribution [30][31][32]. Due to this distribution, each bitline requires different t RP to be precharge itself properly. At reduced t RP , partially precharged bitline may cause wrong logic interpretation in sense amplifier (as explained in section 2.2). Furthermore, a partially precharged bitline may interact differently with the memory cells which are connected to it. This phenomenon can be explained using Fig. 1ii. In the figure, the content of first and last memory cell connected to the bitline B 1 , travel through the path of different length while sensing by the same sense amplifier. Hence, an insufficiently precharged bitline might impact differently with the memory cells that are connected to it. This bit error in read data due to random characteristics of bitlines can take part to generate the device signature. In our proposed method, failure bits (i.e. do not consistently obey the original cell content) are generated by reducing t RP during read operation. We reduce the t RP to the smallest possible value to achieve (i) the maximum number of failed memory cells from each memory module and (ii) smallest possible time for PUF evaluation. However, during PUF evaluation, the reduced t RP should be kept sufficiently long enough to deactivate the previously activated wordline to suppress its impact on evaluated PUF.

Characterization
To obtain the robust and unique signature, we characterize the DRAM errors due to the reduced t RP (i.e., partial precharging). The characterization provides us valuable insight on DRAM cells and their eligibility for key generation. The characterization phase is conducted by observing the outputs with different types of input patterns (e.g., all 1's, all 0's or checkerboard pattern). A particular input pattern is applied for several times to study the temporal variation (i.e., measurement variation). Then based on the input pattern dependency (i.e., initially written data) and temporal variation, we categorize the cells into two major types: • Noisy Cells: Error pattern varies from measurement to measurement for these type of cells.
Internal/external noise can influence the outputs of these cells. Some of these cells can be useful to generate random number and rest of them can be used to create PUF but require a large ECC [33]. • Robust Cells: These cells do not show any temporal variation, i.e., cell outputs are independent of measurements. These cells are tolerant to internal and external noise and ideal for PUF.
The outputs at the reduced t RP might depend on the memory cell contents (i.e., written values) due to the coupling effect of neighborhood cells [34]. Based on the data dependency, we categorize the DRAM cells into two major types as well: • Pattern Independent Cells: These type of cells exhibit the same output (at the reduced t RP ) regardless the patterns written into the memory. The experimental results show that (details in Section 4) most of the DRAM cells from the major manufacturers are pattern independent.
In this paper, we have only focused on pattern independent for PUF implementation. • Pattern Dependent Cells: Pattern dependent cells respond differently with different input patterns. These cells can be the ideal candidates to create a strong PUF [35].

Cell Selection Algorithm
In this paper, we focus only on the pattern independent cells. Pattern-dependent cells are suitable for strong PUF which is our plan for future. The experimental results show that some of the pattern independent cells are strong '1' and some of them are strong '0'. Besides the reproducibility, it is important that the generated key is random and unique as well. Entropy is used to measure the randomness of a bitstream [33,36]. Entropy measures the number of zeros and ones in a bitstream. We scan each page to find the suitable cells for generating robust and random keys. We observe that the generated outputs using all pattern independent bits of every word (a word is 64 bits wide) suffer from poor entropy. Therefore, all bits of every word are not suitable for key generation. It is observed that some specific bits of every word of a page give a predictable outcome. For example, for a particular memory bank, the first bit of every word of a specific page always read as '0' at reduced t RP . Therefore, the binary string (V 1 ) from the first bits of the words cannot be used to generate keys. The hamming weight 1 of the binary string V 1 is 0%. A 50% of hamming weight, which is ideal for a key, means that the binary string has an equal number of 1's and 0's. Similar to V 1 , we create a binary string V 2 with the second bit of each word in a page. Similarly, the binary string generated from the i t h bit of each word is V i . The i t h bit of the word is considered as the eligible bit if it produces a random binary string V i with a ∼ 50% hamming weight.
To improve the entropy of our proposed LDPUF, we propose Algorithm 1 for selecting the qualified memory cells and their location. In practice, not all binary strings in V = {V 1 , V 2 , ..., V 64 } experiences a 50% of hamming weight. Therefore, we choose only those binary strings which fall into a range of allowable hamming weight (H min to H max ). All eligible bits (of words) from a page R x can be defined as expression 1. If the page R x consists of n words, then we can create a binary string from each word by only accounting the qualified bits. For example, if we consider the i th word W i from page R x , then, W β Rx i is a binary string by taking bits which are the elements of β R x . So, all allowable data bits from the R x can be presented as the expression 2. Here, M R x is a single dimensional binary string containing all eligible data bits from R x .
However, the length of the key can be larger than the number of qualified memory cells in a binary string M R x . In this case, we will have to use more than one binary string from the multiple pages. Algorithm 1 is designed to select the qualified bits from each page. From now on to the rest of our discussion, the b t h bit of the 64 bits data word, accessed from the location (r,c), will be noted as (r,c,b) where, r is the row number (or page number), c is the column number (c th word of the page r). In Algorithm 1, R n , C n , and B n are the total number of rows, total number of columns, and the word width respectively (constant for a specific memory module). Note that in our experiment, we have used 1GB memory modules, where, R n = 16384, C n = 1024, and B n = 64).
In the proposed Algorithm 1, an one-dimensional array R and a two-dimensional array β together hold the memory locations of the qualified DRAM cells. The R holds all eligible row (or page) addresses and β holds corresponding qualified bit number of the page. For example, R = 1, 3, 4, 7 represents that 1 st , 3 r d , 4 t h , and 7 t h rows (or pages) are marked as the qualified rows (see Fig. 3). 2D array, β (on right side) of the fig. 3 represents corresponding locations of the eligible bits. For example, for R = 1, the '2', '5' & '8'. i.e. 2 nd , 5 t h and 8 th bit of all words from page 1 can be used to generate key. if is_pattern_independent(current_mem_data) == true then if row_f laд == true then 25: R (row_count) = r ; 26: row_count + +; 27: bit_count = 0; 28: row_f laд = f alse; 29: end if 30: end for

Registration
In the registration phase, we generate a golden data set (i.e. challenge-response dataset). The golden dataset can be used to generate the key or can be used to identify whether the DRAM chip is authentic or not. The golden dataset is created in a secure environment and stored in the database. Using all qualified memory cells produced by the Algorithm 1, the golden data set is generated by the Algorithm 2. In this algorithm, the goldenDataLoc holds the logical locations of eligible memory cells and the goldenData saves the outputs that are accessed from the corresponding location with reduced t RP . The goldenDataLoc, goldenData and the reduced value of t RP should be used as a golden data set for future authentication. Input: mem_data: A R n ×C n ×B n matrix, containing pattern independent data. An element of mem_data can be empty (if the corresponding memory cell is not pattern independent) or '0' or '1' .

RESULT AND ANALYSIS
Our results are based on experiments conducted with the commercial DDR3 memory modules from two major memory manufacturers (namely A and B 2 ). We used SoftMC (Soft Memory Controller [37]) along with the Xilinx ML605 Evaluation Kit which is embedded with Virtex-6 FPGA. SoftMC uses Riffa [38] framework to establish communication between a host PC and the evaluation board through x8 PCIe bus. To check the design reliability against voltage variation, we used USB Interface The experiment was performed in two steps. First, an 8-bit pattern was written with nominal timing parameter and then read it back with reduced timing parameter. The reading operation was done in a single row cycle, i.e., we activated one wordline at a time and then, read all bitlines with consecutive burst, where, each data burst was able to capture the data from successive 8 bitlines. This whole process was done at the nominal operating voltage and room temperature. The pattern length was chosen according to the burst length of the memory module. To evaluate the error pattern, we first checked the Hamming Distance between the written pattern (input pattern) and the pattern that was read out (output pattern) with reduced timing parameter. Then, flipped bits were analyzed for additional information (e.g., spatial distribution, pattern dependency, etc.). Four sets of 8-bit input pattern (0xFF, 0xAA, 0x55, 0x00) were used to characterize the DRAM cells. For each set of the input pattern, we repeated our experiment five times to study the temporal variation. Independent analysis is done by choosing random memory banks (three from manufacturer A and two from B; each consists 128MB memory cells).
We conducted our experiment on DRAM memory module by changing the activation time (t RC D ), restoration time (t RAS ), and precharge time (t RP ). However, we did not observe any data error due to reduced t RAS which is consistent with the observation made by [20]. We characterized flipped bits which is the result of the reduced t RP .

Reduced Latency: Activation Time vs. Precharge Time
We read a whole memory page in a single row cycle to evaluate the error patterns generated at the reduced t RC D . Two 32-byte (double-data rate) memory chunks were read with each burst (with 8-bit burst length, i.e., eight words can be accessed at a time while each word corresponds to 64-bit data). From now on to rest of our discussion we will use the notation t A,x to presents the reduced timing parameter t A , where t A,x = x × t A . At reduced activation time (e.g., at t RC D,0.38 ), flipped bits were only observed at the first accessed cache line (i.e., in the first 64-byte data). As the DRAM-based PUFs rely on flipped bits at non-standard DRAM operation, to evaluate DRAM based PUF at reduced t RC D needs multiple reading cycles (by accessing 64-byte data in each cycle). All memory banks from both manufacturers exhibited similar characteristics. Such behavior is observed because the target wordline gets enough time to get fully activated before accessing the second content of the cache line (see appendix A). Note that, [20] and [18] also presented similar observation. In our experiment, reduced activation latency-based error was first observed at t RC D,0.57 .
On the other hand, the experimental results show that excessively reduced t RP flips memory cells and affects uniformly while we read the whole page in a single row cycle. Fig. 4 shows the percentage of flipped bits in two random banks from two manufacturers at reduced t RP and with different input patterns. The bit flipping was first observed when the t RP is reduced to t RP,0.57 . With the sufficient reduction in t RP , the bitlines do not get enough time to settle themselves to V dd /2 form their previous states and, therefore, float into an intermediate value [4,20]. We reduced the t RP to t RP,0.57 ,t RP,0.38 , and t RP,0.19 to observe the behavior of erroneous outputs. The results show that, for manufacturer A, the total number of flipped cells are ≪ 1% at t RP,0.38 . The number of flipped cells keep increasing as we keep decreasing the t RP . The total number of errors increase by a huge margin at t RP,0.19 . For vendor B, the total number of flipped cells are ≪ 1% at t RP,0.57 but increase significantly at t RP,0.38 . The pattern dependency of flipped bits count is also noticeable in Fig. 4. At t RP,0.19 , for manufacturer A, the number of flipped cells (flipped to '0') for the input pattern 0xFF (all 1's) is 75.17%. On the other hand, only 25.01% memory cells are flipped (flipped to '1') for input pattern 0x00 (all 0's). With checker board pattern (i.e., 0xAA & 0x55), the number of flipped cells is ∼50%. This is because, for vendor A, at t RP,0.19 , most of the memory cells produce '0', regardless of the input pattern. We found similar observation for all memory banks from vendor B, for which, most of the memory cells produce 1 with the partially precharged bitlines. Fig. 4 also shows that, for vendor B, the statistics of the flipped cells are almost similar at t RP,0.19 and t RP,0.38 . We can conclude from the results that reducing precharge time is superior to the reducing activation time for generating quality keys in a single row cycle. The results show that we can have enough errors at t RP,0.19 to obtain PUF keys.

Cell Characterization
We characterize the DRAM cells to improve the quality of the generated device signatures. We read either the right contents or the flipped of the original contents in a DRAM due to the partial precharging. We characterized the DRAM cells based on their response to the partial precharging state. We studied whether the memory cell content read at partial precharge state is dependent on its content and the contents of neighbor cells. We investigated the spatial correlation in the error pattern. It was found that some of the cells are noisy compared to the other cells at the reduced t RP . To characterize memory cells based on their output patterns, we collected data at t RP,0.57 ,t RP,0.38 and t RP,0.19 . However, we characterize DRAM cells only at t RP,0.19 , so that, we can generate the maximum number of flipped bit for the PUF operation. Note that, cell characterization was done at nominal voltage and room temperature. (1) Pattern Independent: Memory cells from this category always get flipped to a fixed value (either to '0' or to '1') regardless of the input pattern (i.e., originally written value to the DRAM cells). Fig 5 shows the spatial locations of output '0' (left) and '1' (right) across a random DRAM bank from manufacturer A. The results show that pattern independent 1's and 0's are uniformly distributed. All memory banks from manufacturer B also showed the similar type of uniform spatial distribution (not shown in the figure). Therefore, the reduction of t RP is very useful to obtain responses for PUF. (2) Pattern Dependent: The outputs of these type of cells depend on the input patterns written to the DRAM cells. The outputs are affected by the cumulative voltage of partially precharged bitline, stored values, and the coupling effect of neighbor cells. We consider a memory cell as pattern dependent if it provides different outputs for different inputs and shows measurement invariance for at least one input pattern. Fig. 6 shows the DRAM cells which are dependent on input patterns 0xAA. Pattern dependent cells can be used for strong PUF with a large challenge-response pair (CRP) space. Furthermore, spatial locality along both row and column are visible in Fig. 6. Darker line in the Fig. 6 (both horizontal and vertical) represents rows and columns with the higher number of pattern dependent cells. The spatial locality might be used to get the physical to logical address mapping [25]. Fig. 6 is captured from a random bank of manufacturer B, a similar type of spatial locality was found in all memory banks from all manufacturer. The third column (from right) in Table 1 shows the percentage of pattern dependent cells from each bank. (3) Noisy Cells: With partially precharged bitlines, outputs of these cells varies from measurement to measurement. Therefore, these noisy cells are not suitable to be used as PUF. The second column (from right) in Table 1 represents the percentage of noisy cells from each bank. Fig. 7 shows the distribution of noisy memory cells for a random bank from the manufacturer B. This figure shows that the noisy cells are not entirely random (in this case, most of the cells are biased to '1'). Similar characteristics are found in other memory banks from both manufacturers (i.e. most of the noisy cells are biased to either '0' or '1'). Large ECC might be required to use these cells as PUF [11], [12]. The locations of noisy cells from a bank to another are random. However, a proper subset of such kind of cells can also be used to generate the random number which is beyond the scope of this paper. The complete distribution of these three types of DRAM cells is presented in Fig. 8 for a given bank of vendor A. In this figure, we only presented two consecutive data words (64 × 2 = 128 bitlines) from each page. The figure shows that all memory cells from 4n th and (4n + 1) th (where, n = 1, 2, 3, ...) bit of the word generate '0' regardless of the input patterns. One of the possible reasons is that these bitlines are deviated toward '0' (from V dd /2) by a huge margin due to the insufficient t RP . Therefore, generation of the key from such memory cells reduces the overall entropy of the key. Our proposed Algorithm 1 is designed to eliminate such memory cells. Table 1 summarizes the distribution of the cells for two different manufacturers (manufacturer A, and manufacturer B) at t RP,0.19 . The results show that more than 90% cells from each bank of manufacturer A are pattern independent while it is < 75% for the manufacturer B. For both manufacturers, the number of pattern dependent cells is less than 1%. The rest of them are noisy cells. Note that, a very small number of cells from manufacturer B, presented in the first column (from right, marked as 'Correct bits') of the table, were able to retain their actual stored data even at t RP,0.19 with all input patterns.

LDPUF Evaluation:
Diffuseness, Uniqueness, and Reliability are three major PUF performance metrics.
• Diffuseness: PUF device should be able to generate distinguishable responses with different challenges. For LDPUF, we consider the address as the challenge and corresponding cell content at reduced t RP as the response. • Uniqueness: Response from different device should be unique.
• Reliability: Same response (i.e., PUF output) should be generated to its entire lifetime at any operating condition. Table 1 shows that the pattern-independent cells are dominant across all memory banks for both manufacturers. In this paper, we have focused only on the pattern independent cells. We used the Algorithm 1, presented in Section 3 to obtain the logical locations of the qualified memory cells. In this algorithm, we used H min = 0.25 and H max = 0.75 as the input parameters. Ideally, the Hamming distance should be 0.5. A Hamming distance of 0 represents that the PUF is not unique. We completed the registration (i.e., creating the golden data set) using our proposed Algorithm 2. We generated at least one 1024-bit key from each qualified page (or row). However, it is possible to generate multiple keys from each page since the number of qualified memory cells from each page was more than 1024. For the simplicity, we obtained only one key from each page to test the PUF performance. The key generated from the golden data set is used as the reference key. We refer the corresponding address for generating a reference key as the key address. To evaluate the performance of our proposed LDPUF, we created four set of test data in four different operating conditions (will be discussed in 4.3.3). Each test set contains the output data with four different input patterns: 0xFF, 0xAA, 0x55, and 0x00. From now on to the rest of the discussion, all results are shown combinedly for all four inputs patterns otherwise it is specified. The outputs at different operating conditions are compared with the reference key to ensure the robustness of our proposed key generation methodology. We present the major performance metrics below.

Diffuseness:
To check the diffuseness, we measured inter Hamming Distance (inter HD) among the reference keys from each bank (i.e., intra-bank but inter-reference key). A 50% of inter HD signifies that a single device can generate unique keys from each row (i.e., address). The average Hamming weight of 50% also represents that the keys are random. Table 2 shows the average Hamming weight of each key and average Hamming distance among the different keys generated from each bank. The average hamming weight and hamming distance for the banks from manufacturer A are more close to the ideal value (i.e., 50%). Though the average HD and Hamming weight for manufacturer B deviates from 50%, still we did not find any repetition of keys that are generated from a distant page of the same memory bank.

Uniqueness:
To quantify the uniqueness, we measured the inter Hamming Distance (inter HD) of the key from different memory banks. i.e. we measured the HD between the two keys of two banks generated from each key address. We have checked inter HD for all possible combination by taking account all five banks. Fig. 9 shows the inter HD from each manufacturer. This figure only represents the worst case (largest deviation from 50% inter HD) scenario of both manufacturers. For the manufacturer A, the average, minimum, and maximum inter HD are 45.78%, 37.05%, and 52.5% respectively. For the manufacturer B, the mean, minimum, and maximum inter HD are 51.91%, 40.92%, and 72.23% respectively. The above results conclude that the key generated from the proposed LDPUF is unique.

Reliability:
The reproducibility at different operating conditions is presented in Fig. 10. This figure only presented the worst memory bank from each manufacturer (i.e. memory bank with the largest deviation from 0% intra HD). We collected results at four different operating conditions: (i) Nominal Voltage and Room Temperature (NVRT), (ii) Low Voltage and Room Temperature (LVRT), (iii) High Voltage and Room Temperature (HVRT), and (iv) Nominal Voltage and High Temperature (NVHT). These operating conditions were chosen to examine the impact of voltage variation and high temperature. Throughout the experiment, we have only measured the external temperature (environment temperature). The result shows that the memory module from manufacturer A less robust than the manufacturer B in reduced operating voltage. For manufacturer A, we can only change the operating voltage by −20mv without causing an excessive error. On the other hand, the DRAM module from manufacturer B can tolerate −75mv change in operating voltage. Table 3 presents the intra HD under different operating conditions. Column 4 of Table 3 represents the change in operating voltage from the nominal operating voltage (1.5v), and column 5 represents the change in temperature from room temperature (25°C). The results show that all memory banks from both manufacturers are robust against the temperature variation.

Evaluation Time:
The Evaluation Time is the time between we send the READ command to the system and we receive the required amount of data to generate 1024-bit key from the system. Table 4 shows the required number of data bursts (mean) and time (mean and standard deviation) to generate 1024-bit key. The result shows that in the proposed technique, on average, we need < 2 ms to obtain a 1024-bit key which is much faster than the existing approaches. We observe that the longest time to evaluate LDPUF is ∼3.97ms. However, the evaluation time is shown in table 4 also includes the required time to transfer the data in between the host PC and evaluation board. With our proposed technique, the experimental result shows that the average system level evaluation time is ∼6.7µs for manufacturer A and ∼20.5µs for manufacturer B. Note that, on average it required ∼91.2µs to access the whole 8KByte page in a single row cycle.
Note that in our proposed algorithm, we used only the pattern independent cells at reduced t RP . Hence, we don't need to write any explicit data for PUF evaluation. The average system level evaluation time of reduced t RC D based DPUF is 88.2ms [18] Table 3. Intra HD at different operating conditions.
(considering memory banks from manufacturer B) than our proposed method. On the other hand, it may take almost a minute to generate a device signature with retention based PUF [15].

System Level Disruption:
For most of the DRAM chips, the granularity of refreshing the DRAM contents is rank. Therefore, we need to increase the refresh interval for entire memory rank during retention-based PUF evaluation. Hence, it causes random data corruption over the whole rank. Also, due to the long evaluation time of the retention based PUF, the particular DRAM rank become unavailable for other applications for a long time. For our proposed LDPUF, the reduced t RP only affects those cells which are being accessed. Furthermore, we also checked the interference to the neighborhood pages of the target page that is being accessed for key generation. To do so, we arbitrarily selected consecutive 1000 rows from each memory bank at nominal voltage and room temperature. Then, at first, we read the data from all odd-numbered rows with the reduced t RP and after that checked the impact on the memory cells of the even-numbered row with nominal t RP . Our results show that there is no data corruption in the adjacent rows.
However, though the latency based DPUF [18] with reduced t RC D solves the problem of long evaluation time by a significant margin, this type of DPUF evaluation needs a filtering mechanism upon each access which causes both computational and hardware overhead. In our proposed mechanism, determination of eligible PUF cells by cell characterization is required to be done only once on entire DRAM lifetime (see section 3.3). Once the suitable cells for PUF operation are determined, the evaluation of our proposed PUF is straight-forward (i.e. request the response by sending an address and then compare only the eligible cells' content with the golden data). Furthermore, our proposed PUF evaluation has least evaluation time which ensures the smallest stall to the system. Therefore, our proposed LDPUF can be used in run-time which cannot be performed in many existing memory-based PUFs [2,15,26].

Robustness:
We compared the robustness between our proposed LDPUF and retentionbased DPUF in different operating conditions. To accumulate the retention based failure, we chose a memory segment containing 1000 rows from each bank. At first, we stored logic '1' to all memory cells under the segment and then refresh interval was prolonged till we get at least ∼2% failure at NVRT. For a specific bank, same refresh interval was kept for all operating conditions. For the proposed LDPUF, we measured bit failure with four input patterns (0xFF, 0xAA, 0x55, and 0x00) at t RP,0.19 for the same 1000 rows. The Jaccard Index is used to compare the robustness of our proposed LDPUF with the retention-based PUF. For the retention-based DPUF, the PUF characteristics are evaluated from the location of flip bits. For example, in our case, retention-based failure bits are always flipped from logic '1' to logic '0'. But the location of the flipped cells differs from one device to another. For the two set of the measurements (M 1 , M 2 ), Jaccard Index is measured as M 1 ∩M 2 M 1 ∪M 2 , where M 1 ∩ M 2 is the total matched failed bits and M 1 ∪ M 2 is the total failed bits from two measurements M 1 and M 2 [18,40]. For better reproducibility, the intra Jaccard Index should be ∼1. Table 5 shows the comparison between LDPUF and retention-based PUF. The results show that the proposed LDPUF is more robust than retention-based DPUF. The retention-based PUF is more vulnerable to the temperature variation compared to the LDPUF. This is because the retention-based bit failure is mostly emphasized by the charge leakage rate of DRAM cells which has a strong exponential dependence on the temperature [15,26,27,41,42]. On the other hand, the change in t RP is very negligible as temperature changes. The t RP changes only (∼3%) as temperature changes from 27°C to 85°C [43].
Though we did not evaluate the LDPUF with reduced t RC D , the result shown in [18] implies that it can tolerate only a small change in temperature (e.g., 5°C). On the other hand, for the LDPUF based on our proposed method, we observed only a negligible difference in robustness after increasing the temperature by 20°C. The result presented in [43] also suggests that the temperature dependency of t RC D is stronger than the temperature dependency of t RP .

CONCLUSION
In this paper, we characterized DRAM cells based on data failure due to partially precharged bitlines in reading cycle. Throughout the experimental evaluation, we used total five commercial off-theshelf 128MB DDR3 memory banks. We showed that most of the data latched at sense-amplifier are independent of the original content of the memory cells when the bitlines are precharged partially.  Table 5. Jaccard Index at different operating condition for LDPUF and retention based DPUF.
Based on this pattern independent data, we proposed LDPUF which can be evaluated in much shorter time (at least ∼ 4, 300X shorter) compared to the fastest DPUF that is available till now. We also provided experimental evidence that showed, our proposed method has less system interference and higher robustness against different operating condition compared to the other DRAM based PUF. We conclude that our proposed partial precharged based DRAM PUF will provided much faster authentication to the run-time application with smaller system overhead. We also believe that, different characterization information that we provided, can be used to generate other security primitives such as Random number, strong PUFs and to expose the internal architecture of DRAM itself.