Fully-Digital Randomization Based Side-Channel Security — Towards Ultra-Low Cost-per-Security

In this paper we formulate and re-evaluate a recently proposed randomization-based side-channel protection mechanism. The strength of the construction lies with its ability to comply with standard digital design flows and that it provides a security parameter which directly links side-channel security metrics. A detailed leakage model is provided and investigated for the first time, and it is linked to electronic parameters of the randomization mechanism. We develop guidelines and optimization for concrete ASIC constructions, and sheds light on this ultra low-cost leakage-randomization mechanism. The proposed circuit is natural to be utilized without or on top of the popular masking countermeasures. It is demonstrated to be considerably more efficient in terms of attack data-complexity as compared to low-order masking (i.e., number of shares d = 2 ). In addition, seemingly it is a nice and necessary fit to increase the noise when a too low-noise environment is expected, which impedes masking’s theoretical security. Finally, it is discussed that the proposed mechanism is natural to be embedded with masked designs for higher security-levels ( d > 2 ) while lowering significantly their asymptotically quadratic area price-tag as d increase. Robustness results are provided along with post place & route cost estimations for both AES encryption and a more recently proposed permutation such as ISAP. Our design efficiently provides unprecedented three orders-of-magnitude signal-to-noise reduction with a total area-overhead of 21% and 46% for AES and Ascon- ρ , respectively. These factors are more cost-efficient than low-orders masked designs and such mechanisms are sometimes necessary when the inherent noise is not sufficient. However, the joint embedding of the proposed mechanism with masked designs potentially exponentially improve the security level they provide, all whilst enabling electronic-design friendly security mechanism.


I. INTRODUCTION
S IDE-channels analysis (SCA) attacks enable distinguishing internal secret values manipulated by the hardware, exploiting secret dependent internal computations which affect some physically measurable quantities, denoted by leakage. Such attacks have repeatedly underpinned the sensitivity of implemented cryptographic schemes. With this motivation, the National Institute of Standardization and Technology (NIST) competitions for future symmetric-key, e.g., Authenticated-Encryption [1] and public-key Post-Quantum schemes [2], consider SCA security as important factors. To date, various successful single leakage-trace SCA attacks were shown possible for public-key encryption/ digital signatures schemes, both on hardware and software implementations (see e.g., [3]- [5]).
Side-channel protection by masking countermeasures bare asymptotic quadratic cost factors with the desired securitylevel or #number of shares (d) dominated by vectormultiplications [6]- [11]. Masking implementations are also quite expensive and complicated due to randomness handling (refreshes) and their amount (generation) [11], [12]. However, considering all inherent masking assumptions take place, theoretically the masking approach provides exponential security with "only" polynomial-cost (quadratic) as d increase. An important added value for embedding circuitry randomization mechanism, such as the ones promoted in this manuscript, is that in many cases the inherent level of noise required to comply with masking's statistical security bounds is insufficient. For example, we can consider the evaluation in [13] showing insufficient software implementations security levels which impedes fulfilling masking full potential. This implies that some underlying noise-embedding mechanism is a must for even standard edge/IoT devices which manifest a rather low noise-level. This is one of the motivations which highlights our goal: designing an ultra low-cost leakage randomization mechanism which is electronic design automation friendly.
Side-channel protection by noise addition is traditionally considered not-sufficient to provide efficient side-channel security. Moreover, such mechanisms are hard to link with concrete security parameters and metrics. Naively assuming that physical noise is linearly expensive with the security level, i.e., noise is assumed to be proportional to area utilization in conventional micro-electronics 1 . The last challenge of noise addition mechanisms is that non-conventional noise addition solutions are hard to embed within standard designflows or require special IPs even if efficient [15]- [18].
In [19], [20] it has been demonstrated first, that it is possible to embed noise-generation on power lines with ultra-low electronic cost, utilizing standard electronic design tools and second, that such mechanisms can be added independently to very localized blocks, i.e., independently randomizing the leakage stemming from internal variables manipulation of small number of bits. Our previously proposed technique is based on localized embedding of randomizers which inflict uniform distribution of the side-channel leakage. These tiny randomizers were implemented utilizing standard powergates (PGs) with a unique sizing methodology which is a reminiscence of Binary Weighted Resistor DAC (Digital to Analog Converter). The relative cost of the countermeasure is very low, especially for small values of the security parameter (number of PGs). Therefore, it was discussed as a perfect match to emulate noise in order to then amplify it with countermeasures which are exponential with the noise-level, i.e., masking. By doing so the overall cost can be significantly reduced owing to smaller masking orders.
In this paper we contribute in the following aspects: (1) we discuss how by smart PGs sizing tactics, relatively to the inherent loads of the circuit, it is possible to reduce the cost of such countermeasures and make them more secure, making them even more attractive, with or without masking on-top. (2) the paper elaborates on how to optimize and set parameters for the countermeasure giving a concrete security target, and (3) how to formally argue the achieved security level and perform security analysis. Finally, we provide a leakage model and support it with simulated ASIC measurements while connecting cryptographic SCA-security metrics, 1 A similar argument holds for algorithmic-noise [14] and projecting so advanced permutations such as Ascon-ρ (also utilized by ISAP).
The highlight achievements of the mechanism is that it enables concrete security levels: (e.g.,) three ordersof-magnitude signal-to-noise reduction with a total areaoverhead of 21% and 46% for AES and Ascon-ρ, respectively. But more importantly, this security level is parametric for the security-architect use, as discussed below. Theoretically, these factors are more cost-efficient than any loworders masked designs which can not achieve such security levels for example in cases the inherent noise is not sufficient. We underline that this is achieved whilst enabling electronicdesign friendly security mechanism and no IP-based design nor digital-flow unsupported steps. Finally, the joint embedding of the proposed mechanism with masked designs potentially exponentially improve the security level they provide, as they are can randomize in the masked "share"-level.
Noteworthy, our results follow physical evidence from a complex 65nm ASIC chip in [20]. However, in this manuscript we provide analysis on the correct utilization of the technique supported by a model and an in depth analysis of the security it provides.
Paper organization. The manuscript starts with a short background discussion including a general-perspective and some necessary reminders in Section II. In Subsection II-A we provide a model for the randomizer's influence on the leakage, supported by general cost estimation, security optimization while discussing security tradeoffs. In Section III we follow with a more detailed security-tradeoffs evaluation while utilizing the SNR and the leakage distribution as the main tools for evaluation. The modeling effort and optimization is followed in Section IV with extracted ASIC transient/noise simulation data which enables evaluation of the concrete security of the mechanism, and a more concrete evaluation of the design parameters needed to achieve maximum security. Finally in Section VII the main conclusions are listed along with directions for future-work.

II. BACKGROUND
Side-channel protection pose a significant challenge for hardware designers. It is a topic of vast research interest within the Circuits and Systems (CAS) society, reflecting basic limitations of available design-techniques and circuits for hardware-security aspects. It also reflects an inherent challenge for cryptographic designs and primitives seeded by parameters from the electronics. Typically, the needed parameters (noise, composability, leakage-distribution and leakages independence), are: (1) very slow to estimate by device-level simulations/noise-simulations and hard to argue otherwise (2) security-metrics are not automated/supported by Electronic Design Automation (EDA) flows. Therefore, a vast interest is observed in security communities in general and needs/requirements are already set by standardization organizations such as common-criteria [21], NIST/FIPS [22], ISO/IEC 15408, ISO/IEC 17825 and BSI. However, regu- lations are changing slower than appearance of attacks and typically, some of the standards allow rather low-level of assurance and security in the context of (e.g.,) side-channel attacks [23].
Side-channel leakages encompass information related to internal computations within the hardware. Relating to leakage randomization mechanisms, it is understood that the desired modulated leakage should distribute uniformly to provide maximum entropy. Otherwise stated, "stretching" the inherent noise by utilizing randomization mechanisms, which in-turn reduces the effective Signal-to-Noise ratio (SNR) observed by an adversary. The main trade-offs are clearly area and energy cost. SCA literature is packed by either (1) randomization mechanisms which are not natively standard for EDA-flows (e.g., [16]- [18]), (2) randomization mechanisms which are not provided with a parametric security level (e.g., [15]- [18]), and (3) naive logical-randomization by duplication of logic or PRNGs.
In [20] a randomization mechanism supported by a security-parameter from the physical implementation was proposed. Similarly to a security parameter such as the number of key bits from cryptography theory, the level of leakage randomization (or number of randomizer states which affect the uniformity of the leakage distribution and its variance) are parametrized with a very area/energy efficient methodology. It enables setting this parameter to match an SCA-security need as a function of circuitry parameters by designers (e.g. inherent resistances and loads of a technology). In [20] standard metrics were utilized to evaluate the SCA-security of a leaky cryptographic primitive. Namely the cryptographic SNR [24] and the Mutual-Information (MI) [25]- [29]. To keep the discussions in this manuscript simple, without the loss of generality, and the analysis comprehensive we stick with one metric, namely the SNR which is faster to compute and easily linked/comparable with prior-art.

A. MODEL -SECURITY AND COST
The low-cost local randomizer demonstrated in [20] is a reminiscence and adaptation of the conventional device sizing in a Binary Weighted Resistor DAC (BWR-DAC); as schematically illustrated in Fig. 1a. Within this topology the k + 1 random input bits of the randomizer, D, are weighting a {V dd , G nd } connection per bit of a parallel resistors bank. The size of each of the resistors is proportional to a base-2 power series. Such weights distribute the output voltage uniformly across the full rail-to-rail voltage span. As illustrated in Fig. 1b, the proposed randomizer has its main similarity in the base-2 power series sizing of the k+1 Power-Gates in the bank. For keeping the discussion simple we set the transistors channel length to L min while their width is increasing with W i = 2 i ·W min . The global and local powerlines, V ddg and V ddl are connected at both ends of the bank. The transistors, controlled by random input bits, r, modulate the effective resistance of the network. Perhaps the main difference within our construction is the existence of another parallel connected Bias device. This difference is significant as discussed in what follow. In order to assure a safeguard maximal resistance and to prevent power-starvation of the local logical-blocks supplied, the Bias (B) alwayson is therefore connected in parallel to the bank, illustrated with a blue background shading. The minimal and maximal effective resistance of this construction and their normalized equivalents (denoted by N ) are: and ρ being the device sheet resistance. The total area-utilization of the construction is proportional to B + S k , i.e., exponential with the number of levels or the number of random input bits of the construction.
We begin with a mathematical model of the mechanism's resistance. Considering Fig. 2b, the normalized resistance values of the construction for all r[k = 4 : 0] bits states is shown for different Bias values ∈ {2 5 , ..., 2 9 }. Clearly, when the Bias is small, the resistances, drawn from 1/(Bias + k i=0 r[i] · 2 i ), will take values not uniformly spread, as can be captured from the Bias = 2 5 curve. The larger the Bias would the values be taken from the more linear section of a 1/(α + x) curve, as shown for larger Biases curves (and illustrated in Fig. 2a). Setting the Bias value correctly enables a designer to control the distribution and set it to approximate a discrete uniform. Considering Fig.  2a, a side-effect is that as the Bias size increase, the span of the effective resistance values (R on the Y axis) decrease, and so does the randomization variance. Fig. 2c shows the histograms of the modeled effective normalized resistance for a k = 5 scenario for several different Biases. It is possible to see that as the Bias increase in dimensions, the distribution becomes more uniform with this exemplary mathematical model. The zoomed-in subplot shows that Bias = 2 7 = 2 · S k or Bias = 4 · S k is quite sufficient to achieve a quasi uniform distribution.
Assuming a designer is correctly utilizing the mechanism, i.e., samples are drawn from the quasi-uniform region, we can evaluate the (ideal) leakage model distribution parameters. Next, we relate to the distribution of the leakage, in a general case regardless of the manipulated data, i.e., leakage variance due to the randomizer alone: The modeled distribution is that of a finite Gaussian mix- For simplicity, let's assume that ∀i, σ i = σ, let's further assume w i = 1/(2 k+1 + 1) (uniform random input assumption) and that µ i = a + b−a 2 k+1 +1 · i (i.e, BWR structure generates a uniform quantization of the span), then naturally: where b and a represent R min and R max from above. For the variance we can write, , an exponential relation between σ tot and k. Or, alternatively put, σ tot is roughly proportional to (R max − R min ). However, as discussed next such an ideal distribution is hard to get in practice due to physical limitations, and for relevant parameters span we roughly achieve a linear to low-order polynomial relation between σ tot and k. E.g., in the device-level simulation section below (Section IV), a second-degree polynomial relation is observed approximately.
More generally, the developed model also corresponds to a leakage distribution with some manipulated data, The effective resistance from the power-supply, V ddg , to ground of a logic block is the serial summation of the randomizer and the logic block resistance, R + R logic . Assuming the logic block resistance is proportional to the Hamming Weight (HW), and that additive noise exist, we can write R + α · HW(data) + N . The leakage (i.e., current) will be proportional to this resistance owing to the constant and stable global voltage V ddg .
For simplicity we can assume that data manipulation changes only the mean of the leakage distribution. That is, Pr[l|data, r] = f (l; µ r +α·HW(data), σ i ), where r denotes the randomizer's state ∈ {0..2 k − 1} and σ i , reflects noise factors independent from the randomizer.
In order to evaluate the security-level provided by the mechanism, it is possible to compute and evaluate the standard-deviation (σ i ) of the samples. Fig. 3a shows the effective (total) standard-deviation, σ tot , versus the Bias dimensions for different k values simulated in Matlab. Constants were set with standard values derived from circuit simulations, i.e., α and R values and Gaussian noise with SNR=10 −2 . As discussed above in our more tentative explanation: for a set k value, increasing the Bias reduces the computed σ tot . However, the more interesting observation relates to Fig. 3b where on each curve we set a different Bias and plot σ tot versus k. As expected, the security-level of the mechanism increases with k increase. However, in order for our uniform-distribution assumption to approximately hold we demand that Bias ≥ 2S k , with this exemplary set of model parameters. Nevertheless, we do note that for practical scenarios, parameters values and sizes, and for cases of combining this protection mechanism with other approaches (such as masking), k is a very effective security-parameter as small values are required.

III. LEAKAGE MODELING AND SNR EVALUATION
We continue in this subsection with leakage modeling and computing a more 'standard' cryptographic SCA metric, the SNR [20], [24]. The SNR evaluates the univariate security level and it is a good and sound estimator in the statistical sense especially when the noise in the leakage is Gaussian. In our case the total noise is a modulation of a Gaussian noise over a discrete uniform distribution (i.e., a Mixture). The SNR which is a metric commonly used for security evaluation is still an indicative estimator for the security in our case. Moreover, a nice property of the SNR is that it directly indicates the level of informativeness in the leakage and closely connected to attack Success Rate and (e.g.,) correlation based attacks, CPA, [30], [31]. This is as opposed to detection-based approaches, such as T-test based, which only distinguish whether some information exist in a specific of a random scenario, regardless of its exploitation (various discussions appear in [32]- [34]). Therefore, this was the metric of choice here without the loss of generality regarding the results.
In this section we model the Hamming Weight leakages of an 8-bit secret variable, we follow by modulating these leakages by the effective resistance which is the outcome of the BWR-sizing based randomizer (Fig. 1b). All simulations are performed with a sample set of 10 7 leakages and in each leakage-cycle the k-bits of the randomizer ∈ {2, . . . 6} are drawn uniformly at random. The noise level of the inherent physical noise, i.e., σ i ∀i, takes a reasonable range of {10 −3 , . . . 10 1 } 2 . For all scenarios the Bias was set to 2 k+2 so as to abide an approximately uniform resistance distribution from above. Fig. 4 shows the resulting SNR. All x-and y-axis in the figures are in log-scale. Fig. 4a illustrates the achievable SNR versus k for different noise-levels and Fig. 4b  the SNR versus the noise-level for different k-values. As shown, for a given noise-level, the security increases linearly in a log-log scale with the security parameter. Clearly, it is more easy to capture this behaviour with small k-values. However, for large inherent noise level it saturates as the sample-set of 10 7 traces is not enough and, as explained above, the span of the distribution saturates. In addition, considering Fig. 4b, especially for low block-inherent noiselevel, σ n the randomizer affect is very significant, reducing the SNR from 10 −2 with k = 2 (low level of added security) to 0.5 · 10 −5 with only k = 6. These parameters (k = 2 to 6) in terms of implementation cost are quite negligible for practical scenarios as discussed and demonstrated on a secured full AES test-chip in 65nm [20]; they occupy less than 25% of the total area.

IV. SIMULATED ASIC-MODEL CORRESPONDENCE
In this section we follow with linking the mathematical model and the modeled security parameter to a model derived from an industrial Process Design-Kit PDK simulation environment. Our evaluation environment is seeded by physical extracted parameters, we next relate to Fig. 5a. We investigate here a 65nm PDK devices with a Power-Management Kit (PMK) supporting the PGs required. We start with a k = 4 scenario (i.e., 5 parallel PGs) with power-grid resistance and capacitance (R ext , C ext ) and internal power-delivery network capacitance of V ddl (C int ) evaluated post-extraction from the physical block, as illustrated on the figure. The underlying logic (Logic) protected is a simple 4-bit Present VOLUME 4, 2016 algorithm synthesized Sbox (results can be easily generalized to other Sboxes). The entire Logic block also incorporates input and output registers, another Sbox at the output reflecting a physical-load and key-addition at the inputs.
Considering Fig. 5b in this evaluation environment the Logic load (both resistive and capacitive) are technological parameters which are generally set by the technology provider. We denote by R logic the on resistance of the logical block in a given state. We further relate to the mean resistance of the randomization mechanism (R mean ). This latter parameter is dominated by the Bias device.
Before we follow with experimental results we list several conflicting effects which are induced due to a reduction in R mean , i.e., increase Bias size: • Negative: It increases the average signal or alternatively increases the voltage drop (∆V ) over R logic . This in turn increases the exploitable signal. • Negative: It reduces the total randomized leakage variance as discussed in the previous section. • Positive: It increase the leakage uniformity as discussed in the previous section. • Positive: A physical/ technological effect is that it increases both C int and C ext and therefore increases filtering effects which generally lowers the exploitable signal.
Referring to Fig. 6a, we have performed a transient/transient noise simulation of the aforementioned circuitry. The clock frequency was set to 500MHz (following Cadence Genus Synthesis). The figure shows the local voltage (V ddl ) span where the nominal is 1.2V on the left (blue) y-axis. The global current, driven through the mechanism to the main power supply (V ddg ) is also showed on each of the plots corresponding with the right y-axis (orange), span- It is noteworthy that the maximal voltage drop is about 50mV which is expected due to the fact that power-gating library devices are designed to drive large currents. In fact, these characteristics are and should be verified by-design utilizing standard UCF/CPF design-flows. In Fig. 6b the leakage distribution of the minimum current (in each clockcycle) is shown per each of these settings 3 . The important and interesting aspect which we can observe is that as more and more bits are utilized by the randomizer, the leakage distributes more and is becoming more and more uniform as discussed above. However, clearly a designer needs to set k as such that different leakage lobes overlap. Therefore, the baseline noise and the Bias size are important parameter.
A complementing view of the transient-noise simulation, from which noise-level can be computed, is the noiseless simulation. Transient-noise simulations of large blocks are compute-intensive and therefore, after evaluating the noise level, it is possible to compute the SNR from such noiseless leakages. In this case, the inherent noise level (σ) is set within the analysis tool (e.g., python). Fig. 7a shows the maximum SNR achieved over time, max t (SNR) with 5·10 6 traces (denoted by leakages). As the distributions indicate, the more randomization is consumed by the mechanism the SNR reduces. As compared to a baseline curve (no protection), the traces from randomization-disabled (all-open) design already provide a considerable SNR reduction by a factor of about 8. At the other extreme, a more than 2 orders of magnitude SNR reduction is achieved with as little as k=5. In this case the Bias size was set to be x2 0 similarly to the minimum device of the network. As discussed, following the transient-noise simulation of the circuitry, it is easy to estimate the actual noise level as indicated on the figure with an ellipse.
Another interesting point relates to the randomnessthroughput (RT) as illustrated by Fig. 7b: PGs which are mainly built to power-on/off cores reflect large input capacitance and are therefore slow to react on inputs change. In this sense, if RT is increased, in essence the randomizer might not suffice to settle on the new state and in-turn the effective randomness bandwidth is "cut". An alternative view is that the span of the distribution reduces if RT is larger than the switching-time of the PG. Therefore, per PDK a designer will need to find the minimum RT to enable correct operation. These values in fact exist within standard PMK by the onto-off timing characteristics of the PGs. As shown on the figure, with RT of 5bits/12ns (i.e. fresh randomness in each sixth cycle), the mechanism operate as required and provide maximum security (minimum SNR). It is important to stress that, reducing RT more will, at some stage, reduce security as an adversary will captures more consequent traces at the same randomized state.
Finally, we evaluate the effect of Bias up-sizing. the top to the bottom plot), whereas Fig. 8b is showing the corresponding SNR levels of the different designs. In this example the RT was set to 5-bits/2ns. As discussed above, though theoretically we would like to increase the Bias to maintain a perfectly uniform distribution, for concrete technological parameters its negative effects outweigh the positive ones: both the total leakage variance reduces and generally the leakage signal increases. The significance of these effects is observed by clear SNR reduction in Fig. 8b where the blue circle-denoted curve with the smallest Bias size provides minimum SNR.

V. COST VS. SECURITY PARAMETER
Generally, area utilization cost factor of the masking countermeasure is in the range of d to d 2 [6]- [12] where, it depends on the implementation and level of serialization/parallelism and the ratio between linear and non-linear Boolean gates used to represent the algorithm. It also depends on how efficiently refreshes and masked multiplication gadgets are implemented. While multiplication gates complexity is of d 2 , the best known randomness complexity ranges between ⌈d 2 /4⌉ and ⌊(d(d − 1)/2⌋. Simply put, even the lowest security-order, d = 2, best masking design will not cost less than 200% in area utilization and steeply increasing with d.
On the other hand, the proposed randomization technique provides exceptionally low area utilization figures with a significant (and parametric) security-level. That is, with only K=3, it provides three orders of magnitude SNR reduction and a cost of ∼20% increment in area-utilization, or for higher security level with K=5 only 36% (further lowering the SNR) for a fully parallel AES as illustrated in Table 1. The table lists the different cost factors associated with two exemplary symmetric ciphers, the AES and Ascon-ρ 4 , both of 128 bits. For the AES case, it was partitioned into power-domains (PDs) accounting naturally to 8-bit internal variables taking up 30 PDs for a fully parallel rounded implementation. The efficient Sboxes representation used was Canright's composite tower-field base one (GF((2 2 ) 2 ) 2 ) [36]. Taking the same tactic for the ISAP algorithm we have 4 instantiated also by ISAP-A [35] synthesized a fully parallel round of the Ascon-ρ permutation while grouping two Keccak Sboxes together per PD [37]. Area utilization values are listed in the table for: (1) vanilla (no protection) synthesized and placed and routed designs, (2) the partitioned and per-PD added overheads (e.g., spacing for power rings encircling PDs) and the cost of the randomization mechanism, randomness storage and powergating, and (3) the total overheads for each of the blocks is also listed, accounting for the additions of the PDs and randomization mechanism for all PDs in a block. I.e., the AES required about 30 PDs and the Ascon permutation required about 40 PDs. As shown, the overheads are computed in the last rows of the table for different K values. In fact, the value of K=3 for the AES, was manufactured and tested (as demonstrated in [20]) and denoted by a * in the table. This design provided SNR levels even lower than the modeled/simulated values, which we refer to here as upper-bounds. This is natural as the analysis performed here was quite conservative; i.e., only accounting for a single block without algorithmic noise or other noise factors which are present in a complete system, and not considering measurement equipment limitations (noise and resolution). For K=5, which provides (pessimistic) SNR values of down to 10 −5 , we list an area overhead of just 36%. For ISAP and K=5, area overhead increases to 79% owing to the very small Keccak Sboxes as compared to the AES ones. As an example, Fig. 9 illustrates a Cadence Innovus pre-placement powergrid layout for 8 PDs, illustrating area overheads and areautilization required for a k=4 randomization mechanism. FIGURE 9: Illustration: exemplary power grid layout configuration (partial); fully automated and CPF/UPF flows supported. Keccak Sbox  2·17  2·17  2·17  2·17  2·17  2·17  2·17  2·17  8bit AES Sbox  120  120  120  120  120  120  120  120  Per power-domain (PD) Overhead  0  7  7  7  7  7  7  7  Randomization mechanism  0  0  5  8  14  21  29  41  Total randomization and PD overhead  0  210  360  450  630  840  1080 1440  Total Area AES (fully parallel Rounded)  3000  3210 3360 3450 3630  3840 4080 4440  Total Area ISAP-A 128b (Ascon-ρ) fully   Throughout the analysis performed we have evaluated the robustness of our design at various supply voltages (ranging 0.9 V to nominal 1.2 V) and temperatures in the range of -20 • C to 70 • C in simulation. However, the trends received were fully anticipated by the power-gates .lib characterization: owing to the dimensions of the PG cells, they are designed to provide minimal IR drops which can be simulated and accounted for by design and (e.g.,) adapt timing requirements accordingly owing to the randomizaer's worst-case which induces the largest propagation delay. As for one example Fig. 10 shows the local voltage and current flowing to a PD while operating with different randomizer's states. We have computed the worst case voltage-drops (∆V DD ) at different process design corners (i.e., slow-slow SS, typical-typical TT, fast-fast FF, and all combinations) in a monte-carlo run representing the 6σ distribution point. Generally, the montecarlo {SS, TT, FF} corresponding values are lower than the ones illustrated on the figure. The maximum voltage drop of a 65mV was captured with a SS corner, 60mV for the FF corner where the TT one was about 40mV. These values pinpoint the robustness of the mechanism to maintain a relatively low IR drop. Such drops only enforce us to select different .lib files for the timing analysis with a maximum change of 100mV in characterization which is supported by-design for the standard-cells and highlights small to moderate timing changes. All these results support the verifiability and EDAapplicability of such a methodology.

VII. CONCLUSIONS AND FUTURE-WORK
side-channels analysis (SCA) attacks have repeatedly underpinned the sensitivity of implemented cryptographic schemes. Launching massive efforts by the National Institute of Standardization and Technology (NIST) for both secured Lightweight Authenticated-Encryption and public-key Post-Quantum schemes as well as efforts for evaluation metrics and criteria and availability of security-embedding design tools. Security solutions which are seeded by securityparameters derived directly from the hardware should naturally provide far more cost effective solutions than mathematical-only solutions. In this research we exemplify such a scenario which can significantly reduce the price-tag of SCA secure designs. I.e. generally, a 3 orders of magnitude SNR reduction, increases adversary's data-complexity with the same factor; if the proposed mechanism is utilized alone, its hardware overheads are negligible, that is for k values of up to 6 we have witnessed area cost of merely up to 48% of the entire area which is much lower than any maskingbased countermeasure with minimal security order (d=2). In addition, a d=2 masking can theoretically provide a d'th power in data-complexity with the SNR −1 (or noise) at the base. Implying that the proposed mechanism can be more efficient stand-alone than masking for low-orders. If higher security levels are required, and e.g., if masking is evaluated to be used simultaneously, such a factor can quadratically reduce area/energy cost of the entire system as the masking order (d) can be linearly reduced with the SNR decrease. Another important aspect is that in low-noise scenarios some underlying noise-embedding mechanism is anyway a must for masked designs as demonstrated in [13], pinpointing the importance of the proposed mechanism. Therefore, the proposed design support our objective of demonstrating a fully-digital randomization based SCA security mechanism which provides a state-of-the-art cost-per-security in the class of EDA supported and security-modelled solutions.
A natural future work would be to evaluate the proposed approach embedded on-top or along masked circuitry and to tailor different combining apparatus to more efficiently reduce the cost of SCA protection. Moreover, an important direction would be to provide ameliorated tools on-top of commercial PDKs to enable faster integration as described in this paper. As such, even-though the approach presented VOLUME 4, 2016 here can be easily embedded by any experienced engineer in the field, our goal would be to open-source improved parsing and embedding flows easily automated at the RTL level and for place & route tools.