Introduction
Applications such as speech analysis, image capturing and compression, real-world data sensing, and multimedia processing have relative and approximate accuracy quantification. That is, where levels of error in the output are tolerated without any tangible effect on the perceived quality depends mainly on human perception. This mode of operation paves the way for a wide and innovative design space for the hardware implementations of the underlying applications [1]–[5]. Hence, this mode primarily benefits from the accuracy relaxation to achieve savings in alternative metrics such as energy. Thereby, energy-quality scalable designs surface and capture the essence of approximate computing approaches. In contrast to the relaxed hardware approaches [6], [7], where levels of software and hardware support for fault recovery are added to ensure correct operation, approximate computing techniques tolerate the generated errors and relax the overhead of error correcting schemes.
Different design levels can be investigated in the realm of approximate computing. Application and algorithm design- level techniques include skipping computations and relaxing global synchronization and communication [8]. At the architectural level, specific accelerators and approximate programmable processors are some of the techniques used [9]. In terms of storage elements, particularly with emerging technologies, approximate memories are tackled from the perspective of associative memories and ternary content addressable memories. These novel architectures are used to accelerate the GPU based on resistive memories and an online learning framework [10], [11]. An alternative approach in approximate memory design also builds on dynamic management of the bit cell allocation for the energy-quality trade-off. Additionally, when considering the circuit level, intriguing insights are gained with respect to the hardware implementations and the level of attained savings [12], [13]. In this regard, mainly two broad concepts are applied. The first includes the scaling of the driving voltage, thereby reducing the overall energy consumption [14], [15]. The second approach builds on reducing the number of transistor elements and redesigning the circuit blocks accordingly [16].
These circuit design techniques are confined to a certain limit for the transistor sizing, number of elements, and voltage levels to sustain a deterministic operation of the underlying components. Nonetheless, when dealing with 32nm and smaller technology, the attributes of Moore’s law, which have been steering the needs and requirements of chip and hardware designs [17], [18], are not applicable due to the physical limits affecting the scaling of the transistor devices [19]. The nanoscale dimensions of the transistors put forward the sub-threshold leakage current and the atomic interactions, thereby abating the reliability and usability of these components [20]. As a result, aside from the static variation that originates from the manufacturing process, the dynamic variability, with operational variations over time, is becoming an increasing concern and a crucial issue for the corresponding circuit operation. The internal mechanisms are affected by stochastic ionic effects involving state variables and leading to output variations [21], [22].
The variability of the transistor devices has been traditionally regarded as a source of concern for hardware designs where correcting schemes needed to be applied to ensure accuracy [19], [23]. However, building on this intrinsic variability, and dealing with it as a source of performance shaping instead of as an impediment to correct operation, is the basis of the analysis presented in this paper. Furthermore, this paper explores the right design space and improved energy efficiency, in the presence of the variability of transistors.
The variable device characteristics are modeled within a SPICE environment, offering an easy form of temporal variability. Moreover, the stochastic transistor is used as the core building block for approximate arithmetic applications. This operational mode is highly useful for error-tolerant applications that constitute a major part of the internet of things (IoT) operations. In particular, this mode targets the devices in which real physical signals are involved, such as wireless sensor nodes. Moreover, the integration of the transistor variability into the circuit-level simulator allows for the emulation in which the circuits are subject to extensive scaling endeavors. All in all, this study provides the design framework and the voltage scaling limits for the energy-quality scalability of resilient processing circuits. Thus, the proposed technique offers an alternative approximation method for the approximate computing circuits that build on unreliable elements. Thereby, four major contributions are presented in this paper:
A dynamic time-dependent transistor model using thermal noise to characterize stochastic behavior within the transistor, this scheme can be easily adopted into larger circuit simulations along with extensions to various statistical distributions;
An approximate computing circuit design based on stochastic components demonstrated through the analysis of N-bit adders;
The efficiency of the proposed solution is addressed with an analysis of the transistor size, operational frequency, energy and delay, process corner and temperature; and
An image compression application with approximate arithmetic blocks for energy-efficient operation with full SPICE demonstration of the application.
The rest of the paper is organized as follows. Section II discusses the physical attributions of the inherent time-dependent variability of the transistor, along with the modeling principles used for the circuit simulations. Next, Section III elaborates the implications for the approximate computing realm of study and explores the effect on the logic operators. Section IV further investigates arithmetic blocks with varying numbers of bits using simulations and explores possible trade-offs involving accuracy, energy saving, technology node, delay, process corner, and temperature. Using full SPICE verification, section V demonstrates image compression with approximate adders and discusses the effects on performance and analysis. Finally, the conclusion comprises the summary and remarks on the presented principle.
Stochastic Transistor Model
The miniaturization of transistor sizes will force the underlying physical characteristics to have a more prominent effect on the output behavior. Static and dynamic forms of variability will suffice and affect the reliability and degradation of the corresponding devices [19].
With random dopant fluctuation (RDF) playing a crucial role in the static variations, dynamic variability is mainly dominated by the quantum-level effects imposed by temperature and voltage-operating conditions [24]. The direct impact of this temporal variability is reflected in the stability of the transistor’s threshold voltage (
A. Threshold Variability
The cumulative and singular effects of Bias Temperature Instability (BTI) [25], [26], Hot Carrier Injection (HCI) [27] and Random Telegraph Signal (RTS) [28] are a form of dynamic stochasticity that is translated into temporal variations in the threshold voltage of the transistor [29]. With a certain gate voltage, the switching event of the transistor is considered probabilistic. The preset bounds of the transistor’s regions of operation are no longer tight nor deterministic; instead, they vary temporally depending on the threshold voltage value at each instant in time [30].
To assess the effect of these variations on the performance and energy efficiency of the circuits, a stochastic transistor model that includes its stochastic behavior needs to be established. Ideally, the physical equations governing the transistor operation need to be altered to include the effects of the noise and the non-idealities present in the device. A technique is needed that closely captures the experimental behavior but adds to the complexity of the model and the simulation process. In [31], Monte Carlo, 3D scaling, and transient noise simulations are used to conduct simulations within the SPICE environment. However, these techniques require a large amount of simulations and computation resources to capture the dynamic effects of variability. This paper introduces a model that incorporates the physical variability in a statistical manner, mainly by adding a noise source at the gate input to have the device behave probabilistically [32], [33]. The oversampling scheme allows the simulator to capture most of the variations during a single period and provides enough data points for the analysis. Hence, the paper presents three major concepts for the induction of the variation into the transistor models and for their easy integration within the circuit simulation platforms:
The physical variation effects can be summed up into the temporal threshold voltage variability;
The modeling of the variation is implemented by adding a thermal noise to the gate voltage; and
The added temporal variability ensures enough data points within a single transient simulation run.
Figure 1a shows the proposed stochastic transistor model with the added noise source at the gate input. The thermal noise is characterized by the standard deviation, and is calculated as follows:\begin{equation*} \overline v_{n}^{2} =4k_{B}TRB\tag{1}\end{equation*}
(a) Stochastic transistor model with added variability to the gate voltage. (b) The PDF of the threshold voltage variation for a Gaussian distribution and
B. Stochastic Distributions
The natural phenomenon affecting the behavior of the transistor elements can be fit into different distributions. Several studies have addressed this dynamic variability and provided models for the underlying noise and instability effects. A separate analysis was conducted for each contributing component, building on extracted experimental measurements. The corresponding atomic/ionic scale stochastic processes are mainly fit to the Lognormal distribution, as is the case for Negative Bias Temperature Instability (NBTI) [26]. Alternatively, with abrupt variations imposed by random telegraph signals, Uniform, Exponential, and Poisson distributions [35] are more generally used approximations. When taking into consideration all of the affecting factors on the variation of the threshold voltage, a general multivariate model was proposed in [34]. Moreover, an approximation was provided for the threshold voltage to follow a Normal distribution with a standard deviation (
Approximate Computing
Conventionally, having the transistors behave in a non- deterministic manner is considered a drawback to the circuit performance. Costly error-correcting schemes are applied to counter the induced deviations from the original operation. On the other hand, increasing the supply voltage is also considered, to a certain extent, to overcome the noise effects and ensure correct output. However, with aggressive scaling, variability is now an inevitable feature that needs to be addressed in unconventional concepts [36]. Approximate computing benefits from the probabilistic behavior of the underlying circuitry to shape the performance in error-tolerant applications. Complete accuracy of the desired output is not considered a priority, but instead a complementary feature, depending on the available resources [14], [37]. This section proposes an alternative approach to approximate computing based on the stochastic operation of the underlying transistors. The conventional design structure of the logical operators is kept intact, while the transistors act in a non-deterministic manner. To verify the operation principles and the gains achievable with these devices, a 20nm predictive technology model (PTM) is used for circuit-level simulations [38] with Cadence Spectre at an operation frequency of 500MHz and a nominal voltage of 0.9V. The impact on Boolean operators is discussed with emphasis on the inverter and the different logic gates where each circuit output is kept as fan-out of 4 to emulate its behavior in full circuits.
Stochasticity Transformation to Alternative Probability Distributions
.Param Ka = 1.38e-23, T=300, Rg =resistance value, Bg =bandwidth, pi= 3.14, lambda= 2
.Param var = 4*K8*T*Rg*Bg
**** Noisy Resistor
Rn n 0 Rg noiseon=yes
**** Normalizing the Gaussian distribution values
Enorm 10 value = ’V(n)/sqrt(var)’
**** Transforming to lognormal Distribution
Elognormal 2 0 value = ’exp(V(n))’
**** Transforming to Uniform Distribution
Euniform 3 0 value= ’1-sqrt(l-exp(-(2*V(l)”2/pi)’
**** Transforming to Exponential Distribution
Eexponentia14 0 value = ’-(1/lambda)*ln(l-V (3))’
A. Stochastic Inverter
The inverter is composed of two vertically concatenated transistors, PMOS and NMOS. When applying nominal voltage
The input and output signals of the stochastic inverter at different input voltage levels.
The accuracy of the logic operation is specified as the number of correct samples for a digital bit divided by the total number of samples (\begin{align*} Digital~Value=\begin{cases} 0, & if~V_{out}< 0.1 \times V_{DD} \\ 1, & if~V_{out}> 0.9 \times V_{DD} \\ x, & if~0.1 \times V_{DD}< V_{out}< 0.9 \times V_{DD} \\ \end{cases} \\ {}\tag{2}\end{align*}
The output accuracy of the inverter in terms of getting an accurate ‘0’ and ‘1’ along with the average performance.
B. Logic Gates
Extending the stochastic operation into the logic domain, the stochastic transistor devices are used to build logic operators. In this mode, the effect of added cascades is studied with respect to the expected output accuracy in a similar fashion to the inverter. Table I depicts the circuit structures for the main Boolean operators. Moreover, the accuracy of the output behavior is analyzed in terms of the percentage of having a correct ‘0’, correct ‘1’, and overall expected correct output for all the different combinations of the truth table.
In general, the output accuracy of the gates shows similar performance to that of the inverter. However, a slight degradation is encountered, where the minimum average accuracy reached is around 80%. Hence, the added number of cascaded blocks has only a minor effect on the output performance. Alternatively, the logic operation does show an impact on the output accuracy. For instance, the AND gate shows the lowest accuracy for the bit ‘1’. This could be because the gate is biased to have more zeros as it is clear from the truth table entries for this operation. Similarly, the OR operator shows the lowest performance accuracy for the bit ‘0’, as it only appears once in its truth table. Thus, this interesting feature needs to be further investigated with larger arithmetic blocks, to determine whether the degradation is propagated or suppressed by the particular structure.
C. Storage Elements
The transistor is the constituting component in storage elements such as SR latch [39], a level sensitive positive latch. Hence, the stochasticity inherent within the transistor operation and the scaling of the input voltage both have a large impact on these non-static structures. The primary impact of the variation is reflected on the failure rate and the delay for the correct data within the storage cells. Figure 4 presents the internal structure of the latch and the corresponding interconnections; Figure 5 depicts the simulation of the latch under different operating frequencies. The accuracy of the output and the propagation delay are the affected parameters with the scaling of the voltage and frequency. However, the operation of the latch could achieve almost 100% accuracy at a voltage as low as 0.4V, corresponding to almost half the nominal value, with a propagation delay of 10 ps and operating frequency of 1GHz.
The structure of the latch circuit [39] composed of stochastic transistor elements.
The impact of the transistor variability on the operation of the latch. Measures of delay and accuracy under different operating frequencies are highlighted.
Hence, the transistor variability allows for more efficient operation of the storage element, in particular where complete accuracy is not a paramount requirement, as in the case of real-world signal processing applications.
Approximate Adder
Adders are the principal building blocks of arithmetic operations, and their reliability and accuracy profoundly affect simple computations and more complex processing [40]. Several approaches for implementation of approximate adders, along with other arithmetic blocks are evaluated and classified in [16] and [41]–[43]. In particular, comparisons on the error; characterizations on the circuit; and discussions on image processing applications are reviewed across various designs in [43]. In this work, however, our primary purpose is to investigate the impact of device variations, so we have selected the standard, optimized ripple carry structure for the multi-bit adder. Furthermore, this section studies and quantifies the performance of approximate adders. Starting with a detailed analysis of a single-bit adder, the section presents the accuracy of the sum and the carry generation blocks. More reflective error quantifying metrics are used to assess the behavior of N-bit adders up to 16 bits. Error distance, mean error distance, relative error distance, and mean relative error distance are calculated for the different adders [44], [45].
A. Full Adder
The mirror adder circuitry is adopted for improved carry generation [39]. Most importantly, the dimensions for the carry block transistors are set to ensure a more optimized operation for the output carry bit. Moreover, the number of transistors used for generating the carry bit is much fewer than the sum generation, as depicted in Figure 6. These features make the carry bit more stable and less susceptible to errors.
The structure of the mirror adder circuit [39] composed of stochastic transistor elements. Transistor sizes are configured to ensure optimized carry generation.
The expected sum value for the 1-bit adder reaches up to 3. Hence, any error within the generated sum or carry bit will be reflected in a substantial change in the overall output value. The performance analysis of the 1-bit adder can, therefore, be portrayed using the accuracy of these bits. Figure 7 shows the corresponding accuracy values for all input combinations for the high and low bits, respectively. As expected, the added stochasticity has a more significant impact on the sum bit than on the carry bit, due to the nature of the used structure. However, the considerably reliable operation is attained even at low voltage levels. The accuracy reaches a minimum of 70% at voltages as low as 200mV.
The output accuracy for the sum and carry bits of the 1-bit full adder. Better output characteristics of the carry bit is due to the nature of the used structure to optimize the carry generation.
B. N-Bit Adders
N-bit operation is needed to perform computations and process operations. Hence, investigating a higher number of bits for the adder offers more insights into the applicability of the approximate computing approach, particularly in the logic domain. A ripple carry adder (RCA) is used for the analysis. It is composed of cascaded blocks of 1-bit full adders with the carry propagating between consecutive blocks. This structure is used for the analysis because it shows the effect of the probabilistic behavior of the carry-in bit to the subsequent full adder blocks, and consequently the overall output value. Figure 8 shows the block diagram for the N-bit ripple carry adder.
The block diagram of the N-bit adder with the propagation of the carry among the individual 1-bit adders.
To quantify the effect of the transistor variability on the adder’s output behavior, the 2-bit, 4-bit, 8-bit, and 16-bit adders are simulated using Cadence Spectre. The accuracy metric used throughout this paper can efficiently reflect the accuracy of operations involving single-bit output, as it assesses whether the bit is correct or not. However, with a larger number of bits for the addition, the accuracy does not provide enough information on the effect of the probabilistic behavior. An error in any of the bits within the output sum values is considered to reduce the accuracy, wherein the binary domain, different weights are given depending on the location of the bits. The least significant bit has a lower impact on the sum than the most significant bit. This divergence further increases when a larger number of bits is used for calculation. Therefore, the metric such as the error distance (ED) is considered to be more informative [42] and is calculated as (
3D plot for the Error Distance of a 4-bit adder with respect to the expected addition value and the operating voltage as well.
Further quantifying metrics are calculated to provide a more elaborate view of the performance of the adders at different operating voltages. The mean error distance (MED) is calculated as (
(a) The mean error distance for the adders at different operating voltages. (b) Log-scale of the MED at an operating voltage of 0.25V showing very low error for the lower bit adders.
Although the results show a high peak for the 16-bit adder, this is due to the measure that the MED portrays: it shows the absolute value of the error with no relation to the actual sum value. Therefore, the mean relative error distance (MRED) is calculated as well [42]. The relative error distance (RED) takes the actual sum value (R) into account by dividing each ED value by the corresponding expected sum (
(a) The Mean Relative Error Distance for the adders at different operating voltages. (b) The log of the mean relative error distance for different N-bit adders at an operating voltage of 0.25V.
C. Discussion
The efficiency of the proposed scheme under diverse process variations is addressed for the different approximate adders. The comparison is based solely on the current scheme; as to the best of the authors’ knowledge, this technique serves as the first proposition of circuit approximation based on unreliable components and across the technology nodes. The evaluation is conducted with respect to the operating frequency, the technology node or transistor size, the energy and delay, the process corner, and temperature.
1) Operating Frequency
The accuracy of the adder output is directly affected by the operational frequency. The higher the frequency, the larger the chances are of obtaining erroneous results. Hence, the adders are simulated under different frequencies, and the MED and MRED are respectively measured. Figure 12 shows the performance of a 4-bit adder, based on a 20nm PTM model, at different frequencies of operation. As depicted, to achieve full accuracy, the operating frequency should not exceed 1 GHz in case the operating voltage is scaled down to 0.4V, which corresponds to less than half the nominal value. Larger operating frequencies are still feasible but at smaller scaling levels. That is, with around 30% scaling of the nominal voltage frequency, up to 2 GHz is attainable with completely accurate results. Hence, as the IoT operation dictates, several operating points could be available for the attainment of the performance metric required under the resource constraints.
(a) The mean error distance for the 4-bit adder at different frequencies. (b) The mean relative error distance of the 4-bit adder at different frequencies.
2) Technology Node
The impact of the technology node is taken into consideration with the simulation of the N-bit adders performed for predictive technology models of 10nm and 20nm, and for the actual device model of TSMC 65nm. Table II provides an overall comparison between the different N-bit adders in terms of the achieved MED and MRED for the scaling of the nominal voltage for each technology node. The voltage levels are shown to reflect the same percentage of scaling for each technology node considering the differences in the absolute nominal voltages. Moreover, as the log-scale is used to represent the values, the dashes represent an achieved error of 0. As depicted, with smaller transistor sizes, higher levels of scaling of the nominal voltage are feasible, reaching down to around 0.4V while maintaining the full accuracy of the output results. However, once the technology size increases, the variability starts to degrade the performance and limit the scaling possibility. The minimum operating voltage applied to start achieving full accuracy is set to 0.8V for the 65nm technology node. Whereas configurable accuracy and energy savings are possible by choosing the level of operating voltage. A compromise arises, with lower voltage levels achieving energy savings of up to 95% but having error levels reaching around 30%. On the other hand, larger operating voltages also offer an almost accurate level of outputs, with savings reaching more than 60% at the 16-bit output at the technology node of 20nm.
3) Energy and Delay
The energy and delay parameters are important but contradictory design metrics. The lower the operating voltage, the lower the energy consumption, but the larger the propagation delay. However, with the underlying stochasticity of the transistor, several operating points are feasible depending on the technology node and the available resources. For instance, scaling the voltage down by 30% achieves energy savings of around 50% and delay in the order of fewer than 50 ps for technology nodes of 10nm and 20nm, and fewer than 150 ps for 65nm. Figure 13 shows the simulation results for a 4-bit approximate adder operating at 500MHz. The intersection points between energy consumption and the delay plots represent the optimal points of operation for the different technology nodes. As depicted, the technology nodes of 10nm and 20nm have their optimum operating point at 45% of the nominal value, which provides an accurate output result, as shown in the performance table. Similarly, the optimum operating point for the 65nm resides at 75% of the nominal value, which also provides accurate output values.
The analysis of a 4-bit adder with respect to the delay and energy consumption. The percentage scaling of the input voltage results in the scaling of the delay and the energy consumption as well.
4) Process Corner
The simulation of the process corners serves as a measure of how the process and the environmental stimuli affect the circuit in an extreme situation [39]. The impact on the performance of the adder is measured in terms of the MED. In that regard, Figure 14 shows the behavior of N-bit approximate adders operating at 0.8V, which corresponds to 70% of the nominal voltage of the 65nm technology node. An interesting feature that is apparent in these results is the impact of the NMOS. The Fast-Slow (FS) and Slow-Fast (SF) do not show similar behaviors. The slow operation of the NMOS degrades the performance and leads to more errors in the operations. Moreover, as depicted in the figure, a completely accurate performance is achieved for the Fast-Fast (FF) case up to 4 bits, with less than 1% error rate. Another interesting observation is that the simulation results of the Slow-Slow (SS) corner show a relatively low error rate that is barely noticeable when dealing with error-tolerant applications.
The performance, in terms of the mean error distance, of the approximate adder with different number of bits at an operating voltage of 0.8V, corresponding to 70% of the nominal voltage for 65nm technology.
5) Temperature Variations
The temperature variations have a strong influence on the transistor parameters, which correspondingly affects the circuit behavior. Figure 15 shows the impact of the temperature on the 8-bit approximate adder for different process corners. The operating voltage is kept at 0.8V, and the MED is measured and plotted. Since the threshold voltage decreases with the temperature [46], the overdrive voltage and the on-state current increase. This leads to a decreasing MED and higher accuracy with temperature.
The mean error distance with temperature variations of the 8-bit approximate adder at 0.8V operating voltage and different process corners.
Image Compression
With an elaborate investigation of the approximate adder with stochastic components, characterization of performance could be visually assessed through digital signal processing applications. In this context, this paper presents image compression using 2-point Discrete Fourier Transform (DFT). The calculation requires addition and subtraction of pixel values of the image to be compressed. With x[i] and x[\begin{equation*} \begin{cases} y\left [{ i }\right]=x\left [{ i }\right]+x[i+1] \\ y\left [{ i+1 }\right]=x\left [{ i }\right]-x[i+1 \\ \end{cases}\tag{3}\end{equation*}
The output values y[i] and y[
A. Simulation Setup
A
B. Multipart Figures
The quality of the compression is characterized using the peak-signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [47].
1) Peak-Signal-to-Noise-Ratio
PSNR stands for the peak value of the pixel; with 8 bits used to represent the pixel, the peak value is considered to be 255. MSE is the mean square error between the reference value of the pixel in the original image and the value of the pixel after reconstruction. Hence, PSNR is calculated as follows \begin{equation*} PSNR=10\log _{10}\frac {PV^{2}}{MSE}\tag{4}\end{equation*}
The output image with different level of arithmetic error in the DFT operation. The Quality of the output is relative to peak signal to noise ratio and the perception of the image. Compromises arise between the energy savings and the level of quality sought for.
2) Structural Similarity Index
SSIM is an assessment of the perceived image quality based on the quantification of the visibility of errors [47]. This measure builds on the adaptability of the human visual system in extracting structural information. Considering two image signals x and y from the original and the reconstructed image respectively, the SSIM (S(x,y)) comprises three components \begin{equation*} \begin{cases} S\left ({x,y }\right)=f(l\left ({x,y }\right), c\left ({x,y }\right), s(x,y)) \\ S\left ({x,y }\right)= \dfrac {(2\mu _{x}\mu _{x}+C_{1})(2\sigma _{xy}+C_{2})}{(\mu _{x}^{2}+\mu _{y}^{2}+C_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+C_{2})} \\ \end{cases}\tag{5}\end{equation*}
The structural similarity index for the reconstructed images at different operating voltages.
C. Discussion
When compared with the alternative deterministic devices that operated at the nominal value, more than 90% on energy saving was achieved while maintaining high PSNR values of the compressed image. Voltage over-scaling schemes investigating timing path in the sequential circuits, bit error rate in different operating conditions, and error in various mathematical functions are discussed in [48]–[50] respectively. In this work, we used voltage over-scaling schemes to examine the accuracy and output characteristics of the logic and arithmetic blocks by incorporating the inherent variability of the transistors for performance shaping, in particular, energy saving in the image compression application. Multimedia and digital signal processing applications that build on these configurable arithmetic units have shown improvements in the output characteristics along with the utilization of the resources [51]. The energy-quality scalability is considered to be a control knob for the level of operation required for error-resilient applications, such as wireless sensor nodes that need to capture images, compress them, and send or even stream them in the most efficient manner to the source [52], [53]. A trade-off is apparent within the different design metrics, but it can, in fact, be sufficient and satisfactory regarding the current system requirements.
Conclusion
Error-resilient applications offer relaxation of the mapping of design specifications regarding the corresponding hardware implementation. In this study, the variability of the nanoscale transistor devices was embraced and modeled in a statistical manner. Thermal noise was used to induce variations into the transistor elements, which allowed for the stochastic setting of the threshold voltage. Adopting this inherent stochasticity in the approximate computing concept showed the attainable benefits of doing so in terms of performance metrics savings. Analysis and simulations on simple and large arithmetic computing blocks maintained a high level of accuracy while offering savings on energy. A case study of an image compression reflected the benefit of adopting approximate adders on the application level. All in all, this approach to transistor stochasticity provides the right design space and improved energy efficiency, in the presence of the variability of transistors. It allows for the development of configurable schemes that are adaptively controlled based on the communication channel and environment, to increase or decrease the corresponding accuracy,
ACKNOWLEDGMENT
Ren Li and Rawan Naous contributed equally to this work.