A 7.6-ns Delay Subthreshold Level-Shifter Leveraging a Composite Transistor and a Voltage-Controlled Current Source

A novel level shifter (LS) circuit that uses a new low-power approach based on a parasitic capacitance voltage controlled current source is presented to minimize the propagation delay (PD) and maximize the voltage conversion range. This new scheme uses a simplified circuit including a dependent current source, a composite transistor made of three interconnected n-channel MOSFETs (TnM), one CMOS input inverter, and one CMOS output buffer to provide a fast response time. The circuit utilizes the combined action of the equivalent parasitic capacitance of the TnM, the value of which changes dynamically according to the transient value of the input voltage, and the dependent current source to shift the input signal level up from subthreshold voltage levels to +3.0 V, with minimal delay and power consumption. The LS circuit fabricated in 0.35 <inline-formula> <tex-math notation="LaTeX">$\mu \text{m}$ </tex-math></inline-formula> CMOS technology occupies a silicon area of only 25 <inline-formula> <tex-math notation="LaTeX">$\mu \text{m}\,\,\times25\,\,\mu \text{m}$ </tex-math></inline-formula>. The LS shows measured rising and falling PDs of, respectively, 4 and 11.2 ns. The measured results show that the presented circuit outperforms other solutions over a wide frequency range of 1 to 130 MHz. The fabricated circuit consumes a static power 31.5 pW and a dynamic power of 3.4 pJ per transition at 1 kHz, <inline-formula> <tex-math notation="LaTeX">$\text{V}_{\mathrm {DDL}}\,\,=0.8$ </tex-math></inline-formula> V, and a capacitive load of <inline-formula> <tex-math notation="LaTeX">$\text{C}_{\mathrm {L}}\,\,=0.1$ </tex-math></inline-formula> pF.


I. INTRODUCTION
Level shifters (LSs) are key circuits in numerous microelectronic systems including systems-on-chip (SoCs) and complex system-in-package (SiP) devices [1], [2], [3], [4]. These circuits are used to translate one signal level to another, allowing signals to be properly transferred The associate editor coordinating the review of this manuscript and approving it for publication was Dusan Grujić . across different supply voltage domains [5], [6], [7]. Highperformance telecommunication systems require high-speed LS circuits consuming as low power as possible. Lowpower applications, such as peripheral modules for the Internet of Things or implantable medical devices, working at medium or low frequencies, often use subthreshold circuits to decrease static and dynamic power consumption [8], [9], [10]. Energy-efficient circuit designs using subthreshold operations are well suited for these applications that do not require fast circuit operation [11], [12]. Hence, highperformance subthreshold LSs are necessary in these lowpower applications [13], [14].
In [36], a logic error detection circuit for near-threshold operation is used to enhance the performance of CBLS as shown in Fig. 2 (a). In this schematic, the level shifting part includes M P1 -M P4 , M N1 , and M N2 , while the logic error detection part consists of M P5 , M P6 , and M N3 -M N5 . Although this LS was implemented in a CMOS 14-nm process using minimum transistor sizes, two-stage topology resulted in a rather long PD and narrow conversion range.
The CBLS reported in [38] (Fig. 2(b)) uses the n-type MOSFET M N4 to minimize the static current. However, utilizing 15 transistors to implement the LS can result in a larger circuit area. This CBLS also uses a substantial amount of power, and the feedback loop increases the PD.
The CBLS presented in [39] uses a digital circuit based on error correction, as shown in Fig. 2(c), to overcome the limitations of conventional LSs. However, the presented LS suffers from significant contention in nodes OUT and Y. The contention is due to the superimposition of M P6 with M N8 at the output node on the right-hand side, and M P1 with M N3 on the left-hand side. The circuit in [39] also suffers from a similar problem like the LS circuit in [38]. The drawbacks associated with this topology are a long PD due to the feedback loop and a bulky area. Since the circuits shown in Figs. 2(b) and 2(c) operate as dual-stage circuits, their total speed is limited due to the resulting long PD.
In this paper, a novel subthreshold LS circuit based on a voltage controlled current source (VCCS) combined with the equivalent parasitic capacitance (C p ) of three interconnected n-channel MOSFETs (TnM) is presented to shift the signal level up with a shorter PD, a wider conversion range, and a lower power consumption compared to other solutions. In our approach, a parasitic capacitor value C p varies according to the voltage drop across the TnM and the operating regions of the MOSFETs (i.e. triode or saturation mode), providing the LS circuit with minimal delay and power consumption for proper low-power operation in subthreshold region. It will be shown in the next sections that using the dynamic parasitic capacitor provided by the TnM at node V A (Fig. 3 (c)), compared to a single transistor, allows this LS design to operate in subthreshold region while providing the same delay and a wide conversion range compared to a LS working in strong inversion and consuming more power.
A low-power latch-based LS is presented in [40]. In this topology, the MOS transistors of M P6 and M P7 are added on the pull-up network in their diode configuration to mitigate any strong contention. However, this connection affects the swing of the circuit because the drain of M N2 can only reach V DDH − |V thp |. In addition, the rise time of node D is decelerated due to the diode connected transistors. Consequently, the propagation delay is increased. A latchbased structure associated with current limiter is presented in [41]. It is used to keep the current flowing during transitions to the minimum. The output is pulled down during transitions easily due to the limited current. In this LS circuit development, the diodes or isolation of gate is not obligatory. Furthermore, the current sources are switched off outside transition times. Consequently, the short-circuit static current is almost zero. The drawback of this circuit is that the buffer circuit input exposes the LS to noise by making the output signal floating high when the input pulse is low. A latch-based structure in which the delay elements and five series diode-connected MOSFETs create an intermediate supply voltage (V IDD ) is presented in [42]. In this circuit, V IDD is lower than V DDH and is connected to pull up the network to reduce the drive strength. The circuit operation is highly challenging at low V DDL because a trade-off must be made between speed, area, and static power due to the delay element and the diode-connected transistors.
In this design, the current source supplies the needed current to the TnM within a given operating period, while  [36], (b) Two stage CBLS [38], (c) CBLS with correction circuit [39].
the TnM converts the obtained current from the VCCS to a voltage level greater than the input voltage level. A feedback mechanism is designed to provide the right amount of current to the TnM while keeping the LS in sleep mode after the transition times to minimize the total power consumption (TPC). The TnM provides fast charge/discharge due to its equivalent parasitic capacitance to minimize the PD. The combined action of the VCCS and the TnM also maximizes the conversion range (CR) compared to other LS circuit solutions.
The remaining sections of this article are organized as follows. The configuration and the operating principle of the proposed LS circuit are covered in Section II. The implementation of the circuit, mathematical discussion, and the post-layout simulation results are presented in Section III. The measured performance of the LS fabricated prototype is provided in Section IV. The results are discussed and compared to other solutions in Section V, followed by conclusions in Section VI.

II. PROPOSED LS ARCHITECTURE AND OPERATION
The proposed LS is constructed based on a current mirror structure, thus it benefits from the high impedance of its output node and good matching capability. A feedback network circuit is employed to control the current supplied to a modified current mirror-based LS (MCLS) (Fig. 3(a)). This area-efficient circuit block consists of a composite transistor including three interconnected MOSFETs providing level shifting and an equivalent parasitic capacitance C p at node V A . In the simplified schematic of Fig. 3 (c), when IN = 1, the VCCS charges C p , which pushes V A upward. The charging of C p by VCCS provides a voltage ramp at node A. Then, the voltage V A is converted to V DDH through the output buffers ( Fig. 3 (a)).
In Fig. 3 (a), the variable V A corresponds to the voltage of an important circuit node that is later used to produce the current source control voltage (V C ), as illustrated in Fig. 3(c). The voltage variation ( V A ) is measured by the feedback network and converted into a current variation ( I A ). Then, I A is subtracted from the main mirrored current ''I''. Finally, the rest of the current (I 1 ) is delivered by the MCLS. Therefore, when I A = I, the feedback network completely shuts the circuit down. Consequently, the average power consumption, corresponding to the area shown in green in Fig. 3(b) is saved and compared to a LS circuit without feedback.
Converting V A to V DDH using a buffer can consume as much power as the energy saved by the feedback mechanism, as shown in Fig. 3(b). A large equivalent parasitic capacitance is seen at node V A due to the TnM (Fig. 3 (c)). The feedback network, therefore, must control the voltage V A at this critical node. The controlled charge/discharge of the large parasitic capacitance at node V A gradually shifts the level of the applied signal. Then, the output buffer (i.e. a chain of inverters) in the lower-strength nodes transforms the enhanced signal at node V A into V DDH with minimal contention. As explained in detail in Section III, the TnM provides a simplified topology designed to efficiently exploit the parasitic capacitance of the MOSFETs. Accordingly, this approach for converting the input signal into V DDH , based on the TnM and the feedback network, significantly improves PD, TPC, and CR as compared to other implementations. Furthermore, the closed-loop negative feedback regulates the current delivered to the TnM to minimize any unnecessary power consumption.
The circuit operation is illustrated in Fig. 3 (c). The voltage-controlled current source (VCCS) supplies a variable current to the TnM and produces a voltage V A , which allows to generate the control voltage (V C ) of the feedback circuit. Finally, V A is amplified through two inverters to buffer the signal and to generate V DDH . The design of the level shifter was summarized in a conference paper [7]. Compared to [7], this paper presents a detailed description of the design, and provide the measurement results obtained with the fabricated chip, which were not available in [7]. Detailed design explanations and a mathematical analysis of the circuit are also provided to fully cover the operation of the presented circuit. The measurement results obtained with the fabricated chip are discussed and the performance of our new level shifter design is compared with other solutions.
The advantages of the presented wide conversion range LS are summarized as follows: 1) A VCCS and a TnM are used to progressively increase the input signal level. Therefore, the opposition between the strong pull up and the weak pull down is completely removed, resulting in significant improvements in PD, CR, and TPC. 2) In contrast with the conventional current mirrorbased LS, wherein the quiescent current limits the LS performance, the feedback used in this LS design permits the circuit to work only during a given time interval. When no input transition occurs, the LS is placed in stand-by mode, thereby significantly decreasing the leakage current and the static power. 3) Our topology uses only one effective stage to raise the signal intensity, rather than two or more, resulting in reduced propagation latency and complexity. 4) The presented circuit utilizes a large transistor on the low side of the current mirror to address the weak pull-down issue associated with the low V DDL . This allows converting the low amplitude signal into a pulse of larger amplitude V DDH . Consequently, this circuit is suitable for use in the subthreshold region. 5) The presented circuit uses fewer transistors compared with conventional CBLS circuit implementations, resulting in area saving.

III. CIRCUIT IMPLEMENTATION AND SIMULATION
A detailed description of the proposed LS circuit implementation is presented in Fig. 4(a). It consists of several subblocks, including a TnM, a feedback network, a VCCS, an input inverter, and an output buffer.

A. CIRCUIT DESCRIPTION
In the presented LS circuit ( Fig. 4(a)), the input signal (IN) is applied to the n-channel MOSFET MN1, whereas the voltage V A , which is in-phase with IN, is applied to the gates of MP1 and MP2 (p-type MOSFETs, superimposed with MN1 to cut off the static current). Accordingly, this circuit is allowed to operate only within the duration of the PD of the input signal to the output of the circuit. Otherwise, V A turns off MP1 and MP2 to avoid any current (I 1 ) flowing into the TnM and remains in idle mode. The role of the TnM composite MOSFET made of MN2, MN3, and MN4 is to shift the level of the input voltage upward by providing a dynamic equivalent capacitor, the value of which depends on the value of IN, charged by VCCS at node V A . When IN = ''1'', the VCCS charges C p . When IN = ''0'', the TnM discharges C p without the need for extra switches. In contrast to using a single transistor to charge V A , this scheme provides a dynamic capacitor value at node V A , the value of which depends on the operating region of MP2, MP4, MN2, MN3, and MN4. When IN = ''1'', the voltage V A increases and puts MP2 and MP4 in the triode region while transistors MN2, MN3, and MN4 are in cut-off region. The value of C p is smaller in triode than in saturation, hence decreasing the dynamic power consumption and the delay. When IN = ''0'', transistors MN1, MP2, and MP4 are in cut-off region and the voltage V A decreases. Consequently, MN2, MN3, and MN4 are put in the triode region, which also decreases the value of C p as compared to saturation mode, hence decreasing the dynamic power consumption and the delay [37]. The total gate capacitance of the MOSFET is given by C O = C OX WL where W and L are, respectively, the width and the length of the MOSFET channel; and C OX is the parallel plate capacitance between the gate and the bulk (i.e. C OX = ε ox /t OX ). Parameters ε ox and t OX are, respectively, the silicon oxide permittivity and the oxide thickness. In the linear region (i.e. in triode), the values of the parasitic capacitances seen between the gate and the source (gate-source) and between the gate and the drain (gate-drain) of the MOSFET are dominant. Due to the existing conduction channel in this region, and are given by C gs = C gd = C O /2 [40].
In the saturation region, the value of the gate-source parasitic capacitance is dominant and is given by C gs = 2·C O /3 due to the channel pinch-off where the values of C gd and C ds are negligible [40]. In the presented LS, the TnM dynamically controls the value of the distributed parasitic capacitance C p according to the transient value of the input pulse (IN), when it is charged by VCCS and discharged through MN2, MN3, and MN4.
When IN is high, MN2, MN3, and MN4 turn off and the current flows through the p-channel transistors (MP2 and MP4) to charge the equivalent parasitic capacitance at node A. The voltage at node V A increases until MP1 and MP2 enter the triode and then the cutoff region. Consequently, the current ''I'' decreases. Eventually, the voltage at node V A is conveyed to V DDH by the output buffer. The circuit operation when IN = ''1'' is illustrated in Fig. 4 The produced current from MN1 must be mirrored to the TnM in the proposed topology to shift up the output voltage. However, the current flowing into MP2 and MP4 depends on the V SG of MP2. The gate voltage of MP2 increases when V A increases gradually, whereas the drain voltage of MP2 decreases. Accordingly, V SG and V SD of, respectively, MP2 and MP4 decreases. The dependent current of the VCCS flows into TnM, leading to the increase of V A at node A. This way, V SG of MP2 and V SD of MP4, which corresponds to V C (Fig. 3(c)), create a well-controlled voltage that places MP2 and MP4 in the triode region and then turn them off, preventing any extra current to be delivered to the TnM. In addition, the value of C p decreases when MP4 goes into the triode region and then into cutoff. Consequently, the dynamic power consumption decreases, and a minimum PD can be achieved.
In contrast, MN2, MN3, and MN4 turn off when IN = ''0''. Consequently, the available equivalent parasitic capacitance C p seen at node A is discharged by the path provided by MN2 and MN3, through MN4. Thus, the voltage V A rapidly discharges to nearly zero. The circuit returns to sleep mode when MN1 is turned off. The circuit operation for IN = ''0'' is illustrated in Fig. 4(c).
Indeed, when IN = ''0'', MN2, MN3, and MN4 are designed to discharge C p rapidly, in two different directions through the various R on resistances of the MOSFETs with specific care to minimize the PD and to reduce the dynamic power consumption.
When IN = ''0'', as shown in Fig. 4 (c), the input of MN2 and MN3 is equal to V DDL while the voltage at node A, which was charged at the previous state (when IN while MN3 is in triode due to the low voltage drop across its drain and source (V DSMN3 = V DDL -V GSMN2 ).
At the beginning, when IN = ''0'', MN4 is turned off and all the discharge current ( Fig. 4 (c)) passes through MN2 and MN3 (i dischrge1 ). Since the discharge current is high in the beginning, the voltage drop across V DSMN3 is R onMN3 × i dischrge1 . The value of R onMN3 × i dischrge1 is greater than V thMN4 . In addition, a large enough voltage drop V A across V DS of MN4 provides the condition V DSMN4 > V odMN4 to turn MN4 on with the help of MN3. When MN4 is turned on, a low R on is added in parallel with the R on of MN2 and MN3. Thus, the total R on provided by the two parallel discharge paths decreases, speeding up the discharge of C p by i discharge2 in parallel with i discherge1 (Fig. 4 (c)). Consequently, the PD and the dynamic power consumption are decreased. Table 1 shows the dimensions of all transistors used in the presented LS circuit shown in Fig. 4(a).

B. TnM CIRCUIT ANALYSIS AND EQUIVALENT CAPACITOR
The circuit diagram of the TnM with all its parasitic capacitances, is illustrated in Fig. 5(a). The effective equivalent parasitic capacitance C p at node ''A'' is the sum of the capacitance seen from the drain of the TnM (C eq−TnM in Fig. 5 (a)), the drain of MP4 (C dMP4 ), the gate of MP2, and the input capacitances of the output buffer (C eq−Buffer ). Thus, the total equivalent parasitic capacitor seen at node ''A'' is nonlinear and depends on the region of operation of the MOSFET. Fig. 5 (b) presents the simplified circuit of the output buffer and the schematic of an inverter with its parasitic. C dMP4 , C eq−Buffer , C gMP2 , and C eq−TnM are detailed in (1). In the C eq−TnM , the C sMN 2 is the seen capacitor from the source of MN2. C p is the summation of these capacitances.
In general, post-layout simulations are necessary to obtain an accurate estimation of the total parasitic capacitance seen at a specific node. The circuit operation can be modeled based on the expression of a capacitance charged by a current source, as shown in Fig. 5(c). In this model, the voltage across the capacitance C p is given by (2), where V is the voltage developed across capacitor C p during the time interval T , as shown in Fig. 5(d), and i Cp is the current flowing through C p .
Simulation can be performed to precisely determine the value of the equivalent parasitic capacitance at node ''A'' in the circuit shown in Fig. 4(a).
In the proposed circuit, the capacitance value seen at node ''A'' can vary because the amount of current delivered to node ''A'' depends on the operating region of MP2 over different time intervals. When V A is zero, MP1 and MP2 are in their saturation regions. Therefore, the values of C p can be obtained from Expression (3) is valid only at startup. Afterward, when IN changes to ''1'', the gate voltage of MP2 and the drain voltage of MP4 increase gradually, forcing MP2 to enter the triode region. Therefore, ''i SD MP2 '' in (3) is substituted by  (3), then a first order equation as in (4) follows, and finally I = 0, i.e. cut-off. Consequently, the current flowing through the TnM is a Gaussian waveform [43]: where σ is standard deviation, m is mean value, and t represents the time. The amplitude of the current equals 1/σ √ 2 , and the operating time of the circuit equals approximately m/2. The LS circuit operation during the transition times and the transfer function of the output buffer are analyzed to derive the PD formula. V O as a function of V A (input of buffer voltage) is plotted in Fig. 5 (e). In this voltage transfer characteristic, V M is the switching threshold. Thus, the output of the presented LS changes when V A reaches V M . Therefore, when IN =''1'', to extract the rising PD (t pr ) from (2), we replace i Cp (t) by I TnM (t) that is described by (5). In this way, (2) can be rearranged as follows where V = V A .
The proposed LS is allowed to work only during the PD; therefore, the integral bounds of (6) are from t 0 to t 0 + t pr , where t 0 indicates the exact start time of the LS input signal (IN) going up and t 0 + t pr indicates the moment that V A reaches V M . Thus, replacing V A by V M in (6) and evaluating this integral over the rising PD t pr yields Which can be approximated using Simpson's rule Replacing dt by a time interval t in (3) yields where V A is the variation of V A over the time interval t.
The value of C p is determined by In (10), the value of i SDMP2 depends on the region of operation of MP2. i SDMP2 can be calculated using (3) when MP2 is in saturation and using (4) when it is in the triode region. The PD (t p ) is then derived as follows t pr = 6C P V M I TnM (t 0 ) + 4I TnM 2t 0 +t pr 2 + I TnM t 0 + t pr (11) when IN goes down, MN1 is turned off and C p , which was charged at the previous step when IN = ''1'' starts to discharge through MN2, MN3, and MN4. Fig. 6 shows the equivalent circuit of the presented LS when IN = ''0''. In this case, to extract the falling PD (t pf ), the equation of the discharging capacitor must be used. Based on Fig. 6, where R onT equals (R on2 + R on3 )||R on4 (Fig. 6) and C p is described by (10), while i SDMP2 is a discharging current, as illustrated in Fig. 6, In (12), the falling PD of t pf equals the time interval between the input signal (IN), when it goes down, and the moment that V Cp (t) decreases to reach V M . Therefore, by replacing V Cp (t) with V M in (12), t pf is estimated by Finally, the PD is estimated by the average of t pr and t pf as follows

C. SIMULATED PERFORMANCE
The simulation results of the proposed LS are shown in Fig. 7. It shows the input voltage of 0.6 V, the current ''I '' flowing through the TnM, the V A , which is gradually increasing, and the output voltage of 3.0 V with an operating frequency of 1 MHz. In the inset of Fig. 7, the PD of the rising edge is 1.99 ns. The current flows in the TnM only during a limited time interval within the PD, provides the current I with a Gaussian waveform (Fig. 7), which is an energyoptimal waveform [44], [45]. The figure also shows V A , which is controlled by the feedback network, increasing smoothly. The peak current flowing inside the TnM during the rising edge of the input/output pulses duration, is limited to 8.5 µA. The current in the rising edge is reduced by half due to the utilization of the feedback network, which minimizes any contention and speeds up the operation of the circuit, as shown by the limited duration of this current waveform. Figs. 8 (a) and (b) show the simulated C p and voltage V A during the, respectively, rising and falling transitions in which, the LS circuit is permitted to work. In Fig. 8 (a), V A is close to zero in the beginning placing MP2 and MP4 in the saturation region. At the same time, C p exhibits its highest parasitic capacitance. When V A gradually increases, the MP2 and MP4 go into the triode region which decreases the parasitic capacitance at node A. It should be noted that the gate-source / gate-drain parasitic capacitances of a MOSFET are dominant (other parasitic capacitances are negligible) and are given by C gs = C gd = C O /2 in the triode region, and by C gs = 2·C O /3 in saturation, where C O is the total gate capacitance of the MOSFET [40]. Thus, C p follows a descending curve when the V A increases when IN = ''1''. As shown in Fig. 8 (a), C p can vary from 30 fF to 7 fF when V A increases from 0 V to 1.3 V. Fig. 8 (b) shows the simulated C p and V A when IN goes down. In this case, the voltage V A is around 1.35 V  in the beginning assuming it was previously charged, the gate voltages of MN2 and MN3 are equal to V DDL , which turns them on. As shown in Fig. 6, the TnM creates two different branches between V A and ground for discharging C p when IN = ''0'', 1) through MN4, and 2) through MN2 and MN3 in series. In the beginning, V A is large placing MN4 and MN2 in the saturation region and thus significantly increasing the parasitic capacitance at node A. At this time, C p can be as high as 32 fF according to the simulation ( Fig. 8 (b)). When V A decreases, the voltage across the TnM decreases, and consequently, MN4 and MN2 go into the triode region, decreasing the parasitic capacitance at node A. Then, when IN = ''0'' and V A starts decreasing, C p decreases as shown in Fig. 8 (b). The value of C p varies from 32 fF to 5 fF when V A decreases from 1.4 V to 0.2 V for IN = ''0''.
We compare the LS operation with a single transistor (Fig. 5 (f)) instead of using the TnM. Fig. 9 presents an input voltage of 0.6 V, a flowed current ''I '' through a single transistor (replacing the TnM), a V A gradually increasing, and the output voltage of 3.0 V with an operating frequency of 1 MHz. In the inset of Fig. 9, the PD of the rising edge is 3.18 ns. In this setting, the simulated PD was 1.6 times longer than with the proposed LS circuit, which is using the TnM. The amplitude of the current is I =11 µA, compared to I =8.5 µA when the TnM was used. To elaborate, when a single transistor is used instead of the TnM, the low side of the presented LS design (Fig. 4 (a)) performs like a conventional LS circuit (Fig. 1) that suffers from contention. Consequently, the power consumption and the PD of this LS increase compared to our solution, as simulated, and shown in Fig. 9. Figs. 10 (a) and (b) show the simulated C p and voltage V A during the, respectively, rising and the falling transitions in which the LS circuit (using a single transistor instead of the TnM) is permitted to work. In Fig. 10 (a), the input of the single transistor (used instead of the TnM) is ''0'' and the single transistor is turned off. So, the variation of C p in node A depends on the parasitic capacitances of MP2 and MP4 (Fig. 4 (a)) where C p varies from 35 fF to 7 fF during the transition of the LS output from ''0'' to ''1''. During this transition, V A varies from 0.2 V to 1.5 V. When IN = ''1'' the operation of the LS is independent of the parasitic capacitance of the low side of the LS. Therefore, the variation of C p and V A in Fig. 10 (a) and Fig. 8 (a) are the same. Fig 10 (b) illustrates the simulated C p and V A when IN goes low. In this case, the input of the LS, which is using a single transistor rather than the TnM, is ''1'' and the single transistor is turned on. Since V A has reached its maximum voltage after the previous step (IN = ''1''), it places MP2 and MP4 in their cutoff region. Therefore, C p depends on the parasitic of a single transistor and is independent of the parasitic of MP2 and MP4. When V A becomes large, the single transistor (i.e. MN2 in Fig. 1) is put in the saturation region, significantly increasing the parasitic capacitance at node A. At this time, C p can be as high as 150 fF according to the simulation (Fig. 10 (b)). When V A decreases, the voltage across the single transistor decreases, and consequently, the single transistor goes into the triode region, which decreases the parasitic capacitance at node A.
Then, when IN = ''0'' and V A starts decreasing, C p decreases as shown in Fig. 10 (b). The value of C p varies from 150 fF to 5 fF when V A decreases from 2.4 V to 1 V for IN = ''0''. Comparing Fig. 10 (b) with the Fig. 8 (b), the equivalent parasitic capacitance at node A of the LS with the single transistor is greater than with the TnM, due to the generated contention in this node, which consequently increases the PD. When the PD increases, the voltage of V A reaches V M , showing that the variation of V A (dV A /dt) is slow.
The equivalent capacitance at node A is, therefore, large (C p = i MP4 /(dV A /dt)), as shown in Fig. 10 (b).
Validating the operation of the proposed design against mismatch and process corners is critical because the current delivered to the TnM is closely related to V SG and V SD of MP2 and MP4. MP2 and MP4 shown in Fig. 4(a) are responsible for adjusting the current delivered to the TnM; hence, they must be designed and implemented to be resilient against process variations. The design is simulated for the typical case corner using typical NMOS and PMOS transistor parameters, V DDH = 3.0 V, V DDL = 1.6 V, capacitive load (C L ) = 100 fF, and operating frequency of 1 MHz. The simulation yields a value of the PD (average of rising and  presented in Fig. 11 (a). The power consumption of the LS is also assessed to validate the operation of the proposed circuit against mismatch and process corners. The design is simulated for the typical case corner using typical NMOS and PMOS transistor parameters, V DDH = 3 V, V DDL = 0.8 V, capacitive load (C L ) = 100 fF, and an operating frequency of 1 kHz. The simulation yields a total power consumption of 1.89 nW for the proposed LS. A worst-case simulation is performed to assess the effects of process variations on power consumption. The simulated values of the power consumption for the WP and WS cases are, respectively, 12.16 and 1.56 nW. A Monte-Carlo simulation is performed for V DDH = 3.0 V, V DDL = 0.8 V, capacitive load (C L ) = 100 fF, and an operating frequency of 1 kHz to further examine the influence of process variations on the operation of the proposed LS. The resulting power consumption and its log-normal distribution of variance of 0.6, mean power consumption (mu) of 2.72 nW, and standard deviation (sd) of 1.63 nW are presented in Fig. 11 (b). Fig. 12 shows the test setup used to perform the experimental validation of the fabricated LS circuit. As shown in the block diagram of Fig. 12 (a), a capacitive load is used to perform the measurement. A source measurement unit (SMU) with a resolution of 10 fA (KEYSIGHT-B2911A) is utilized to accurately measure the power consumption. Fig. 12 (b) shows the test equipment and the fabricated chip under test mounted on a custom printed circuit board (PCB). Fig. 12 (c) presents the enlarged view of one 10-pF capacitive load for the test. Fig. 12 (d) shows the screenshot of the oscilloscope and the measured input and output waveforms of the fabricated LS in parallel with amplitudes of 0.8V and 3.0 V, respectively. Fig. 12 (e) shows the enlarged view of the wire-bonded fabricated chip.

IV. EXPERIMENTAL RESULTS
The presented LS circuit was fabricated in an AMS 0.35-µm CMOS process. A photograph of the fabricated chip prototype is shown in Fig. 13 (a). The fabricated LS occupies an area of 25 × 25 µm 2 . The performance of the fabricated LS is measured over a wide range of frequencies. The maximum operating frequency of the circuit is shown for different input signal amplitudes while V DDL changes from 0.4 V to 2 V and V DDH = 3 V. The proposed LS successfully works up to 130 MHz (at V DDL = 1.6 V) as shown in Fig. 13 (b), while similar LS circuits are limited to 100 MHz, like in [37]. The total power consumption is another important parameter that must be considered. The measured power consumption of the fabricated LS circuit is presented in Fig. 14 (a) as a function of the frequency of the input pulse. Power consumption is measured experimentally for input signal amplitudes and V DDL of 0.6, 0.7, 0.8, 1.0, 1.4, and 1.8 V, a fixed V DDH = 3.0 V, and a capacitive load C L = 20 pF. The consumed power increases with the frequency of the input pulse. Except for the input inverter, which is supplied by V DDL , the other parts of the proposed LS circuit are supplied by V DDH , as shown in Fig. 4(a), and they consume the bulk of the power. V DDL is only connected to the input inverter that consumes a negligible amount of power. Accordingly, the measured power consumption shown in Fig. 14 (a) is the result of the product of V DDH and the current drained from V DDH measured by the SMU. Among the six measured curves, the highest power consumption variation occurs when the circuit converts the input signal with the lowest amplitude into 3 V (from 0.6 V to 3 V). While the lowest power consumption variation is obtained when the circuit converts the input signal with the highest amplitude into 3 V (from 1.8 V to 3 V). The circuit needs more power to provide a higher conversion gain. Since in all six measured curves presented in Fig. 14 (a) V DDH is fixed, the variation of the measured consumed power versus V DDL are close to each other.
The variation in power as a function of the input frequency over several values of V DDH (1.8, 2.0, 2.2, 2.4, 2.6, 2.8, and 3 V) is shown in Fig. 14 (b). The results are obtained for a fixed V DDL and a capacitive load C L = 20 pF. In this set of measured curves, the power consumption also increases with the frequency. Here, the highest measured power occurs at V DDH = 3 V while the lowest one is observed at V DDH = 1.8 V.
The measured consumed power variation of the LS circuit is presented in Fig. 14 (c) for different values of V DDL at V DDH = 3.0 V, capacitive load C L = 20 pF, and an operating frequency of 1 kHz. In this descending curve, when V DDL increases (i.e. the amplitude of the input signal increases), the implemented LS requires a lower amount of energy to convert the input signal to a signal with an amplitude of 3 V, because V DDH is fixed at 3 V. The total power consumption of the LS circuit versus V DDH for a fixed V DDL = 0.8 V, capacitive load C L = 20 pF, and an operating frequency of 1 kHz, is shown in Fig. 14 (d). The measured power consumption that is within a range of a few nW, increases with V DDH .
The static power consumption (P S ) of the proposed LS circuit is measured for different input signal values. The P S measured for these DC input values, without applying any input pulse, is within a few pW. When the input of the LS is connected to GND (''0'') or V DDL (''1''), the measured P S are, respectively, 31.5 and 260 pW. The power consumption versus capacitive load (C L ) at 1 kHz pulse shaped input, V DDL = 0.8 V, and V DDH = 3 V is measured and presented in Fig. 15. In this measurement, C L is changed from 0.1 pF to 100 nF and the power is measured at each step. As C L increases, the power consumption increases.
The PD of the LS circuit is measured for an input pulse signal with a frequency of 1 MHz and a duty cycle of 50% while an inverter, integrated on the chip, is used as a load. As shown in Fig. 16 (a), the LS shows a rising PD of 4 ns and a falling PD of 11.2 ns (see Fig. 16 (b)). Thus, the average PD of the proposed LS after a transition is 7.6 ns.
Given that the PD is an essential parameter of any LS circuit, Figs. 16 (c) and (d) are provided to show the variation of the PD versus the voltage variation in V DDL and V DDH . The rising and falling PDs versus V DDL at a fixed V DDH = 3 V, and while using an integrated inverter as load, are presented in Fig. 16 (c). The PD, especially the rising edge delay for V DDL < 0.8 V, is long because of the larger conversion gain of the circuit when it is converting a low-amplitude signal into a high-amplitude signal of 3 V. The measured delay variation curve, especially the falling PD, is almost flat for V DDL > 0.8 V. The measured variations of the rising and falling PDs versus V DDH at fixed V DDL = 1.6 V, while an integrated inverter is used as load, are shown in Fig. 16 (d). In this figure, the PD is measured when the LS converts a pulse shaped signal with a 1.6 V amplitude into 3 V. In Fig. 16 (d), the PD of the circuit is almost flat, especially the rising PD, which shows that the circuit provides good performance at lower conversion gain (converting 1.6 to 3 V).
Designing circuits to work in subthreshold is essential for low-power budget applications, such as in biomedical implants [40], [46], [47], [48]. The results in Fig. 17 show that the operation of the circuit in subthreshold meets the requirements of high-performance operation due to the TnM and feedback network of the proposed LS ( Fig. 4 (a)). Fig. 17 presents the measured waveforms of the implemented LS for an 80 mV and 50 kHz input pulse converted into a   3.0 V waveform. As can be seen, a clean 3 V pulse is extracted at the output of the presented LS circuit from an ultra-low amplitude input pulse. Therefore, the proposed LS provides a convenient interface between a weak signal and a digital processor.

V. DISCUSSION AND PERFORMANCE COMPARISON
The achieved experimental validation revealed that the utilization of the TnM remarkably improves the performance of the proposed LS. The TnM relies on the equivalent parasitic capacitances of the MOSFET utilized in the LS circuit. The feedback network prevents the LS circuit from consuming a large amount of power during the transitions. This minimizes the PD, enables operation in subthreshold, and minimizes static and dynamic power consumption. Table 2 summarizes and compares the performance of the presented LS circuit to other solutions. The experimental results show an improvement of approximately 8.5% of the PD as compared to [36], which uses a current mirrorbased LS. In addition, the proposed circuit can successfully convert a pulse signal over a wide range of conversion (0.08-3 V) and with an amplitude as small as 80 mV into 3 V while working in subthreshold and consuming only 28 nW. Moreover, the measured maximum operating frequency of the fabricated circuit improved by 30% as compared to the other solutions. The static and dynamic power consumption is comparable with those of the recently published papers in this field. A reasonable criterion in the performance evaluation of a LS design is to measure the PD while the circuit is working over a given conversion range. Indeed, a larger conversion range results in longer PD. Therefore, a figureof-merit (FoM1) is recommended to assess the performance of the PD over the conversion range. In our proposed LS, the PD of 7.6 ns is measured while the LS was converting a pulse with an amplitude of 1.6 V to a pulse with an amplitude of 3 V. Thus, a FoM1 = (3.0-1.6)V/(7.6 ns) = 0.184 × 10 9 is achieved. With this FoM1, a larger value is the better. A second FoM2 is defined to compare the different LS designs together according to the trade-off between the power consumption (P C ) and the PD (Delay). It is beneficial that a LS achieves the lowest P C for the shorter delay, while converting V DDL as low as possible. Therefore, in the defined FoM2, the V DDL , P C , and Delay appear at the numerator. Thus, a LS providing a smaller FoM2 would achieve a better performance. To facilitate comparison, FoM2 is normalized according to the V DDH employed in each circuit. The product of parameters P C and Delay is divided by the square of V DDH . The values of P C and Delay published in the literature for other systems were reported for comparison in Table 2 for different values of V DDL . To draw a fair comparison using this FoM, we normalized the values of P C and Delay according to V DDL . Then, FoM2 can be calculated as follow: As it can be seen in Table 2, our LS design has the lowest FoM2 of 3.67, suggesting that our design outperforms other solutions in terms of trade-off power consumption vs PD for a given V DDL .

VI. CONCLUSION
A new high-performance LS circuit topology is presented in this work. The presented current mirror-based structure offers a wide-range and fast LS conversion. Our design can convert subthreshold input signal levels to above-threshold levels with minimum PDs, leveraging the dynamic equivalent parasitic capacitance value of the TnM, which depends on the value of the transient input signal IN. The operation of the proposed circuit is efficient, especially in the case of contention on high-impedance nodes, owing to the utilization of a feedback network and a voltage controlled current source (VCCS). An equivalent parasitic capacitance of three interconnected n-type MOSFETs (TnM) circuit is used to perform the level-shifting part of the LS in addition to forming two discharge paths for a C p when IN = ''0''. The advantageous effect of this TnM minimizes the PD and dynamic power consumption. In addition, this approach enhances the conversion range as compared to the other mechanism. The measured performance of the test chip fabricated in a 0.35 µm AMS design process outperforms other compared solutions. The discussion and the measured performance of the presented LS illustrated the numerous advantages of this solution, namely wide conversion range, low power consumption, and short PD. Laval, where he holds the Canada Research Chair in smart biomedical microsystems. His significant contribution to biomedical microsystems research led to commercializing the first wireless electro-optic bioimplant to study the development of brain diseases in freely behaving animal models by Doric Lenses Inc. His research interests include wireless microsystems for brain-computer interfaces, analog/mixed-mode, and RF integrated circuits for neural engineering, interface circuits of implantable sensors/actuators, and point-of-care diagnostic microsystems for personalized healthcare. He is a fellow of the Canadian Academy of Engineering. He received several prestigious awards, including the NSERC Brockhouse Canada Prize and the Prix Génie Innovation of the Quebec Professional Engineering Association OIQ. He is the Chair and the Founder of the IEEE CAS/EMB Quebec Chapter (2015 Best New Chapter Award). He served on the committees of several international IEEE conferences, including NEWCAS, EMBC, LSC, and ISCAS. He is an Associate Editor of the IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS. VOLUME 10, 2022