Reconfigurable Signal Processing and DSP Hardware Generator for 5G and Beyond Transmitters

The digital front-end of the communication transceivers envisioned for fifth-generation (5G) and beyond requires highly configurable high-performance digital signal processing (DSP) hardware operating at very high sampling rates to accommodate increasing signal bandwidths and support a range of modulation schemes and transmitter architectures. In this article, we present an efficient implementation of a highly configurable DSP hardware generator that can generate high-performance DSP hardware for multiple transmitter architectures including Cartesian, polar, outphasing, and multilevel outphasing modulators. The generated hardware unit, which consists of multistage multirate filters and other required DSP operations, runs at sample rates up to 4 GHz. The hardware supports an adjacent channel leakage ratio (ACLR) down to −48 dB and an error vector magnitude (EVM) of 0.78% with a 7-bit phase signal at a sampling rate of 4 GHz for multilevel outphasing modulation. Digital synthesis of the circuit in a 5-nm complimentary metal-oxide semiconductor (CMOS) process yields a core area consumption of 0.01 mm2 and an estimated power consumption of 37.2 mW for a 200-MHz bandwidth 5G new radio (NR) baseband (BB) signal.

Fig. 1.Reconfigurable DSP processor between BB signal generator and RF components of a transmitter.and spectrum requirements of 5G and beyond transmitters, programmable digital-intensive signal processing units operating at high sampling rates and power efficiency are needed because complex signal processing operations implemented in the digital domain rather than analog domain offer better scalability, lower cost, and better performance [2], [3], [4], [5], [6].Due to increasing demands for versatility in modern communication systems, there is a need to implement a reconfigurable hardware generator for digital signal processing (DSP) units to reduce the high cost of redesign, while handling the multitude of signal processing tasks in 5G and beyond radio transmitters [7].The generator is also expected to support other hardware specifications, including high reusability, high flexibility, high clock frequency, low power consumption, and support for various modulation schemes such as the Cartesian, polar, and outphasing systems.The goal is to develop a universal design methodology for DSP for next-generation transceivers while providing all the benefits of reconfigurable hardware generators by creating an application-specific, highperformance custom DSP processor.
In this article, we present a reconfigurable signal processing hardware generator developed using a highly automated development and simulation environment.The hardware generator is capable of producing high-performance DSP hardware that is widely configurable at both the compile time and run time.The generated hardware includes a programmable interpolation chain and signal processing unit required for the Cartesian, polar, outphasing, and multilevel outphasing transmitter architectures.Fig. 1 shows the reconfigurable DSP processor within a transmitter chain, described in this article.Several programmable options for the DSP in terms of signal scaling and time interleaving for multiple modulation techniques were designed in the signal processing architecture to support operation over the 5G new radio (NR) frequency ranges (FR1-3).An example of the generated DSP hardware supports an adjacent channel leakage ratio (ACLR) down to −48 dB and an error vector magnitude (EVM) of 0.78% with a 7-bit phase signal at a sampling rate of 4 GHz, for multilevel outphasing modulation.Digital synthesis of the circuit in a 5-nm CMOS process yields a core area consumption of 0.01 mm 2 and an estimated power consumption of 37.2 mW for a 200-MHz bandwidth 5G NR baseband (BB) signal.Although the focus of this work and presentation is approximately around 5G FR2 specifications, it should be noted that similar implementation principles will likely apply in the sixth-generation (6G) era.As most of the existing 5G bands are reformed, 6G networks are expected to obtain a new spectrum in the 7-24-GHz frequency range, commonly referred to as FR3 [8].
This article is organized as follows: Section II presents the signal processing required for different modulation techniques.Section III describes the architecture and hardware implementation of the multimodulation DSP hardware and related optimization techniques for improved area and power efficiency in the DSP unit.Section IV describes the environment and the hardware generator used to develop and design the reconfigurable DSP hardware generator.This section also includes varying performance results regarding ACLR and EVM for the behavioral hardware model of DSP.Performance metrics in terms of ACLR and EVM, as well as area and power consumption of the synthesized DSP hardware, are presented and discussed in Section V, and Section VI concludes the article.

II. SIGNAL PROCESSING FOR MULTIMODULATION TRANSMITTERS
In modern transmitters, three types of signal composition are generally applied, namely, Cartesian, polar, and outphasing.The relevant fundamental principles regarding these schemes and the related DSP hardware implementation are briefly discussed in Sections II-A-II-D.

A. Cartesian Transmitters
Cartesian transmitters are based on direct modulation of complex BB signals in terms of the in-phase component I (t) and the quadrature component Q(t), which represent the real and imaginary parts of the complex BB signal.Such an I/Q modulation principle can be expressed as where V (t) is the output signal of the transmitter, and ω c is the angular frequency of the carrier signal.Cartesian transmitters typically use high-resolution radio frequency (RF) D/A conversion for upconverted BB signals, which limits the feasibility of developing a power-efficient implementation due to AM-AM, AM-PM, PM-AM, and PM-PM nonlinearity in high-frequency operation [9], [10], [11], [12].

B. Polar Transmitters
In polar transmitters, the complex BB signal is expressed in terms of polar coordinates, that is, the amplitude A(t) and phase φ(t), thus losing the bandlimited property of the Cartesian I/Q signal.The corresponding high-frequency modulated signal can be expressed as where the relationship between the two transmitter architectures is given by The polar components, i.e., the amplitude A(t) and the phase φ(t) for this transmitter architecture, are often realized by the coordinate rotation digital computer (CORDIC) algorithm [13], which converts the I/Q data into AM-PM data and can be implemented in the digital domain using a digital signal processor [14], [15], [16], [17], [18].

C. Outphasing Modulation
Modern BB modulations such as orthogonal frequencydivision multiplexing (OFDM) provide signals with high peak-to-average power ratio (PAPR).To transmit these signals at high power, the PA is required to operate in the linear region inefficiently with large power backoff.However, outphasing is a technique that applies phase modulation to achieve linear amplification efficiently [19], [20].The outphasing modulation technique uses the summation of two complex vectors with constant amplitude and different phase modulation, as opposed to the traditional polar transmitter, as shown in Fig. 2(a).These vectors are represented as follows [19], [21]: where V (t) is the modulated RF signal, S 1 (t) and S 2 (t) are two constant-envelope signals, and θ (t) is the outphasing angle defined by the normalized amplitude output A(t) Due to the linear operation of PA in RF front-end, highthroughput, low-power, high-accuracy digital outphasing transmitters have been designed for millimeter-wave applications in recent years [22], [23].

D. Multilevel Outphasing Modulation
To decrease the signal dynamics in phase and increase the power efficiency in the outphasing architecture, which is related to the fundamental constraints of the RF components of a transmitter, a multilevel outphasing architecture has been proposed [24], [25], [26].In this case, the power efficiency is increased using discrete amplitude levels A MOP (t) and is expressed as where A MOP (t) represents equally spaced discrete amplitude levels and is defined by A mo (t) = ⌈A(t)A max ⌉ (11) where A max is the maximum of discrete amplitude levels, and the outphasing angle θ (t) in this case is described as

E. Emerging Transmitter Architectures
In addition to the four classic transmitter architectures described above, which are supported by the reconfigurable DSP hardware generator proposed in this work, there are other emerging transmitter architectures of interest, which are briefly described below.The multiphase architecture aims to increase the power efficiency of the PA by reducing the adjacent LO phases that are smaller than π/2 [27], [28].The hybrid polar I/Q architecture, on the other hand, benefits from the superior RF-DAC drain efficiency of the digital polar architecture and the relaxed tuning range of the phase modulator due to the matched I/Q components of the constrained phase [29].The RF-PWM architectures benefit from the amplitude-based pulsewidth modulation.However, generating a narrow pulse at the RF front-end becomes difficult [30].Finally, RF-QAM uses power amplification of the quadrature phase shift keying (QPSK) signal directly in the RF domain before combining it to facilitate linearity of the PA [31].The hardware for these emerging architectures can be supported by chisel-based generators like the one proposed, but they are not covered in this work.

III. RECONFIGURABLE SIGNAL PROCESSOR
This section describes the hardware implementation of the reconfigurable signal processing unit that supports the Cartesian, polar, outphasing, and multilevel outphasing modulation schemes, as well as their respective architectural features.Fig. 3 shows the data path of the input BB signals (I (t) and Q(t)) through the two main components of the implemented transmitter DSP hardware, namely, the interpolation filters that support sample rate conversion (SRC) ratios of up to 128 and the signal component separator (SCS) that supports signal conversion for multiple modulation schemes.The proposed DSP hardware includes a unified control bus to control various operating parameters of the internal signals within the modules and a configurable clock divider to provide divided clocks from a f Clk = 4 GHz clock input.Table I shows the configurable clock divider output for n = 2 and k = 4 for different modules of the DSP hardware.The programmable interpolator chain, together with the reconfigurable hardware generator for SCS, acts as an enabler for the next-generation linear amplification using nonlinear component (LINC) transmitters [21].

A. Interpolation Filters
For systems with nonlinear signal transformations, i.e., polar and outphasing systems, the bandwidth of the BB signal increases by five-ten times the bandwidth of the Cartesian signal, which is often referred to as the bandwidth expansion.Therefore, the BB signal must be interpolated as shown in Fig. 1 to account for the increased bandwidth and avoid aliasing of the images in the RF components of the transmitter [32].In this work, we determine the interpolation factor required for a given carrier modulation based on the EVM specified in the 5G NR technical data sheet [1] for the respective modulation techniques, while allowing enough margin to cover the performance degradation expected due to impairments and nonidealities in the analog/RF circuits.The calculations are performed by means of numerical simulations using the developed system model.
Interpolation of a signal can be performed both in the frequency domain and in the time domain.Interpolation in the frequency domain causes additional latency compared with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I OPERATIONAL FREQUENCY OF DSP HARDWARE MODULES
FOR n = 2 AND k = 4 interpolation in the time domain [33].In the time-domain SRC, the input data are upsampled and passed to a digital low-pass filter.Interpolation of a signal by an integer factor L in time is achieved by inserting L − 1 zeros between successive values of the signal.To eliminate the images of the BB signal at π/L intervals resulting from the insertion of (L − 1) zeros into successive samples, a low-pass filter with a frequency response of H L (ω y ) is required, as follows: where C is a scaling factor to normalize the output signal and is equal to L. ω y = ω bb /L is the interpolated angular frequency and ω bb is the angular frequency of the input BB signal.
To meet the requirements of ACLR and EVM for different BB modulations in the 5G NR standard [1], configurable SRC ratios are required to achieve multiple interpolation factors.The minimum integer oversampling ratio that can be achieved in digital circuits is 2. To efficiently enable multiple interpolation ratios, a configurable interpolator chain consisting of three half-band filters (HBFs) followed by a cascaded-integrator comb (CIC) filter is used in this work.The three cascaded HBFs provide the necessary oversampling ratio for the CIC filter to avoid the passband drop within the CIC while minimizing the computational cost in terms of area and power consumption since the HBFs require significantly fewer multipliers compared with a regular FIR filter with similar specifications.In this work, a configurable deserializer is designed within the interpolator to obtain the interpolated output from different cascaded stages of the interpolation chain to obtain different interpolation factors, namely, 2, 4, 8, 16, 24, . . ., 128.Fig. 4 shows the frequency responses of the three cascaded HBFs and one CIC filter used in this work.The first HBF has a cutoff frequency at ( f s /8), while the second and third HBFs have cutoff frequencies at ( f s /4) and ( f s /2), respectively.Here, f s is the maximum interpolated sampling frequency of the signal, which in this work is 4 GHz.The cutoff frequency of the first HBF, which is at ( f s /8), eliminates the need for a CIC droop compensation filter, allowing us to implement an efficient configurable interpolation chain without an additional FIR droop compensation filter running at high frequency.
1) Half-Band FIR Filter Design: Linear-phase HBFs are used for interpolation in multirate filter applications [34], [35], [36].To achieve the oversampling ratio required by the CIC filter, this work uses three cascaded HBFs, each with an interpolation factor of 2 and increasing attenuation stop  bands, for efficient implementation.In this case, the polyphase implementation of the odd symmetric HBF of type-II is realized for high area and power efficiency [37].Fig. 3 shows the functional block diagram of the interpolator, where the output is designed to bypass the different stages of the HBFs depending on the required interpolation factor.
To increase power efficiency, the polyphase realization of FIR filters is implemented, as shown in the circuit diagram in Fig. 5, which allows us to operate the FIR filters at a frequency that matches the input sampling rate.In this work, 60-dB suppression is achieved for images recurring at (π/16) in the frequency domain.Because of the symmetry of HBFs around their central coefficient, the number of multipliers can be halved using a direct transpose FIR filter structure.This is expressed as follows: where Similarly, the other HBFs (H hbf1 (z) and H hbf2 (z)) were designed using the principle of maximum efficiency for power consumption and area.The total number of interpolation coefficients required and the number of multipliers needed in these HBFs are listed in Table II.
2) Cascaded Integrator-Comb Filter Design: CIC filters are computationally efficient linear phase low-pass filters and are often used in interpolation structures [38], [39].To achieve attenuation of the images below 60 dB [40] and a balanced implementation between power consumption, speed, and area, a CIC filter of order N = 3 is used in this work.The transfer Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.function of the implemented filter is expressed as As a third-order filter, the CIC has three stages of cascaded comb filters clocked at ( f s /L), where L is the interpolation factor, followed by three stages of integrators running at a frequency of f s = 4 GHz.Since the integrators run near the operational frequency of the digital design, the integrators can be unrolled by an integer factor k, allowing the integrators to run at a frequency of f s /k while meeting the timing requirements of the circuit design.However, increasing the unrolling factor k increases the number of parallel paths and the critical path of the circuit operating at lower frequencies.
The number of parallel adders and sequential elements also increases as the value of k increases to compensate for the larger critical path, which may result in slightly higher area and power consumption.In this work, an unrolling factor of k = 4 is used to account for the tradeoff between area and power, as shown in the schematic in Fig. 6.Unrolling the integrators for an integer factor of k can be done as follows:

B. Signal Component Separator
An SCS block is required to perform complex mathematical operations to compute phase and amplitude signals from the interpolated I (t) and Q(t) signals for multiple modulation schemes.The implementation of SCS is efficiently done with a CORDIC module [41], which is used to calculate the phase φ(t) and the amplitude A(t) for polar transmitters [see ( 4) and ( 3)], and a CORDIC-based module is used to compute outphasing phases, θ (t) for outphasing transmitters [see (8)].
To alleviate the scaling problem of the rotation vector when computing the inverse cosine for outphasing transmitters, the double iteration algorithm is implemented in this work due to its improved accuracy [42].The algorithm is given by As n → 0, 1, 2, 3, . . ., ∞, θ n → cos −1 t, and θ 0 = 0, x 0 = 1, y 0 = 0, t 0 = t are the initial values for the angle, coordinates of the rotating vector, and input to the inverse cosine function, respectively.Also, tϵ[−1, 1] and d n is defined as System-level simulations revealed an acceptable value of n = 15 to achieve a minimum error rate in the computation of the inverse tangent and cosine functions.The output phase signals φ, φ 1 = φ + θ, and φ 2 = φ − θ , as shown in ( 4) and (5), must be rotated from [−π, π] to [0, 2π ] to convert them into 7-bit phase signals.For this purpose, the sign of the phase signals is continuously monitored.If it is negative, the resulting phase is summed with 2π as cos (2π + θ ) = cos θ .However, in the case of multilevel outphasing modulation, the SCS requires additional hardware to calculate the different amplitude levels A MOP (t), as shown in (9).In the case of nonlinear modulation schemes, namely, polar, outphasing, and multilevel outphasing, the pipeline delay is balanced between the amplitude A(t), A MOP (t) and the phase angle φ, φ 1 , φ 2 paths to avoid any delay mismatch between the amplitude and phase signals.
To increase the throughput of the SCS, this work implements pipelined hardware with a series of shift and add operations for the CORDIC algorithms instead of feedback structures, at the cost of increased hardware resources.However, like the integrators in the CIC filter, the SCS needs to be time-interleaved to operate at a high clock frequency.In this work, we timeinterleaved the SCS by the same factor k, that the integrator in the CIC filter is unrolled.Fig. 7 shows the configurable architecture of the time-interleaved SCS, where each SCS module runs at a frequency of ( f s /k), with the output data sampled by a multiplexer at a frequency of f s .

IV. HARDWARE GENERATOR AND VERIFICATION
The DSP hardware generator for multiple modulations was developed using CHISEL [43] in the TheSyDekick (TheSDK) environment.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. TheSDK Environment
TheSDK is a python-based simulation, design, and test framework for integrated circuits [44].The python-based environment benefits from the high-level object-oriented programming language for system modeling.In this work, TheSDK is used to develop subentity classes for the transmitter that houses reconfigurable DSP, enabling modular and closed-loop design exploration and optimization with real hardware.TheSDK aims to model hardware according to the "test first" design principle [44].The design phase begins with a high-level behavioral model and system testbench for the design under test.Since the environment aims at modular hardware development, each module can be used as an entity class for both the python system model/circuit system model (py) and the hardware models (rtl), as shown in Fig. 8.This enables connection between different components with pointers and also between the signal source and the signal analyzer.In our work, TheSDK is used to develop the hardware model of a reconfigurable DSP in a transmitter chain.

B. Reconfigurable Hardware Generator
Fig. 9 shows the block diagram of our transmitter chip model in TheSDK environment.Different parameters (the operation and generation) for both the hardware model and the circuit system model in each module of the transmitter can be used to evaluate the performance of the reconfigurable DSP.In addition to system evaluation, each subsystem can be verified individually and the performance degradation of each submodule can be measured along the data path, compared with the circuit system model.The reconfigurable DSP hardware was developed using CHISEL [43], which uses Scala to support the highly parameterized circuit generator, which is a weakness of traditional HDLs.In our implementation of the reconfigurable DSP hardware generator, there are two sets of configurable parameters, as shown in Fig. 9.One set of the parameters includes the hardware operation parameters that can be run-time reconfigured on the hardware synthesized by the hardware generator during the operation of the transmitter, whereas the other set includes the hardware generation parameters which are used to reconfigure the DSP hardware generator to generate hardware of a particular architecture.As an example, we illustrate the generation capability of one of the highly parameterized subsystems, the SCS.In Chisel, the basic configurable generator parameters for the SCS subsystem, i.e., modulation type (SCSType), output resolution (SCSOutRes), and the number of parallelizations of the SCS module (SCSPar), can be defined as follows.The main class for the SCS generator component can be defined with the desired configurable parameters, as mentioned earlier, along with some other parameters, namely, the input resolution (SCSInRes) and the number of amplitude levels (SCSAmpLvl).These parameters are then passed to various classes within the SCS class for hardware generation as follows.The variable scs_single is generated with the clock domain of f s /k, as can be seen in Fig. 7, with an entity core for different modulation techniques, namely, Cartesian, polar, outphasing, and multilevel outphasing, depending on the value of scs_type.For each modulation scheme, the entity core (scs_single) is pipelined to balance any delay mismatch between the amplitude and phase paths.However, outphasing and multilevel outphasing modulation share the same entity core in this work to enable operation in LINC transmitters.
The second configurable generator parameter, scs_outRes, maps the output phase signals from [−π, π] to [0, 2π ] with a quantization step depending on the generator parameter scs_outRes for polar, outphasing, and multilevel outphasing techniques.
Finally, the configurable parameter scs_par generates the number of registers within the deserializer in the interpolator module and creates multiple instances of the SCS module to be parallelized.The parallelization of SCS is performed in this work as follows.v a l s c s = ( 0 u n t i l s c s _ p a r ) .map ( x => s c s _ s i n g l e .

i o ) . t o L i s t
The variable scs is generated as a list of the module scs_single, and the number of instances in the list is defined by the parameter scs_par.

C. Simulation Results
To explore the configurability and modularity of TheSDK, the hardware model (r tl) for reconfigurable DSP is simulated as part of the transmitter chip model for different configuration and generation parameters.The performance of the system for multiple modulation schemes is evaluated using the EVM and the ACLR with a 7-bit output amplitude and phase signals.Fig. 10 shows the performance of the hardware DSP for multiple carrier modulations generated using the hardware generation parameters of the reconfigurable DSP HW Gen. and for multiple BB subcarrier modulations, namely, BPSK, 4-QAM, 16-QAM, 64-QAM, and 256-QAM, configured using the hardware operation parameters of the BB signal generator.Whereas in Fig. 11, the performance of the hardware DSP is shown for multiple carrier modulations generated using the hardware generation parameters of the reconfigurable DSP HW Gen.The performance of the DSP hardware is also Fig. 10.EVM and ACLR performance of reconfigurable DSP HW Gen. with BB modulation sweep.evaluated for multiple carrier bandwidths (10-100) MHz in 5G FR1 and (50-400) MHz in the FR2 frequency range, while the other configuration parameters remain unchanged across different cases.In Fig. 12, on the other hand, only the interpolator chain of the reconfigurable DSP is evaluated as a hardware model (r tl) for different BB modulation schemes of the BB signal generator, while the other subsystems of the DSP are from the circuit system model ( py).It can also be seen from Fig. 12 that the degradation in terms of EVM and ACLR of the interpolator hardware unit is the same for different carrier modulations compared with the circuit system model.The difference in the values of EVM and ACLR is due to the quantization error between the finite-precision interpolator hardware (rtl) and the python-implemented ideal behavior model of the transmitter.The interpolator hardware uses different fixed-point resolutions at different stages of the filter for maximum area efficiency, while the python model uses floating-point resolution.

V. SYNTHESIZED PERFORMANCE
The fixed-point implementation of the reconfigurable DSP hardware was synthesized in 5-nm technology node and simulated using TheSDK.The master clock frequency of the entire digital signal processor is 4 GHz and is divided internally using a programmable clock divider for multiple clock domains.The performance of the synthesized system for multiple modulation schemes is also evaluated using the EVM and ACLR.The hardware resource utilization and power consumption of different hardware architectures for multiple modulations generated by the proposed hardware generator are also discussed in this section.

A. Number Representation and Reduction in Switching Activity
The normalized BB signal corresponding to the 5G NR communication standards was generated using the BB signal generator.To feed the normalized in-phase (I ) and quadrature signal (Q) into the hardware as 16-bit input signals, we scale the input data to fit a signed fractional Q format [45]db@TI Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply., as mentioned in our earlier work [46].Fig. 13(a) shows the Q(1.14) format used in this work, where 14 bits are reserved for the fractional part of the input data, along with 1 bit for the integer and 1 for the sign bit.In our scaling method of Q(1.14) representation, an additional 10% reduction in the switching activity of the sequential elements of the digital design is observed compared with a signed integer Q(15.0) representation.This choice trades a slight reduction in precision to obtain a reduction in power consumption by reducing switching activity.
The module within the DSP hardware that contributes most to the reduction in switching activity is the SCS, as can be seen in Fig. 13(b).The SCS module accounts for more than 80% of the switching activity in the hardware due to the presence of multiple time-interleaved instances.Fig. 13(c) shows the switching efficiency of the different submodules for different modulations in the reconfigurable DSP hardware.It can be seen that most of the switching efficiency comes from the SCS module in the outphasing architecture.The switching efficiency in the outphasing architecture comes mainly from the CORDIC-based module to calculate the outphasing angles from a single amplitude level instead of multiple amplitude levels, which requires more switching activity in multilevel outphasing (multi_op) compared with the other submodules.

B. Summary of Post-Synthesis Performance
EVM and ACLR are evaluated for the synthesized reconfigurable digital signal processor using a 200-MHz 64QAM 5G NR signal as BB input.Table III, shows the comparison between the ideal python-system model (S/W) and the synthesized hardware model (H/W) of the DSP for 7-bit phase and amplitude signals, for different modulation schemes and different interpolation factors, namely, 16, 8, and4.The difference between the S/W and the H/W values of the EVM is due to the quantization error of the finite-precision hardware.However, the increasing interpolation factor for different BB modulations results in a lower EVM value because of a better reconstruction of the BB signal, achieved mainly from higher sample rates while using the same signal precision.
Similar to EVM measurement, the lower ACLR (ACLR 1 ) and the higher ACLR (ACLR 2 ) to the center frequency are also compared for both the python-system model (S/W) and the synthesized hardware model (H/W) of the proposed reconfigurable DSP hardware for multiple modulation architectures in Table IV.As with the EVM measurement, the difference between the software-based and hardware-based values of the ACLR is mainly due to the quantization error from finite-precision hardware.However, the difference is more pronounced for Cartesian modulation because, in the hardware implementation, each stage of the interpolation filters uses different fixed-point resolutions for maximum area efficiency, while in software model floating-point precision is used.

C. Area and Power
Area and power are among the most important design parameters in digital design, as optimizing both requires Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.power consumption at a high operating frequency, we optimized the interpolation filters and SCS to accommodate fewer sequential units.However, to meet the timing constraints in high-frequency operation, part of the DSP processor, i.e., the SCS, had to be replicated multiple times to take advantage of time interleaving, resulting in increased hardware resource requirements.
The implementation of the proposed modulation architectures generated by the reconfigurable DSP hardware generator was synthesized in a 5-nm technology node.The results are summarized in Table V.In this work, the design of the interpolator is the same for different transmitter architectures.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VI DSP HARDWARE PERFORMANCE WITH PRIOR ARTS
The difference in the area values in Table V comes from the variations in the optimization process of the circuit design tool for different modulation schemes.For the SCS module, on the other hand, the difference comes from the signal processing requirements for different carrier modulations.Table V also shows that the area for outphasing and multimodulation is four times that for polar, which is due to the time interleaving architecture to meet the circuit design timing requirements.Since power consumption is directly proportional to area at constant clock frequency, the Cartesian architecture consumes the least power due to the absence of coordinate conversion hardware.In contrast, the multimodulation architecture, which supports outphasing, multilevel outphasing, and Cartesian modulation techniques, consumes the most power.

D. Hardware Performance With Prior Art
Direct comparison of the performance of the proposed configurable hardware supporting multiple modulation/architectures synthesized toward a 5-nm CMOS technology with other works from literature, which varies in architectures supported and in technology, is challenging.Furthermore, the comparison is difficult also due to the limited information on the power-to-performance parameters specifically of the digital front-end for different transmitter architectures in other works.Nevertheless, a comparison of the simulated performance of the proposed with other relevant state-of-the-art digital front-end systems (BB only) is provided in Table VI.
It can be seen that the area consumption for the multimodulation DSP processor synthesized toward the 5-nm technology node in this work has decreased by about 90.96% compared with our previous work [46], where a similar and comparable hardware operating at similar speeds is synthesized toward a 22-nm CMOS process.The hardware in 5 nm also consumes about 73% less power than the implementation in 22 nm.These improvements can mostly be attributed to the benefits of technology scaling, further emphasizing the need for a highly reconfigurable hardware generator to enable low-cost porting of hardware across different technologies.The proposed digital front-end hardware for different transmitter architectures generated by a reconfigurable DSP hardware generator and synthesized in the state-of-the-art CMOS process demonstrates the efficiency and flexibility that can be achieved while designing a high-performance, low-power digital signal processor for next-generation communication systems.

VI. CONCLUSION
This article proposes a universal design method for a reconfigurable signal processing hardware generator for multiple transmitter architectures.The DSP hardware generator supports a low-power, high-speed reconfigurable digital signal processor that includes a flexible hardware generator for multiple modulation schemes.The generated hardware for the reconfigurable DSP includes an interpolation chain for upsampling the BB signal and an SCS for nonlinear conversions of the BB signal for polar, outphasing, and multilevel outphasing modulation schemes.Multiple architectures for the reconfigurable DSP are synthesized in a 5-nm technology node.The generated hardware for multimodulation DSP achieves an EVM of 0.78% and an ACLR of −48 dB for multilevel outphasing modulation with a 200-MHz 5G NR BB signal.The proposed hardware for multimodulation consumes about 73% less power than our previous implementation [46] synthesized at the 22-nm technology node at an output sampling rate of 4 GHz.The proposed hardware generator is not only a generator for multiple transmitter architectures but also can be configured to have multimodulation output with an output sampling rate of up to 128 compared with the BB frequency.The reconfigurable DSP hardware generator presented in this work enables efficient development of nextgeneration millimeter-wave transmitters for 5G and beyond communications systems.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 3 .
Fig. 3. Functional block diagram of the proposed DSP hardware.
is the transfer function of the third HBF and b 0 , b 1 , . . ., b 6 are the coefficients of the FIR filter, b 1 = b 5 = 0 and b 0 = b 6 , b 2 = b 4 , b 3 = 1.The odd-symmetric nature of the half-band FIR filters can reduce complexity, as shown in (14) and as illustrated by the circuit diagram in Fig. 5.
o b j e c t SCSParams { v a l SCSType = 3 v a l SCSOutRes= 7 v a l SCSPar = 4 } where SCSType = 3 represents multilevel outphasing modulation, SCSOutRes = 7 denotes the output resolution used in this work, and SCSPar = 4 defines the number of time-interleaved instances for the SCS module.
c l a s s SCS ( s c s _ t y p e : I n t = SCSParams .SCSType , s c s _ i n R e s : I n t = SCSInRes , s c s _ o u t R e s : I n t = SCSParams .SCSOutRes , scs_ampRes : I n t = SCSAmpLvl , s c s _ p a r : I n t = SCSParams .SCSPar ) e x t e n d s Module { \ l d o t s The first programmable parameter, scs_type, is enumerated for different modulation techniques, and the pseudostate Algorithm 1 Pseudostate Machine for the Enumerated Values of Modulation Type (SCSType) machine for generating different architectures for different modulation techniques is shown in Algorithm1.

Fig. 12 .
Fig. 12. EVM and ACLR performance comparison between the rtl model of the interpolator submodule and the python model of the same for different carrier modulation with BB modulation sweep.

Fig. 13 .
Fig. 13.(a) Representation of fixed-point data type with separate integer and fractional part of input data, (b) switching activity for reconfigurable hardware generator, and (c) switching reduction due to Q(1.14) of different components compared with Q(15.0) for different modulation techniques.
Agnimesh Ghosh (Student Member, IEEE) received the M.Sc.(Eng.)degree from Lund University, Lund, Sweden, in 2020.He is currently working toward Doctoral degree at the Department of Electronics and Nanoengineering, Aalto University, Aalto, Finland.His main research interests are linked to architectural development and programmatic hardware generation methodologies for high-speed digital signal processing applications in next-generation digital-intensive wideband transmitters.Andrei Spelman received the B.Sc. (Tech.) and M.Sc.(Tech.)degrees from Aalto University, Aalto, Finland, in 2019 and 2021, respectively, where he is currently working toward the D.Sc.(Tech.)degree at the Department of Electronics and Nanoengineering.His main research interests include digital-intensive transmitters for next-generation wireless communication techniques and more specifically digital signal processing for wideband signals.

TABLE II PARAMETERS
FOR HALF-BAND INTERPOLATORS