Kevin Chuang, Hossein Yektaii, Noureddine Outaleb, Ahmed Raslan, Shipra Bhal, Peadar Forbes
©SHUTTERSTOCK.COM/PINGINGZ
The ever-increasing demand for global mobile data traffic over time has had an avalanche effect on the deployment of cellular infrastructure as driven by both the rising number of smartphones and subscriptions and an increasing average data volume per subscription. The total mobile data traffic logged 90 exabytes monthly in 2022 and is forecasted to reach 324 exabytes per month in 2028, which equates to a percentage increase of 260% [1]. The data traffic per smartphone is projected to increase more than threefold by 2028, implying that the network infrastructure must evolve proportionally to support it, while at the same time reducing the network operating costs.
In the article published a year ago, radio challenges and design considerations at the architectural level were discussed, comparing and contrasting all of the generations of cellular technology we have enjoyed to date as well as the fundamental enabling technology, i.e., software-defined radio (SDR) [2]. However, today engineers are being challenged to address climate change and environmental sustainability, making it vital for the next generation of wireless communication networks to reflect the most energy-efficient technologies. The telecommunications industry as of now contributes about 1.6% of the total global CO2 emissions, and the total energy consumption is projected to grow by 160% in the period between 2020 and 2030 [3]. While the majority of the power consumption in the network is consumed in the radio access network (RAN), much of that is actually consumed in the radio unit (RU) by the power amplifier (PA). The RAN infrastructure on average consumes more than 73% of the energy used by a communications service provider, as shown in Figure 1; therefore, the cell site represents the low-hanging fruit to target for forward-looking sustainable networks [4]. It is worth noting that while most of the energy consumption is due to active RAN communications, in the absence of any data transmissions, RAN still consumes energy. A study report published by the Next Generation Mobile Networks (NGMN) Alliance shows that 80% of RAN sites carry only approximately 20% of all traffic, which implies that low traffic and idle sites account for a disproportional and potentially significant higher energy consumption [5]. Future networks need to dynamically minimize the energy consumption at different load levels, especially for the low-traffic sites.
Figure 1. Energy consumption breakdown of RAN.
The words “energy” and “power” are often used interchangeably in the context of resource consumption. For integrated circuit (IC) designers, power consumption, measured in watts, is often preferred because it describes the physical behavior and the rate at which energy is consumed and can be calculated directly from the voltage and current measurements. On the other hand, wireless engineers tend to use the energy metric because real-time mobile traffic can be sporadic and vary significantly between day and night. Therefore, the notion of time duration is required to determine the amount of energy consumed over different time intervals and load conditions, measured in kilowatt hours. For this reason, cellular network operators typically try to reduce the energy consumption across the nodes by jointly optimizing power and time metrics. Another important metric often referenced in wireless communications is the energy efficiency of the network or energy efficiency per bit delivered. While energy consumption describes the capacity of doing the work, energy efficiency measures the ratio of the total number of packets received at the destination node to the total energy spent by the system to deliver those packets. The lower the energy efficiency of a wireless communication link, the more power is wasted to transmit the information over the air. Therefore, it is imperative to optimize the energy efficiency of RAN by improving the spectral efficiency of the air interface and reducing the power consumption of the individual nodes, including device access to the air interface as well as the fronthaul, midhaul, and backhaul transport networks.
For the rest of the article, we will discuss the future trends in cellular infrastructure and, more specifically, the emerging frequency bands that give rise to new challenges, the promising PA architectures for high bandwidth and high efficiency operation, and the need for continued innovation in digital radio transceiver systems on chips to overcome these challenges.
As with previous cellular generations, today’s 5G spectrum is divided into multiple frequency bands with the time-division duplexing (TDD) radio becoming the de facto communication method between transmitter (Tx) and receiver (Rx) nodes for the emerging frequency bands. In general, the spectrum is categorized into sub-6-GHz frequency range 1 (FR1) and millimeter-wave (mmWave) frequency range 2 (FR2) for 5G cellular technology. FR2 5G promises impressive data speeds well in excess of several gigabits per second, but it is susceptible to losses because of the line-of-sight obstruction, causing this technology to have difficulty coping with a user’s mobility. On the other hand, sub-6-GHz 5G provides a reliable option for blanket coverage and capacity, forming the essential backbone for improving network speeds and reliability for most consumers.
While the mmWave 5G employs TDD mode, sub-6-GHz base stations historically have adopted the frequency-division duplexing (FDD) mode for RFs between 600 MHz and 2.7 GHz. These FDD radios are mostly configured as four transmit and four receive (4T4R) systems and are required to deliver about 2–4 W of RF power per megahertz of transmit bandwidth, also known as the macrocell remote radio head (RRH). Typically, each chain of a 4T4R single-band radio must deliver 40 W to the antenna load for a total radiated power (TRP) of 160 W. The development trend is to integrate multiple FDD band radios into one dual-band or triple-band radio in the 1.8–2.2-GHz range and sub-1-GHz range, respectively. Combining two or three different bands into one radio results in lower cost of deployment through simplification of the radio site, if these multiband radios are more efficient than or at least as efficient as the existing single-band radios. The most challenging aspect in the development of multiband radios is the wideband RF PA and the necessary linearization system that will satisfy the requirements of the transmit error vector magnitude (EVM), adjacent channel leakage ratio (ACLR), operating band unwanted emissions (OBUE), spectrum emission mask, and power consumption. In addition, because of the nature of combining multiple bands into one common radio, Tx leakage and self-inflicted interference should be addressed to avoid desensitizing the Rx performance. For instance, a 4T4R dual-band radio covering the Third Generation Partnership Project (3GPP) Band 1 and Band 3 with 365 MHz of instantaneous bandwidth (iBW) must deliver a combined average power of 80 W to each antenna load for a TRP of 320 W, while satisfying the EVM requirement of 3% and OBUE of –22 dBm/MHz at the 1.5-MHz offset from the edge of each component carrier. Assuming a 1.5-dB post-PA loss, the RF output power of the four PAs is 450 W. At a high efficiency of 45%, the array of PAs would consume 1 kW of power and turn 550 W of that into heat, which must be dissipated by a large heatsink, increasing the size, weight, and cost of the radio head.
The information capacity of a wireless channel is linearly proportional to the available channel bandwidth and number of multiplexed layers and logarithmically proportional to the signal-to-noise ratio. The sub-3-GHz licensed spectrum does not provide sufficient capacity to satisfy the insatiable demand for mobile data traffic. For frequency bands above 3 GHz, TDD has been unanimously adopted for lower cost and better suitability for multiple-input, multiple-output (MIMO) beamforming technologies. That said, the largest sub-6-GHz licensed spectrum allocated by the Federal Communications Commission in the United States to date is known as the C-band, sitting on the midband spectrum of cellular broadband network frequencies between 3.7 and 3.98 GHz. Beamforming technology will become even more instrumental in the higher end of the FR1 spectrum, thanks to massive MIMO radios. To that end, the 3GPP Technical Specification 38.104 released in the second quarter of 2022 has introduced a new frequency band designation, n104, between 6,425 and 7,125 MHz [6], which is expected to be adopted soon in countries and regions across Europe and China. This is part of the larger n96 band between 5,925 and 7,125 MHz that is allocated for shared spectrum channel access in the United States. Furthermore, for base stations operating in band n104, the ACLR requirement has been relaxed to –38 dBc as opposed to –45 dBc in the 4G standards. This gives vendors an opportunity to be creative in bridging the physical and digital worlds using extremely massive MIMO RUs. Presently, 5G networks deploy RUs between 3 and 4 GHz in the 32T32R or 64T64R form factor for a total TRP of 160 W or 320 W, depending on the occupied signal bandwidth. This translates to between 2.5 and 10 W of conducted power at each antenna port. From the sustainability perspective, it is believed that the massive MIMO RUs in the n104 band will be initially deployed at the same sites as the 3–4-GHz radios. Given the higher propagation losses in the n104 band, the antenna aperture and TRP must be maintained to achieve coverage similar to that of the 3–4 GHz band. Since the operating frequency doubles and the wavelength is halved, the New Radio (NR) architecture will require four times as many antenna elements as what we have today, resulting in a much lower output power requirement for each PA. Figure 2 summarizes the trend in spectrum as well as the number of antenna elements expected for the emerging spectrum opportunities. This requires continued innovation in fully integrated RU application-specific ICs (ASICs) for a higher level of performance and power efficiency relative to using more power-hungry field-programmable gate array devices in the RU.
Figure 2. Trends in spectrum. DL: downlink.
Another interesting development in regulatory circles is the discussion on allocating new frequency bands between 10 and 15 GHz for 5G Advanced and 6G. These bands are mainly used for satellite communications and happen to fall between FR1 and FR2 frequencies. The signal propagation characteristics in these so-called “golden” bands are not as good as those of FR1, but more desirable than FR2’s, with the promise of providing a larger coverage area and lower overall cost relative to today’s FR2 mmWave base stations. These bands also offer the potential of wider available bandwidths in the range of 400 MHz, allowing for an increase in data rates and capacity. The radio architecture for these bands has its own challenges. Because of the shorter wavelength, the traditional massive MIMO implementations become more difficult because the RF front end (RFFE), consisting of antenna filter, PA, circulator, and low-noise amplifier, for 128 or more transmit and receive chains has to fit within the dimensions dictated by the smaller antenna array grid. The lower PA efficiency in these bands also implies a higher power dissipation density, which makes cooling the RU more challenging. However, if successfully implemented and deployed, the higher capacity achieved with such radios has the potential to lower the energy per bit of transferred data. Obviously, the idle mode power-saving features under low traffic become more important in guaranteeing a power-efficient network implementation.
Another angle to look at in the importance of sustainability in the telecommunications industry is to see how easily we could consume more than 80% of the world’s total electricity supply today if we were using the cellular network from a decade ago based on the 4G technology to carry the traffic today. Obviously, we use much less than 10% of the world’s total electricity supply today because the 5G network is an order of magnitude more efficient in energy consumption. Figure 3 illustrates the evolution of various cellular technologies, including 3G, 4G, and 5G, in terms of radio architecture and performance metrics. For the sake of contrasting the trend in power consumption, the energy consumption per bit in the graph is normalized to the 32T32R massive MIMO RU with a single 100-MHz carrier and 160 W of TRP. Hence, the smaller the energy per bit number, the more energy efficient the RU. 64T64R massive RUs are twice as efficient when transmitting in peak capacity mode compared to 32T32R radios assuming the same TRP requirement. It is also expected that future cellular networks in the n104 band will deliver considerably higher throughput at a much lower number of joules per bit. Relative to today’s 5G 32T32R MIMO systems, the peak capacity of larger massive MIMO systems is expected to improve by a factor of 32, while the RU power consumption only increases by about 7×, which results in a 4.5× improvement in power efficiency. However, in real-world scenarios, RUs rarely operate at peak capability, making it essential to include power-saving features that scale with the traffic load. To reach carbon neutrality, we need to focus on increasing the cellular network efficiency by decarbonizing the core technologies. This will be discussed in the subsequent sections.
Figure 3. Trends in power consumption and energy efficiency per bit.
The basic function of a PA appears to be very simple in concept, but the optimum design of getting the device to be presented with a power match on the output to extract the maximum power from it is a scientific art itself. An ideal PA should convert the dc power to the RF output power, amplify the input signal linearly, and exhibit no memory effect, meaning its output does not depend on past or future information. In practice, the closest form of an ideal PA in cellular applications that ever existed is known as the class-AB amplifier, which is only efficient at the saturated power level. To maintain high efficiency at a large power backoff (PBO), other PA architectures are required, such as linear amplification using nonlinear components and load-modulated PAs [7], [8]. Among the load modulation techniques, the Doherty PA (DPA) is commonly employed in 5G base stations because of its lower overhead and relatively lower cost. The DPA, as depicted in Figure 4, operates efficiently at PBO as the main stage remains saturated, thanks to the load modulation performed by the auxiliary stage through the output quarter-wave transformer. The other role of the auxiliary stage is to provide the additional power needed during the peak envelope swing to enhance the overall PA linearity. Because 5G NR radios use orthogonal frequency-division multiplexing, the resulting peak-to-average power ratio (PAPR) of the waveform typically exceeds 8 dB after the crest factor reduction algorithm. The conventional Doherty architecture, which has a theoretical maximum efficiency at 6 dB, does not deliver the required performance at higher PBO. The asymmetrical inverted Doherty amplifier is commonly used today as it maintains a high efficiency of 50% at 8 dB PBO over a wide bandwidth of 200 MHz. In fact, the auxiliary device in the asymmetrical Doherty architecture, as depicted in Figure 5, is sized larger than the main device (ideally a 1.8:1 ratio) to allow for the peak efficiency to occur at a larger PBO of 8 dB. On the same figure, one can note that the output combiner (i.e., the impedance inverter) is moved to the output of the auxiliary stage to further enhance the DPA bandwidth; hence the name asymmetrical inverted DPA. Unfortunately, for bandwidths larger than 200 MHz at a power level greater than 40 W, the DPA architecture finds its efficiency compromised because of the inherent bandwidth limitation due to the phase linearity characteristics of the transmission lines used in the matching networks and the Doherty combiner.
Figure 4. Conventional DPA.
Figure 5. Asymmetrical inverted DPA.
Recently, the load-modulated balanced amplifier (LMBA) architecture, as shown in Figure 6, has been proposed to enhance the bandwidth while maintaining high efficiency at a much larger PBO than the DPA [9]. The conventional LMBA utilizes a class-AB PA as an auxiliary stage to actively modulate the load seen by the two main balanced amplifiers in class-AB mode labeled as Main1 and Main2 in the figure. The load modulation is achieved by varying the amplitude and the phase ratio of the auxiliary amplifier output power relative to the output power of each branch of the balanced PAs. Hence, the power ratio of the auxiliary stage to one of the balanced PAs determines the PBO at which the efficiency peaks, while the phase ratio sets the attainable maximum efficiency. As a result, a high-efficiency PA operation is maintained over a broadband frequency range. In fact, unlike the inverted Doherty architecture, the only bandwidth limitation of the LMBA PA is the bandwidth of the hybrid combiners used in the circuit. The drawback of the conventional LMBA is its low efficiency of around 40% at PBO as the dc power consumption of the auxiliary stage significantly impacts the overall PA efficiency.
Figure 6. Conventional LMBA.
To overcome the shortcoming of efficiency seen in the conventional LMBA, several variants of LMBA have been reported [10], [11]. The pseudo-Doherty LMBA (PD-LMBA), as shown in Figure 7, reverses the role of the stages in the conventional LMBA, where the auxiliary stage becomes the main stage, while the balanced amplifiers become the auxiliary stage. The total PA efficiency is further improved by biasing the auxiliary stage in class-C operation. The resulting architecture behaves like a Doherty amplifier with the advantage of offering much higher bandwidths, hence the name “pseudo-Doherty.” In fact, this architecture has two regimes (low power and high power) of operation similar to the Doherty design to achieve maximum efficiency at a wider range of power levels. In the low-power regime, the auxiliary stage biased in class-C operation is completely in the off state, whereas the main stage delivers the power to the load until it reaches the design PBO at which the device saturates. The high-power regime starts at the design PBO, where the auxiliary stage contributes to the power transfer to the load until the entire LMBA design saturates for the maximum possible power efficiency. One drawback of the PD-LMBA configuration is that the main stage remains saturated from PBO until the PA reaches saturation, implying a harder nonlinearity.
Figure 7. PD-LMBA.
Figure 8 compares the efficiency characteristic of four PA architectures, namely the class-AB PA, DPA, asymmetrical DPA, and PD-LMBA. While the class-AB amplifier loses power efficiency rapidly as the output power level decreases, the conventional DPA, even based on symmetrical device sizing between the main and auxiliary stages, resists compromising its power efficiency up to about 6 dB PBO. The asymmetrical DPA maintains its efficiency high up to a PBO of 8 dB. By contrast, the PD-LMBA architecture offers higher efficiency PA operation at a significantly wider range of PBO (i.e., up to 10–11 dB).
Figure 8. Comparison of power efficiency versus output PBO for different PA architectures.
In addition to power efficiency characteristics, the fractional bandwidth, defined as the ratio of the absolute iBW to the RF center frequency, and iBW are also important metrics to validate. To preserve the bandwidth of nonlinear operation, the power devices must be broadband while presenting low Q matching load impedances, i.e., almost real impedances. In our prototypes, the Wolfspeed CGH40010F 10-W GaN transistor has been selected as it demonstrates power load impedances close to 25 X over the desired bandwidth, which avoids the need of an external output matching network.
Figures 9 and 10 compare the performance of the asymmetrical inverted DPA and the PD-LMBA, designed to operate in the frequency band between 1.9 and 2.1 GHz using the same power devices. When comparing the power-added efficiency (PAE) variation over frequencies, one can notice that, in the case of the inverted DPA, the variation is much larger. On the other hand, the PD-LMBA presents better and more consistent wideband characteristics over the 200 MHz of operation and can operate up to 400 MHz of iBW. The PAE and gain profile for the inverted DPA at 2.1 GHz are notably worse than those at lower RF frequencies, reflecting the inherent bandwidth limitation problem.
Figure 9. Simulation and analysis of two-way asymmetrical inverted DPA.
Figure 10. Simulation and analysis of PD-LMBA.
Both the DPA and PD-LMBA are cost-effective designs, requiring only a single input connection in the actual system. However, this suggests that the delay between the paths will be optimum for a limited range of RF frequencies, typically around the center of the iBW. In simulation, at a center frequency of 2,000 MHz, the optimum offset line length was found to be close to around 234 ° electrical length. Using this offset line electrical length, the PD-LMBA design has been simulated over the desired band and compared to different electrical lengths, as shown in Figure 11. Increasing the offset line length enhanced the operating bandwidth of the PA up to 32%. Finally, optimizing the phase at each frequency point results in an enhanced PD-LMBA bandwidth up to 50%. This would be a compelling advantage in performance to motivate the adoption of splitting the main and auxiliary paths to compensate for the phase variation.
Figure 11. Simulation and analysis of MISO PD-LMBA over variable phase offset between main and balanced auxiliary stages. MISO: multiple-input, single-output.
For larger bandwidths and higher power operations, it is necessary to split the input into two paths to digitally phase-align the individual paths over power. Figure 12 presents the multiple-input, single-output (MISO) PD-LMBA architecture, where the offset line length at the input of the main stage can be optimized for each RF frequency point over the frequencies between 1,500 and 2,500 MHz.
Figure 12. MISO PD-LMBA.
With the MISO PD-LMBA as shown in Figure 12 and using the optimum phase found between the main and auxiliary balanced PAs, the RF performance can be extended up to 1-GHz iBW, as depicted in Figure 13. The PAE for the design varies between 55% and 60%. Both the PAE and gain profiles from 1.5 to 2.5 GHz are still within the acceptable range of performance. However, when looking at the gain response, an abrupt transition at PBO is observed as the auxiliary balanced PAs turn on. This sharp transition at around 8 dB PBO will present more challenges and require a more advanced linearization solution to cope with the highly efficient and nonlinear PD-LMBA PA.
Figure 13. Simulation and analysis of MISO PD-LMBA over 1-GHz bandwidth.
One of the main drivers behind the digital proliferation in wireless SDRs is the adoption of digital front-end (DFE) technology including linearization algorithms. The DFE augments the RF mixed-signal converters for bridging the physical to digital worlds. Figure 14 shows the block diagram of a complete signal chain for FDD radios, where a high level of performance is typically required at high transmit power. The DFE provides the digital signal processing (DSP) ability to softly reconfigure the channels in the base station RF in real time, thus allowing for implementation of various signal processing, conditioning, compensation, and mitigation of channel nonlinear responses in both Tx and Rx. The figure also highlights two sources of nonidealities in this type of radio: passive intermodulation (PIM) distortion and the PA’s nonlinearity.
Figure 14. Block diagram of a complete signal chain for FDD radios. DAC: digital-to-analog converter; ADC: analog-to-digital converter; PIM: passive intermodulation.
While the ACLR linearity requirement of the Tx for the emerging frequency band n104 has been relaxed, the macrocell RRHs operating in the FDD bands below RFs of 2.7 GHz still present many opportunities for innovation and for integrating more advanced DSP capabilities. For instance, PIM distortion in FDD radios can originate in various RF data paths, especially during high TRP transmission (e.g., > 2 W/MHz). Sources of PIM include, but are not limited to, aging antenna systems, rusty bolts, bad connectors, low-quality duplexers, and external components that reflect the transmit signal back to the antenna. Figure 15 describes a system based on 3GPP Band 3, where Tx and Rx are concurrently processed at the same time. As is usually the case with FDD systems, the Rx band is located near the Tx band, such that low-order PIM distortion products can fall onto the receive band, degrading the intended Rx signal. From the operator perspective, not only does this artifact translate to lower uplink (UL) and downlink (DL) data rates, a higher volume of dropped calls, and degraded voice quality, it is also very difficult to troubleshoot the root cause of such issues [12]. In general, PIM is becoming a growing pain for future multiband radio systems because of the advances in broadband PA and SDR technologies. In fact, PIM in the past has only impacted around 5% of high-traffic areas; however, as more bands are deployed on a common antenna today, we expect PIM to affect up to 70% of high-traffic FDD systems. A study has also shown that a drop of 1 dB in UL sensitivity due to PIM can reduce wireless coverage by up to 11% in a macro network [13].
Figure 15. Example of low-order PIM distortions for 3GPP Band 3 between DL and UL channels.
PIM solutions today are mostly addressed by using collision-mitigation techniques. For example, PIM problems can be addressed by frequency planning to avoid intermodulation distortion (IMD) of the Tx signal falling on the desired Rx band or by reducing the transmit and receive bandwidth such that the impact of the PIM is negligible. For more complex multiband scenarios, where frequency planning is not plausible, separation of antennas for different bands or the use of higher quality microwave components is required. In any case, either solution will reduce the revenue for the operators and increase the overall system cost. A better solution to this challenge is an integrated PIM compensator (PIMC) in the SDR, which would reduce the system bill of materials cost without compromising the service quality of the operators. Real-time cancellation in the receive signal path can be employed to compensate for the PIM source originating from the microwave duplexer in a no-fault antenna system. Since the PIM observed in the receive path is the result of Tx IMD products that leak into the Rx band, recreation of the PIM using the original Tx signal is possible, which consequently can be subtracted from the Rx baseband data, once it is time and frequency aligned with it. In terms of architectural considerations, PIM cancellation can be applied at several locations in the data path, including the individual component carrier level, 3GPP band level, and full sampling rate level; each comes with different tradeoffs among performance, hardware complexity, and power consumption. Another advantage of the integrated PIMC is an automatic detection loop. Since cross-correlation of the captured Tx and Rx data will be performed, theoretically it is possible to detect any undesired PIM sources and inform the mobile operators immediately for mitigation in the field.
One of the most critical stages in DFE is arguably the digital predistortion (DPD) system. A DPD system generally models the nonlinear PA output as a finite sum of multidimensional convolutional integrals, known as the complex equivalent baseband Volterra-based model [14]. The complex baseband model aims to collect the terms that fall in the first output zone around the fundamental carrier frequency, and several variants of the behavioral models have been comparatively analyzed [15], [16]. Shown in Figure 16 is the most primitive and yet an effective form of implementation, where each basis function is parameterized by a complex parameter, ${\alpha},$ and derived by calculating the scalar product of the delayed complex input sample, $u,$ and its phase-invariant nonlinear transformation denoted by ${NL}{:}{\mathbb{C}}\rightarrow{\mathbb{R}}{.}$ Typically, the NL operator implements an absolute value function that maps the complex input sample to a real-valued output sample or is known as the instantaneous magnitude. The input–output relationship of Figure 16 can be expressed as \[{v}\left[{t}\right] = \mathop{\sum}\limits_{{k} = {1}}^{K}\mathop{\sum}\limits_{{m} = {0}}^{M}{\alpha}_{k,m}\cdot{u}{[}{t} + {\tau}_{m}{]}\cdot{\left|{{u}{[}{t} + {\tau}_{m}{]}}\right|}^{{k}{-}{1}}\]
Figure 16. Direct-form memory polynomial implementation.
where $v$ is the complex output sample, and $K$ is the degree of nonlinearity. In the literature, this is widely known as the direct-form memory polynomial model or the Hammerstein model [17], which describes a combination of Taylor series nonlinear functions and linear filters, targeting the diagonal terms. The direct-form polynomial implementation reduces the complexity of the Volterra series greatly, leaving only the dominant distortion kernels, and often primary degrees ${K}\in{[}{1}{],}{[}{2}{],}{[}{3}{],}{[}{4}{],}{[}{5}{]}$ are sufficient to achieve good performance for low-power wireless applications, such as small-cell Txs and user equipment Txs. As illustrated in Figure 16, there are two aspects to consider within the design space to leverage machine learning techniques to compensate for nonlinear RF analog circuits impairments: model signature extraction and model parameter extraction. Signature extraction aims to identify the feature of a model that is robust under various operating conditions and dynamic traffic, whereas parameter extraction assumes no a priori knowledge about the form of the ground truth function to be estimated. These two steps are critical to completing the design of an adaptive closed-loop DPD system.
As higher power and broader transmission bandwidth are desired, a more capable model is required. Figure 17 further generalizes the Volterra-based model and augments the modeling capability by extending the memory depth spanning from past to future samples and incorporating gain functions, H, that can transform input samples of either complex or real domain to higher order nonlinear dynamics subject to the choice of the preceding NL operation by performing the output scalar product of two samples. The signals partitioning block routes the complex samples from the input delay line to the subsequent NL operators. In its simplest form, ${NL}{:}{\mathbb{C}}\rightarrow{\mathbb{R}}$ derives the real-valued instantaneous envelope of a complex input sample, and $\varphi{:}{\mathbb{R}}\rightarrow{\mathbb{C}}$ implements the delay operation and applies a nonlinear transformation using a piecewise linear (PWL) lookup table (LUT) [18], [19]. This can also be interpreted as the LUT-based generalized memory polynomial (GMP) model, as given by \begin{align*}{v}\left[{t}\right] & = \mathop{\sum}\limits_{{m} = {0}}^{M}{u}\left[{{t} + {\tau}_{m}}\right]\cdot\varphi\left({\left|{{u}\left[{{t} + {\tau}_{m}}\right]}\right|,{K},{\alpha}_{m,k}}\right) \\ & \quad + \mathop{\sum}\limits_{{m} = {0}}^{M}\mathop{\sum}\limits_{{q} = {0}}^{Q}{u}\left[{{t} + {\tau}_{m}}\right]\cdot\varphi\left({\left|{{u}\left[{{t} + {\tau}_{q}}\right]}\right|,{K},{\alpha}_{m,q,k}}\right)\end{align*}
Figure 17. Direct synthesis of Volterra-based models.
where now the nonlinear gain function ${H}{:}{C}\rightarrow{C}$ is the scalar product of the two samples from u and operator $\varphi,$ and K specifies the total number of basis functions used to optimize each LUT as parameterized by complex parameters ${\alpha}$ (i.e., decision variables). Since the model incorporates both leading and lagging envelope terms, it can also describe a nonlinear dynamical behavior whose output signal at a certain time instant may depend on both the future and history of the input signal. Given the flexible LUT implementation, there are several options to derive the nonlinear basis functions, including global bases, such as orthogonal polynomials, local bases, such as cubic splines, and any arbitrary PWL functions that can be optimized efficiently by the on-chip microprocessor [15], [16]. Figure 18 shows an ensemble of five optimized $\varphi$ gain functions comparing direct-form polynomials and spline local bases, respectively. A 9-dB lower model fitting error is observed when using the latter basis function style. As a result, directly optimizing the LUT enhances the modeling capability of the already highly parameterized baseband model, which makes this implementation effective for medium-power wireless applications, such as massive MIMO Txs.
Figure 18. Optimized nonlinear gain function $\mathbf{\varphi}{.}$
Since the block diagram in Figure 17 is intended to generalize most of the prevalent Volterra-based models, in some instances, ${NL}{:}{\mathbb{C}}\rightarrow{\mathbb{C}}$ can also describe models comprising dynamic deviation reduction (DDR) terms [20], [21], such as ${u}^{2}{D}_{n}\bar{u}$ and ${\bar{uD}}_{n}{u}^{2},$ where ${D}_{n}$ denotes the time-shift operation by $n$ samples, as shown by \begin{align*}{v}\left[{t}\right] & = \mathop{\sum}\limits_{{m} = {0}}^{M}\mathop{\sum}\limits_{{q} = {0}}^{Q}{u}^{2}\left[{t}\right]\cdot\bar{u}\left[{{t} + {\tau}_{m}}\right]\cdot\varphi\left({\left|{{u}\left[{{t} + {\tau}_{q}}\right]}\right|,{K},{\alpha}_{m,q,k}}\right) \\ & \quad + \mathop{\sum}\limits_{{m} = {0}}^{M}\mathop{\sum}\limits_{{q} = {0}}^{Q}\bar{u}\left[{t}\right]\cdot{u}^{2}\left[{{t} + {\tau}_{m}}\right]\cdot\varphi\left({\left|{{u}\left[{{t} + {\tau}_{q}}\right]}\right|,{K},{\alpha}_{m,q,k}}\right){.}\end{align*}
The additional DDR terms transform the complex input u to higher odd-order nonlinear dynamics that cannot be attained by the GMP expression and further augment the highly parameterizable model, making it quite useful to target medium-to-high-power wireless applications, including macrocell Txs.
The system of identification method used to extract the parameters of the model architecture, as described in Figures 16 and 17, typically follows linear regression techniques, where the linearity of the regression model will not be affected by the form of independent variables, as given by \[{F}{(}{u}{)} = \mathop{\sum}\limits_{j}^{J}\mathop{\sum}\limits_{i}^{I}{\alpha}_{j,i}{\beta}_{j,i}{(}{u}{),}\] where the model $F$ is decomposed into $J$ basis functions, each consisting of $I$ localized functions for a total of ${K} = {J}\cdot{I}$ features, and ${\beta}$ denotes the nonlinear transformation of u, such as the square operation. Since the parameters appear in linear form, the least squares (LS) optimization can be done by minimizing the sum of the squared errors with respect to the parameters plus some form of regularization, written as \begin{align*}\mathop{\sum}\limits_{{n} = {1}}^{N}&{\left|{{v}_{n}{-}\mathop{\sum}\limits_{{k} = {1}}^{K}{\alpha}_{k}{\beta}_{k}{(}{u}_{n}{)}}\right|}^{2} + {\lambda}{L}\left({\mathbf{\alpha}}\right){\text{or}} \\ & {\text{argmin}}_{\mathbf{\alpha}}\left({{\Vert}{{\mathbf{v}}{-}\mathbf{\beta}\mathbf{\alpha}}{\Vert}_{2}^{2} + {\lambda}{L}\left({\mathbf{\alpha}}\right)}\right)\end{align*} where $v$ and $u$ are the training samples, ${\lambda}$ is the nonnegative regularization parameter, L is a function that generally penalizes larger parameter values, $\mathbf{\beta}$ contains the N × K basis matrix, $\mathbf{\alpha}$ is the K × 1 parameter vector, and v is the N × 1 output data. ${L}\left({\mathbf{\alpha}}\right) = {\Sigma}_{{k} = {1}}^{K}{\left|{{\alpha}_{k}}\right|}^{2}$ is known as the Ridge regression, which exploits the Tikhonov ${L}_{2}$ regularization [22]. The previous quadratic cost function has an analytically closed-form solution as $\tilde{\alpha} = {(}{\mathbf{\beta}}^{H}\mathbf{\beta} + {\lambda}{I}{)}^{{-}{1}}{\mathbf{\beta}}^{H}{v}$ in block matrix form, where ${R} = {\mathbf{\beta}}^{H}\mathbf{\beta}$ denotes the autocorrelation matrix (also known as the basis matrix), and ${\mathbf{p}} = {\mathbf{\beta}}^{H}{v}$ denotes the cross-correlation vector. This method achieves a faster rate of convergence at the expense of an asymptotically computational cost of ${\mathcal{O}}\left({{nk}^{2}}\right){.}$ However, because the quantities of the autocorrelation matrix and cross-correlation vector cannot be calculated exactly beforehand because of changing traffic and nonstationary continuous-time behavior, the extraction process needs to be repeated as frequently as needed. In practice, performing recursive determination of model parameters without directly inverting the basis matrix is a more attractive method and is hardware friendly for the embedded systems that have limited memory space and computational resources. In the derivation of the recursive LS (RLS) algorithm, the input signals are first considered deterministic as opposed to being stochastic, given by \[{R}_{t} = {R}_{{t}{-}{1}} + {\beta}_{t}^{H}{\beta}_{t}\,{\text{and}}\,{p}_{t} = {p}_{{t}{-}{1}} + {\beta}_{t}^{H}{v}_{t}\] where $t$ denotes the current time of observation using the newly acquired data, and ${t}{-}{1}$ denotes the old value from the previous calculation. The RLS technique avoids the ${\mathbf{R}}_{t}$ matrix inversion by exploiting the Sherman–Morrison–Woodbury formula [23], as expressed by \[{A}^{{-}{1}} = {B}{-}{BC}\left({D} + {C}^{H}{BC}\right)^{{-}{1}}{C}^{H}{B}\] where A , B , C , and D are positive definite matrices, ${A} = {B}^{{-}{1}} + {CD}^{{-}{1}}{C}^{H}$ is substituted by the basis matrix ${R}_{t},\,{B}^{{-}{1}} = {R}_{{t}{-}{1}},\,{C} = {\mathbf{\beta}}_{t}^{H}$, and D is the appropriately sized identity matrix. When dealing with estimation of time-varying parameters, the ordinary RLS can be further modified to allow for exponential data weighting by introducing a forgetting factor in the error function ${\Sigma}_{{i} = {1}}^{t}{\lambda}^{{t}{-}{i}}{\left|{{e}_{i}}\right|}^{2}$ as opposed to just calculating the sum of squared errors, where $e$ is the error sample, $t$ is the time instant, and ${\lambda}{\in}$ (01] is the forgetting factor. In essence, the exponential weighting factor is used to forget data samples in the distant past and should be applied accordingly. For example, ${\lambda}$ can be chosen large and small for slowly and rapidly changing processes to help the DPD actuator cope with different traffic patterns and dynamics. A small forgetting factor reduces the influence of old data, and, as a result, a better tracking capability is expected at the cost of a higher variance of actuator parameters.
Another lower cost alternative is to use an iterative learning method, such as the least mean-square (LMS) algorithm with ${\mathcal{O}}\left({nk}\right)$ complexity. Using a method of gradient descents, the recursive relation for the update rule is simply ${\mathbf{\alpha}}_{{i} + {1}} = {\mathbf{\alpha}}_{i} + {\mu}{\mathbf{\beta}}^{H}{e},$ where ${e} = {v}{-}\mathbf{\beta}\mathbf{\alpha}$ is the error sample to minimize, $i$ denotes the iteration number, and ${\mu}$ is the step size, which controls the stability and rate of convergence. Depending on the system, the update rule is implementation friendly and can be applied on a sample-by-sample basis or in block form. To demonstrate the transient behavior of the adaptive algorithms, a LUT-based GMP model is initialized. In each iteration, the training is done by taking a random interval of 1,000 samples to emulate the rapidly changing nature of the real-time mobile data traffic. Once trained, the parameters are transferred to the DPD actuator for RF performance validation on the entire waveform. The convergence of the LS algorithm is typically attained in about two iterations, while it could take the LMS algorithm many more iterations to converge with the same number of training samples and update frequency if the step size ${\mu}$ is not properly chosen even under nominal testing conditions, as illustrated in Figure 19. Depending on the block size, we may not want to scan through the entire training set before taking a single step of update. In fact, the convergence speed of the stochastic gradient descent algorithm benefits the most by updating the parameters on the fly, which means that applying the update rule on a sample-by-sample basis as opposed to in block form may result in higher performance.
Figure 19. Trends in systems of identification.
The linearization models and techniques discussed thus far may not provide a sufficient margin to overcome the performance variations due to part-to-part manufacturing, environmental changes, and the need to operate the PA more efficiently. To achieve a higher level of performance, Figure 20 demonstrates yet another efficient DPD implementation inspired by the recent advancement in deep learning neural networks. Models as illustrated in Figures 16 and 17 can be configured in a network of cascaded models to create extra nonlinear dynamics that cannot be attained easily with either stage alone. When trained properly, our study has consistently shown 3–6-dB improvements of linearity for different test cases under similar hardware constraints, which are promising signs to compensate for highly nonlinear PA systems. Because the equations are no longer linear in parameters, the system of identification techniques discussed previously is not suitable for multistage networks, as shown in Figure 20. Therefore, the compositional optimization becomes ${\text{argmin}}_{{\boldsymbol{\alpha}}_{k}}\left({F}_{k}\left({\boldsymbol{\alpha}}_{k},\ldots{F}_{1}\left({\boldsymbol{\alpha}}_{1},{\boldsymbol{\beta}}_{1}\left({\boldsymbol{u}}\right)\right)\right) + {\lambda}{L}\left({\boldsymbol{\alpha}}_{k}\right)\right)$, where ${\mathbf{\alpha}}_{k}$ denotes the parameters in the kth stage of a cascaded structure. Such a nonconvex optimization problem is better solved using backpropagation and gradient descent algorithms, as illustrated in Figure 21. As with any optimization procedure, we first formulate the problem by defining an objective function. This objective function is also known as the error function and is used to predict the output sample in the feedforward direction, as indicated by the green arrow. The error function is then minimized by computing its partial derivatives with respect to the parameters starting backward from the Kth output stage to the ${K}{-}{1} {-} {th}$ stage and eventually backpropagating to the first stage using the chain rule [23]. Once the derivatives are obtained, we apply the gradient descent update rule to adjust the parameters by taking steps proportionally to the negative of the gradients, as highlighted by the blue arrow, and this completes one epoch of the inner for-loop iteration. Depending on the batch size chosen to work through the backpropagation step and the hardware constraints, we typically use a batch size larger than one. The outer for-loop of the artificial intelligence (AI)-infused workflow is used to update the DPD actuator parameters. We have observed the actuator reach its optimality in hundreds of epochs. Although this may seem straightforward computationally, the development trend is that SDR should have the appropriate ASIC architecture to allow for hardware acceleration to implement the computations in the most energy-efficient manner. Embracing the adoption of machine learning techniques to solve a nonconvex DPD optimization problem would be one key enabling technology.
Figure 20. Multistage network.
Figure 21. AI-infused system level design workflow and methodology. AI: artificial intelligence.
Finally, we evaluated and examined the capability of a Volterra-based linear-regression model and a multistage actuator using a commercially available GaN DPA as the device under test (DUT), operating at an average output power level of 42 dBm at 3.5 GHz for outdoor massive MIMO use cases. The input stimulus consists of two 100-MHz 5G NR component carriers, and its PAPR has been preprocessed to 8 dB. Since the DUT has a peak power level of 50 dBm, the chosen operation puts the evaluation exactly right at the saturation level of the device. The Volterra-based model consisting of GMP/DDR terms has 800 parameters, whereas the multistage actuator only contains GMP terms for a total of 600 parameters. Figure 22 demonstrates the highest level of linearization performance to the best of the authors’ knowledge using the multistage actuator, featuring a linearized ACLR of –53 dBc and OBUE of –31.3 dBm/MHz for a 200-MHz signal bandwidth and a drain efficiency of 57%. The entire RF lineup, consisting of a driver amplifier and a final power stage providing more than 30 dB of amplification, measures a power efficiency of 51.7% at 8 dB PBO. This allows for flexible input interfaces to the commercial silicon-based radios. For the same test condition, the Volterra-based counterpart achieves an ACLR and OBUE of –47.5 dBc and –27.8 dBm/MHz, respectively. While the Volterra-based model delivered sufficient performance to meet the minimum 3GPP linearity requirement, it is not enough for production and RU deployment at scale because of the low margin to specifications. Comparing the achieved 51.7% PA power efficiency of the multistage model to a 45% PA power efficiency, which is typically delivered today, this performance delta translates to 237 W of power savings for each unit of the massive MIMO RU. With 6.5 million sites in the world today, and assuming three sector base stations each, the power savings relative to just the RU alone exceeds 4.62 GW of power consumption. Furthermore, the reported performance in this article is benchmarked against several recent publications from academia and industry, as summarized in Table 1. Compared to prior methods, the multistage modeling approach in conjunction with the proper system of identification method based on machine learning techniques delivers the highest linearized power efficiency and linearity performance to date.
Figure 22. The linearized output spectrum of two 100-MHz 5G NR carriers using the multistage network, as described in Figure 20, and the design methodology, as described in Figure 21, respectively.
Table 1. Comparison of linearized PA systems to date.
The telecommunications business sector is a notable source of greenhouse gas emissions, and with the demand for mobile data continuing to rise at an exponential rate, the energy efficiency of next-generation cellular networks must improve significantly to meet global sustainability targets. This article presents the trends in spectrum, energy efficiency, and opportunities for technological improvements that will improve network energy efficiency. The PA dominates the energy consumption in the RU, and novel PA architectures are explored that can improve energy efficiency and the ability to service multiple bands. Linearization of the PA is equally important, and the use of advanced techniques is shown in this article to set a new high watermark in energy efficiency and linearity. Finally, RUs rarely operate at peak capacity, making it essential to include power-saving features that modulate energy consumption in concert with traffic demand. Innovation in dynamic power scaling, PA architecture, and linearization solutions will be instrumental in underpinning the sustainable cellular network of our future.
[1] “Ericsson mobility report,” Ericsson, Stockholm, Sweden, 2022. Accessed: May 1, 2023. [Online] . Available: https://www.ericsson.com/4ae28d/assets/local/reports-papers/mobility-report/documents/2022/ericsson-mobility-report-november-2022.pdf
[2] K. Chuang et al., “Radio challenges, architectures, and design considerations for wireless infrastructure: Creating the core technologies that connect people around the world,” IEEE Microw. Mag., vol. 23, no. 12, pp. 42–59, Dec. 2022, doi: 10.1109/MMM.2022.3203925.
[3] ABI Research. “Environmentally sustainable 5G deployment: Energy consumption analysis and best practices.” Interdigital. Accessed: May 1, 2023. [Online] . Available: https://www.interdigital.com/white_papers/environmentally-sustainable-5g-deployment
[4] “A blueprint for green networks,” GSMA Assoc., London, U.K., 2022. Accessed: May 1, 2023. [Online] . Available: https://data.gsmaintelligence.com/research/research/research-2022/a-blueprint-for-green-networks
[5] “A deliverable by the NGMN Alliance further study on critical C-RAN technologies,” NGMN Alliance, Frankfurt, Germany, 2015. Accessed: May 1, 2023. [Online] . Available: https://ngmn.org/wp-content/uploads/NGMN_RANEV_D2_Further_Study_on_Critical_C-RAN_Technologes_v1.0.pdf
[6] “NR; Base station (BS) radio transmission and reception,” 3rd Generation Partnership Project (3GPP), Sophia Antipolis, France, Tech. Specification Group Radio Access Netw., Release 17, 3GPP TS 38.104 v17.6.0, Jun. 2022.
[7] K.-Y. Jheng et al., “Multilevel LINC system designs for power efficiency enhancement of transmitters,” IEEE J. Sel. Topics Signal Process., vol. 3, no. 3, pp. 523–532, Jun. 2009, doi: 10.1109/JSTSP.2009.2020949.
[8] G. Ahn et al., “Design of a high-efficiency and high-power inverted Doherty amplifier,” IEEE Trans. Microw. Theory Techn., vol. 55, no. 6, pp. 1105–1111, Jun. 2007, doi: 10.1109/TMTT.2007.896807.
[9] D. J. Shepphard et al., “An efficient broadband reconfigurable power amplifier using active load modulation,” IEEE Microw. Wireless Compon. Lett., vol. 26, no. 6, pp. 443–445, Jun. 2016, doi: 10.1109/LMWC.2016.2559503.
[10] Y. Cao et al., “Pseudo-Doherty load-modulated balanced amplifier with wide bandwidth and extended power back-off range,” IEEE Trans. Microw. Theory Techn., vol. 68, no. 7, pp. 3172–3183, Jul. 2022, doi: 10.1109/TMTT.2020.2983925.
[11] J. Sun et al., “Broadband three-stage pseudoload modulated balanced amplifier with power back-off efficiency enhancement,” IEEE Trans. Microw. Theory Techn., vol. 70, no. 5, pp. 2710–2722, May 2022, doi: 10.1109/TMTT.2022.3162867.
[12] F. Kearney and S. Chen. “Passive intermodulation (PIM) effects in base stations: Understanding the challenges and solutions.” Analog Dialogue. Accessed: May 1, 2023. [Online] . Available: https://www.analog.com/en/analog-dialogue/articles/passive-intermodulation-effects-in-base-stations-understanding-the-challenges-and-solutions.html
[13] R. Forum, “Reducing PIM on cell site towers in 5G era,” RCR Wireless News, Jul. 2020. Accessed: May 1, 2023. [Online] . Available: https://www.rcrwireless.com/20200724/wireless/
[14] V. Mathews and G. Sicuranza, Polynomial Signal Processing. Hoboken, NJ, USA: Wiley, 2000.
[15] K. Chuang, “A perspective on linearization and digital pre-distortion for wireless radio systems,” in Proc. IEEE Topical Conf. RF/Microw. Power Amplifier Radio Wireless Appl. (PAWR), San Antonio, TX, USA, Jan. 2020, pp. 38–41, doi: 10.1109/PAWR46754.2020.9036000.
[16] K. Chuang, “Comparative analysis of behavioral modeling for wireless radio systems,” in Proc. IEEE Radio Wireless Symp. (RWS), Las Vegas, NV, USA, Jan. 2022, pp. 167–170, doi: 10.1109/RWS53089.2022.9719921.
[17] D. Morgan et al., “A generalized memory polynomial model for digital predistortion of RF power amplifiers,” IEEE Trans. Signal Process., vol. 54, no. 10, pp. 3852–3860, Oct. 2006, doi: 10.1109/TSP.2006.879264.
[18] A. Molina et al., “Digital predistortion using lookup tables with linear interpolation and extrapolation: Direct least squares coefficient adaptation,” IEEE Trans. Microw. Theory Techn., vol. 65, no. 3, pp. 980–987, Mar. 2017, doi: 10.1109/TMTT.2016.2627562.
[19] P. Gilabert et al., “Multi-lookup table FPGA implementation of an adaptive digital predistorter for linearizing RF power amplifiers with memory effects,” IEEE Trans. Microw. Theory Techn., vol. 56, no. 2, pp. 372–384, Feb. 2008, doi: 10.1109/TMTT.2007.913369.
[20] A. Zhu et al., “Dynamic deviation reduction-based volterra behavioral modeling of RF power amplifiers,” IEEE Trans. Microw. Theory Techn., vol. 54, no. 12, pp. 4323–4332, Dec. 2006, doi: 10.1109/TMTT.2006.883243.
[21] A. Zhu et al., “Open-loop digital predistorter for RF power amplifiers using dynamic deviation reduction-based Volterra series,” IEEE Trans. Microw. Theory Techn., vol. 56, no. 7, pp. 1524–1534, Jul. 2008, doi: 10.1109/TMTT.2008.925211.
[22] S. Haykin, Adaptive Filter Theory. Upper Saddle River, NJ, USA: Pearson Education, 2013.
[23] G. Strang, Linear Algebra and Learning From Data. Wellesley, MA, USA: Wellesley-Cambridge, 2019.
[24] “MaxLinear teams with Qorvo to enable high-efficiency power amplifiers for massive MIMO radio solution.” MaxLinear. Accessed: May 1, 2023. [Online] . Available: https://investors.maxlinear.com/press-releases/detail/469/
[25] H. Zhou et al., “A generic theory for design of efficient three-stage Doherty power amplifiers,” IEEE Trans. Microw. Theory Techn., vol. 70, no. 2, pp. 1242–1253, Feb. 2022, doi: 10.1109/TMTT.2021.3126885.
[26] C. Chu et al., “High-efficiency class-iF−1 power amplifier with enhanced linearity,” IEEE Trans. Microw. Theory Techn., vol. 71, no. 5, pp. 1977–1989, May 2023, doi: 10.1109/TMTT.2022.3224132.
[27] M. Li et al., “Bandwidth enhancement of Doherty power amplifier using modified load modulation network,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 6, pp. 1824–1834, Jun. 2020, doi: 10.1109/TCSI.2020.2972163.
[28] J. Pang et al., “Analysis and design of highly efficient wideband RF-input sequential load modulated balanced power amplifier,” IEEE Trans. Microw. Theory Techn., vol. 68, no. 5, pp. 1741–1753, May 2020, doi: 10.1109/TMTT.2019.2963868.
Digital Object Identifier 10.1109/MMM.2023.3314319