David Ernesto Troncoso Romero
Cascaded integrator-comb (CIC) decimators provide natural aliasing rejection in folding bands. However, improving that rejection requires increasing the number of integrator-comb pairs, N, and this affects the attractive simplicity of the CIC system as N grows. For a given N, the worst-case attenuation can still be improved by applying zero rotations. However, state-of-the-art zero-rotation methods introduce coefficients in the CIC filter’s transfer function and therefore complicate the resulting decimation architecture. Here, we show how to rotate zeros in a simple way without these coefficients, such that we only need to add two registers and one adder to the original CIC decimation architecture. Worst-case attenuation improvements can be up to more than 20 dB, depending on the value of N and the wideness of the folding bands. Other efficient architectures to implement these simplified zero-rotated CIC systems are also presented, and a pruning analysis for these architectures is developed.
The CIC filter is a linear-phase finite impulse response (FIR) filter that consists of N CIC pairs. It is frequently employed in the first stages of a multistage decimation chain, specifically, in any decimation stage with an integer downsampling factor that precedes other decimation stage(s) [1]. In the following, we will denote as R the downsampling factor of the CIC decimation stage and as P the total downsampling factor that remains after the CIC stage in the chain ${(}{P} = {2}$ or ${P} = {4}$ are values frequently used).
The CIC filter is popular because of two main reasons.
Aliasing attenuation can be improved by increasing N. However, the CIC architecture becomes more complex as N grows, and this puts an upper limit on N in practice. For a given N, the worst-case attenuation can still be enhanced by modifying the transfer function of the CIC filter in such a way that zero rotations are introduced in the magnitude response. Nevertheless, this leads to the necessity of altering the CIC architecture. Making this worst-case attenuation improvement with as simple as possible alterations to the original CIC decimation architecture has been a subject of intense research over the years; see, for example, [2], [3], [4], [5], [6], [7].
In general, in the state-of-the-art methods (for instance, [2], [3], [4], [5], [6], [7]), zero rotations are introduced by different coefficients in the transfer function of the CIC filter for different pairs (R, P). This means that constant products, which can be implemented with additions and shifts, are efficient only for a specific pair (R, P). Multipliers or extra adders and barrel shifters, as well as extra registers, must be included in the resulting CIC-based decimation data paths to allow flexibility for changing the coefficients’ values adequate for different pairs (R, P).
Thus, common to state-of-the-art methods is the fact of introducing coefficients in the CIC data path to improve the worst-case attenuation, a practice that augments the complexity of the resulting system. However, we can improve the worst-case attenuation in a simpler manner without the need for these coefficients, such that little extra hardware has to be added. That is the focus of this contribution. We will show how to achieve zero rotations without introducing filter coefficients that depend on the pair (R, P).
Before leaving this section, the following two aspects are worth mentioning:
The transfer function of the CIC filter is \[{H}{(}{z}{)} = {\left({\frac{{1}{-}{z}^{{-}{R}}}{{1}{-}{z}^{{-}{1}}}}\right)}^{N} \tag{1} \] where R is the integer downsampling factor, and N is the number of integrator-comb pairs. The magnitude response of the CIC filter, obtained from (1) upon replacing ${z} = {e}^{{\text{j}}{2}{\pi}{f}}$ (where f is the frequency in Hz), is \[\left|{H}\left({e}^{{\text{j}}{2}{\pi}{f}}\right)\right| = \left|\left(\frac{\sin\left[{{R}{\pi}{f}}\right]}{\sin\left[{\pi}{f}\right]}\right)^{N}\right|{.} \tag{2} \]
Since the CIC decimator is followed by a downsampling factor P due to the next decimation stage(s), the passband of the CIC filter goes from ${f} = {0}$ to ${f} = {f}_{s}{/}{2}{RP}{.}$ Its stopbands, the folding bands generated due to the decimation process, are ${f}_{s}/RP$ wide, centered on multiples of ${f}_{s}/R,$ where ${f}_{s}$ is the sampling rate at the input of the CIC decimator. From (2), it is known that the CIC filter has a monotonic droop in the passband and offers natural aliasing rejection in the folding bands, with the worst-case attenuation occurring at ${f} = {f}_{s}\times{(}{1}{/}{R}{-}{1}{/}{2}{RP}{)}{.}$ This is exemplified in Figure 1(a) for ${R} = {8}$ and ${P} = {2},$ with ${N} = {1}$ and ${N} = {3},$ where we observe that the worst-case attenuation values are 10 and 31 dB, respectively (passband and folding bands are gray shaded, and magnitude responses are scaled by $1/{R}^{N}$ to normalize 0 dB at ${f} = {0}{Hz}{)}{.}$
Figure 1. (a) The magnitude response $\mid{H}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ for ${N} = {1}$ and ${N} = {3}{.}$ (b) The architecture of the CIC decimator.
The CIC decimator is derived after applying multirate identities to the comb sections ${(}{1}{-}{z}^{{-}{R}}{)}^{N}$ in the transfer function (1). As shown in the register transfer level diagram of Figure 1(b), the CIC architecture consists of N integrators working at the input rate ${f}_{s}$ and N comb filters working at the lower rate, where registers are enabled every Rth clock cycle. The CIC data path, as detailed in [1], operates in two’s complement arithmetic. The input word length W must be extended by ${K} = \left\lceil{\log}_{2}{(}{R}^{N}{)}\right\rceil$ extra bits to prevent internal overflow (in the figure, this extension is represented with a triangle), and some least significant bits can be pruned at the output, as well as at the input of each integrator and comb, to reduce complexity at the expense of introducing truncation error. Every pruning position is shown with a small square in Figure 1(b), and the number of bits that can be pruned at the kth pruning position is denoted by ${B}_{k}{.}$
Let us consider two simple ways of modifying H(z), which result in the transfer functions ${H}_{1}(z)$ and ${H}_{2}{(}{z}{)}{.}$ We refer to these filters as modified CIC 1 (MCIC-1) and modified CIC 2 (MCIC-2), respectively. In both transfer functions, we employ an integer parameter r, which satisfies ${0}{<}{r}{<}{R}{.}$
To obtain ${H}_{1}(z),$ we modify one of the comb sections of the original CIC filter to become ${(}{1}{-}{z}^{{-}{2}{R} + {r}}{)}$ instead of ${(}{1}{-}{z}^{{-}{R}}{)}{.}$ The resulting transfer function is \[{H}_{1}{(}{z}{)} = {\left({\frac{{1}{-}{z}^{{-}{R}}}{{1}{-}{z}^{{-}{1}}}}\right)}^{{N}{-}{1}}\times\left({\frac{{1}{-}{z}^{{-}{2}{R} + {r}}}{{1}{-}{z}^{{-}{1}}}}\right){.} \tag{3} \]
For ${H}_{2}(z),$ we introduce a simple extra comb filter ${(}{1} + {z}^{{-}{R} + {r}}{)}$ in the original CIC transfer function, and we get \[{H}_{2}{(}{z}{)} = {\left({\frac{{1}{-}{z}^{{-}{R}}}{{1}{-}{z}^{{-}{1}}}}\right)}^{N}\times{(}{1} + {z}^{{-}{R} + {r}}{)}{.} \tag{4} \]
The first zero of the term [${(}{1}{-}{z}^{{-}{2}{R} + {r}}{)/}\,{(}{1}{-}{z}^{{-}{1}}{)}$] in (3) appears at ${f} = {f}_{s}{/(}{2}{R}{-}{r}{),}$ and the first zero of the term ${(}{1} + {z}^{{-}{R} + {r}}{)}$ in (4) appears at ${f} = {f}_{s}{/}{2}{(}{R}{-}{r}{)}{.}$ Thus, by adjusting r, we can place the aforementioned zeros close to the frequency where the worst-case attenuation occurs, namely, ${f} = {f}_{s}\times{(}{1}{/}{R}{-}{1}{/}{2}{RP}{)}{.}$ In general, the value for r in ${H}_{1}(z)$ is different from the value for r in ${H}_{2}{(}{z}{)}{.}$
The term zero rotation comes from expressing the position of the zeros in angular frequencies ${\omega}$ relative to the sampling rate, i.e., ${\omega} = {2}{\pi}{f}{/}{f}_{s}{.}$ As shown in Figure 2(a), the first zero of the term ${[(}{1}{-}{z}^{{-}{2}{R} + {r}}{)/(}{1}{-}{z}^{{-}{1}}{)]}$ in (3) [shown in Figure 2(a) in black color] gets rotated by an angle ${\alpha}_{1} = {2}{\pi}{[(}{R}{-}{r}{)}{/}{R}{(}{2}{R}{-}{r}{)]}$ with respect to the first zero of a classic CIC filter [shown in Figure 2(a) in gray color]. Similarly, Figure 2(b) shows the rotation angle ${\alpha}_{2} = {\pi}{[(}{R}{-}{2}{r}{)/}{R}{(}{R}{-}{r}{)]}$ for the first zero of the term ${(}{1} + {z}^{{-}{R} + {r}}{)}$ in (4). In this case, ${r}{<}{R}{/}{2}$ must hold to have a clockwise rotation ${\alpha}_{2}{.}$
Figure 2. (a) Zero rotations for the MCIC-1 filter. (b) Zero rotations for the MCIC-2 filter. (c) The magnitude responses of $\mid{H}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ and $\mid{H}_{1}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ with ${N} = {3}$ and ${r} = {5}{.}$ (d) The magnitude responses of $\mid{H}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ and $\mid{H}_{2}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ with ${N} = {3}$ and ${r} = {3}{.}$ (e) The passband droop increase of the filter MCIC-1. (f) The passband droop increase of the filter MCIC-2. g) The worst-case attenuation improvement of the filter MCIC-1. (h) The worst-case attenuation improvement of the filter MCIC-2.
Figure 2(c) and (d) shows, respectively, the magnitude responses $\mid{H}_{1}{(}{e}^{\text{j}2\text{r}f}{)}\mid$ and $\mid{H}_{2}{(}{e}^{{\text{j}}{2}{\pi}{f}}{)}\mid$ for ${N} = {3},\,{R} = {8},$ and ${P} = {2},$ which are contrasted with the traditional CIC’s magnitude response $\mid{H}{(}{e}^{{\text{j}}{2}{\pi}{f}}{)}\mid{.}$ The respective values for r in $\mid{H}_{1}{(}{e}^{{\text{j}}{2}{\pi}{f}}{)}\mid$ and in $\mid{H}_{2}{(}{e}^{{\text{j}}{2}{\pi}{f}}{)}\mid$ in these figures are ${r} = {5}$ and ${r} = {3}{.}$ Notice that, in both filters, the worst-case attenuation is improved by about 14 dB, whereas the passband droop, which is similar to the one of the traditional CIC filter, is affected by only approximately 1 dB.
For a more general perspective on diverse values of N, P, and R, the box plots in Figure 2(e) and (f) present, respectively for the filters MCIC-1 and MCIC-2, the maximum increase in passband droop relative to the worst passband droop of the traditional CIC filter, for ${N} = {3}$ to ${N} = {6},$ when R ranges from 2 to 1,024, considering ${P} = {2}$ and ${P} = {4}{.}$ Similarly, the box plots in Figure 2(g) and (h) show, respectively for the filters MCIC-1 and MCIC-2, the worst-case attenuation improvements relative to the worst-case attenuation of the traditional CIC filter, using the same values for N, R, and P as before.
If we contrast Figure 2(e) and (f), we see that the passband droop of the MCIC-2 filter is slightly worse than that of the MCIC-1 filter. However, in both filters, that droop gets worse by fewer than 1.6 dB for the case of a wide passband ${(}{P} = {2}{)}$ and by fewer than 0.4 dB for the case of a narrow passband ${(}{P} = {4}{)}{.}$ From Figure 2(g) and (h), we observe very acceptable improvements for the worst-case attenuation, which increase as N grows. For the MCIC-1 filter, the worst-case attenuation improvement ranges from approximately 14 to 24 dB for wide folding bands ${(}{P} = {2}{),}$ and from about 2 to 8 dB for narrow folding bands ${(}{P} = {4}{)}{.}$ For the MCIC-2 filter, the ranges are approximately 14–28 dB for wide folding bands and about 13–17 dB for narrow folding bands.
The MCIC-2 filter employs an extra filter besides the N integrator-comb pairs, which introduces its own zeros and also its own passband droop. Because of that, the MCIC-2 filter can provide better worst-case attenuation improvements, mainly in the narrowband case, but also has a slightly more noticeable passband droop.
To build the decimation architectures derived from ${H}_{1}(z)$ and ${H}_{2}(z),$ the respective filtering blocks ${(}{1}{-}{z}^{{-}{2}{R} + {r}}{)} = {(}{1}{-}{z}^{{-}{[}{R} + {(}{R}{-}{r}{)]}}{)}$ and ${(}{1} + {z}^{{-}{R} + {r}}{)} = {(}{1} + {z}^{{-}{(}{R}{-}{r}{)}}{)}$ must be implemented in polyphase form, preceding the other regular comb sections. In both cases, instead of using ${(}{R}{-}{r}{)}$ registers in one of the two polyphase branches as in Figure 3(a) and (c), we employ only one register that catches the desired samples, as in Figure 3(b) and (d). The equivalence between polyphase arrays of Figure 3(a) and (b) or between polyphase arrays of Figure 3(c) and (d) can be explained as follows. In the polyphase arrays of Figure 3(a) and (c), for any sample x[n] at the input, the sample ${x}{[}{n}{-}{(}{R}{-}{r}{)]}$ appears at the output of the ${(}{R}{-}{r}{)}$ registers. If x[n] is passed to the low-rate section at a downsampling instant, then ${x}{[}{n}{-}{(}{R}{-}{r}{)]}$ is also passed to the low-rate section. In the arrays of Figure 3(b) and (d), instead of using (R − r) registers to obtain ${x}{[}{n}{-}{(}{R}{-}{r}{)],}$ we let the control unit specifically enable a single register at the proper time (using the signal “en-s”) to store ${x}{[}{n}{-}{(}{R}{-}{r}{)]}$ when it appears in the input. This proper time is r clock edges after the downsampling instant, which is the same as ${(}{R}{-}{r}{)}$ clock edges before the downsampling instant because the sample ${x}{[}{n}{-}{(}{R}{-}{r}{)]}$ was available at the input at that instant.
Figure 3. (a) A filter ${(}{1}{-}{z}^{{-}{2}{R} + {r}}{)}$ with the polyphase delay implemented using several registers. (b) A filter ${(}{1}{-}{z}^{{-}{2}{R} + {r}}{)}$ with the polyphase delay implemented using a single register. (c) A filter ${(}{1} + {z}^{{-}{R} + {r}}{)}$ with the polyphase delay implemented using several registers. (d) A filter ${(}{1} + {z}^{{-}{R} + {r}}{)}$ with the polyphase delay implemented using a single register. (e) A timing example for ${R} = {5}$ and ${r} = {3}{.}$ CLK: clock.
Figure 3(e) shows a timing diagram to exemplify this operation considering ${R} = {5}$ and ${r} = {3}{.}$ We have the input signal x[n] and the delayed input ${x}{[}{n}{-}{(}{5}{-}{3}{)]}{.}$ At the clock edges ${n} = {0},$ ${n} = {5},$ and ${n} = {10},$ the downsampling signal “en” samples from the delayed input the values zero, x[3], and x[8]. These same values can be stored directly from the input x[n] in the single register enabled by “en-s.” If the signal “en-s” is asserted three clock edges after “en,” the single register catches from x[n] the values zero, x[3], and x[8], which remain ready to be sampled by the signal “en” at the corresponding clock edges ${n} = {0},$ ${n} = {5},$ and ${n} = {10}{.}$
The aforementioned implementation strategy, illustrated in Figure 3, consists of applying the commutator model for polyphase decimation, as explained in texts on multirate systems, such as [9], [10]. In general, “en-s” is asserted every Rth clock cycle in the same way as “en” but at a different clock edge. Thus, the control unit in the modified decimators must have an extra bit for the special signal “en-s.” This does not represent any practical limitation as it only needs a little extra logic added to the original control. Besides, the value r can be reconfigured easily, in the same way as the value R can be tuned in the classic CIC system.
Figure 4(a) and (b) presents, respectively, the decimation architectures derived from ${H}_{1}(z)$ and from ${H}_{2}(z),$ showing in gray color the hardware employed for the original CIC decimator and in black color the additional hardware. Notice that only two extra registers are required in the MCIC-1 system, and only two extra registers plus an extra adder are required in the MCIC-2 system. Pruning positions are shown with small squares, and the number of pruned bits at the kth pruning position is denoted by ${B}_{k}{.}$
Figure 4. (a) The architecture of the MCIC-1 decimator. (b) The architecture of the MCIC-2 decimator. (c) The architecture of the MCIC-1 decimator with time-multiplexed comb unit. (d) The architecture of the MCIC-2 decimator with time-multiplexed comb unit. (e) A timing example for ${R} = {8},$ ${N} = {3},$ and ${r} = {5}{.}$
Let us call L the number of pruning positions, with ${L} = {2}{N} + {1}$ for the MCIC-1 architecture and ${L} = {2}{N} + {2}$ for the MCIC-2 architecture. Thus, we have \[{B}_{k} = \left\lfloor{{B}_{L} + \tfrac{1}{2}{\log}_{2}\tfrac{1}{{L}{-}{1}}{-}{\log}_{2}{F}_{k}}\right\rfloor \tag{5} \] for ${k} = {1},{2},\ldots,{L}{-}{1},$ where ${F}_{k}^{2}$ denotes the “variance error gain” of the kth error source, computed as the sum of squared impulse response coefficients of the filter seen from the kth pruning position to the output (for sources at the high-rate section, the downsampler must be placed at the output of that filter via multirate identities). The derivation of (5) is detailed in “Pruning Analysis.” Notice that ${B}_{L},$ the number of bits pruned at the output, can be set beforehand depending on the application needs.
Pruning Analysis
The input word length $W$ must be extended by $K$ extra bits in the modified CIC 1 (MCIC-1) and modified CIC 2 (MCIC-2) decimation architectures to prevent internal overflow. The value for $K$ is the number of bits necessary to express the gain $G$ of the corresponding filter. Thus, we have ${K} = \left\lceil{\log}_{2}{(}{G}{)}\right\rceil$, with ${G} = {R}^{{N}{-}{1}}\times{(}{2}{R}{-}{r}{)}$ for MCIC-1 or ${G} = {2}{R}^{N}$ for MCIC-2. For the $k\text{th}$ pruning position, the value ${B}_{k}$ is computed in such a way that the variance of the error due to pruning inside the data path remains bounded by the variance due to output pruning. As in [1], we consider that the error at the $k\text{th}$ noise source due to pruning has uniform probability distribution with a width of ${E}_{k} = {2}^{{B}_{k}},$ and its variance is ${\sigma}_{k}^{2} = {E}_{k}^{2}{/}{12} = $ ${2}^{2{B}_{k}}/12.$ The variance contributed by the $k\text{th}$ error source is ${\sigma}_{T,k}^{2} = {\sigma}_{k}^{2}{F}_{k}^{2},{i}{.}{e}{.,}$ \[{\sigma}_{T,k}^{2} = \left({\frac{1}{12}{2}^{2{B}_{k}}}\right){F}_{k}^{2} \tag{A1} \] where ${F}_{k}$ is the variance error gain due to the filter’s coefficients. The total variance is the sum of individual variances of every source. Thus, if we let the variances of every error source from ${k} = {1}$ to ${k} = {L}{-}{1}$ contribute at most ${1}{/(}{L}{-}{1}{)}$ of the variance at the output source, we have \[{\sigma}_{T,k}^{2}\leq\tfrac{1}{{L}{-}{1}}{\sigma}_{T,L}^{2} \tag{A2} \] for ${k} = {1},{2},\ldots,{L}{-}{1}{.}$ Using (A1) in (A2) with the consideration that the output error source sees a unitary gain in front of it, i.e., ${F}_{L}^{2} = {1},$ we obtain \[\left({\frac{1}{12}{2}^{2{B}_{k}}}\right){F}_{k}^{2}\leq\frac{1}{{L}{-}{1}}\left({\frac{1}{12}{2}^{2{B}_{L}}}\right) \tag{A3} \] for ${k} = {1},{2},\ldots,{L}{-}{1},$ which leads to (5).
Let us denote the collective variance of the comb section as ${\sigma}_{c}^{2} = {\sigma}_{{T},{N} + {1}}^{2} + {\sigma}_{{T},{N} + {2}}^{2} + \cdots + {\sigma}_{{T},{L}{-}{1}}^{2},$ or using (A1), as ${\sigma}_{c}^{2} = $ ${(}{2}^{2{B}_{{N} + {1}}}{/}{12}{)}{F}_{{N} + {1}}^{2} + {(}{2}^{2{B}_{{N} + {2}}}{/}{12}{)}{F}_{{N} + {2}}^{2} + \cdots + {(}{2}^{2{B}_{{L}{-}{1}}}{/}{12}{)}{F}_{{L}{-}{1}}^{2}{.}$ For the case of the time-multiplexed comb section, we have ${B}_{{N} + {1}} = {B}_{{N} + {2}} = \cdots = {B}_{{L}{-}{1}}{;}$ thus, we can express ${\sigma}_{c}^{2}$ as \[{\sigma}_{c}^{2} = \left({\frac{1}{12}{2}^{2{B}_{{N} + {1}}}}\right){F}_{c}^{2} \tag{A4} \] \[{F}_{c}^{2} = \mathop{\sum}\limits_{{k} = {N} + {1}}\limits^{{L}{-}{1}}{{F}_{k}^{2}}, \tag{A5} \]
Applying (A2) to ${\sigma}_{c}^{2},$ we can write ${\sigma}_{c}^{2}\leq{[}{L}{-}{(}{N} + {1}{)]}$ ${[}{\sigma}_{T,L}^{2}{/(}{L}{-}{1}{)],}$ where ${[}{L}{-}{(}{N} + {1}{)]}$ is the number of comb filters. Using (A1) and (A4), with the consideration ${F}_{L}^{2} = {1},$ we can write \[\left({\frac{1}{12}{2}^{2{B}_{{N} + {1}}}}\right){F}_{c}^{2}\leq\frac{{[}{L}{-}{(}{N} + {1}{)]}}{{L}{-}{1}}\left({\frac{1}{12}{2}^{2{B}_{L}}}\right) \tag{A6} \] which leads to (6).
If R is large, we can consider that there is enough time for a single arithmetic unit to perform the operations of the whole cascaded comb filtering in a multiplexed way, following the principles from [11]. In that case, we can reduce the number of implemented hardware elements. Figure 4(c) and (d) shows the resulting decimation architectures where the time-multiplexed comb filtering approach is employed. These architectures are called time-multiplexed MCIC, or for short, TMUX-MCIC-1 and TMUX-MCIC-2. For the TMUX-MCIC-1 decimator, ${N}{-}{1}$ arithmetic units are traded for a pair of two-to-one multiplexers, whereas the TMUX-MCIC-2 decimator trades N arithmetic units for a pair of two-to-one multiplexers.
Figure 4(c) and (d) illustrates in gray color the employed arithmetic units and the registers that would be used in a classic CIC decimator and in black color the extra registers needed as well as the traded multiplexers. Notice that the multiplexed comb block in the TMUX-MCIC-2 system uses an adder/subtractor unit because its first comb in the cascade employs an adder, but the rest of the combs use subtractors.
In both architectures, we observe two additional single-bit control signals: “en-f,” which must be asserted either N (for TMUX-MCIC-1) or ${N} + {1}$ (for TMUX- MCIC-2) times per assertion of “en” to enable the register transfers of the time-multiplexed unit, and “en-m,” which is asserted to route new samples that come from the high-rate section (and to enable the addition mode in the case of TMUX-MCIC-2). It is worth noticing that, even though the control unit has to include two extra bits for the signals “en-f” and “en-m,” this does not add significant utilization of hardware resources because the complexity is dominated by the data path of the decimation chain.
Figure 4(e) shows an example of the timing for the multiplexed comb block of the TMUX-MCIC-1 system, considering ${R} = {8},$ ${r} = {5},$ and ${N} = {3}$ (the case for TMUX-MCIC-2 is similar and can be straightforwardly deduced from this example). Right at the clock edge ${n} = {1},$ the downsampling signal “en” activates the low-rate registers, whereas the signal “en-s” occurs out of phase by r clock edges with reference to “en,” as explained earlier. Since at that same clock edge ${n} = {1}$ the signal “en-m” is asserted, the multiplexed comb block gets (due to “en”) a new sample from the high-rate section, whereas the signal “en-f” activates the registers placed inside the loops of the time-multiplexed block. After that clock edge, the signal “en-m” is deasserted, allowing one multiplexer to route the feedback path and the other multiplexer to form the subsequent combs in the cascade, whereas the signal “en-f” makes two more register transfers inside the multiplexed system. Right after the clock edge ${n} = {5},$ “en-m” is asserted, and the multiplexers route new data that come from the high-rate section, which will be sampled at the clock edge ${n} = {9}$ (due to “en”), and the process starts again.
We have to take into account that the cascaded comb section in the time-multiplexed version needs to process data at a rate M times higher than the output rate, with ${M} = {N}$ for the TMUX-MCIC-1 system and ${M} = {N} + {1}$ for the TMUX-MCIC-2 system. The processing rate of the time-multiplexed unit remains below the input rate if ${R}{>}{M}$ holds. Besides, to have the pulses “en-f” separated from each other by at least one clock cycle, as in Figure 4(e), the relation ${R}\geq{2}{M}$ must hold.
Observe that, since a single time-multiplexed comb unit is employed, the number of pruning bits is the same in all comb sections, i.e., we have ${B}_{{N} + {1}} = $ ${B}_{{N} + {2}} = \cdots = {B}_{{L}{-}{1}}{.}$ Therefore, we can use (5) to compute ${B}_{k}$ for ${k} = {1},{2},\ldots,{N},$ and we only need to compute ${B}_{{N} + {1}}$ additionally, using \begin{align*}\begin{gathered}{{B}_{{N} + {1}} = }\\{\left\lfloor{{B}_{L} + \tfrac{1}{2}{\log}_{2}\tfrac{{[}{L}{-}{(}{N} + {1}{)]}}{{L}{-}{1}}{-}{\log}_{2}{F}_{c}}\right\rfloor{.}}\end{gathered} \tag{6} \end{align*}
The derivation of (6) is detailed in “Pruning Analysis.”
Implementations on different platforms vary depending on the type and granularity of the technology target. Therefore, instead of presenting an example mapped to a particular platform (for example, field-programmable gate array), we prefer to provide a more general and intuitive estimation of the hardware cost C of the CIC and proposed architectures by using the bus sizes of these systems, weighted by technology-dependent cost values related to the involved components (adders, registers, and multiplexers). More specifically, the hardware cost C can be expressed by the cost of a single-bit register $({C}_{\text{reg}})$ weighting the sum of the sizes of all registers (the size of the ith register is ${S}_{\text{reg},i}$) plus the cost of a single-bit adder $({C}_{\text{add}})$ weighting the sum of the sizes of all adders (the size of the ith adder is ${S}_{\text{add},i})$ plus the cost of a single-bit multiplexer $({C}_{\text{mux}})$ weighting the sum of the sizes of all multiplexers (the size of the ith multiplexer is ${S}_{\text{mux},i}),$ i.e., \begin{align*}\begin{gathered}{{C} = \mathop{\underbrace{({C}_{\text{reg}}}}\limits_{\begin{gathered}{\text{cost for}}\\{\text{single-bit}}\\{\text{register}}\end{gathered}}\times\mathop{\underbrace{\mathop{\sum}\limits_{i}{{S}_{\text{reg},i})}}}\limits_{\begin{gathered}{\text{sum of sizes}}\\{\text{of registers}}\end{gathered}} + \mathop{\underbrace{({C}_{\text{add}}}}\limits_{\begin{gathered}{\text{cost for}}\\{\text{single-bit}}\\{\text{adder}}\end{gathered}}}\\{\times\mathop{\underbrace{\mathop{\sum}\limits_{i}{{S}_{\text{add},i})}}}\limits_{\begin{gathered}{\text{sum of sizes}}\\{\text{of adders}}\end{gathered}} + \mathop{\underbrace{({C}_{\text{mux}}}}\limits_{\begin{gathered}{\text{cost for}}\\{{\text{single}} {-} {bit}}\\{\text{multiplexer}}\end{gathered}}\times\mathop{\underbrace{\mathop{\sum}\limits_{i}{{S}_{\text{mux},i})}}}\limits_{\begin{gathered}{\text{sum of sizes}}\\{\text{of multiplexers}}\end{gathered}}{.}}\end{gathered} \tag{7} \end{align*}
Regardless of the architecture, W denotes the input word length, and K denotes the number of extra bits required to prevent internal overflow. Thus, the total bus width can be denoted by ${U} = {W} + {K}{.}$ The size of every register, adder, or multiplexer is computed as ${U}{-}{B}_{k},$ where ${B}_{k}$ is the number of bits pruned in the kth pruning position, i.e., the pruning position where that register, adder, or multiplexer is placed in the data path. Adders, subtractors, and adder/subtractors are assumed to have the same cost in this formulation, which is a usual simplification. Besides, the complexity of the control units is neglected because these units do not involve the important contribution of hardware utilization in a decimation chain where the data paths consume most of the resources.
Consider the following example with ${R} = {8}{.}$ For the CIC architecture, the different bus widths from input to output can be sorted into a vector in the form u = [16 31 28 26 24 23 22 21 20 19 19 16]. Similarly, we can sort the number of adders, registers, and multiplexers for every bus width in the respective vectors a = [0 1 1 1 1 1 1 1 1 1 1 0] T , r = [1 1 1 1 1 1 2 1 1 1 1 1] T and m = [0 0 0 0 0 0 0 0 0 0 0 0] T . Notice that the first entry of a is zero because there is no adder at the input before the sign extension. Yet the first entry of r is one because there is an input register before the sign extension, and there is a “two” in the seventh entry of r because there is an extra register in the middle of the architecture (the downsampler). The CIC architecture does not use multiplexers in the data path, so m is an all-zero vector. With these vectors, we find the cost by using ${C} = {C}_{\text{add}}\cdot{u}\cdot{a} + {C}_{\text{reg}}\cdot{u}\cdot{r} + {C}_{\text{mux}}\cdot{u}\cdot{m}{.}$ We use the same approach for the other architectures.
Table 1 summarizes the estimated hardware costs for CIC, MCIC-1, MCIC-2, TMUX-MCIC-1, and TMUX-MCIC-2 architectures, with ${N} = {5}$ stages and values for R that are powers of two from eight to 256 to cover a wide enough extent. For all the cases, we consider 16-bit inputs and 16-bit outputs. The values ${C}_{\text{mux}},$ ${C}_{\text{add}},$ and ${C}_{\text{reg}}$ are the respective very large-scale integration–technology silicon compiler area costs for a single-bit 2:1 multiplexer, a full adder, and a flip-flop, which, according to Meyer-Baese [12], are ${C}_{\text{mux}} = {0}{.}{0012}{mm}^{2}{;}$ ${C}_{\text{add}} = $${0}{.}{0086}{mm}^{2}{;}$ and ${C}_{\text{reg}} = {0}{.}{0037}{mm}^{2}$ for a ${1} {-} {\mu}{m}$ CMOS process. Compared with the CIC architecture, the cost of the MCIC-1 is higher by 4.82% on average, and the cost of the MCIC-2 is higher by 12.46% on average. On the other hand, the costs of the TMUX-MCIC-1 and the TMUX-MCIC-2 architectures are, on average, lower by 11.4% and 8.92%, respectively. These percentages are summarized in Tables 2 and 3.
Table 1. The hardware cost C (area in mm2) based on the models from [12] for architectures with N = 5 original integrator-comb pairs, with 16-bit input and 16-bit output.
Table 2. The average percentage of estimated increase of hardware cost C (area in mm2) of the proposed architectures MCIC-1 and MCIC-2 in comparison to the original CIC architecture, for architectures with N = 5, with 16-bit input and 16-bit output (based on the models from [12]).
Table 3. The average percentage of estimated savings of hardware cost C (area in mm2) of the proposed architectures MCIC-1-TMUX and MCIC-2-TMUX with respect to the original CIC architecture, for architectures with N = 5, with 16-bit input and 16-bit output (based on the models from [12]).
Finally, it is worth mentioning the following. From the percentages mentioned previously, it is clear that TMUX architectures could be more convenient in terms of hardware cost. The zero-rotation improvements are available with the additional benefit that the resulting architectures have even lower costs than the original CIC architecture. However, we must take into account that the TMUX architectures might be slower than the ones without multiplexing because, on the one hand, there are additional multiplexers embedded in their data paths, which increase the critical path, and on the other hand, these multiplexers must operate M times faster than the decimated rate. Besides, the power consumption of a single comb unit multiplexed M times may not be exactly equal to the power consumption of M comb units running in parallel at a slower pace. These aspects depend on the target platform where the system will be implemented. Therefore, a good decision should be supported by some additional information about the silicon technology to be used. Yet emphasis is made here on the fact that the proposed filters introduce a good improvement of worst-case aliasing attenuation, and the corresponding architectures preserve the desirable characteristics of CIC decimators: simplicity and the ability for easy reconfiguration.
We have employed a simple trick to design CIC decimators with improved aliasing rejection; instead of introducing complicated branches and bifurcations in the data path of the original CIC system, just a couple of registers enabled at the appropriate sampling times implement the polyphase decomposition of a simple filter that has only a couple of unitary coefficients. With this, zero rotations can be performed conveniently because the data path of the original CIC architecture is modified just slightly. We also have observed that in the proposed CIC-based decimators, the comb units do not have to operate concurrently. Since they operate at the low-rate section, time multiplexing does not come at the cost of having to increase the maximum rate of operation. By using a simpler time-multiplexed comb unit, the proposed data path for the CIC decimator can achieve, in comparison with the traditional CIC data path, lower hardware utilization. Therefore, the proposed CIC-based decimators are a useful option over the classic CIC architecture.
Additionally, to favor savings in hardware resources upon implementing the decimators, closed-form expressions have been derived to compute the number of bits to be pruned in the proposed architectures, following the approach by Hogenauer [1]. In this case, a constant fraction of the variance of the error due to output pruning is employed as an upper bound to the variance of every error source, and this homogeneously guides the formulation to obtain the maximum number of bits allowed to be pruned. It is worth highlighting that more sophisticated approaches could be used, taking into account different criteria for the selection of the number of bits to prune in every source, following an optimization viewpoint where the cost function to minimize can be, for example, a model for area utilization or power consumption, subject to a given total output variance.
In this regard, the approach can be a joint optimization that takes into account the postfiltering stages that usually accompany the CIC-based architecture to form an overall sharp decimator with optimal word lengths. Moreover, the variance model employed in these settings can be one better suited for cases where word lengths are different by a relatively small number of bits, such as the one in [13], which outperforms the classic simplification that considers a great difference of word lengths between noise sources, a generalization that may not be realistic in the aforementioned joint optimization scenario.
David Ernesto Troncoso Romero (david.troncoso@uqroo.edu.mx) is with Universidad Autónoma del Estado de Quintana Roo, Cancún 77519, Quintana Roo, México.
[1] E. B. Hogenauer, “An economical class of digital filters for decimation and interpolation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 2, pp. 155–162, Apr. 1981, doi: 10.1109/TASSP.1981.1163535.
[2] T. Saramaki and T. Ritoniemi, “A modified comb filter structure for decimation,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Hong Kong, China, 1997, pp. 2353–2356, doi: 10.1109/ISCAS.1997.612795.
[3] J. O. Coleman, “Chebyshev stopbands for CIC decimation and CIC-implemented array tapers in 1D and 2D,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 12, pp. 2956–2968, Dec. 2012, doi: 10.1109/TCSI.2012.2206435.
[4] M. G. C. Jimenez, U. Meyer-Baese, and G. Dolecek, “Computationally-efficient CIC-based filter with embedded Chebyshev sharpening for the improvement of aliasing rejection,” Electron. Lett., vol. 53, no. 4, pp. 281–283, Feb. 2017, doi: 10.1049/el.2016.3309.
[5] L. Zhibin, G. Bo, Y. Ruotong, and G. Min, “Efficient sharpening CIC filter embedding fifth-order filter with coefficient optimization algorithm,” Electron. Lett., vol. 56, no. 23, pp. 1241–1243, Nov. 2020, doi: 10.1049/el.2020.1433.
[6] S. Aggarwal and P. K. Meher, “Enhanced sharpening of CIC decimation filters, implementation and applications,” Circuits Syst. Signal Process., vol. 41, pp. 4581–4603, Mar. 2022, doi: 10.1007/s00034-022-01993-w.
[7] A. Dudarin, G. Molnar, and M. Vucic, “Optimum multiplierless sharpened cascaded-integrator-comb filters,” Digit. Signal Process., vol. 127, Jul. 2022, Art. no. 103564, doi: 10.1016/j.dsp.2022.103564.
[8] D. E. T. Romero and M. G. C. Jimenez, “Efficient wide-band droop compensation for CIC filters: Ad-hoc and reconfigurable FIR architectures,” Electron. Lett., vol. 53, no. 4, pp. 228–229, Jan. 2017, doi: 10.1049/el.2016.3782.
[9] F. J. Harris, Multirate Signal Processing for Communication Systems. Upper Saddle River, NJ, USA: Prentice-Hall, 2004.
[10] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ, USA: Prentice-Hall, 1993.
[11] Z. Jiang and A. N. Willson Jr., “Efficient digital filtering architectures using pipelining/interleaving,” IEEE Trans. Circuits Syst. II. Analog Digit. Signal Process., vol. 44, no. 2, pp. 110–119, Feb. 1997, doi: 10.1109/82.554438.
[12] U. Meyer-Baese, Fast Digital Signal Processing (in German). Heidelberg, Germany: Springer-Verlag, 1999.
[13] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, “Truncation noise in fixed-point SFGs,” Electron. Lett., vol. 35, no. 23, pp. 2012–2014, Nov. 1999, doi: 10.1049/el:19991375.
Digital Object Identifier 10.1109/MSP.2023.3236772