Shlomo Engelberg
The method of averaged periodograms and related methods such as Welch’s method are generally justified on probabilistic grounds.
In this “Lecture Notes” column, we show that it is possible to use deterministic arguments to gain some intuition into why using periodograms without averaging does not work well and why they “fail” in the way they do. We then explain how the probabilistic case can be seen as an extension of the deterministic case. Next, we give a brief description of the method of averaged periodograms and explain how the deterministic perspective points to additional cases where the method of averaged periodograms should prove effective. Finally, we provide several numerical examples to demonstrate the theoretical material, and “A Probabilistic Argument” provides a fairly detailed probabilistic justification that is an extension of the deterministic one.
Given a (wide-sense stationary) random process, x(t), the power spectral density (PSD) associated with the random process, ${S}_{xx}{(}{f}{)}$, is the function whose integral over a range of frequencies gives the expected value of the amount of power the signal has in that range of frequencies. Thus, the expected value of the power of X(t) in the range of frequencies denoted by ${\scr{F}}$ is \[{E}{(}{\text{power in }}{\scr{F}}{)} = \mathop{\int}\nolimits_{{f}\,{\in}\,{\scr{F}}}{S}_{xx}{(}{f}{)}{\text{d}}{f}{.}\]
It is often necessary to estimate the PSD of a random process from samples of the random process. A naive approach to this problem leads one to consider using (a suitably normalized version of) the square of the absolute value of the discrete Fourier transform (DFT) of the process being measured, which is a good, reasonable way to perform this measurement in the deterministic case. (See the section “The connection between the DFT and the power density of a signal.”)
Let ${x}_{k} = {x}{(}{kT}_{s}{),}\,{k} = {0},{\ldots},{N}{-}{1}$ be N measurements of a signal that is sampled every ${T}_{s}$ seconds, and let ${X}_{m},\,{m} = {0},{\ldots},{N}{-}{1}$ be the values of the DFT of this sequence of measurements. Within the context of spectral estimation, the normalized values of ${\left\vert{X}_{m}\right\vert}^{2}$, are said to be samples of the periodogram of the sequence \[{\text{sample of the periodogram }} = \frac{{T}_{s}{\left\vert{X}_{m}\right\vert}^{2}}{N}\] where, as usual, the term with index m is associated with the frequency ${f} = {mF}_{s} / {N}$, and ${F}_{s} = {1} / {T}_{s}$. Unfortunately, using the periodogram to estimate the PSD of a random process is a bad idea, and it is our goal to explain why in an intuitive fashion and to explain the logic behind a standard method of “fixing” the problems: the method of averaged periodograms.
By the middle of the previous century, the problems associated with using the periodogram to estimate the PSD were addressed by M.S. Bartlett [1] (and P. Welch [5] and others). The essence of the methods proposed by Bartlett and Welch is to take estimates whose expected value is close to the value you would like to find and to reduce their variance by taking the sequence of samples, dividing it into subsequences, performing the estimate on each subsequence, and averaging the estimates. When the estimates are independent, averaging N such estimates reduces the variance by a factor of N and the standard deviation by a factor of ${\sqrt{N}}$. When the estimates are correlated, the results may not be as good, but often, the techniques work there, too [5].
We present a simple deterministic argument, one that complements the standard probabilistic argument, to explain why the naive estimate does not do well and why it behaves poorly in the particular way it behaves poorly. That is, we present a different way of understanding why the PSD of a signal is not well approximated by a single periodogram. This “Lecture Notes” column’s principal contribution is in the “Problem statement” section, where the nature of the problem is considered in some depth.
This “Lecture Notes” column makes use of ideas from signals and systems, probability and stochastic processes, and digital signal processing. It should be useful in courses given to advanced undergraduate and beginning graduate students.
In this “Lecture Notes” column, we present an intuitive way of seeing why the DFT, and, hence, the periodogram associated with a time series, behaves as it does. We start by providing a somewhat detailed picture of the connection between the DFT and the power density of a signal. We then give a brief explanation of why the periodogram is an asymptotically unbiased estimator of the PSD. (That is, as the number of samples used in calculating the periodogram tends to infinity, the expected value of the periodogram tends to the PSD.) Having dealt with this issue, we move on to the main point of this note. We describe the spectral structure of a random signal (a random process), and we proceed to consider the spectrum of linear combinations of shifted copies of a deterministic base signal, first by considering two examples and then by analyzing the spectrum of arbitrary combinations of shifted copies of a signal. We explain how the spectrum of a linear combination of shifted copies of a base signal is and is not similar to the spectrum of the base signal. We proceed to make use of the characterization of random signals and of the relation between the power in a base signal and the power in a linear combination of shifted versions of the base signal to explain why calculating values of the periodogram of a time series should not be expected to produce a low-variance estimate of the associated PSD. We then note that the standard deviation of the PSD of (certain types of) random signals should be of the order of the expected value of the PSD (and we provide a proof of a closely related result in “A Probabilistic Argument”). We then make use of both our intuitive and our more analytical arguments to argue for dividing the time series into many smaller time series and averaging the estimates made on each of the pieces. That is, we provide a new “semideterministic” and a probabilistic argument for the method of averaged periodograms and related methods.
Consider a band-limited signal, x(t). Assume that the sampling frequency, ${F}_{s}$, is chosen to be greater than twice the signal’s highest frequency. Let ${x}_{k} = {x}{(}{kT}_{s}{)}$, ${0} = {1},{\ldots},{N}{-}{1}$ be the samples, and let ${X}_{m}$ be their DFT.
The power of the continuous-time signal is the integral of its square divided by the period over which the square of the function was integrated. When estimating this value using our samples, a reasonable estimate of the (integral that defines the) power is \[\mathop{\underbrace{\frac{1}{{NT}_{s}}}}\limits_{\frac{1}{T}}\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\mathop{\underbrace{{x}_{k}^{2}}}\limits_{{x}^{2}(t)}}\mathop{\underbrace{{T}_{s}}}\limits_{\approx{d}{\text{t}}} = \mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\frac{{x}_{k}^{2}}{N}}{.}\]
From Parseval’s equation, we know that this estimate is equal to \[\mathop{\sum}\limits_{{m} = {0}}\limits^{{N}{-}{1}} \frac{{\left\vert{X}_{m}\right\vert}^{2}}{{N}^{2}}.\]
The element Xm, ${m}\,{<}\,{N} / {2}$ is associated with the frequency ${mF}_{s} / {N}$, and the spacing between the bins of the DFT is ${F}_{s} / {N}$. To make the above sum into an approximation of the integral of the power density over the frequencies from 0 to ${F}_{s}$, rewrite it as \[\mathop{\sum}\limits_{{m} = {0}}\limits^{{N}{-}{1}}{\frac{{\left\vert{X}_{m}\right\vert}^{2}}{{NF}_{s}}}\frac{{F}_{s}}{N} = \mathop{\sum}\limits_{{m} = {0}}\limits^{{N}{-}{1}}{\mathop{\underbrace{\left({\frac{{T}_{s}{\left\vert{X}_{m}\right\vert}^{2}}{N}}\right)}}\limits_{\approx{S}_{xx}{(}{f}{)}}}\mathop{\underbrace{\frac{{F}_{s}}{N}}}\limits_{\approx{\text{d}}{f}}{.} \tag{1} \]
This formula gives the power density for any signal, and we find that to approximate the power density near the frequency ${f} = {mF}_{s} / {N}$, one should calculate the value of the periodogram: ${T}_{s}{\left\vert{X}_{m}\right\vert}^{2} / {N}$.
When considering the power density of a specific signal, we refer to the density of power in the frequency domain as the deterministic power density (DPD). We find that the DPD is a scaled version of ${\left\vert{X}_{m}\right\vert}^{2}$, as we hoped and expected.
When sampling a band-limited, random signal, x(t), if we are careful to sample at more than twice the highest frequency (from ${t} = {-}{\infty}$ to ${t} = {\infty}$), we lose no information about the signal. When using the periodogram to estimate the PSD, not only do we sample the signal, which should not be a problem, we sample the signal for a finite amount of time. As we know that the PSD, ${S}_{xx}{(}{f}{)}$, is the Fourier transform of the autocorrelation function, ${R}_{xx}{(}{\tau}{)}$, that ${S}_{xx}{(}{f}{)} = {y}_{-\infty}^{\infty}{\exp}{(}{-}{2}{\pi}{jf}{\tau}{)}{R}_{xx}{(}{\tau}{)}{\text{d}}{\tau}$, that ${R}_{xx}{(}{\tau}{)}\,{\equiv}\,{E}{(}{x}{(}{t}{)}{x}{(}{t} + {\tau}{)}{)}$, and that ${R}_{xx}{(}{\tau}{)}$ tells us how correlated measurements taken ${\tau}$ seconds apart are, it is clear that taking measurements over a finite period is problematic. We do not have any (or at least any obvious) way of estimating ${R}_{xx}{(}{\tau}{)}$ for a ${\tau}$ which is larger than the time interval over which we sampled our signal. Thus, we do not expect the periodogram’s expected value to equal the PSD of the signal; we are missing some information about the signal.
On the other hand, as the amount of time over which we measure increases, we expect the estimate to improve, and it does. In fact, as the time over which one samples tends to infinity, the expected value of the periodogram tends to the PSD [3]. We do not consider this issue further.
We now develop methods to explain why the periodogram is “noisy,” i.e., has large variance. We explain this both using a “semideterministic” approach and a related probabilistic approach.
When a random signal is produced by passing white noise (whose PSD is identically equal to one) through a filter whose frequency response is H(f), the PSD of the signal is equal to ${\left\vert{H}{(}{f}{)}\right\vert}^{2}$ [3]. Thus, one expects the power of the output noise to lie in the passband of the filter.
When dealing with such noise, one expects to see that the noise’s constituent frequencies lie in the passband of the filter, and one expects their amplitudes to be related to the amplification of the filter at the relevant frequencies, although one expects to see variations in the amplitude.
Stable filters are not affected much by the parts of their input that are from the distant past or the distant future [2]. Because of this, when the input to a stable filter is stationary, uncorrelated (i.e., white) noise, the output of the filter at widely separated times is (almost) uncorrelated. This lack of correlation should cause the phases of the components at widely separated times to be unrelated. When passing white noise through a bandpass filter with a sharp peak of amplitude 2 at a frequency of 1 Hz, for example, at time ${t} = {0}$, one might find that the signal looks like ${2}\,{\sin}{(}{2}{\pi}{t}{)}$; at time ${t} = {10}$, it might look like ${2.3}\,{\sin}{(}{2}{\pi}{t} + {\pi} / {8}{)}$; and at time ${t} = {10}$, it might look like ${1.6}\,{\sin}{(}{2}{\pi}{t} + {7}{\pi} / {8}{)}$. (See Figure 2 for an example of this phenomenon.)
Because the waveforms output by the filter at widely separated times are (almost) uncorrelated with one another, we do not expect the phases of the waveforms to be related. This, as we shall see in the section “Destructive interference,” is critically important.
When considering random signals, it often seems that stationarity is enough for almost all purposes. It should be noted that here stationarity is certainly not enough. There are stationary signals for which the periodogram does not provide a reasonable way to estimate the PSD. A simple example of a such a (wide-sense) stationary random process is ${x}{(}{t}{)} = {\sin}{(}{Ft} + {\phi}{)}$, where F is uniformly distributed over ${[}{F}_{l},{F}_{u}{]}$ and ${\phi}$ is uniformly distributed over ${[}{0},{2}{\pi}{)}$. This signal’s autocorrelation function is displayed in the box at the bottom of the page, and its PSD is \[{S}_{xx}{(}{f}{)} = \begin{cases}\begin{array}{ll} \frac{1}{{4}{(}{F}_{u}{-}{F}_{l}{)}} & {F}_{l}\,{≤}\,{\left\vert{f}\right\vert}\,{≤}\,{F}_{u} \\ {0} & {\text{otherwise}} \end{array} \end{cases}{.} \]
For any given instance of this signal, no matter how long one measures the signal, one samples a sinewave of fixed phase and frequency, and the DFT corresponding to the measured signal will be the DFT of a sinewave of that frequency. The DFT does not provide any indication that were the signal measured again, the frequency would generally be different. Thus, one cannot use a periodogram to estimate the PSD of this signal. On the other hand, following the logic of the section “The Spectral Structure of a Class of Random Signals” and as seen in the section “Numerical Examples” (in “The Method of Averaged Periodograms”), filtered white noise (where the filtering is performed by a stable filter) is an example of a class of signals for which one should be able to use periodogram-based techniques to estimate the spectrum of a signal.
Consider the base signal \[{x}_{1}{(}{t}{)} = \begin{cases}\begin{array}{ll}{\cos}{(}{2}{\pi}{5}{t}{)} & {0}\,{≤}\,{t}\,{<}\,{1} \\ {0} & {\text{otherwise}}\end{array} \end{cases}{.} \] \[{R}_{xx}{\left({\tau}\right)} = \begin{cases}\begin{array}{cc} \frac{1}{{4}{\left({F_u} - {F_t}\right)}}{\left(\frac{{\sin}{\left({\pi}{2}{F_u}{\tau} \right)}}{{\pi}{\tau}} - \frac{{\sin}{\left({\pi}{2}{F_l}{\tau}\right)}}{{\pi}{\tau}}\right)} & {\tau}\,{\ne}\,{0} \\ {1 / 2} & {\tau} = {0} \end{array} \end{cases}\]
Thinking of this signal as the product of a cosine and a shifted rectangular window, it is not hard to see that its Fourier transform is \begin{align*}{X}_{1}{(}{f}{)} = & {0.5}{(}{e}^{{-}{2}{\pi}{j}{(}{f}{-}{5}{)}{(}{1} / {2}{)}}{\text{sinc}}{(}{f}{-}{5}{)} \\ & \,\,\, + {e}^{{-}{2}{\pi}{j}{(}{f} + {5}{)}{(}{1} / {2}{)}}{\text{sinc}}{(}{f} + {5}{)}{)} \end{align*} where \[{\text{sinc}}{(}{f}{)}\,{\equiv}\, \begin{cases}\begin{array}{cc} \frac{{\sin}{(}{\pi}{f}{)}}{{\pi}{f}} & {f}\,{≠}\,{0} \\ {1} & {f} = {0} \end{array} \end{cases}{.} \]
Now consider the Fourier transform of the signal \begin{align*}{x}_{2}{(}{t}{)} & = {x}_{1}{(}{t}{)} + {x}_{1}{(}{t}{-}{1}{)} \\ & = \begin{cases}\begin{array}{ll}{\cos}{(}{2}{\pi}{5}{t}{)} & {0}\,{≤}\,{t}\,{<}\,{1} \\ \begin{array}{l}{\cos}{(}{2}{\pi}{5}{(}{t}{-}{1}{)}{)} \\ \,\,\, = {-}{\cos}{(}{2}{\pi}{5}{t}{)} \end{array} & {1}\,{\leq}\,{t}\,{<}\,{2} \\ {0} & {\text{otherwise}} \end{array} \end{cases} \\ & = \begin{cases}\begin{array}{ll}{\cos}{(}{2}{\pi}{5}{t}{)} & {0}\,{\leq}\,{t}\,{<}\,{2} \\ {0} & {\text{otherwise.}} \end{array} \end{cases} \end{align*}
Considering the first form given for the function, it is clear that \begin{align*}{X}_{2}{(}{f}{)} = & {0.5}{(}{e}^{{-}{2}{\pi}{j}{(}{f}{-}{5}{)}{(}{1} / {2}{)}}{\text{sinc}}{(}{f}{-}{5}{)} \\ & + {e}^{{-}{2}{\pi}{j}{(}{f} + {5}{)}{(}{1} / {2}{)}}{\text{sinc}}{(}{f} + {5}{)}{)} \\ & {\times}\,{(}{1} + {e}^{{-}{2}{\pi}{jf}}{).} \end{align*}
When ${f} = {5}$, the last term is equal to 2 (there is constructive interference between the two copies of the base signal), and ${X}_{2}{(}{5}{)} = {1}$.
Let \[{x}_{1}{(}{t}{)} = \begin{cases}\begin{array}{ll}{\cos}{(}{2}{\pi}{5.5}{t}{)} & {0}\,{\leq}\,{t}\,{<}\,{1} \\ {0} & {\text{otherwise}}\end{array} \end{cases} \] be our base signal. Its Fourier transform is \begin{align*}{X}_{1}{(}{f}{)} = & {0.5}{(}{e}^{{-}{2}{\pi}{j}{(}{1} / {2}{)}{(}{f}{-}{5.5}{)}}{\text{sinc}}{(}{f}{-}{5.5}{)} \\ & \quad + {e}^{{-}{2}{\pi}{j}{(}{1} / {2}{)}{(}{f} + {5}{)}}{\text{sinc}}{(}{f} + {5.5}{)}{)}{.} \tag{2} \end{align*}
Once again, let ${x}_{2}{(t)}$ be composed of two copies of the base signal, where one copy is delayed by 1 s. That is, let \[{x}_{2}{(}{t}{)} = \begin{cases}\begin{array}{ll}{\cos}{(}{2}{\pi}{5.5}{t}{)} & {0}\,{\leq}\,{t}\,{<}\,{1} \\ \begin{array}{l}{\cos}{(}{2}{\pi}{5.5}{(}{t}{-}{1}{)}{)} \\ \,\,\, = {-}{\cos}{(}{2}{\pi}{5.5}{t}{)} \end{array} & {1}\,{\leq}\,{t}\,{<}\,{2} \\ {0} & {\text{otherwise}}{.} \end{array} \end{cases} \]
The Fourier transform of ${x}_{2}{(t)}$ is \begin{align*}{X}_{2}{(}{f}{)} = & {(}{0.5e}^{{-}{2}{\pi}{j}{(}{1} / {2}{)}{(}{f}{-}{5.5}{)}}{\text{sinc}}{(}{f}{-}{5.5}{)} \\ & + {0.5e}^{{-}{2}{\pi}{j}{(}{1} / {2}{)}{(}{f} + {5.5}{)}}{\text{sinc}}{(}{f} + {5.5}{)}{)} \\ & {\times}\,{(}{1} + {e}^{{-}{2}{\pi}{jf}}{).} \tag{3} \end{align*}
Looking at ${X}_{1}{(}{f}{)}$ and ${X}_{2}{(}{f}{)}$ as given in (2) and (3), respectively, we find that the absolute value of the Fourier transform of the first signal has its maximum at ${f} = {5.5}{\text{ Hz}}$, where our intuition tells us it should be. In the second signal, because the last term in (3) equals zero when ${f} = {5.5}{\text{ Hz}}$, because of destructive interference between the two copies when ${f} = {5.5}{\text{ Hz}}$, the Fourier transform is 0 there! (See Figure 1 for a closely related phenomenon.) In the second case, the energy “moves” to other frequencies (as it must according to Parseval’s theorem), but note that the thickness of the “tails” cannot be more than doubled in size through the “constructive interference” that may be caused by the shift, so the energy must move to frequencies where the tails are reasonably thick, where some reasonable fraction of the energy is already located.
Figure 1. The zero-padded base signal (in blue) and a signal composed of two copies of the base signal, one of which has been delayed by 1 s (in red). The amplitudes of the signals are given in volts, and the amplitudes of the deterministic spectral densities are given in units of V2/Hz. We see that the second signal has no power at 5 Hz. All the power has “moved” to other frequencies at which the base signal has a reasonably large power density.
Consider signals, y(t), that are made up of a linear combination of N shifted copies of a base signal, x(t). That is, consider \[{y}{(}{t}{)} = \mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{a}_{k}{x}{(}{t}{-}{\tau}_{k}{).}\]
The precise method of choosing the ${\tau}_{k}$ is not particularly important.
Upon calculating the Fourier transform of y(t), Y(f), we find that \[{Y}{(}{f}{)} = {\left(\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{a}_{k}{e}^{{-}{2}{\pi}{jf}{\tau}_{k}}\right)}{X}{(}{f}{).} \tag{4} \]
Considering ${\left\vert{Y}{(}{f}{)}\right\vert}$, we find that \begin{align*}{\left\vert{Y}{(}{f}{)}\right\vert} & = {\left\vert{\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{a}_{k}{e}^{{-}{2}{\pi}{jf}{\tau}_{k}}}\right\vert}\,{\left\vert{X}{(}{f}{)}\right\vert} \\ & {\leq}\,{\left(\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\left\vert{a}_{k}\right\vert}\right)}\,{\left\vert{X}{(}{f}{)}\right\vert}{.} \tag{5} \end{align*}
That is, we find that the spectrum of the base signal is multiplied by a function whose absolute value may vary between 0 and ${\Sigma}_{0}^{{N}{-}{1}}\mid{a}_{k}\mid$. Thus, the value of $\mid{Y}{(}{f}{)}\mid$ may range between 0 and several times the value of $\mid{X}{(}{f}{)}\mid$, and this provides a more general explanation of the phenomenon observed in the section “The effect of delays on the Fourier transform: Two instructive examples.”
As discussed in the section “The spectral structure of a class of random signals,” random signals can often be thought of as being composed of sinusoids whose phases and amplitudes change with time (see, for example, the section “Random noise”), and as we have seen, when you calculate the Fourier transform of signals composed of a linear combination of shifted versions of a signal, you can find that the power in some of the frequencies is larger than it “should” be because the complex exponentials in (4) all have nearly the same phase, while at other frequencies, the power may be near zero because of “destructive interference” among the complex exponentials. This is why you cannot look at the DFT of an entire time series and hope to get a good estimate of the PSD. You should expect the values of such an estimate to range from near zero to several times the true value of the PSD, and that is what happens. (See Figure 3 for an example of this phenomenon, and see “A Probabilistic Argument” for a probabilistic explanation.)
The analysis of the periodogram and the method of averaged periodograms presented here focuses more on the nature of the random signals under consideration than on the nature of the periodogram. We make use of the properties of the random signal to explain the weaknesses of a single periodogram as an estimator. Others focus on the periodogram as a measurement technique. See [4, Sec. 2.4.2], for example. Hopefully, the perspective taken in the current article casts a little more light on why a single periodogram is not a good estimator of the PSD of a random signal.
Suppose one samples random noise and chooses to calculate the DPD by using (1). We do not expect the results to actually approximate the PSD as we are doing exactly what we should not: we are evaluating the power at each frequency once for the entire measurement.
If the values we calculate do not provide a good basis for estimating the PSD, what do they tell us?
The answer is clear: (1) is an estimate of the actual density of power for the specific instance of the signal under consideration. The PSD of a signal is the statistical mean, the expected value, of the density of the power. It is not the power density that you measure when making use of the DFT of all of the samples of a signal, no matter how many samples are used.
As we have seen, intuitively speaking, the reason one should not estimate the PSD of a random signal by calculating the DFT of a long sequence is that the phase (and amplitude) of the relevant frequency can be thought to “drift” over the course of the measurement, and that can cause the power in the signal to move to other frequencies. To overcome this problem, it would seem reasonable to divide the signal up and estimate the PSD in each subsequence. No given estimate will be particularly good, because the subsequences too are made up of signals whose phases and amplitudes are changing, but if one averages the estimates, one hopes that if one estimate is too high, the next will be too low, and in this way, the final average will be near the correct value.
Another way of arriving at the same conclusion is to think carefully about the definition of the PSD. The PSD is the statistical average, the expected value, of the power. It is not the power estimated from a single sequence once.
In “A Probabilistic Argument,” we show (that under certain assumptions) the standard deviation of the actual power density of a random signal of known PSD is of the same order of magnitude as the expected value of the power density, and this is (largely) independent of the length of time over which the power density is measured. That is, the power density is actually “noisy”; the problem is a fundamental one and is not specific to periodograms or DFTs.
If, however, one breaks the long sequence of measurements into smaller subsequences, estimates the power density associated with each subsequence, and then averages the estimates, as long as the correlation between the estimates drops off fast enough as the subsequences get farther apart in the original sequence, the means of the estimates tend to their expected values, and (for sufficiently large numbers of samples) this is what we want to measure.
Let ${x}_{k} = {x}{(}{kT}_{s}{),}\,{k} = {0},{\ldots},{N}{-}{1},\,{N} = {LI}$ be a large set of measurements of the random process x(t). Let ${z}_{i,k} = {x}_{{k} + {i}\,{\cdot}\,{L}}$, ${k} = {0},{\ldots},{L}{-}{1}$, ${i} = {0},{\ldots},{I}{-}{1}$ define I subsequences of length L. Let ${Z}_{i,m}$, ${m} = {0},{\ldots},{L}{-}{1}$ be the DFT of ${z}_{i,k}$, ${k} = {0},{\ldots},{L}{-}{1}$. Then ${T}_{s}{\left\vert{Z}_{i,m}\right\vert}^{2} / {L}$ is the periodogram associated with the ith subsequence and the frequency ${f} = {mF}_{s} / {L}$. The solution to our problem is to average periodograms. When one wants information about the PSD near ${f} = {mF}_{s} / {L}$, one calculates \[\frac{1}{I} \mathop{\sum}\limits_{{i} = {0}}\limits^{{I}{-}{1}} \frac{{T}_{s}{\left\vert{Z}_{i,m}\right\vert}^{2}}{L} = \frac{{T}_{s}}{N} \mathop{\sum}\limits_{{i} = {0}}\limits^{{I}{-}{1}}{\left\vert{Z}_{i,m}\right\vert}^{2}{.} \tag{6} \]
Note that the phase is “removed” before the averaging begins, so destructive interference is no longer possible. (To see code which implements these calculations, please see the MATLAB script in the supplementary material available at https://doi.org/10.1109/MSP.2023.3285044. This script contains all the code used to generate the figures in this article and much more.)
This solution is known as the method of averaged periodograms. As mentioned previously, there are other related solution such as Welch’s method, and there are a host of unrelated solutions. See, for example, [4].
As seen in (6), the method of averaged periodograms removes the phase information from ${Z}_{i,m}$. We now consider a slightly different application of the method of averaged periodogram that makes use of this.
Given a signal built of a fixed base signal that is repeated over the course of time but is not repeated periodically, the method of averaged periodogram can be used to allow you to “see” the spectral signature of the base signal by removing the phase information from the ${Z}_{i,m}$ and preventing destructive interference in the frequency domain (and the MATLAB script in the supplementary material available at https://doi.org/10.1109/MSP.2023.3285044 has an example of this use).
A Probabilistic Argument
We now give a fairly simple probabilistic argument that demonstrates that making a single, simple estimate of the frequency is not likely to lead to good results. We do this by constructing a random signal and showing that the power density is a random variable whose standard deviation is of the same order as its expected value.
One way of arriving at noise whose power spectral density (PSD) is ${\left\vert{H}{(}{f}{)}\right\vert}^{2}$ to pass white noise through a (stable) filter whose frequency response is ${H}{(}{f}{)}$. Let us consider one way of doing this in order to see how such filtered noise behaves.
Consider a digital filter whose (nonstandard) frequency response is ${H}{(}{f}{)}$, where ${H}{(}{f}{)}$ is defined as the (ordinary, continuous-time) Fourier transform of the filter’s continuous-time response to a pulse of width ${T}_{s}$ input at time 0, and suppose that you excite the filter by inputting pulses of width ${T}_{s}$ and sizes ${x}_{k}$, ${k} = {0},{\ldots},{N}{-}{1}$ at the times ${t} = {kT}_{s}$, ${k} = {0},{\ldots},{N}{-}{1}$. Assume that the values of ${x}_{k}$ are zero-mean independent random variables (which implies that the noise is white). Also, assume that the values of ${x}_{k}$ are identically distributed and have a standard deviation of 1. The (continuous-time) spectrum of the (continuous-time) output of the filter, ${y}{(t)}$, is \[{Y}{(}{f}{)} = {\left(\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{x}_{k}{e}^{{-}{2}{\pi}{jfkT}_{s}}\right)}{H}{(}{f}{)}{.}\]
As the values of ${x}_{k}$ are random variables, so is the value of ${Y}{(}{f}{)}$. We now show that ${\sigma}_{{\left\vert{Y}{(}{f}{)}\right\vert}^{2}}\,{\approx}\,{c}{(}{f}{)}{E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{2}{)}$, where ${c}{(}{f}{)}$ is ${\sqrt{2}}$ when ${f} = {kF}_{s} / {2},{k}\,{\in}\,{\Bbb{Z}}$, and ${c}{(}{f}{)}$ is 1 (relatively) far from these points.
As the values of ${x}_{k}$ are zero-mean, ${E}{(}{Y}{(}{f}{))} = {0}$. Calculating ${E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{2}{)}$ is not much harder. We find that \begin{align*}{E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{2}{)} & = {E}{\left(\mathop{\sum}\limits_{{k},{\ell}}{x}_{k}{x}_{\ell}{e}^{{-}{2}{\pi}{jf}{(}{k}{-}{\ell}{)}{T}_{s}}\right)}{\left\vert{H}{(}{f}{)}\right\vert}^{2} \\ & = \mathop{\sum}\limits_{{k},{\ell}}{E}{(}{x}_{k}{x}_{\ell}{)}{e}^{{-}{2}{\pi}{jf}{(}{k}{-}{\ell}{)}{T}_{s}}{\left\vert{H}{(}{f}{)}\right\vert}^{2}{.} \end{align*}
As the values of ${x}_{k}$ are zero-mean and independent, the only terms that do not produce zeros are those for which ${k} = {\ell}$. There are N such terms, and we find that \[{E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{2}{)} = {NE}{(}{\left\vert{x}_{k}\right\vert}^{2}{)}{\left\vert{H}{(}{f}{)}\right\vert}^{2} = {N}{\left\vert{H}{(}{f}{)}\right\vert}^{2}{.}\]
Now let us consider ${E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{4}{)}$ so that we will be able to calculate the standard deviation of ${\left\vert{Y}{(}{f}{)}\right\vert}^{2}$. \[{\left\vert{Y}{(}{f}{)}\right\vert}^{4} = {\left(\mathop{\sum}\limits_{{m},{n},{k},{\ell}}{{x}_{m}}{x}_{n}{x}_{k}{x}_{\ell}{e}^{{-}{2}{\pi}{j}{(}{m}{-}{n} + {k}{-}{\ell}{)}{f}{T}_{s}}\right)}{\left\vert{H}{(}{f}{)}\right\vert}^{4}{.}\]
As the values of ${x}_{k}$ are zero-mean and independent, the only terms that will be able to contribute to the expected values are those with four identical ${x}_{k}$ values or with two pairs of identical ${x}_{k}$ values. There are N possible ways to have all the indices be the same. There are ${\left(\begin{array}{c}{N} \\ {2} \end{array}\right)} = {N}{(}{N} - {1}{)} / {2}$ ways of picking two indices from among N, and there are ${\left(\begin{array}{c}{4} \\ {2} \end{array}\right)} = {6}$ ways of picking the sums in which one set of equal indices will be “used.”
We now consider the case where we have two pairs of identically valued indices in greater detail. Let the two values of the indices be A and B. Then the coefficients of ${E}{(}{x}_{A}^{2}{x}_{B}^{2}{)} = {E}^{2}{(}{x}_{k}^{2}{)}$ in the six terms in which ${x}_{A}^{2}{x}_{B}^{2}$ appears are given in Table S1.
Table S1. The values the complex exponential takes.
The sum of these coefficients is ${4} + {2}\,{\cos}{(}{2}{\pi}{(}{2}{A}{-}{2}{B}{)}{f}{T}_{s}{)}$. This sum is always greater than or equal to 2, and when ${f} = {kF}_{s} / {2},{k}\,{\in}\,{\Bbb{Z}}$, it is equal to 6.
We find that \[{(}{NE}{(}{x}_{k}^{4}{)} + {N}{(}{N}{-}{1}{)}{E}^{2}{(}{x}_{k}^{2}{)}{)}{\left\vert{H}{(}{f}{)}\right\vert}^{4}\,{≤}\,{E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{4}{)}\] and that \[{E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{4}{)}\,{≤}\,{(}{NE}{(}{x}_{k}^{4}{)} + {3}{N}{(}{N}{-}{1}{)}{E}^{2}{(}{x}_{k}^{2}{)}{)}{\left\vert{H}{(}{f}{)}\right\vert}^{4}{.}\]
For ${f} = {kF}_{s} / {2},{k}\,{\in}\,{\Bbb{Z}}$, we find that ${E}{(}{\left\vert{Y}{(}{kF}_{s} / {2}{)}\right\vert}^{4}{)} = {NE}{(}{x}_{k}^{4}{)} + {3}{N}{(}{N}{-}{1}{)}{E}^{2}{(}{x}_{k}^{2}{)}$. When ${f} = {kF}_{s} / {2},{k}\,{\in}\,{\Bbb{Z}}$, we find that the variance satisfies \begin{align*} \frac{{\sigma}_{{\left\vert{Y}{(}{kF}_{s} / {2}{)}\right\vert}^{2}}^{2}}{{\left\vert{H}{(}{f}{)}\right\vert}^{4}} & = {NE}{(}{x}_{k}^{4}{)} + {3}{N}{(}{N}{-}{1}{)}{E}^{2}{(}{x}_{k}^{2}{)}{-}{N}^{2}{E}^{2}{(}{x}_{k}^{2}{)} \\ & = {NE}{(}{x}_{k}^{4}{)} + {2}{N}^{2}{-}{3}{N} \\ & \mathop{\approx}\limits^{{N}\,{\gg}\,{1}}{2}{N}^{2} \\ & = \frac{{2}{E}^{2}{(}{\left\vert{Y}{(}{kF}_{s} / {2}{)}\right\vert}^{2}{)}}{{\left\vert{H}{(}{f}{)}\right\vert}^{4}}{.} \end{align*}
Thus, for large N, the standard deviation of ${\left\vert{Y}{(}{kF}_{s} / {2}{)}\right\vert}^{2}$ tends to ${\sqrt{2}}{E}{(}{\left\vert{Y}{(}{kF}_{s} / {2}{)}\right\vert}^{2}{)}$.
Considering ${E}{(}{\left\vert{Y}{(}{f}{)}\right\vert}^{4}{)}$ again, we find that in general, it is equal to \[{\left({NE}{(}{x}_{k}^{4}{)} + \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{(}{N}{-}{k}{)}{(}{4} + {2}\,{\cos}{(}{2}{\pi}{2}{kf}{T}_{s}{)}{)}\right)}{\left\vert{H}{(}{f}{)}\right\vert}^{4}{.}\]
This sum can be calculated by hand (see the section “Calculating the sums”) or found online (using WolframAlpha, for example). We find that \begin{align*} & \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{(}{N}{-}{k}{)}{(}{4} + {2}\,{\cos}{(}{2}{\pi}{2}{k}{f}{T}_{s}{)}{)} \\ & \quad = \frac{1}{2}{(}{4}{N}^{2}{-}{\csc}^{2}{(}{2}{\pi}{f}{T}_{s}{)}{\cos}{(}{N}{2}{\pi}{2}{f}{T}_{s}{)} \\ & \qquad {-}{6}{N} + {\cot}^{2}{(}{2}{\pi}{f}{T}_{s}{)} + {1}{)}{.} \tag{S1} \end{align*}
The nontrivial sums in (S1) that need to be calculated are \[\mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{\cos}{(}{\alpha}{k}{)}{\text{ and }} \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{k}\,{\cos}{(}{\alpha}{k}{)}\] and in the course of calculating the sums, we make use of the trigonometric identities \begin{align*}{\sin}\,{(}{a}{)}{\cos}{(}{b}{)} & = \frac{{(}{\sin}\,{(}{a} + {b}{)} + {\sin}\,{(}{a}{-}{b}{)}{)}}{2} \\ {\sin}\,{(}{a}{)}{\sin}\,{(}{b}{)} & = \frac{{(}{\cos}\,{(}{a}{-}{b}{)}{-}{\cos}{(}{a} + {b}{)}{)}}{2}{.} \end{align*}
It is convenient to work with complex exponentials as they contain both sines and cosines. Summing complex exponentials by making use of the summation formula for finite geometric series, we find that \begin{align*} \mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\exp}{(}{jk}{\alpha}{)} & = \mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\exp}^{k}{(}{j}{\alpha}{)} \\ & = \frac{{1}{-}{\exp}{(}{jkN}{\alpha}{)}}{{1}{-}{\exp}{(}{j}{\alpha}{)}} \\ & = {\exp}{\left({j}{\left(\frac{{N}{-}{1}}{2}\right)}{\alpha}\right)} \frac{{\sin}{\left(\frac{{N}{\alpha}}{2}\right)}}{{\sin}{\left(\frac{\alpha}{2}\right)}}{.} \tag{S2} \end{align*}
Thus, \begin{align*} \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{\cos}{(}{\alpha}{k}{)} & = {\text{Re}}{\left(\mathop{\sum}\limits_{{k} = {0}}\limits^{{N}{-}{1}}{\exp}{(}{jk}{\alpha}{)}\right)}{-}{1} \\ & = {\cos}{\left({\left(\frac{{N}{-}{1}}{2}\right)}{\alpha}\right)} \frac{{\sin}{\left(\frac{{N}{\alpha}}{2}\right)}}{{\sin}{\left(\frac{\alpha}{2}\right)}}{-}{1} \\ & = \frac{{\sin}{\left({\left({N}{-} \frac{1}{2}\right)}{\alpha}\right)} + {\sin}{\left(\frac{\alpha}{2}\right)}}{{2}{\sin}{\left(\frac{\alpha}{2}\right)}}{-}{1} \\ & = \frac{{\sin}{\left({\left({N}{-}\frac{1}{2}\right)}{\alpha}\right)}}{{2}{\sin}{\left(\frac{\alpha}{2}\right)}}{-}\frac{1}{2}{.} \tag{S3} \end{align*}
The second sum of interest is easily seen to be \[\mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{k}\,{\cos}{(}{\alpha}{k}{)} = \frac{\text{d}}{{\text{d}}{\alpha}} \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{\sin}{(}{\alpha}{k}{)}\] and the right-hand sum can be evaluated by considering the imaginary part of (S2). The sum is equal to \begin{align*} \mathop{\sum}\limits_{{k} = {1}}\limits^{{N}{-}{1}}{\sin}{(}{\alpha}{k}{)} & = {\text{Im}}{\left({\exp}{\left({j}{\left(\frac{{N}{-}{1}}{2}\right)}{\alpha}\right)} \frac{{\sin}{\left(\frac{{N}{\alpha}}{2}\right)}}{{\sin}{\left(\frac{\alpha}{2}\right)}}\right)} \\ & = {\sin}{\left({\left(\frac{{N}{-}{1}}{2}\right)}{\alpha}\right)} \frac{{\sin}{\left(\frac{{N}{\alpha}}{2}\right)}}{{\sin}{\left(\frac{\alpha}{2}\right)}} \\ & = \frac{{\cos}{\left(\frac{\alpha}{2}\right)}{-}{\cos}{\left({\left({N}{-}\frac{1}{2}\right)}{\alpha}\right)}}{{2}\,{\sin}{\left(\frac{\alpha}{2}\right)}}{.} \tag{S4} \end{align*}
All that is left is to differentiate the result in (S4) and to make use of the result of the differentiation and the result in (S3). This is left as an exercise to the reader.
Reasonably far from locations at which ${f} = {kF}_{s} / {2}$, (S1) looks like ${2N}^{2}$ rather than ${3N}^{2}$, and for this reason, ${\sigma}_{{\left\vert{Y}{(}{f}{)}\right\vert}^{2}}\,{\approx}\,{E}{\left({\left\vert{Y}{(}{f}{)}\right\vert}^{2}\right)}$ when f is not too near ${kF}_{s} / {2}$.
As we have only used a finite number of inputs, the signal cannot be truly stationary. However, as long as the number of samples, N, is much larger than the effective length of the filter’s impulse response, away from the edges, the output will be very close to what it would have been had we input one sample every ${T}_{s}$ from ${-}{\infty}$ until ${\infty}$, and ${\left\vert{Y}{(}{f}{)}\right\vert}^{2}$ is the energy spectral density of this signal. Dividing it by the time the signal was being produced, approximately ${N}{T}_{s}$, as long as this is much longer than the transient response of the filter, will give the (approximate) PSD, and its standard deviation is of the same size as its expected value, and this is independent of the value of N chosen.
Thus, even given an absolutely accurate measurement of the spectral density for some measured signal, the measurement will not be a good estimate of the PSD. The measurement’s standard deviation will be comparable in size to its expected value. To reduce the standard deviation, it makes sense to take many (independent or at least minimally correlated) measurements of the power density and average them.
We now use the DFT to reexamine the case described in the section “Destructive interference.” We sample both signals, the 1-s snippet of ${\cos}{(}{2}{\pi}{5.5}{t}{)}$ and the two copies of the snippet, at the rate of ${F}_{s} = {100}{\text{ Hz}}$. We then calculate the DFTs and use them to calculate and plot the DPD. See Figure 1. As expected, after the phase shift, there is absolutely no power left at 5.5 Hz. The power has moved to nearby frequencies.
When considering the snippet with the phase shift, the power moves from ${\pm}\,{5.5}{\text{ Hz}}$ to surrounding frequencies at which a 1-s snippet of the cosine has substantial amounts of power.
In many cases, random noise is similar to a signal in which one has a linear combination of shifted signals of a fixed type. To demonstrate this, we considered filtered noise. We used the MATLAB script in the supplementary material available at https://doi.org/10.1109/MSP.2023.3285044 to filter the output of the MATLAB randn command. The script passes the noise through a bandpass filter whose passband is from 24.5 Hz to 25.5 Hz. The results of this operation are given in Figure 2. Looking at the two subfigures, one finds that the signals seem to be in the correct frequency range, but their phase and amplitude are not constant over long periods of time. That is, over short periods, one sees the signals one would “expect,” but in the long term, the phase and amplitude drift.
Figure 2. An example of random noise. The noise has been filtered using a digital, fourth-order, Butterworth filter whose passband is from 24.5 Hz to 25.5 Hz. The sampling frequency is 100 Hz.
When considering the spectrum of such signals, we would, therefore, expect to see the same types of constructive and destructive interference we see in (4), and the way to avoid this problem should be to break the signal into small pieces, transform the small pieces, square and normalize the absolute value of the elements of the DFT, and then average the results frequency by frequency.
Figure 3 shows nine plots of the DPD of low-pass filtered white noise. (The samples of white noise input to the low-pass filter are generated by the MATLAB command randn.) In Figure 3, we start by considering the first 100 samples of the filtered noise, then we consider 200 samples, until finally, we consider ${2}^{8}\,{\times}\,{100} = {25,600}$ samples. In each plot, we show the DPD out to ${f} = {3}{\text{ Hz}}$, which is past the cutoff frequency of the filter. Each dataset includes all the samples in the previous datasets. Note that it is always clear that the signal is low-pass in nature. The power density where there should be next to no power is always small. In places where there should be “some” power, we sometimes see relatively large amounts of power where we do not expect it because of constructive interference, and sometimes the power density is smaller than anticipated because of destructive interference.
Figure 3. The DPD for the first N elements of an array that holds samples of filtered white noise. The filter used was a digital, second-order Butterworth filter with a corner frequency of 1 Hz. Note how the power seems to “move” from frequency to frequency as the number of samples is increased. Increasing the number of samples does not cause the variance of the estimates to decrease.
Finally, we consider an example of the solution that requires us to break up the sequence of measurements into subsequences. Using the MATLAB randn command, we produced 100,000 samples of low-pass noise. The first 25,600 samples were used in preparing Figure 3. Now we break up all 100,000 samples into 100 groups of 1,000 samples apiece, calculate the DFT of each subsequence, calculate the square of the absolute value of each element of the DFT, average the values of the mth element over the 100 DFTs, and normalize the values to be approximations to the PSD. The result of this process is shown in Figure 4 and is compared with the square of the magnitude of the frequency response, ${\left\vert{H}{(}{f}{)}\right\vert}^{2}$. As expected, the curves are quite similar.
Figure 4. A comparison between the PSD as estimated using the method of averaged periodograms (in blue) and the theoretical value, ${\left\vert{H}{(}{f}{)}\right\vert}^{2}$ (in red).
In this note, we have shown that when it is reasonable to look at a random signal as a linear combination of shifted copies of base signals, then one should not expect estimates based on calculating samples of the periodogram of the whole measurement to provide a good estimate of the PSD. We show that the problem with the method is not a “technical” problem with the DFT or the periodogram but is a fundamental problem with the way the problem of estimating the PSD is being approached. To solve this problem and arrive at a clear picture of the spectral properties of the base signal, one should break the signal up into several pieces and average the “information” provided by the pieces.
We have presented a “semideterministic” and a probabilistic way of arriving at the method of averaged periodograms (and related methods like Welch’s method). The semideterministic explanation provides some nonprobabilistic intuition into why one should not try to measure the PSD of a signal by calculating the DFT over the entire measurement, and it leads one to consider breaking a large sequence into many smaller subsequences.
This article has supplementary downloadable material available at https://doi.org/10.1109/MSP.2023.3285044, provided by the author. The material includes the MATLAB.m file that generated all of the figures. Contact shlomoe@g.jct.ac.il with questions about this work.
Shlomo Engelberg (shlomoe@g.jct.ac.il) is a professor in the Department of Electrical and Electronics Engineering at the Jerusalem College of Technology, Jerusalem 9116001, Israel, where he has worked since 1997. From December 2008 until June 2012, he was the editor-in-chief of IEEE Instrumentation and Measurement Magazine, and he is currently an associate editor of IEEE Control Systems. His research interests include control theory, signal processing, coding theory, applied mathematics, and biomedical engineering. He is a Senior Member of IEEE.
[1] S. Bartlett, “Periodogram analysis and continuous spectra,” Biometrika, vol. 37, nos. 1–2, pp. 1–16, Jun. 1950, doi: 10.1093/biomet/37.1-2.1.
[2] S. Engelberg, “The advantages of ‘Forgetery’ [Lecture Notes] ,” IEEE Control Syst. Mag., vol. 35, no. 3, pp. 52–75, Jun. 2015, doi: 10.1109/MCS.2015.2406656.
[3] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed. Upper Saddle River, NJ, USA: Pearson, 1998.
[4] P. Stoica and R. Moses, Spectral Analysis of Signals, Upper Saddle River, NJ, USA: Prentice-Hall, 2005.
[5] P. Welch, “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. Audio Electroacoust., vol. 15, no. 2, pp. 70–73, Jun. 1967, doi: 10.1109/TAU.1967.1161901.
Digital Object Identifier 10.1109/MSP.2023.3285044