Vincent W. Neo, Soydan Redif, John G. McWhirter, Jennifer Pestana, Ian K. Proudler, Stephan Weiss, Patrick A. Naylor
©SHUTTERSTOCK.COM/MARISHA
This article is devoted to the polynomial eigenvalue decomposition (PEVD) and its applications in broadband multichannel signal processing, motivated by the optimum solutions provided by the EVD for the narrowband case [1], [2]. In general, we would like to extend the utility of the EVD to also address broadband problems. Multichannel broadband signals arise at the core of many essential commercial applications, such as telecommunications, speech processing, health-care monitoring, astronomy and seismic surveillance, and military technologies, including radar, sonar, and communications [3]. The success of these applications often depends on the performance of signal processing tasks, including data compression [4], source localization [5], channel coding [6], signal enhancement [7], beamforming [8], and source separation [9]. In most cases and for narrowband signals, performing an EVD is the key to the signal processing algorithm. Therefore, this article aims to introduce the PEVD as a novel mathematical technique suitable for many broadband signal processing applications.
In many narrowband signal processing applications, such as beamforming [8], signal enhancement [7], subband coding [6], and source separation [9], the processing is performed based on the covariance matrix. The instantaneous spatial covariance matrix, computed using the outer product of the multichannel data vector, can capture the phase shifts among narrowband signals arriving at different sensors. In the narrowband case, diagonalization of the spatial covariance matrix often leads to optimum solutions. For example, the multiple signal classification algorithm uses an EVD of the instantaneous spatial covariance matrix to perform super-resolution direction finding [5], [10].
The defining feature of a narrowband problem is the fact that a time-delayed version of a signal can be approximated by the undelayed signal multiplied by a phase shift. The success of narrowband processing therefore depends on the accuracy of this approximation, which varies from problem to problem. It is well known that as this approximation degrades, various issues start to occur when using narrowband algorithms. In array processing problems, this is often because some quantity in the algorithm that is related to direction of arrival (DOA) starts to depend on the frequency of the signal. For example, in DOA algorithms, a wideband source can appear to be spatially distributed. Another issue is that of multipath. Reflections can cause problems, as different multipath signals are derived from a single source but arrive at the sensors at different times. This leads to various issues, which can be advantageous or disadvantageous depending on one’s point of view. In beamforming and DOA estimation, this causes a problem, as the bearing to the source is clearly not well defined. However, in signal recovery problems, such as speech enhancement and communication systems, multipath is advantageous, as the signals can be combined to improve the signal-to-noise ratio (SNR). This is, however, possible only if the multipath signals are coherent. With narrowband processing, multipath signals appear to decorrelate as the delay increases. Multipath signals can also cause frequency-dependent fading, whereas narrowband processing can deal only with flat fading. Hence, for some problems, it is desirable to depart from narrowband processing and introduce some form of frequency-dependent processing.
For the broadband case, one common approach is to divide each broadband signal into multiple narrowband signals. While these narrowband signals are often processed independently by well-established and optimal narrowband techniques that are typically based on the EVD, splitting the broadband signal into independent frequency bins neglects spectral coherence and thus ignores correlations among different discrete Fourier transform (DFT) bins [11], [12]. As a result, optimal narrowband solutions applied in independent DFT bins give rise to suboptimal approaches to the overall broadband problem [13]. Broadband optimal solutions in the DFT domain need to consider the cross-coupling among DFT bins via cross terms, but the number of terms depends on the SNR and cannot be determined in advance [14], [15]. Another approach uses tapped delay line (TDL) processing [16], [17], [18], but the performance depends on the filter length, which is challenging to determine in practice. These approaches highlight the lack of generic tools to solve broadband problems directly.
Polynomial matrices are widely used in control theory and signal processing. In the control domain, these matrices are used to describe multivariable transfer functions for multiple-input multiple-output (MIMO) systems [19]. Control systems are usually designed for continuous-time systems and are analyzed in the Laplace domain. There, factorizations, such as the Smith and Smith–McMillan decompositions, of matrices in the Laplace variable s target unimodularity, which is critical in the control context for invertibility, and spectral factorizations with minimum phase components to minimize time delays [20]. More recently, within digital signal processing (DSP), multirate DSP exploits polynomial matrices to describe lossless filter bank systems using polyphase notation [6], [20]. In multichannel broadband arrays and convolutively mixed signals, the array signals are generally correlated in time across different sensors. Therefore, the time delays for broadband signals cannot be represented only by phase shifts but need to be explicitly modeled. The relative time shifts are captured using the space-time covariance polynomial matrix, where decorrelation over a range of time shifts can be achieved using a PEVD [21].
While the initial work on the PEVD was a numerical algorithm [21], the existence of the decomposition of an analytic positive semidefinite para-Hermitian matrix, such as the space-time covariance matrix, has only recently been proven [22], [23], [24]. In most cases, unique para-Hermitian eigenvalues and paraunitary eigenvectors for a para-Hermitian matrix EVD exist but are of infinite length. However, being analytic, they permit good approximations by finite length factors, which are still helpful in many practical applications, such as beamformers [25], MIMO communications [26], source coding [27], signal enhancement [28], [29], source separation [30], source identification [31], and DOA estimation [32].
This article is organized as follows. The “Mathematical Background” section provides a primer on the relevant mathematical concepts. The “Preliminaries: Representing Broadband Signals” section introduces the notations and gives a background on multichannel array processing, including the use of spatial and space-time covariance matrices and the inadequacies of two common approaches. The “Polynomial Matrix EVD” section first introduces the PEVD, whose analytic eigenvalues and eigenvectors are described before their approximations by numerical algorithms are presented. The “Example Applications Using PEVD” section demonstrates the use of the PEVD for some multichannel broadband applications, namely, adaptive beamforming, subband coding, and speech enhancement. Concluding remarks and future perspectives are provided in the “Conclusions and Future Perspectives” section.
In the time domain, the key to describing the propagation of a broadband signal through a linear time-invariant (LTI) system is the difference equation, where the system output $y[n]$ depends on a weighted average of the input $x[n]$ and past values of both $y[n]$ and ${x}{[}{n}{]}{.}$ This difference equation \[{y}{[}{n}{]} = \mathop{\sum}\limits_{{\nu}\geq{0}}{b}{[}{\nu}{]}{x}{[}{n}{-}{\nu}{]} + \mathop{\sum}\limits_{{\mu}{>}{0}}{a}{[}{\mu}{]}{y}{[}{n}{-}{\mu}{]} \tag{1} \] is straightforward to implement but does not lend itself to simple algebraic manipulations. For example, the difference equation for the concatenation of two LTI systems is not easily expressed in terms of the difference equations for the two component systems. For this reason, the z transform ${x}{(}{z}{)} = {\Sigma}_{n}{x}{[}{n}{]}{z}^{{-}{n}},$ with ${z}\in{\mathbb{C}},$ or for short, ${x}{(}{z}{)}\,{⊷}\,{x}{[}{n}{]},$ can be used to turn the time-domain convolution into the multiplicative expression ${y}{(}{z}{)} = {h}{(}{z}{)}\cdot{x}{(}{z}{),}$ which is easy to manipulate [33], [34].
The z transform exists as long as the time-domain quantities are absolutely summable; i.e., we require for $\text{x}(z),\,{\Sigma}_{n}\left|{x[n]}\right|\,{<}\infty{.}$Values of z for which the z transform is finite define the region of convergence (ROC), which, therefore, must include at least the unit circle since $\left|{{\Sigma}_{n}x[n]{e}^{{-}{j}\Omega{n}}}\right|\leq{\Sigma}_{n}\left|{x[n]}\right|{<}\infty,$ where ${j} = \sqrt{{-}{1}}$ is the imaginary number. For values of z within this ROC, the function $\text{x}(z)$ is complex analytic, which has profound consequences. Analytic functions mathematically belong to a ring such that any addition, subtraction, and multiplication will produce an analytic result. These operations potentially reduce the ROC. Dividing by an analytic function also results in an analytic function as long as the divisor does not have spectral zeros; again, this operation may shrink the ROC. For example, with $\text{b}(z)$ and $\text{a}(z)$ analytic and the latter without any zeros on the unit circle, then ${h}{(}{z}{)} = {b}{(}{z}{)/}{a}{(}{z}{)}$ is also guaranteed to be analytic. Note that the same cannot be said for nonanalytic functions. This is important since nonanalytic functions can be difficult to approximate optimally in practice (see the following; for more on the algebra of analytic functions, see “Algebra of Functions.”)
Algebra of Functions
We are interested in matrices whose entries are more general than complex numbers. Specifically, we are interested in entries that are analytic functions: matrices whose entries are analytic functions rather than real and complex numbers, and the algebraic manipulation of these matrices may, at first, seem a little exotic, but many operations for real and complex numbers carry over to this setting.
There are several different classes of functions, depending on what properties they have. For example, there are discontinuous functions, continuous but nondifferentiable functions, functions that are continuous and differentiable up to a certain order, and functions that are continuous and differentiable for all orders. The class of analytic functions, by definition, has locally convergent power series. Consequently, the functions are infinitely differentiable and easier to work with than other types of functions. These series might have a finite number of terms, but in general, there are infinitely many. The truncation of these series results in polynomial approximations of the underlying analytic function.
Analytic functions can be algebraic or transcendental. An algebraic function $f(x)$ is a function that is a root of a polynomial equation. More specifically, $f$ is algebraic if it satisfies ${p}{(}{x},{f}{(}{x}{))} = {0}$ for some irreducible polynomial $p(x,y)$ with coefficients in some field. Examples of algebraic functions include rational functions and nth roots of polynomials. Note that the inverse function of an algebraic function (if it exists) is also algebraic. An analytic function that is not algebraic is called a transcendental function. Examples include ${e}^{x},\sin{(}{x}{),}$ and $\cos(x).$ Such functions have power series representations with an infinite number of terms.
Let us first consider analytic functions on their own. A function $\text{f}(z)$ is (complex) analytic in a domain (an open set) ${\mathcal{D}}\subset{\mathbb{C}}$ if at each point it can be written as a locally convergent Taylor series. (Note that this means that such a function is infinitely differentiable.) The set $\mathcal{D}$ is known as the domain of analyticity of ${f}{(}{z}{)}{.}$ We note that two different analytic functions $\text{f}(z)$ and $\text{g}(z)$ may have different domains of analyticity, say, ${\mathcal{D}}_{f}$ and ${\mathcal{D}}_{g}{.}$ When we operate on these functions, we assume that ${\mathcal{D}}_{f}$ and ${\mathcal{D}}_{g}$ overlap, i.e., that they have a nontrivial intersection, and restrict $\text{f}(z)$ and $\text{g}(z)$ to this common domain ${\mathcal{D}} = {\mathcal{D}}_{f}\cap{\mathcal{D}}_{g}{.}$
Then, we can perform certain fundamental operations on analytic functions, and the result will also be an analytic function with the same domain of analyticity $\mathcal{D}.$ In particular, if $\text{f}(z)$ and $\text{g}(z)$ are analytic on a domain ${\mathcal{D}}\subset{\mathbb{C}},$ then ${f}{(}{z}{)} + {g}{(}{z}{)}$ is analytic; that is, it can be expressed as a locally convergent power series for any ${z}\in{\mathcal{D}}{.}$ Similarly, ${f}{(}{z}{)}{-}{g}{(}{z}{)}$ and ${f}{(}{z}{)}\cdot{g}{(}{z}{)}$ are analytic. Things become a little more complicated when we consider quotients of the form $\text{f}(z)/\text{g}(z),$ but the result is analytic everywhere except at zeros of $\text{g}(z),$ as might be expected. This “closure” is important since it means that as we manipulate analytic functions, we do not need to worry if the result is also analytic. Note, as well, that if the product ${f}{(}{z}{)}\cdot{g}{(}{z}{)}\equiv{0}$ on $\mathcal{D},$ then ${f}{(}{z}{)}\equiv{0}$ on $\mathcal{D}$ or ${g}{(}{z}{)}\equiv{0}$ on $\mathcal{D}.$
If we now restrict our attention to polynomials in $z,$ which are analytic everywhere, and Laurent polynomials, which are analytic everywhere except ${z} = {0},$ then we can say something more. Indeed, if $\text{f}(z)$ and $\text{g}(z)$ are (Laurent) polynomials, then ${f}{(}{z}{)} + {g}{(}{z}{),}\,{f}{(}{z}{)}{-}{g}{(}{z}{),}$ and ${f}{(}{z}{)}\cdot{g}{(}{z}{)}$ are not just analytic but are also (Laurent) polynomials. Now, however, we must exercise some care when considering quotients $\text{f}(z)/\text{g}(z)$ since the result will be analytic in $\mathcal{D}$ [except at the zeros of $\text{g}(z)$] but not be a (Laurent) polynomial in general. However, $\text{f}(z)/\text{g}(z),$ and, indeed, any analytic function, can be arbitrarily well approximated by polynomials, as discussed in the preceding.
Let us now consider matrices $\text{R}(z)$ whose entries are analytic functions in ${\mathcal{D}}\subset{\mathbb{C}}{.}$ We start by noting that for any fixed ${z}_{0}\in{\mathbb{C}},$ the matrix $\mathcal{R}({z}_{0})$ is simply a matrix of complex numbers that can be manipulated in the usual ways. For example, we can multiply $\mathcal{R}({z}_{0})$ by another (conformable) matrix or vector, and we can compute the eigenvalue decomposition (EVD) of $\mathcal{R}({z}_{0}).$ When we instead allow $z$ to vary, it is still possible to form, say, matrix–matrix and matrix–vector products with $\mathcal{R}(z).$ Indeed, using the arguments in the previous paragraphs, if $\mathcal{R}(z)$ has analytic (polynomial) entries, then the resulting matrix or vector will also have analytic (polynomial) entries. However, it is not immediately obvious that we can write down a single $z$-dependent EVD of $\mathcal{R}(z)$ that holds for all values of ${z}\in{\mathcal{D}}{.}$ That this is true in certain circumstances is proved in a remarkable result from Rellich [S1].
Reference
[S1] F. Rellich, “Störungstheorie der Spektralzerlegung. I. Mitteilung. Analytische Störung der isolierten Punkteigenwerte eines beschränkten Operators,” Mathematische Annalen, vol. 113, pp. DC–DCXIX, 1937, doi: 10.1007/BF01571652. [Online] . Available: https://eudml.org/doc/159886
Throughout this article, we often represent z transforms by series, i.e., by expressions of the form \[{h}{(}{z}{)} = \mathop{\sum}\limits_{{n} = {N}_{1}}\limits^{{N}_{2}}{h}{[}{n}{]}{z}^{{-}{n}}{.} \tag{2} \]
This is motivated by the fact that analytic functions can be represented by a Taylor (or, equivalently, power) series within the ROC. More generally, we are interested in Laurent series, power series, Laurent polynomials, and polynomials, which we distinguish in the following.
For finite ${N}_{1}$ and ${N}_{2}$ in (2), $\text{h}(z)$ is a Laurent polynomial if ${N}_{1}$ and ${N}_{2}$ have opposing signs. If ${N}_{1}$ and ${N}_{2}$ share the same sign, i.e., if $\text{h}(z)$ is purely an expression in powers of either ${z}^{{-}{1}}$ or z, it is a polynomial. Typically, by a polynomial, we refer to an expression that contains powers in ${z}^{{-}{1}}{.}$ If interpreted as a transfer function, a polynomial $\text{h}(z)$ in ${z}^{{-}{1}}$ refers to a causal finite-impulse response filter. If it possesses finite coefficients, then a polynomial or Laurent polynomial $\text{h}(z)$ will always be absolutely summable and hence be analytic.
A Laurent series is characterized by ${N}_{1}\rightarrow{-}\infty$ and ${N}_{2}\rightarrow\infty,$ while for a power series, $\text{h}(z)$ strictly contains only powers in either ${z}^{{-}{1}}$ (for ${N}_{1}\geq{0}$ and ${N}_{2}\rightarrow\infty{)}$ or z (for ${N}_{1}\rightarrow{-}\infty$ and ${N}_{2}\leq{0}{).}$ Both Laurent and power series possess a generally infinite coefficient sequence ${\{}{h}{[}{n}{]\}}{.}$ Such sequences can be used to represent rational functions, where ${h}{(}{z}{)} = {b}{(}{z}{)/}{a}{(}{z}{)}$ is a ratio of two polynomials; with respect to (1), such a power series can describe an infinite-impulse response filter. Further and more generally, Laurent and power series can also represent transcendental functions, which are absolutely convergent but may not be representable by a finite number of algebraic operations, such as a ratio.
In signal processing, polynomials and convergent power series can represent quantities, such as finite- and infinite-impulse responses, of either causal or anticausal stable systems. In contrast, Laurent series and Laurent polynomials appear as a result of correlation operations. First, assume that a zero-mean unit variance uncorrelated random signal $x[n]$ excites a system with impulse response ${h}{[}{n}{]}{.}$ With the input autocorrelation sequence ${r}_{x}{[}{\tau}{]} = {\mathbb{E}}\left\{{{x}{[}{n}{]}{x}^{\ast}{[}{n}{-}{\tau}{]}}\right\} = {\delta}{[}{\tau}{],}$ the output autocorrelation is ${r}_{y}{[}{\tau}{]} = {\Sigma}_{n}{h}^{\ast}{[}{-}{n}{]}{h}{[}{\tau}{-}{n}{]}$ [35], where ${\mathbb{E}}\left\{{\cdot}\right\}$ and ${[}\cdot{]}^{\ast}$ are the expectation and complex conjugate operators, respectively. Then, its z transform, the power spectral density ${r}_{y}{(}{z}{)}\,{⊷}\,{r}_{y}{[}{\tau}{],}$ and ${r}_{y}{(}{z}{)} = {h}{(}{z}{)}{h}^{\ast}{(}{1}{/}{z}^{\ast}{)}$ will be a Laurent series if $\text{h}(z)$ is a power series and a Laurent polynomial if $\text{h}(z)$ is a polynomial.
By the Weierstrass theorem, any continuous function can be arbitrarily well approximated by a polynomial of sufficient degree, but, in general, it can be nontrivial to construct the approximating polynomials. However, for analytic functions, such as Laurent and power series $\text{h}(z),$ this approximation can be easily obtained by truncating ${h}{[}{n}{]}\,{⊶}\,{\text{h}}{(}{z}{)}$ to the required order. If the result is a Laurent polynomial as in (2), describing, e.g., the impulse response of a noncausal system, then a polynomial (or causal system) can be obtained by a delay by ${N}_{1}$ sampling periods. Thus, by delay and truncation, all the preceding expressions describing analytic functions—Laurent series, power series, and Laurent polynomials—can be arbitrarily closely approximated by polynomials.
Key Statement
While operations on analytic functions tend to yield analytic functions, the same is not true for polynomials; e.g., the ratio of polynomials generally yields a rational function but not a polynomial. Nonetheless, since the resulting function is analytic, it can be approximated arbitrarily closely by a polynomial via appropriate delay and truncation operations.
In this article, we consider matrices whose entries are analytic functions in general and their close approximation by polynomial matrices in particular. The mathematical theory of polynomial matrices that depend on a real parameter has been studied in, e.g., [36]. This has found application, for example, in the control domain [37]. Within signal processing, polynomial matrices have been used in filter bank theory. Specifically, polyphase notation [38] has been utilized to allow efficient implementation. Here, polynomial matrices in the form of polyphase analysis and synthesis matrices describe networks of filters operating on demultiplexed single-channel data. More generally, polynomial matrices have been used to define space-time covariance matrices on demultiplexed data streams [20] and directly for multichannel data [21].
A number of polynomial matrix factorizations have been introduced in the past. Since we are particularly interested in diagonalizations of matrices, these prominently include the Smith and Smith–McMillan forms for matrices of polynomials and rational functions, respectively [20]. Popular in the control domain, these allow a decomposition into a diagonal term and two outer factors that are invertible but generally nonorthogonal polynomial matrices. Further, spectral factorizations [37], [39] involve the decomposition of a matrix into a product of a causal stable matrix and its time-reversed, complex-conjugated, and transposed version. These are matrix-valued extensions of Wiener’s factorization of a power spectral density into minimum and maximum phase components, and they are supported by numerical tools, such as PolyX [40].
In control theory, minimizing the delay of a system is a critical design issue, and hence, many of the existing matrix decompositions, such as spectral factorization, emphasize the minimum phase equivalent of the resulting matrix factors. In signal processing, the delay is often a secondary issue, while, e.g., energy preservation (or unitarity) of a transform is crucial. Therefore, in the following, we explore the diagonalization of an analytic matrix by means of energy-preserving transformations.
The received signal at the mth of a total of M sensors for the discrete-time index n is \[{x}_{m}{[}{n}{]} = \mathop{\sum}\limits_{{\ell} = {1}}\limits^{L}{\mathop{\sum}\limits_{{\tau} = {0}}\limits^{T}{{a}_{m,l}}}{[}{\tau}{]}{s}_{\ell}{[}{n}{-}{\tau}{]} + {v}_{m}{[}{n}{],}\quad{m} = {1},\ldots,{M}, \tag{3} \] where ${a}_{m,l}[n]$ models the channel from the ${\ell}{\text{th}}$ source signal ${s}_{\ell}[n]$ to the mth sensor and is an element of ${A}{[}{n}{]}\in{\mathbb{C}}^{{M}\times{L}\times{(}{T} + {1}{)}},{v}_{m}{[}{n}{]}$ is the additive noise at the mth sensor assumed to be uncorrelated with the L source signals, and T is the maximum order of any of the channel impulse responses. The received data vector for M sensors is ${x}{[}{n}{]} = {[}{x}_{1}{[}{n}{],}\ldots,{x}_{M}{[}{n}{]]}^{T}\in{\mathbb{C}}^{M},$ and each element has a mean of ${\mathbb{E}}{\{}{x}_{m}{[}{n}{]\}} = {0}\forall{m},$ where ${[}\cdot{]}^{T}$ represents the transpose operator. Similarly, the source and noise data vectors are ${s}{[}{n}{]}\in{\mathbb{C}}^{L}$ and ${v}{[}{n}{]}\in{\mathbb{C}}^{M},$ respectively.
Broadband source signals, which naturally arise in, for example, audio, speech, communications, sonar, and radar, are directly reflected by (3). This is also applicable for narrowband systems. Here, the source signal is often described by a complex exponential, ${e}^{{j}\Omega{n}},$ where $\begin{gathered}{\Omega}\end{gathered}$ is the normalized angular frequency. This means that (3) can be simplified by setting ${T} = {0}{.}$ As an alternative to (3), as shown in Figure 1, the L source signals, ${s}_{\ell}{[}{n}{]}{\ell}, = {1}{,…}{L},$ could be generated using spectral-shaped noise obtained by filtering uncorrelated zero-mean unit variance complex Gaussian random variables, ${u}_{\ell}{[}{n}{]}\in{\mathcal{N}}{(}{0},{1}{),}$ through some innovation filters, ${f}_{\ell}[n]$ [35].
Figure 1. The multichannel system model for $L$ spectral-shaped source signals and $M$ sensors. Uncorrelated noise signals $\text{v}[n],$ not drawn in the figure, are optionally added to each sensor based on (3).
The channel model in (3) can describe systems in diverse scenarios, for example, instantaneous and convolutive mixtures, near-field and far-field sources, and anechoic and reverberant environments. The signal model in (3) is often simplified by taking the z transform. However, care is needed, as the z transform of a random signal does not exist. Nonetheless, in the case of deterministic absolutely summable signals, the z transform of (3) may be written in matrix-vector form as \[{x}{(}{z}{)} = {\mathcal{A}}^{P}{(}{z}{)}{s}{(}{z}{)} + {v}{(}{z}{)}, \tag{4} \] where $\text{A}[n]\,{⊶}\,{\mathcal{A}}{(}{z}{)}\in{\mathbb{C}}^{{L}\times{M}},\,\text{s}[n]\,{⊶}\,{s}{(}{z}{)}\in{\mathbb{C}}^{L},$ and $\text{v}[n]\,{⊶}\,{v}{(}{z}{)}\in{\mathbb{C}}^{M}$ are z-transform pairs of the channel matrix, source, and noise vectors, respectively. The well-known equivalence of convolution in the time domain and multiplication in the z domain [33] is expressed in (3) and (4).
The covariance matrix used in many narrowband subspace-based approaches [5], [8], [10] is described by \[{\bf{R}} = {\mathbb{E}}{\{}{\bf{x}}{[}{n}{]}{\bf{x}}^{\text{H}}{[}{n}{]\}} \tag{5} \] using the data vector obtained from (3). The ${(}{m},{\ell}{)}{\text{th}}$ element of R is ${r}_{{m},\ell} = {\mathbb{E}}{\{}{x}_{m}{(}{n}{)}{x}_{\ell}^{*}{(}{n}{)\},}$ and the expectation operation is performed over n. In practice, the expectation is approximated using the sample mean, where the inner product between the received signals at the mth and ${\ell}{\text{th}}$ sensor is computed before normalizing by the total number of samples N. Because the inner product is calculated sample-wise, the covariance matrix instantaneously captures the spatial relationship among different sensors. This article calls it the instantaneous (or spatial) covariance matrix.
When the system involves convolutive mixing and broadband signals, time delays among signals at different sensors need to be modeled. This spatiotemporal relationship is explicitly captured by the space-time covariance matrix parameterized by the discrete-time lag parameter ${\tau}\in{\mathbb{Z}},$ defined as [21] \[{\bf{R}}{[}{\tau}{]} = {\mathbb{E}}{\{}{\bf{x}}{[}{n}{]}{\bf{x}}^{\text{H}}{[}{n}{-}{\tau}{]\}}{.} \tag{6} \]
The ${(}{m},{\ell}{)}{\text{th}}$ element of ${\bf{R}}{[}{\tau}{],}$ arising from sensors with a fixed geometry, is ${r}_{{m},{\ell}}{[}{\tau}{]} = {\mathbb{E}}{\{}{x}_{m}{[}{n}{]}{x}_{\ell}^{\ast}{[}{n}{-}{\tau}{]\},}$ and, again, the expectation operation is performed over n, where wide-sense temporal stationarity is assumed. The autocorrelation and cross-correlation sequences are obtained when ${m} = {\ell}$ and ${m}\ne{\ell},$ respectively. Furthermore, (5) can be seen as a special case of (6) when only the instantaneous lag is considered; i.e., ${\bf{R}}[0]$ is the coefficient of ${z}^{0}$ when ${\tau} = {0},$ as demonstrated in Figure 2.
Figure 2. (a) A typical spatial covariance matrix for zero lag, i.e., ${\tau} = {0},$ is the matrix coefficient of ${z}^{0}{.}$ (b) In general, each matrix slice corresponds to a coefficient of the polynomial. (c) The same polynomial matrix is also a matrix consisting of polynomial elements represented by tubes in the same cube.
The z transform of the space-time covariance matrix in (6), \[{\mathcal{R}}{(}{z}{)} = \mathop{\sum}\limits_{{\tau} = {-}\infty}^{\infty}{\bf{R}}{[}{\tau}{]}{z}^{{-}{\tau}}, \tag{7} \] known as the cross spectral density (CSD) matrix, is a para-Hermitian polynomial matrix satisfying the property ${R}^{\text{P}}{(}{z}{)} = {R}{(}{z}{)}{.}$ (The symbol ${[}\cdot{]}^{\text{P}}$ denotes the para-Hermitian operator, ${R}^{\text{P}}{(}{z}{)} = {R}^{\text{H}}{(}{1}{/}{z}^{\ast}{),}$ which involves a Hermitian transpose followed by a time reversal operation [20], where ${[.]}^{\text{H}}$ denotes the Hermitian transpose operator.) The polynomial matrix can be interpreted as a matrix of polynomials (functions) as well as a polynomial with matrix coefficients; i.e., ${\bf{R}}{[}{\tau}{]}$ is the matrix coefficient of ${z}^{{-}{\tau}}.$ This is visualized in Figure 2(b), which describes the temporal evolution of the spatial relationship across the entire array. Equivalently, the same polynomial matrix can also be interpreted as a matrix with polynomial elements, representing the temporal correlation in the z domain between sensor pairs, for example, element ${r}_{3,1}(z)$ for sensors 3 and 1 in Figure 2(c).
The space-time covariance matrix completely captures the second-order statistics of multichannel broadband signals via auto- and cross-correlation functions. Its z transform has the useful property of being para-Hermitian.
The multichannel signal model introduced in the “Signal Model” section is compared against two signal representations commonly encountered in array processing. They are the TDL and short-time Fourier transform (STFT) approaches.
The relative delays with which broadband signals arrive at different sensors cannot be sufficiently modeled by phase shifts because they can be accurate only at a single frequency. Therefore, these delays need to be implemented by filters that possess, at the very least, frequency-dependent phase shifts. Such filters must rely on processing a temporal window of the signals; the access to this window can, in the finite-impulse response case, be provided by TDLs that are attached to each of the M array signals. The length T of these TDLs will determine the accuracy with which such delays—often of a fractional nature [18]—are realized.
Based on the array signal vector $\text{x}[n],$ a T-element TDL provides data that can be represented by a concatenated vector $\mathbf{\chi}{[}{n}{]} = {[}{x}^{T}{[}{n}{],}\ldots,{x}^{T}{[}{n}{-}{T} + {1}{]]}^{T}\in{\mathbb{C}}^{MT},$ which holds both spatial and temporal samples. For the covariance matrix of $\mathbf{\chi}{[}{n}{],}{R}_{\chi} = {\mathbb{E}}{\{}\mathbf{\chi}{[}{n}{]}{\mathbf{\chi}}^{\text{H}}{[}{n}{]\},}$ we have \begin{align*}{\bf{R}}_{\chi} = \left[{\begin{array}{ccc}{\bf{R}}[0]&{\ldots}&{{\bf{R}}{[}{T}{-}{1}{]}}\\{\vdots}&{\ddots}&{\vdots}\\{{\bf{R}}{[}{-}{T} + {1}{]}}&{\ldots}&{\bf{R}}[0]\end{array}}\right]{.} \tag{8} \end{align*}
Although of different dimensions, the covariance ${\bf{R}}_{\chi}$ contains, as submatrices, the same terms that also make up the space-time covariance matrix ${\bf{R}}{[}{\tau}{]}{.}$ However, it is not necessarily clear prior to processing how large or small T should be selected. Apart from its impact on the accuracy of a delay implementation, if T is selected smaller than the coherence time of the signal, then some temporal correlations for lags $\left|{\tau}\right|\geq{T}$ in the signals are missed, leading to a potentially insufficient characterization of the signals’ second-order statistics. If T is set too large, then no extra correlation information is included, but additional noise may be added.
The EVD of the covariance matrix ${\bf{R}}_{\chi} = {\bf{Q}}_{\chi}{\mathbf{\Lambda}}_{\chi}{\bf{Q}}_{\chi}^{\text{H}}$ gives access to MT eigenvalues in ${\mathbf{\Lambda}}_{\chi}{.}$ In inspecting these eigenvalues, there no longer is any separation between space and time, and, for example, a single broadband source that is captured by the array in its data vector ${\bf{x}}[n]$ can generate, depending on its bandwidth, anything between one and ${T} + \Delta$ nonzero eigenvalues, where $\Delta$ is the maximum propagation delay across the array that any source can experience. Hence, tasks such as source enumeration can become challenging.
Furthermore, in narrowband processing, a common procedure is to project the received signals x onto the so-called signal subspace, as this suppresses some of the noise [7]. The signal subspace is defined by partitioning the eigenvalues by magnitude and selecting the subset of eigenvectors corresponding to the larger eigenvalues. Mimicking this in the broadband case would mean partitioning ${\bf{Q}}_{\chi} = \left[{\begin{array}{cc}{{\bf{Q}}_{s}}&{{\bf{Q}}_{n}}\end{array}}\right],$ where ${\bf{Q}}_{s}$ correspond to the larger eigenvalues. In the narrowband case, it is well known that, in general, the projected signals ${y}{[}{n}{]} = {Q}_{s}^{\text{H}}\mathbf{\chi}{[}{n}{]}$ are not the source signals but merely span the same space. However, if only one source signal is present, then the projected signal is the source signal. In the broadband case, not even this is true since, as noted in the preceding, we might have more than one eigenvalue per signal.
If we take a T-point DFT W of each of the TDLs in $\mathbf{\chi}{[}{n}{],}$ we evaluate $\mathbf{\xi}{[}{n}{]} = {(}{\bf{W}}\odot{\bf{I}}_{M}{)}\mathbf{\chi}{[}{n}{],}$ with $\odot$ being the Kronecker product. The DFT domain covariance matrix ${\bf{R}}_{\xi} = {\mathbb{E}}{\{}\mathbf{\xi}{[}{n}{]}{\mathbf{\xi}}^{\text{H}}{[}{n}{]\}} = {(}{\bf{W}}\odot{\bf{I}}_{M}{)}{\bf{R}}_{\chi}{(}{\bf{W}}\odot{\bf{I}}_{M}{)}^{\text{H}}$ is generally nonsparse due to cross-coupling between DFT bins. This cross-coupling does not subside even as T is increased. For bin-wise processing, i.e., processing each of the frequency bins across the array independently of other frequency bins, many of the terms in ${\bf{R}}_{\xi}$ are neglected, leading to processing that can be very low cost but generally is suboptimal. To achieve optimality, time-domain criteria must be embedded in the processing, which generally leads to cross terms between bins [14], [41]. The generally dense nature of ${\bf{R}}_{\xi}$ can be relaxed when employing more frequency-selective subband methods over DFT processing, but cross terms, at least between adjacent subbands, still remain [42]. Together with the increased computational cost of such filters over the DFT, this negates the low-complexity aspiration of this approach.
Broadband processing requires accurately representing fractional time delays. Previous approaches do not lead to proper generalizations of the narrowband algorithms and are often suboptimal.
As discussed in the “Comparison With Other Broadband Signal Representations” section, conventional approaches to processing broadband signals have some shortcomings. Arguably, this is because the incorrect signal representation is used. Specifically, the use of a TDL, with either time-domain or frequency-domain processing, mixes up the spatial and temporal dimensions. This section builds on the signal model in the “Preliminaries: Representing Broadband Signals” section, representing the broadband system using z transforms and “polynomials.” Guided by the successful use of linear algebra, i.e., EVD, in narrowband systems, this section focuses on the decomposition of para-Hermitian polynomial matrices, such as the space-time covariance matrix. That is, given a para-Hermitian polynomial matrix ${\cal{R}}{(}{z}{)} = {\cal{R}}^{\text{P}}{(}{z}{),}$ does a decomposition ${\cal{R}}{(}{z}{)} = {\cal{Q}}{(}{z}{)}\mathbf{\Lambda}{(}{z}{)}{\cal{Q}}^{\text{P}}{(}{z}{)}$ exist where $\mathbf{\Lambda}{(}{z}{)}$ is diagonal and ${\cal{Q}}(z)$ is paraunitary?
Note that the EVD can diagonalize a para-Hermitian matrix ${\bf{R}}{[}{\tau}{]}$ for only one specific lag ${\tau} = {\tau}_{0}$ and, alternatively, ${\boldsymbol{R}}{(}{z}{)}\,{⊷}\,{\bf{R}}{[}{\tau}{]}$ for one specific value ${z} = {z}_{0}$. The unitary matrices that accomplish the diagonalization at that value are unlikely to diagonalize the matrix at other values ${\tau}\ne{\tau}_{0}$ and ${z}\ne{z}_{0}{.}$ We therefore require a decomposition that diagonalizes ${R}{[}{\tau}{]}$ for all values of ${\tau}$ and $\text{R}(z)$ for all values of z within the ROC. We address the existence of such a decomposition via the analytic EVD in the “Analytic EVD” section and provide some comments on numerical algorithms in the “PEVD Algorithms” section.
The key to a more general EVD is the work by Rellich [43], who, in the context of quantum mechanics, investigated a matrix-valued function ${\bf{A}}(t)$ that is self-adjoint, i.e., ${\bf{A}}{(}{t}{)} = {\bf{A}}^{\text{H}}{(}{t}{),}$ and analytic in t on some real interval. Matrix-valued functions of this type admit a decomposition ${\bf{A}}{(}{t}{)} = {\bf{U}}{(}{t}{)}\mathbf{\Gamma}{(}{t}{)}{\bf{U}}^{\text{H}}{(}{t}{)}$ with matrix-valued functions ${\bf{U}}(t)$ and $\mathbf{\Gamma}{(}{t}{)}$ that are also analytic in t and where $\mathbf{\Gamma}{(}{t}{)}$ is diagonal and $\text{U}({t}_{0})$ is unitary for any specific value ${t} = {t}_{0}{.}$ These results were obtained through perturbation analysis [44], where for the EVD of a matrix ${\bf{A}}{(}{t}_{0}{)} = {\bf{U}}{(}{t}_{0}{)}\mathbf{\Gamma}{(}{t}_{0}{)}{\bf{U}}^{\text{H}}{(}{t}_{0}{),}$ a change of ${\bf{A}}({t}_{0})$ by some small Hermitian matrix results in only a limited perturbation of both the eigenvalues and eigenvectors. There is no such guarantee if ${\bf{A}}(t)$ is not analytic in t; even infinite differentiability is not sufficient [44].
To decompose a matrix ${\cal{R}}(z)$ that is analytic in the complex-valued parameter z, it suffices to investigate $\mathcal{R}(z)$ on the unit circle for ${z} = {e}^{{j}\Omega}{.}$ This is due to the uniqueness theorem for analytic functions, which guarantees that if two functions are identical on some part of their ROC—here, the unit circle, which must always be included—they must be identical across the entire ROC. Although $\Omega\in{\mathbb{R}},$ Rellich’s results do not directly apply, as they do not imply a ${2}{\pi}$ periodicity. Without such periodicity, it is not possible to reparameterize the EVD factors by replacing ${e}^{{j}\Omega}$ with z and hence produce an EVD that is analytic in z. However, it has recently been shown that Rellich’s result admits ${2}{\pi}{N} {-} {periodic}$ eigenvalue functions, and, furthermore, ${N} = {1}$ unless the data generating $\mathcal{R}(z)$ emerge from N-fold multiplexing or block filtering [23], [24]. Analytic eigenvector functions, then, exist with the same periodicity as the eigenvalues [45]. Therefore, an analytic EVD for this N-fold multiplexed system \[{\cal{R}}{(}{z}^{N}{)} = {\boldsymbol{Q}}{(}{z}{)}\mathbf{\Lambda}{(}{z}{)}{\boldsymbol{Q}}^{\text{P}}{(}{z}{)} \tag{9} \] exists with analytic factors such that $\mathbf{\Lambda}{(}{z}{)}$ is diagonal. The matrix $\text{Q}(z)$ contains the eigenvector functions and for ${z} = {e}^{\text{j}{\Omega}_{0}}$ is unitary. For a general z, $\text{Q}(z)$ is paraunitary such that ${Q}{(}{z}{)}{Q}^{\text{P}} {(}{z}{)} = {Q}^{\text{P}} {(}{z}{)}{Q}{(}{z}{)} = {I}{.}$ Paraunitarity is an extension of the orthonormal and unitary properties of matrices from the real- and complex-valued cases to matrices that are functions in a complex variable [20]. For ease of exposition in the following, we talk of “analytic matrices $\mathcal{X}(z),$” with the understanding that $\mathcal{X}(z)$ is a matrix-valued analytic function.
In the analytic EVD of (9), the eigenvalue function is $\mathbf{\Lambda}{(}{z}{)} = {\text{diag}}\left\{{\lambda}_{1}{(}{z}{),}\ldots,{\lambda}_{M}{(}{z}{)}\right\},$ where ${\text{diag}}{(}{\cdot}{)}$ forms a diagonal matrix from its argument. When evaluated on the unit circle, the eigenvalues ${\lambda}_{m}{(}{e}^{{j}\Omega}{),}{m} = {1},\ldots,{M}$ are real valued and unique up to a permutation. If there are M distinct eigenvalues, i.e., ${\lambda}_{m}{(}{e}^{{j}\Omega}{)} = {\lambda}_{\mu}{(}{e}^{{j}\Omega}{)}$ only for ${m} = {\mu}$ for any ${m},{\mu} = {1},\ldots,{M},$ then the corresponding eigenvectors ${q}_{m}(z)$ in ${Q}{(}{z}{)} = {[}{q}_{1}{(}{z}{),}\ldots,{q}_{M}{(}{z}{)]}$ are unique up to an arbitrary all-pass function; i.e., ${q}_{m}^{'}{(}{z}{)} = {\psi}_{m}{(}{z}{)}{q}_{m}{(}{z}{)}$ is also a valid analytic eigenvector of $\text{R}({z}^{N}),$ where ${\psi}_{m}(z)$ is all-pass.
As an example, consider a system from [23], \begin{align*}{\mathcal{R}}{(}{z}{)} = \left[{\begin{array}{cc}{\frac{{1}{-}{j}}{2}{z} + {3} + \frac{{1} + {j}}{2}{z}^{{-}{1}}}&{\frac{{1} + {j}}{2}{z}^{2} + \frac{{1}{-}{j}}{2}}\\{\frac{{1} + {j}}{2} + \frac{{1}{-}{j}}{2}{z}^{{-}{2}}}&{\frac{{1}{-}{j}}{2}{z} + {3} + \frac{{1} + {j}}{2}{z}^{{-}{1}}}\end{array}}\right], \tag{10} \end{align*} which is constructed from eigenvalues $\mathbf{\Lambda}{(}{z}{)} = {\text{diag}}{\{}{z} + {3} + {z}^{{-}{1}},\,{jz} + {3}{-}{j}{z}^{{-}{1}}{\}}$ and their corresponding eigenvectors ${q}_{1,2}{(}{z}{)} = {[}{1},\pm{z}^{{-}{1}}{]}^{T}{/}\sqrt{2}{.}$ The evaluation of the eigenvalues on the unit circle ${\lambda}_{1,2}({e}^{{j}\Omega})$ is presented in Figure 3(a). Figure 3(b) shows the Hermitian angle of the eigenvectors ${\varphi}_{m}{(}{e}^{{j}\Omega}{)} = \arccos{(}\mid{q}_{1}^{\text{H}}{(}{e}^{\text{j}0}{)}{q}_{m}{(}{e}^{{j}\Omega}{)}\mid{)}$ is drawn in Figure 3(b). Note that due to the analyticity of the EVD factors, all these quantities evolve smoothly with the normalized angular frequency Ω. An all-pass modification of the eigenvectors might be as simple as imposing a delay; while this will not affect ${\varphi}_{m}({e}^{{j}\Omega}),$ it can increase support of ${\text{Q}}[n]\,{⊶}\,{Q}{(}{z}{)}{.}$
Figure 3. (a) Analytic eigenvalues on the unit circle and (b) Hermitian angles of the corresponding analytic eigenvectors, measured against a reference vector [23].
While in the preceding example, the factorization yields polynomial factors, this does not have to be the case: they could be Laurent and power series. For example, modifying the previous eigenvectors by arbitrary all-pass functions does not invalidate the decomposition, but it may change the order of ${q}_{m}(z)$ to $\infty,$ i.e., a power series. More generally, Laurent polynomial matrices $\text{R}(z)$ are likely to lead to algebraic and even transcendental functions as EVD factors [23], [24]. Nonetheless, recall from the “Introduction” section that analyticity implies absolute convergence in the time domain. Therefore, the best least-squares approximation is achieved by truncation. Further, as the approximation order increases, the approximation error can be made arbitrarily small.
The components of the analytic PEVD have some useful properties. The matrix of eigenvectors $\text{Q}[n]$ can be viewed as a lossless filter bank. Clearly, it transforms the input time series into another set of time series. However, being paraunitary, the energy in the output signals is the same as that of the input signals. Furthermore, the output signals are strongly decorrelated. That is, any two signals have zero cross-correlation coefficients at all lags. Significantly, the signals are not temporally whitened; i.e., they do not have an impulse as their autocorrelation function. Note that the order of a z transform is connected to the time-domain support of the corresponding time series. Thus, the computational cost of implementing such a filter bank is related to the order of ${Q}{(}{z}{)}{.}$ In general, the eigenvalues of a narrowband covariance matrix have differing magnitudes, with the presence of small values indicating approximate linear dependency among the input signals. Similarly, the eigenvalues on the diagonal of $\mathbf{\Lambda}{[}{n}{]}$ can show linear dependence but in a frequency-dependent manner.
The pioneering work of Rellich showed that an analytic EVD exists for a matrix function. Applying this to a space-time covariance matrix on the unit circle introduces some additional constraints but results in the existence of an analytic PEVD.
The first attempt at producing a PEVD algorithm began with the second-order sequential best rotation (SBR2) [21], which was motivated by Jacobi’s method for numerically computing the EVD [2]. The PEVD of $\mathcal{R}(z),$ i.e., (7), as given by (9) for ${N} = {1}$ and established in the “Analytic EVD” section, can be approximated using an iterative algorithm and is expressed as [21], [46] \[{\mathcal{R}}{(}{z}{)}\approx{\mathcal{U}}{(}{z}{)}\mathbf{\Lambda}{(}{z}{)}{\mathcal{U}}^{\text{P}}{(}{z}{),} \tag{11} \] where the columns of the polynomial matrix, ${\mathcal{U}}{(}{z}{)}\in{\mathbb{C}}^{{M}\times{M}},$ correspond to the eigenvectors with their associated eigenvalues on the diagonal polynomial matrix, $\mathbf{\Lambda}{(}{z}{)}\in{\mathbb{C}}^{{M}\times{M}}{.}$ The Laurent polynomial matrix factors $\mathcal{U}(z)$ and $\mathbf{\Lambda}{(}{z}{)}$ are necessarily analytic, being of finite order. However, under certain circumstances, the theoretical factors in the PEVD might not be analytic [23], [24], in which case, the Laurent polynomial matrix factors $\mathcal{U}(z)$ and $\mathbf{\Lambda}{(}{z}{)}$ are only approximations of the true factors, hence the approximation in (11).
Rewriting (11) as \[\mathbf{\Lambda}{(}{z}{)}\approx{\mathcal{U}}^{\text{P}}{(}{z}{)}{\mathcal{R}}{(}{z}{)}{\mathcal{U}}{(}{z}{),} \tag{12} \] the diagonalization of $\mathcal{R}(z)$ can be achieved by generalized similarity transformations, with $\mathcal{U}(z)$ satisfying the paraunitary, or lossless, condition [20] \[{\mathcal{U}}^{\text{P}}{(}{z}{)}{\mathcal{U}}{(}{z}{)} = {\mathcal{U}}{(}{z}{)}{\mathcal{U}}^{\text{P}}{(}{z}{)} = {I}, \tag{13} \] where I is the identity matrix. The similarity transform $\mathcal{U}(z)$ may be calculated via an iterative algorithm, such as the SBR2, and sequential matrix diagonalization (SMD) [46]. Here, a sequence of elementary paraunitary transformations ${\mathcal{G}}_{i}{(}{z}{)}{(}{i} = {1},\ldots{)}$ are applied to $\mathcal{R}(z)$ until the polynomial matrix becomes approximately diagonal; i.e., starting from ${\tilde{\mathcal{R}}}_{0}{(}{z}{)} = {\mathcal{R}}{(}{z}{),}$ the expression \[{\tilde{\mathcal{R}}}_{i}{(}{z}{)} = {\mathcal{G}}_{i}^{\text{P}}{(}{z}{)}{\tilde{\mathcal{R}}}_{{i}{-}{1}}{(}{z}{)}{\mathcal{G}}_{i}{(}{z}{)} \tag{14} \] is iterated until ${\tilde{\mathcal{R}}}_{{N}_{I}}(z)$ is approximately diagonal for some ${N}_{I}{.}$ An elementary paraunitary transformation takes the form of the product of a unitary transformation and a polynomial delay matrix, diag${\{}{1},\ldots,{1},{z}^{n},{1},\ldots,{1}{\}.}$
Figure 4 gives the steps involved during every iteration of the SBR2. At each iteration, the algorithm searches for the off-diagonal element with the largest magnitude across all z-planes, as marked in red in Figure 4. If the magnitude exceeds a predefined threshold, a delay polynomial matrix is applied to bring the element to the principal ${z}^{0} {-} {plane},$ as shown in Figure 4. A unitary matrix, designed to zero out two elements on the zero-lag plane, is applied to the entire polynomial matrix. Note that applying one elementary paraunitary transformation may make some previously small off-diagonal elements larger, but overall, the algorithm converges to a diagonal matrix. As observed in Figure 4, the delay step can increase the polynomial order and make it unnecessarily large. Therefore, a trimming procedure [21] is used to control the growth of the polynomial order by discarding negligibly small coefficients in the outer planes, e.g., ${z}^{{-}{4}}$ and ${z}^{4}$ in Figure 4. Furthermore, the similarity transformations in (12) affect a pair of dominant elements so that the search space can be halved due to the preservation of symmetry. The algorithm terminates when the magnitudes of all off-diagonal elements fall below the preset threshold and when a user-defined maximum number of iterations is reached.
Figure 4. Each PEVD iteration involves the following four steps. (a) The polynomial matrix is first searched for the maximum off diagonal across all lags. (b) The second delay step brings the largest element to the principal ${z}^{0} {-} {p}{l}{a}{n}{e}{.}$ (c) The third is the zeroing step, which transfers energy from the off-diagonal elements to the diagonal. (d) The final trimming step discards negligibly small coefficients in the outer matrix slices.
This has led to a family of time-domain algorithms based on the SBR2 [21] and SMD [46]. The computational complexity of these numerical algorithms is at least $\mathcal{O}({M}^{3}{T})$ due to matrix multiplication applied to every lag [47]. The additional complexity incurred over the EVD approach is essential for the temporal decoupling of broadband signals. Furthermore, some promising efforts using parallelizable hardware [48] and numerical tricks [49] have been proposed, and the decomposition can be computed in a fraction of a second. These algorithms are also guaranteed to produce polynomial paraunitary eigenvectors but tend to generate spectrally majorized eigenvalues, which may not be analytic. Two functions ${f}_{1}(z)$ and ${f}_{2}(z)$ are said to be spectrally majorized if, on the unit circle, one function’s magnitude is always greater than the other’s. Figure 5 presents the results of using the SMD algorithm to process the matrix used to generate Figure 3. In Figure 5, the eigenvalue in blue is always greater than the one in red. In contrast, the (analytic) eigenvalues in Figure 3 intersect and are not spectrally majorized. As described in Figure 5(b), forcing a spectrally majorized solution for the eigenvalues leads to the eigenvectors having discontinuities that are difficult to approximate with polynomials. To get an accurate result, high-order polynomials are required. This, in turn, has consequences for the implementation cost of any signal processing based on the output of these algorithms. Note, however, that spectral majorization can be advantageous in some situations; see the “PEVD-Based Subband Coding” section.
Figure 5. The results of using SMD to decompose the matrix in Figure 3: (a) eigenvalues on the unit circle and (b) Hermitian angles of the corresponding analytic eigenvectors, measured against a reference vector [23].
Unlike in the case of the SBR2 algorithm [21], there is no proof that the SMD algorithm will always produce spectrally majorized eigenvalues, although evidence from the use of this algorithm strongly supports this conjecture. Given the issues that spectral majorization produces in terms of exploiting $\mathcal{Q}(z)$ as a filter bank and for identifying subspaces, recent work has been directed at designing an algorithm that can produce a PEVD whose components are guaranteed to be analytic. One such approach [50] involves working in the frequency domain and taking steps to ensure that the spectral coherence is not lost.
A number of algorithms have been designed for decompositions of fixed order and without proven convergence. This includes the approximate EVD (AEVD) algorithm [51], which applies a fixed number of elementary paraunitary operations in the time domain in an attempt to diagonalize $\mathcal{R}(z).$ In the DFT domain, [52] aims to extract maximally smooth eigenvalues and eigenvectors, which can target the extraction of the analytic solution.
Approximating analytic functions by polynomials allows the development of PEVD algorithms based on an elementary paraunitary operator. The resulting algorithms are guaranteed to produce polynomial paraunitary eigenvectors but tend to generate spectrally majorized eigenvalues. This property has benefits as well as drawbacks.
This section highlights three application cases that demonstrate key examples where PEVD-based approaches can offer advantages over state-of-the-art processing. In the “PEVD-Based Adaptive Beamforming” section, we demonstrate how, for adaptive beamforming, the computational complexity is decoupled from the TDL length that otherwise determines the cost of a broadband adaptive beamformer. The “PEVD-Based Subband Coding” section shows how, in subband coding, the PEVD can generate a system with optimized coding gain and helps to formulate optimum compaction filter banks that previously could be stated only for the two-channel case. Finally, the “Polynomial Subspace Speech Enhancement” section addresses how the preservation of spectral coherence can provide perceptually superior results over DFT-based speech enhancement algorithms.
To explore PEVD-based beamforming, we first recall some aspects of narrowband beamforming before defining a linearly constrained minimum variance (LCMV) beamformer using both TDL- and PEVD-based formulations. We work with an arbitrary geometry of M sensors but, for simplicity, assume free-space propagation and that the array is sufficiently far field to neglect any loss in amplitude across its sensors.
Spatial filtering uses the fact that wavefronts arriving from different sources have a different delay profile when arriving at the sensors. If there are L spatially separated sources, then for the $\ell{\text{th}}$ source, ${\ell} = {1},\ldots,{L},$ and let this delay profile be ${\{}{\tau}_{{\ell},{1}},\ldots,{\tau}_{{\ell},{M}}{\},}$ where ${\tau}_{{\ell},{m}}$ is the delay at the mth sensor with respect to some common reference point. We further define a vector of transfer functions ${a}_{\ell}{(}{z}{)} = {[}{d}_{{\tau}_{{\ell},{1}}}{(}{z}{),}\ldots,{d}_{{\tau}_{{\ell},{M}}}{(}{z}{)]}^{T}$ containing fractional delay filters, where ${d}_{\tau}{[}{n}{]}\,{⊶}\,{d}_{\tau}{(}{z}{)}$ implements a delay by ${\tau}\in{\mathbb{R}}$ samples [18]. We refer to ${\mathbf{a}}_{\ell}(z)$ as a broadband steering vector since, when evaluated at a fixed frequency ${\Omega}_{\ell},$ the $\ell{\text{th}}$ source can be regarded as a narrowband signal with center frequency ${\Omega}_{\ell},$ in which case this vector of functions reduces to the well-known steering vector ${a}_{\ell}{(}{z}{)}\mid{}_{{z} = {e}^{\text{j}{\Omega}_{\ell}}} = {a}_{\ell}{(}{e}^{\text{j}{\Omega}_{\ell}}{)}\in{\mathbb{C}}^{M}{.}$ The latter contains the phase shifts that each sensor experiences with respect to the $\ell{\text{th}}$ source. If at least two sensors satisfy the spatial sampling theorem, and for a particular frequency ${\Omega}_{\ell} = {\Omega}_{0},$ this steering vector is unique with respect to the DOA of the $\ell{\text{th}}$ source.
We want to process the array data ${\bf{x}}{[}{n}{]}\in{\mathbb{C}}^{M}$ by a vector of filters ${\bf{w}}{[}{n}{]}\,{⊶}\,{\boldsymbol{w}}{(}{z}{),}$ with ${w}^{\text{P}}{(}{z}{)} = {[}{w}_{1}{(}{z}{),}\ldots,{w}_{M}{(}{z}{)]}$ and ${w}_{m}{(}{z}{)}\,{⊷}\,{w}_{m}{[}{n}{]}$ is the filter processing the mth sensor signal such that the array output $y[n]$ is the sum of the filtering operations, ${y}{[}{n}{]} = {\Sigma}_{v}{\bf{w}}^{\text{H}}{[}{-}{\nu}{]}{\bf{x}}{[}{n}{-}{\nu}{]} = {\Sigma}_{m,v}{w}_{m}{[}{\nu}{]}{x}_{m}{[}{n}{-}{\nu}{].}$ The definition of the filter vector ${\bf{w}}[n],$ with its time-reversed and conjugated weights, may seem cumbersome, but it follows similar conventions for complex-valued data [53] and will later simplify the z-transform notation.
In the narrowband case, the delay filters can be replaced by complex coefficients in a vector ${\bf{w}}^{\text{H}} = {[}{w}_{1},\ldots,{w}_{M}{]}\in{\mathbb{C}}^{M}$ that implement phase shifts. To generate a different gain ${f}_{\ell},{\ell} = {1},\ldots,{L}$ with respect to each of the L sources, the beamformer defined by w must satisfy the constraint equation \begin{align*}\mathop{\underbrace{\left[{\begin{array}{c}{{a}_{1}^{\text{H}}({e}^{\text{j}{\Omega}_{1}})}\\{\vdots}\\{{a}_{L}^{\text{H}}({e}^{\text{j}{\Omega}_{L}})}\end{array}}\right]}}\limits_{C}{\bf{w}} = \mathop{\underbrace{\left[{\begin{array}{c}{{f}_{1}}\\{\vdots}\\{{f}_{L}}\end{array}}\right]}}\limits_{\bf{f}}, \tag{15} \end{align*} where ${\bf{C}}\in{\mathbb{C}}^{{L}\times{M}}$ and ${\bf{f}}\in{\mathbb{C}}^{L}$ are the constraint matrix and associated gain vector for the L constraints. In the presence of spatially white noise, the minimum mean-square error (MMSE) solution is the quiescent beamformer ${\bf{w}}_{\text{q}} = {\bf{C}}^{\dagger}{\bf{f}}$ [53], where ${\bf{C}}^{\dagger}$ is the pseudo-inverse of $\text{C}$. If the noise is spatially correlated, then the LCMV formulation \[\mathop{\min}\limits_{\bf{w}}{\mathbb{E}} \left\{{\mid{y}{[}{n}{]}\mid^{2}}\right\}\quad\quad{\text{s.t.}}\quad{\bf{Cw}} = {\bf{f}} \tag{16} \] provides the MMSE solution, now constrained by (15). Solutions to (16) include, for example, the Capon beamformer, ${\bf{w}}_{\text{opt}} = {(}{\bf{R}}{[}{0}{])}^{{-}{1}}{\bf{C}}^{\text{H}}{[}{\bf{C}}{(}{\bf{R}}{[}{0}{])}^{{-}{1}}{\bf{C}}^{\text{H}}{]}^{{-}{1}}{\bf{f}},$ and the generalized sidelobe canceller (GSC). For the GSC, a “quiescent beamformer” ${\bf{w}}_{\text{q}}$ implements the constraints in ${\bf{C}},$ and a “blocking matrix” B is constructed such that ${\bf{CB}} = {\mathbf{0}}{.}$ When operating on the array data ${\bf{x}}[n],$ the blocking matrix output is free of any desired signal components protected by the constraints. All that remains now is to suppress any undesired signal components in the quiescent beamformer output that correlate with the blocking matrix output. This unconstrained optimization problem for the vector ${\bf{w}}_{\text{a}}$ in Figure 6(a) can be addressed by adaptive filtering algorithms via a noise cancellation architecture [53]. The overall response of the adapted GSC is ${\bf{w}} = {\bf{w}}_{\text{q}}{-}{\bf{Bw}}_{\text{a}}{.}$
Figure 6. The GSC for (a) the narrowband (black quantities in boxes) and PEVD-based cases (blue quantities in boxes) and (b) the TDL-based case.
In the broadband case, each sensor is followed by a TDL of length T to implement a finite-impulse response filter that can resolve explicit time delays [54]. This leads to the concatenated data vector ${\mathcal{X}}{[}{n}{]} = {[}{x}^{\text{H}}{[}{n}{],}\ldots,{x}^{\text{H}}{[}{n}{-}{T} + {1}{]]}^{\text{H}}\in{\mathbb{C}}^{MT}$ presented in the “TDL Processing” section. The weight vector ${v}\in{\mathbb{C}}^{MT}$ now performs a linear combination across this spatiotemporal window such that the beamformer output becomes ${y}{[}{n}{]} = {v}^{\text{H}}\mathbf{\chi}{[}{n}{].}$ Analogous to (15), a constraint equation ${\bf{C}}_{\text{b}}{\bf{v}} = {\bf{f}}_{\text{b}}$ defines the frequency responses in a number of directions. The constraint formulation for a linear array with a look direction toward broadside is as straightforward as in the narrowband case [55]. For linear arrays with off-broadside constraints and for arbitrary arrays, the formulation of constraints becomes trickier and can be based on stacked narrowband constraints across a number of DFT bins, akin to the single-frequency formulation that leads to (15). Since it may not be clear how many such constraints should be stacked, robust approaches start with a large number, which is then trimmed to a reduced set of linearly independent constraints by using, e.g., a QR decomposition [56]. Overall, with respect to the narrowband case, the dimensions of the constraint matrix and constraining vector will increase approximately T-fold such that ${C}_{\text{b}}\in{\mathbb{C}}^{{TL}\times{TM}}$ and ${f}_{\text{b}}\in{\mathbb{C}}^{TL}{.}$
For the broadband GSC [57], a quiescent beamformer ${\bf{v}}_{\text{q}} = {\bf{C}}_{\text{b}}^{\dagger}{\bf{f}}_{\text{b}}\in{\mathbb{C}}^{TL}$ will generate an output that still contains any structured interference that is not addressed by the constraint equation. A signal vector correlated with this remaining interference is produced by the blocking matrix ${\bf{B}}_{\text{b}}\in{\mathbb{C}}^{{T}{(}{M}{-}{L}{)}\times{TM}},$ whose columns, akin to the narrowband case, must span the null-space of ${C}_{\text{b}}$ such that ${\bf{C}}_{\text{b}}{\bf{B}}_{\text{b}} = {\mathbf{0}}{.}$ Its output is then linearly combined by an adaptive filter ${v}_{\text{a}}\in{\mathbb{C}}^{{T}{(}{M}{-}{L}{)}}$ such that the overall beamformer output in Figure 6(b) is minimized in the MSE sense. Note that the TDL length determines the dimensions of all GSC components, with the overall adapted response of the beamformer, with respect to the input ${\bf{x}}[n]$ extended to the TDL representation in $\mathcal{X}[n],$ being ${\bf{v}} = {\bf{v}}_{\text{q}}{-}{\bf{B}}_{\text{b}}{\bf{v}}_{\text{a}}{.}$
In the PEVD-based approach, we replace narrowband quantities in the narrowband formulation by their polynomial equivalents to address the broadband case. This includes substituting the Hermitian transpose ${\{·\}}^{\text{H}}$ by a para-Hermitian transposition ${\{·\}}^{\text{P}}{.}$ Thus, the constraint equation becomes \begin{align*}\mathop{\underbrace{\left[{\begin{array}{c}{{a}_{1}^{\text{P}} {(}{z}{)}}\\{\vdots}\\{{a}_{L}^{\text{P}} {(}{z}{)}}\end{array}}\right]}}\limits_{{C}(z)}{w}{(}{z}{)} = \mathop{\underbrace{\left[{\begin{array}{c}{{f}_{1}(z)}\\{\vdots}\\{{f}_{L}(z)}\end{array}}\right]}}\limits_{{f}(z)}{.} \tag{17} \end{align*}
The constraint matrix $\mathcal{C}(z)$ is therefore made up of broadband steering vectors, and the gain vector $\mathbf{f}(z)$ contains the transfer functions ${f}_{\ell}(z),\,{\ell} = {1},\ldots,{L}$ that should be imposed on the L sources at the beamformer output. Both quantities are of the same dimensions as in the narrowband case but are now functions of the complex variable z. Writing the beamformer output as ${y}{[}{n}{]} = {\Sigma}_{v}{\bf{w}}^{\text{H}}{[}{-}{\nu}{]}{\bf{x}}{[}{n}{-}{\nu}{]}$ allows the broadband LCMV problem to be formulated as [25] \[\mathop{\min}\limits_{\text{w}(z)}\mathop{\oint}\nolimits_{{|}{z}{|} = {1}}{{w}^{\text{P}}} {(}{z}{)}{\mathcal{R}}{(}{z}{)}{w}{(}{z}{)}\frac{\text{d}z}{z}\quad\quad{\text{s.t.}}\,\,{\mathcal{C}}{(}{z}{)}{w}{(}{z}{)} = {f}{(}{z}{),} \tag{18} \] where $\mathcal{R}(z)$ is the CSD matrix of $\text{x}[n].$ The evaluation of (18) at a single frequency ${\Omega}_{0}$ leads back to the narrowband formulation via the substitution ${z} = {e}^{\text{j}{\Omega}_{0}}{.}$
The solution to the broadband LCMV problem can be found as the equivalent of the Capon beamformer ${w}_{\text{opt}}{(}{z}{)} = {\mathcal{R}}^{{-}{1}}{(}{z}{)}{\mathcal{C}}^{\text{P}} {(}{z}{)\{}{\mathcal{C}}{(}{z}{)}{\mathcal{R}}^{{-}{1}}{(}{z}{)}{\mathcal{C}}^{\text{P}} {(}{z}{)\}}^{{-}{1}}{\boldsymbol{f}}{(}{z}{),}$ which is a direct extension of the narrowband formulation. To access this solution, the inversion of the para-Hermitian matrices $\mathcal{R}(z)$ and, subsequently, ${\mathcal{C}}{(}{z}{)}{\mathcal{R}}^{{-}{1}}{(}{z}{)}{\mathcal{C}}^{\text{P}} {(}{z}{)}$ can be accomplished via PEVDs [58]. Once factorized, the resulting paraunitary matrices are straightforward to invert, and it remains to invert the individual eigenvalues; for this, recall the comment on analytic functions as divisors in the “Mathematical Background” section. Alternatively, to avoid the nested matrix inversions of the Capon beamformer and to exploit iterative schemes for their general numerical robustness, an iterative unconstrained optimization can be performed via a broadband PEVD-based GSC, whereby, with respect to Figure 6(a), the quiescent beamformer is ${w}_{\text{q}}{(}{z}{)} = {\mathcal{C}}^{\text{P}} {(}{z}{)\{}{\mathcal{C}}{(}{z}{)}{\mathcal{C}}^{\text{P}} {(}{z}{)\}}^{{-}{1}}{f}{(}{z}{).}$ The pseudo-inverse of a polynomial matrix for the quiescent solution can again be obtained via a PEVD of the para-Hermitian term ${\mathcal{C}}{(}{z}{)}{\mathcal{C}}^{\text{P}} {(}{z}{)}$ [58]. Furthermore, its subspace decomposition also reveals the null-space of $\mathcal{C}(z)$ that can be used to define the columns of the blocking matrix $\mathcal{B}(z)$ such that ${\mathcal{C}}{(}{z}{)}{\mathcal{B}}{(}{z}{)} = {\mathbf{0}}{.}$ It remains only to operate a vector ${w}_{\text{a}}(z)$ of ${(}{M}{-}{L}{)}$ adaptive filters on the output of the blocking matrix to complete the optimization of this PEVD-based GSC. Note that the overall response of the beamformer is ${w}{(}{z}{)} = {w}_{\text{q}}{(}{z}{)}{-}{\mathcal{B}}{(}{z}{)}{w}_{\text{a}}{(}{z}{).}$
Using polynomial matrix notations and the PEVD, narrowband approaches, such as the Capon beamformer and the GSC, can be directly extended to the broadband case.
Compared to the narrowband GSC in Figure 6(a), all quantities have retained their dimensions but are now functions of z. It now remains to set the polynomial orders of the different GSC components for an implementation. The quiescent vector ${w}_{\text{q}}(z)$ depends on the constraint formulation, and its order ${J}_{1}$ determines the accuracy of the fractional delay filters. The order ${J}_{2}$ of the blocking matrix $\mathcal{B}(z)$ needs to be sufficiently high such that no source signal components covered by the constraint equation leak into the adaptive part ${w}_{\text{a}}(z).$ The order ${J}_{3}$ of the latter has to be sufficient to minimize the power of the output, $y[n].$ Thus, unlike in the TDL-based broadband beamformer case, the orders of the components are somewhat decoupled.
If the optimization of the adaptive part is addressed by inexpensive least-mean-squares type algorithms, the computational cost in both the TDL- and PEVD-based approaches is governed by the blocking matrix. In the TDL-based case, it requires ${T}^{2}{(}{M}^{2}{-}{ML}{)}$ multiplications and additions, while the PEVD-based blocking matrix expends only ${J}_{2}{(}{M}^{2}{-}{ML}{)}$ such operations. With typically ${J}_{2}\approx{T},$ the PEVD-based realization is less expensive by a factor of approximately the length of the TDL, T.
A linear array with ${M} = {8}$ elements spaced by half the wavelength of the highest-frequency component has a look direction toward $\vartheta = {3}\mathop {0}\nolimits^{\circ},$ which is protected by a constraint. Three “unknown” interferers with directions $\vartheta = {\{}{-}{4}\mathop {0}\nolimits^{\circ},{-}{1}\mathop {0}\nolimits^{\circ},{8}\mathop {0}\nolimits^{\circ}{\}}$ are active over a frequency range of ${0}{.}{2}\leq\Omega{/}{\pi}\leq{0}{.}{9}$ at -20-dB signal-to-interference-plus-noise and need to be adaptively suppressed. The data are further corrupted by spatially and temporally white additive noise 50 dB below the signal levels of the interferers. A TDL-based GSC operates with a TDL length of ${T} = {175}{.}$ For a PEVD-based GSC, the adaptive filter uses the same temporal dimension ${J}_{3} = {T},$ but to match the MSE performance of the TDL-based version, a length of ${J}_{1} = {51}$ for the fractional delay filters in the quiescent beamformer and a temporal dimension of ${J}_{2} = {168}$ for the blocking matrix suffice. The adaptive filter is adjusted by a normalized least-mean-squares algorithm [53]. Note that ${J}_{2}{<}{T}{.}$ Overall, per iteration, the PEVD-based GSC takes 12.3 kMACs, while the TDL-based GSC requires 3.46 MMACs, which is indeed more than a factor of T higher.
To evaluate the beamformer performance, we determine the gain response or directivity pattern of the beamformer by probing the adapted overall beamformer response by sweeping a broadband steering vector ${a}_{i}(z)$ across a set of angles $\{{\varphi}_{i}\}$ with a corresponding delay profile. For the directivity pattern, the angle-dependent transfer function ${\mathcal{G}}{(}{z},{\varphi}_{i}{)} = {w}^{\text{P}} {(}{z}{)}{a}_{i}{(}{z}{)}$ can be evaluated on the unit circle. For the PEVD-based GSC, this directivity pattern is displayed in Figure 7(a); the response (not displayed) for the TDL-based GSC is very similar. A difference can, however, be noted in the look direction, which, in the case of the TDL-based GSC, is protected by a number of point constraints along the frequency axis, as highlighted in Figure 7(b). The gain response satisfies these point constraints, but it shows significant deviations from the ideal flat response between the constrained frequencies. In contrast, the PEVD-based beamformer is based on a single broadband constraint equation, which shows a significantly lower deviation from the desired look direction gain. This is due to the formulation in the time domain, which preserves spectral coherence. There are downsides, and the gain response will break down closer to $\Omega = {\pi},$ due to the imperfections that are inherent in fractional delay filters operating at close to half the sampling rate [18].
Figure 7. The (a) directivity pattern for the adapted PEVD-based GSC and (b) gain response in the look direction ${(}\varphi = {30}{)}$ for the PEVD- and TDL-based GSC.
The PEVD-based GSC can implement the constraint equation more easily and precisely than the TDL-based version and possesses a significantly lower complexity when addressing nontrivial constraints.
Although this article addresses techniques for array signals, in many circumstances, multichannel signal representations are derived from a single-channel signal by demultiplexing [20], [59], [60], [61]. Let ${x}{[}{\nu}{]}$ be such a single-channel signal. Demultiplexing by M and an implicit decimation operation by the same factor, or serial-to-parallel conversion, is performed to obtain a data vector ${x}{[}{n}{]} = {[}{x}{[}{nM}{],}{x}{[}{nM}{-}{1}{]},\ldots,{x}{[}{nM}{-}{M} + {1}{]}{]}^{T}{.}$ This demultiplexed vector $x[n]$ possesses the same form as the data vectors considered in the “Signal Model” section. While the number and type of samples that are held in $\text{x}[n]$ remain unaltered from those in ${x}{[}{\nu}{],}$ the representation in $\text{x}[n]$ allows clever data reduction and coding schemes through filter bank-based processing, for which we ultimately exploit the PEVD.
Generally, we want to process the data ${\bf{x}}[n]$ through a transformation such that ${\bf{y}}{[}{n}{]} = {\Sigma}_{v}{\bf{Q}}^{\text{H}}{[}{-}{\nu}{]}{\bf{x}}{[}{n}{-}{\nu}{].}$ Specifically, we wish this transformation to be lossless, i.e., for ${\mathcal{Q}}{(}{z}{)}\,{⊷}\,{\bf{Q}}{[}{n}{]}$ to be paraunitary, such that a perfect reconstruction via ${\bf{x}}{[}{n}{]} = {\Sigma}_{v}{\bf{Q}}{[}{\nu}{]}{\bf{y}}{[}{n}{-}{\nu}{]}$ is possible. To the original unmultiplexed single-channel signal, the transformation ${\bf{Q}}^{\text{H}}$ represents the analysis filter bank, whereas the transformation Q implements the synthesis (reconstruction) filter bank [20]. The matrices ${\mathcal{Q}}^{\text{P}}(z)$ and $\mathcal{Q}(z)$ are known as the analysis polyphase matrix and synthesis polyphase matrix, respectively, and the paraunitarity of $\mathcal{Q}(z)$ guarantees perfect reconstruction of the overall filter bank system when operating back-to-back.
The polyphase matrix $\mathcal{Q}(z)$ can be designed to implement a series of low-pass, bandpass, and high-pass filters to split the signal ${x}{[}{\nu}{]}$ into signal components with different spectral content. However, the filter bank $\mathcal{Q}(z)$ can also be signal dependent. Chief among such systems are principal component, or optimum compaction, filter banks (PCFBs), which aim to assign as much power of ${x}{[}{\nu}{]}$ into as few successive components of ${\bf{y}}[n]$ as possible. The purpose of this is to discard some components of ${\bf{y}}[n],$ thus producing a lower-dimensional representation of the data. A closely related task is subband coding, where a quantization is performed on ${\bf{y}}[n]$ rather than on ${x}{[}{\nu}{].}$ Thus, a higher bit resolution is dedicated to those subbands of ${\bf{y}}[n]$ that possess higher power. By not increasing the overall number of bits with respect to ${x}{[}{\nu}{],}$ judicious distribution of the coding effort results in an increase in the coding gain measure: the ratio between the arithmetic and geometric means of the variances of the subband signals in ${\bf{y}}[n]$ [59]. A coding gain greater than one can be exploited as an increased signal-to-quantization-noise ratio under constant word length or in terms of a reduction in the number of bits required for quantization while retaining the same quality for the quantized signals.
To maximize the coding gain under the constraint of the paraunitarity of $\mathcal{Q}(z),$ two necessary and sufficient conditions of ${\bf{y}}[n]$ have been identified [59]: 1) the subband signals in ${\bf{y}}[n]$ must be strongly decorrelated such that ${\mathcal{R}}_{y}(z)$ is diagonal, and 2) they must be spectrally majorized such that for the elements ${S}_{m}(z)$ along the diagonal, on the unit circle, we have ${S}_{m}{(}{e}^{{j}\Omega}{)}\geq{S}_{{m} + {1}}{(}{e}^{{j}\Omega}{)}\forall\Omega$ and ${m} = {1},\ldots{,(}{M}{-}{1}{).}$ Due to Parseval’s theorem, this implies that the powers of the subband signals are also ordered in a descending fashion. While under the constraint of paraunitary, this does not change the arithmetic mean; it minimizes the geometric mean of the subband variances and thus maximizes the coding gain. Optimum subband coders $\mathcal{Q}(z)$ have been derived for the case of a zeroth-order filter bank, where they reduce to the Karhunen-Loève transform (KLT), and for the infinite-order filter bank case [59], [61]. Executing the PEVD (described in the “PEVD Algorithms” section) on ${\bf{R}}{[}{\tau}{]}$ leads to ${\bf{R}}_{y}{[}{\tau}{]} = \mathbf{\Lambda}{[}{\tau}{]}$ that directly satisfies the preceding conditions and thus provides a solution to a subband coder of finite order [27] whose theoretical evaluation otherwise eluded the research community except for the case of ${M} = {2}$ [60].
As discussed in the “PEVD Algorithms” section, at each SBR2 algorithm iteration, the parameters of the elementary paraunitary operator are selected such that the most dominant cross-covariance term of the input space-time covariance matrix is zeroed. There are two problems with this “greedy” optimization approach: 1) cross-correlation energies spread among subbands of the weakest power can end up being ignored, which limits the extent to which spectral majorization is performed, and 2) there is a stronger tendency to annihilate cross-correlations due to noise in powerful subbands rather than true cross terms related to weak subbands, which causes a degradation in strong decorrelation performance. The coding gain variant of the SBR2, namely, SBR2C [27], alleviates these problems because it uses a cost function based on the coding gain measure, which is proportionately equally receptive to cross-correlations among any of the subbands.
Consider a signal ${x}{[}{\nu}{]}$ described by a fourth-order autoregressive model [27], [35]; its PSD $\mathcal{S}({e}^{{j}\Omega})$ appears in Figure 8(a). Demultiplexing by ${M} = {4}$ produces a pseudocirculant matrix $\text{R}(z)$ whose analytic eigenvalues ${\mathcal{S}}_{m}{(}\Omega{)} = {\mathcal{S}}{(}{e}^{{j}{(}\Omega{/}{M}{-}{2}{\pi}{(}{m}{-}{1}{))}}{),}\,{m} = {1},\ldots,{M},$ are ${8}{\pi}{-}{\text{periodic}}$ modulated versions of this PSD [20]. For ${0}\leq\Omega\leq{2}{\pi},$ they are depicted as gray-highlighted curves in Figure 8(b). Although the ${8}{\pi}$ periodicity of these functions means that $\mathcal{R}(z)$ has no analytic eigenvalues, an estimated CSD matrix $\hat{\mathcal{R}}(z),$ here based on ${10}^{4}$ samples of ${x}{[}{\nu}{],}$ does possess an analytic EVD due to the perturbation by the estimation error [24]. Applying the SMD [27] to $\hat{\mathcal{R}}(z)$ will generate a strongly decorrelated signal vector $\text{y}[n]$ via a paraunitary operation ${\mathcal{Q}}{(}{z}{)}{.}$ The eigenvalues ${\hat{\lambda}}_{m}{(}{\text{e}}^{{\text{j}}\Omega}{)}$ extracted by the SMD algorithms are also in Figure 8(b). These closely match the folded PSD of ${x}{[}{\nu}{]}$ highlighted in gray but are spectrally majorized.
Figure 8. The (a) PSD of input ${x}{[}{\nu}{]}$ and (b) eigenvalues extracted by the SMD algorithm for the subband coding problem; the ${M} = {4}$-times-folded PSD of the input signal is shown in gray.
Interpreting $\mathcal{Q}(z)$ as a polyphase analysis matrix, the associated four-channel filter bank is characterized in Figure 9. The theoretically optimum infinite-order PCFB [59] is also shown. These are obtained by assigning every demultiplexed frequency component of ${x}{[}{\nu}{]}$ to one of four filters, in descending magnitude. This yields a binary mask in the Fourier domain, which would require the implementation of infinite sinc functions in the time domain. In contrast, the finite-order filters computed by the SMD algorithm, each derived from an eigenvector in $\mathcal{Q}(z)$ corresponding to the eigenvalues in Figure 8, very closely approximate the PCFB except where the input PSD is small and arguably unimportant.
Figure 9. The magnitude responses $\mid{H}_{m}{(}{e}^{{j}\Omega}{)}\mid,{m} = {1},\ldots,{4}$ of the ${M} = {4}{-}{\text{channel}}$ filter bank equivalent to the polyphase analysis matrix $\mathcal{Q}(z),$ with the theoretical PCFB of infinite order shown in gray.
To demonstrate the wider benefit of the proposed subband coder design, a randomly generated ensemble of 100 moving average processes of order 14 [MA(14)] produces signals ${x}{[}{\nu}{]}$ that are demultiplexed by ${M} = {4}{.}$ For each ensemble probe, the space-time covariance matrix is estimated from ${2}^{11}$ samples of ${x}{[}{\nu}{]}$ as a basis for the subband coder design. To average the subband coding results across this ensemble, we normalize the coding gain obtained for each instance in the ensemble by the maximum coding gain of the infinite-order PCFB; the latter can be derived from each of the MA(14) processes [27]. This ensemble-averaged normalized coding gain verses the order of the polynomial matrix $\mathcal{Q}(z)$ is detailed in Figure 10. The figure shows results for the KLT, the AEVD algorithm in [51], and the SBR2C and SMD. The KLT is the optimum zeroth-order subband coder. The AEVD algorithm is a fixed-order technique that aims to generate a PEVD but without proved convergence. Note that, like the AEVD algorithm, the SMD algorithm for zeroth-order systems (i.e., length-one polynomials) reduces to an ordinary EVD that is equivalent to the KLT and optimum for narrowband source signals, as shown in the figure. Both the SBR2C and SMD converge toward the optimum performance of an infinite-order PCFB as the polynomial order of $\mathcal{Q}(z)$ increases. This is indeed what would be expected since the PEVD is effectively the broadband generalization of the KLT. Due to its specific targeting of the coding gain and the resulting enhanced spectral majorization, the SBR2C outperforms the SMD here and thus provides a highly useful trade-off between polynomial order and coding gain.
Figure 10. The averaged normalized coding gain in its dependence on the length (order: plus one) of $\mathcal{Q}(z)$ for an ensemble of random MA(14) processes and the case of demultiplexing with ${M} = {4}{.}$
Hitherto, algorithms for M > 2 -channel paraunitary filter banks for subband coding were suboptimal. PEVDdesigned M -channel filter banks now closely approximate the ideal system.
Speech enhancement is important for applications involving human-to-human communications, such as hearing aids and telecommunications, and human-to-machine interactions, including robot audition, voice-controlled systems, and automatic speech recognition. These speech signals are often captured by multiple microphones, commonly found in many devices today, and provide opportunities for spatial processing. Moreover, speech signals captured by different microphones naturally exhibit temporal correlations, especially in reverberant acoustic environments. This section shows that it is advantageous to use PEVD algorithms to capture and process these spatiotemporal correlations, thus preserving spectral coherence. A more comprehensive treatment with listening examples and code is available in [28].
Consider a scenario where there is a single speaker $s[n],$ an array of microphones, and uncorrelated background noise ${v}{[}{n}{]}{.}$ The speech propagates from the source to each microphone m through the channels with acoustic impulse responses (AIRs), ${a}_{{\ell},{m}}[n],$ that are assumed to be time invariant. The AIR models the direct path propagation of the speech signal from the speaker to the microphone as well as reverberation due to multipath reflections from objects and walls in enclosed rooms. Background noise is then added to each microphone. The signal model in the “Signal Model” section, with ${\ell} = {1},$ can describe this situation. Across M microphones, the signal vector ${x}{[}{n}{]}\in{\mathbb{C}}^{M}$ is used to compute the space-time covariance matrix ${\bf{R}}_{\bf{x}}{[}{\tau}{]}$ in (6) and its z transform ${\mathcal{R}}_{x}(z)$ in (7).
Exploiting the reverberation model in [62], the early reflections in the AIR represent closely spaced distinct echoes that perceptually reinforce the direct path component and may improve speech intelligibility in certain conditions. On the other hand, the late reflections in the AIR consist of randomly distributed small amplitude components, and the associated late reverberant signal components are commonly assumed to be mutually uncorrelated with the direct path and early signal components [28], [62]. Thus, (7) can be written as \begin{align*}{\cal{R}}_{\text{x}}{(}{z}{)} & = \tilde{\mathbf{a}}{(}{z}{)}{\tilde{\mathbf{a}}}^{\text{P}}{(}{z}{)}{r}_{s}{(}{z}{)} + {\cal{R}}_{l}{(}{z}{)} + {\cal{R}}_{\text{v}}{(}{z}{)} \tag{19} \\ & = {R}_{\tilde{s}}{(}{z}{)} + {R}_{\tilde{v}}{(}{z}{),} \tag{20} \end{align*} where ${r}_{s}{(}{z}{)}\,{⊷}\,{r}_{s}{[}{\tau}{]}$ and ${r}_{s}{[}{\tau}{]}$ is the autocorrelation sequence of the source. The space-time covariance matrices of the late reverberation ${\mathcal{R}}_{l}(z)$ are modeled as a spatially diffuse field. This is combined with the space-time covariance of the ambient noise ${\mathcal{R}}_{v}(z)$ to form ${\mathcal{R}}_{\tilde{v}}{(}{z}{)}{.}$ The channel polynomial vector is $\tilde{\mathbf{a}}{(}{z}{)} = {[}{\tilde{a}}_{1}{(}{z}{),}\ldots,{\tilde{a}}_{M}{(}{z}{)]}\in{\mathbb{C}}^{M},$ where ${\tilde{a}}_{m}(z)$ is a polynomial obtained by taking the z transform of the direct path and early reflections in the AIR from the source to the mth microphone, i.e., ${\tilde{a}}_{m}{(}{z}{)} = {\Sigma}_{{i} = {0}}^{I}{\tilde{a}}_{m}{[}{i}{]}{z}^{{-}{i}},$ dropping ${\ell}$ for brevity.
The PEVD of (20) decomposes the polynomial matrix into \[{\cal{R}}_{\bf{x}}{(}{z}{)} = \left[{\cal{U}}_{\tilde{s}}(z){\vert}{\cal{U}}_{\tilde{v}}(z)\right]\,\left[{\begin{array}{cc}{\mathbf{\Lambda}}_{\tilde{s}}(z)&{\mathbf{0}}\\{\mathbf{0}}&{\mathbf{\Lambda}}_{\tilde{v}}(z)\end{array}}\right]\,\left[{\begin{array}{c}{\cal{U}}_{\tilde{s}}^{\text{P}}(z)\\{\cal{U}}_{\tilde{v}}^{\text{P}}(z)\end{array}}\right], \tag{21} \] where ${\{.\}}_{\tilde{s}}$ and ${\{.\}}_{\tilde{v}}$ are associated with the signal-plus-noise (or, simply, signal) and noise-only (or, simply, noise) subspaces, respectively. Unlike some speech enhancement approaches, the proposed method does not use any noise or relative transfer function (RTF) estimation algorithms since the strong decorrelation property of the PEVD implicitly orthogonalizes the subspaces across all time lags in the range of ${\tau}{.}$ Consequently, speech enhancement can be achieved by combining components in the signal subspace while nulling components residing in the noise subspace.
The paraunitary $\mathcal{U}(z)$ is a lossless filter bank. This implies that $\mathcal{U}(z)$ can distribute spectral power only among channels and not change the total signal and noise power over all subspaces. The eigenvector filter bank is used to process the microphone signals, using \[{\bf{y}}{[}{n}{]} = \mathop{\sum}\limits_{\nu}{{\bf{U}}^{\text{H}}}{[}{-}{\nu}{]}{\bf{x}}{[}{n}{-}{\nu}{],} \tag{22} \] where ${\bf{U}}{(}{n}{)}\,{⊶}\,{\mathcal{U}}{(}{z}{)}\in{\mathbb{C}}^{{M}\times{M}}{.}$ Since the polynomial eigenvector matrix $\mathcal{U}(z)$ is constructed from a series of delay and unitary matrices, each vector ${\bf{u}}_{m}[n]$ has a filter-and-sum structure.
For a single source, the signal subspace has a dimension of one. Therefore, the enhanced signal can be extracted from the first channel of the processed outputs ${\bf{y}}{[}{n}{]}{.}$ The enhanced output ${y}_{1}[n],$ associated with the signal subspace, includes mainly speech components originally distributed over all microphones but now summed coherently. In contrast, the noise subspace is dominated by ambient noise and the late reverberation in the acoustic channels. The orthogonality between subspaces is a result of strong decorrelation, expressed as ${\cal{R}}_{y}{(}{z}{)} = \mathbf{\Lambda}{(}{z}{),}$ where ${\cal{R}}_{y}{(}{z}{)}\,{⊷}\,{\bf{R}}_{y}{[}{\tau}{]}$ is computed from ${\bf{R}}_{y}{[}{\tau}{]} = {\mathbb{E}}{\{}{\bf{y}}{[}{n}{]}{\bf{y}}^{\text{H}}{[}{n}{-}{\tau}{]\}}{.}$
In practice, assuming quasi stationarity, the speech signals are processed frame by frame such that ${\bf{R}}_{x}{[}{\tau}{]}$ in (6) can be recursively estimated. Additionally, the two-sided z transform of ${\bf{R}}_{x}{[}{\tau}{]}$ in (7) can be approximated by some truncation window W, which determines the extent of the supported temporal correlation of the speech signal. The time-domain PEVD algorithms, such as the SBR2 and SMD, are used to compute (21) because they preserve spectral coherence of the speech signals and do not introduce audible artifacts. The proposed algorithm can also cope in noise-only and reverberation-only scenarios, as explored in [28]. Experimental results are next presented to show these general principles applied for a specific case.
Anechoic speech signals, which were sampled at 16 kHz, were taken from the TIMIT corpus [63]. AIR measurements and babble noise recordings for the ${M} = {3}{-}{\text{channel}}$ “mobile” array were taken from the ACE corpus [64]. ACE lecture room 2 has a reverberation time ${T}_{60}$ of 1.22 s. For each Monte Carlo simulation, 50 trials were conducted. In each trial, sentences from a randomly selected speaker were concatenated to have an 8- to 10-s duration. The anechoic speech signals were then convolved with the AIRs for each microphone before being corrupted by additive noise. The SNRs ranged from −10 to 20 dB.
PEVD-based enhancement can be compared against other algorithms, such as the oracle multichannel Wiener filter (OMWF), weighted power minimum distortionless response (WPD), and two subspace approaches, multichannel subspace (MCSUB) [65] and colored subspace (COLSUB) [66], which use an EVD and a generalized EVD (GEVD), respectively. Furthermore, unlike the PEVD approach, noise estimation is required for the GEVD. The OMWF is based on the concatenation of a minimum variance distortionless response beamformer followed by a single-channel Wiener filter. The OMWF provides an ideal performance upper bound since it uses complete prior knowledge of the clean speech signal, based on [67], where the filter length is 80. Practical multichannel Wiener filters, which rely on the RTF and noise estimation algorithms, do not perform as well as the OMWF, and comparative results can be found in [28]. The WPD is an integrated method for noise reduction and dereverberation [68]. The ground truth DOA is provided to compute the steering vector for the WPD to avoid signal direction mismatch errors. The PEVD does not use any knowledge of the speech, DOA, and array geometry.
Experiments presented here to illustrate comparative performance use PEVD parameters, chosen following [28], including ${\delta} = \sqrt{{N}_{1}/3}\times{10}^{{-}{2}}$ denoting the threshold of the dominant off-diagonal column norm, where ${N}_{1}$ is the square of the trace norm of ${\bf{R}}_{\bf{xx}}(0);$ trim factor ${\mu} = {10}^{{-}{3}}{;}$ and ${L} = {500}$ iterations. In all experiments, the frame size T and window W are set to 1,600. With this parameter selection, correlations within 100 ms, which are assumed to include the direct path and early reflection components, are captured and used by the algorithm. The source corresponding to these experiments is available in [28].
The frequency-weighted segmental SNR (FwSegSNR) can be used to evaluate the noise reduction and normalized signal-to-reverberant ratio (NSRR) and Bark spectral distortion (BSD) for dereverberation [28]. To measure speech intelligibility and to account for processing artifacts, short-time objective intelligibility (STOI) can be used. These measures are computed for the signals before and after enhancement by using the proposed and benchmark algorithms. The improvement $\Delta$ is reported. Positive $\Delta$ values show improvements in all measures except $\Delta{BSD},$ for which a negative value indicates a reduction in spectral distortions.
An illustrative example based on clean speech $s[n]$ corrupted by 5-dB babble noise in the reverberant ACE lecture room 2 is presented in Figure 11. The spectrogram of the first microphone signal ${x}_{1}[n]$ shows temporal smearing due to reverberation and the addition of babble noise. Comparing the plots for ${x}_{1}[n]$ with the processed signals ${y}_{1}[n],$ the dotted cyan boxes in Figure 11 qualitatively show the attenuation and some suppression of the babble noise and reverberation for the PEVD and COLSUB. This is supported by Table 1, which shows that the PEVD significantly improves the STOI and NSRR while coming second in the FwSegSNR and BSD, after the COLSUB. Although the COLSUB makes the most significant improvement in the FwSegSNR, the solid white boxes highlight the speech structures in $s[n],$ which are lost after the processing of ${x}_{1}[n]$ to generate ${y}_{1}[n],$ as evident between 3 and 3.3 s and 4.2 and 4.7 s in Figure 11. This has resulted in artifacts in the listening examples and the lowest improvement in the STOI. The OMWF, which uses complete knowledge of the clean speech signal, is the second best in the STOI and slightly improves other metrics, similar to the WPD, which uses the ground truth steering vector. The MCSUB offers limited improvement. Listening examples also highlight that the PEVD does not introduce audible processing artifacts into the enhanced signal [28].
Figure 11. Spectrograms, with corresponding time-domain signals, for the processing of a noisy reverberant speech example in ACE lecture room 2 and 5-dB babble noise. Dotted cyan boxes highlight noise- and reverberation-suppressed regions as a result of processing. Solid white boxes highlight regions where speech structures are lost using the COLSUB but not PEVD processing. Listening examples are available in [28]. The (a) clean speech signal $s[n],$ (b) noisy speech signal ${x}_{1}[n],$ (c) COLSUB-enhanced signal ${y}_{1}[n],$ and (d) PEVD-enhanced signal ${y}_{1}{[}{n}{]}{.}$
Table 1. The enhancement of a single reverberant speech sample in lecture room 2 and 5-dB ACE babble noise.
Results for the Monte Carlo simulation involving 50 speakers in lecture room 2 and corrupted by −10- to 20-dB babble noise are available in Figure 12. For ${\text{SNR}}\leq{10}{\text{dB}},$ the COLSUB outperforms other algorithms in $\Delta{\text{FwSegSNR}}$ but gives the worst $\Delta{\text{STOI}}{.}$ On the other hand, the OMWF, designed to minimize speech distortion by using knowledge of clean speech, performs the best in $\Delta{\text{STOI}}$ but not in $\Delta{\text{FwSegSNR}}{.}$ This also reflects the fact that speech intelligibility may not necessarily be affected by noise levels, up to some limit. Despite not being given any information on the target speech, the PEVD performs comparably to the OMWF and ranks first in $\Delta{\text{NSRR}}$ and second in $\Delta{\text{FwSegSNR}}$ and $\Delta{\text{STOI}}{.}$
Figure 12. A comparison of speech enhancement performance for recorded AIR and babble noise in ACE lecture room 2, with a 1.22-s reverberation time: the (a) $\Delta{\text{FwSegSNR}}$ (higher is better), (b) $\Delta{\text{STOI}}$ (higher is better), and (c) $\Delta{\text{NSRR}}$ (higher is better).
At a 20-dB SNR, algorithms targeting reverberation, such as the WPD, perform better than noise reduction approaches. Similar to generalized weighted prediction error in the reverberation-only case in [28], the WPD processes the reverberant signals aggressively by removing most early reflections but not the direct path and late reflections, as observed in the listening examples. Furthermore, the WPD uses the ground truth DOA to compute the ideal steering vector, leading to the best improvement in $\Delta{\text{BSD}}$ and $\Delta{\text{STOI}}{.}$ Listening examples for the PEVD indicate that the direct path and early reflections are retained in the enhanced signal in the first channel. The late reverberations, absent in the enhanced signal, are observed in the second and third channels because of orthogonality [28]. Even without additional information, the PEVD performs comparably to the WPD and ranks second in $\Delta{\text{NSRR}}$ and $\Delta{\text{STOI}}{.}$
Despite not being given knowledge of the DOA, target speech, and array geometry, the PEVD consistently ranks first for $\Delta{\text{NSRR}}$ and second in $\Delta{\text{STOI}}$ and $\Delta{\text{FwSegSNR}}{.}$ over the range of scenarios. Comprehensive results with listening examples and code for the noise-only, reverberation-only, and noisier reverberant scenarios are available in [28].
PEVD-based speech enhancement consistently improves noise reduction metrics, speech intelligibility scores, and dereverberation measures over a wide range of acoustic scenarios. This blind and unsupervised algorithm requires no knowledge of the array geometry and does not use any channel and noise estimation algorithms but performs comparably to an oracle algorithm. More notably, due to the preservation of the spectral coherence using time-domain PEVD algorithms, the proposed algorithm does not introduce noticeable processing artifacts into the enhanced signal. Code and listening examples are provided in [28].
This article has demonstrated the use of polynomial matrices to model broadband multichannel signals and the use of the PEVD to process them. Previous approaches using TDLs and STFTs do not lead to proper generalization of narrowband algorithms and are suboptimal. Instead of considering only the instantaneous covariance matrix, the space-time covariance matrix has been proposed to completely capture the second-order statistics of multichannel broadband signals. Motivated by the optimum processing of narrowband signals using the EVD, i.e., for a single lag, the PEVD has been proposed to process broadband signals across a range of time lags. In most cases, an analytic PEVD exists and can be approximated by polynomials using numerical algorithms, which tend to generate spectrally majorized eigenvalues and paraunitary eigenvectors.
PEVD-based processing for three example applications has been presented and is advantageous over state-of-the-art processing. The PEVD approach can implement the constraints more easily and precisely for adaptive broadband beamforming while achieving a lower complexity than the TDL-based approach. For multichannel subband coding, the PEVD design approximates the ideal optimal data encoding system and overcomes the previous issues with the more-than-two-channel case. The PEVD-based algorithm, which uses only microphone signals, can consistently enhance speech signals, without introducing any audible artifacts, and performs comparably to an oracle algorithm, as observed in the listening examples. In addition to the applications presented in this article, the PEVD is also successfully used for blind source separation [30], MIMO system design [26], source identification [31], and broadband DOA estimation [32].
Similar extensions from the EVD to an analytic or PEVD can be undertaken for other linear algebraic operations, e.g., the nonpara-Hermitian EVD, singular value decomposition (SVD), QR decomposition, and generalized SVD. Algorithms for SVD and QR decomposition have appeared but are without a theoretical foundation with respect to their existence. Powerful narrowband techniques, such as independent component analysis, may find their polynomial equivalents. While a number of low-cost implementations have already emerged, algorithmic scalability is an area of active investigation. We hope that these theoretical and algorithmic developments will motivate the signal processing community to experiment with polynomial techniques and take these beyond the successful application areas showcased in this article. Resources including code and demonstration pages are available in [28] and [69].
The work of Stephan Weiss was supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC), under grant EP/S000631/1, and the MoD University Defense Research Collaboration in Signal Processing. The work of Patrick A. Naylor was funded through EPSRC grant EP/S035842/1 and the European Union’s Horizon 2020 research and innovation program, under Marie Skłodowska-Curie grant 956369.
Vincent W. Neo (vincent.neo09@imperial.ac.uk) received his Ph.D. degree in electrical and electronic engineering in 2022 from Imperial College London. He is currently a principal engineer in the Singapore Defence Science and Technology Agency, working on speech technology, and a visiting postdoctoral researcher with the Department of Electrical and Electronic Engineering, Imperial College London, SW7 2AZ London, U.K. His research interests include multichannel signal processing and polynomial matrix decomposition, with applications to speech, audio, and acoustics. He is a Member of IEEE.
Soydan Redif (soydan.redif@aum.edu.kw) received his Ph.D. degree in electronics and electrical engineering from the University of Southampton, U.K. He is currently an associate professor at the College of Engineering and Technology, American University of the Middle East, Dasman 15453, Kuwait. His research interests include adaptive and array signal processing applied to source separation, communications, power, biomedical, and wearable systems. He is a Senior Member of IEEE.
John G. McWhirter (mcwhirterjg@cardiff.ac.uk) received his Ph.D. degree in theoretical physics from Queen’s University of Belfast, U.K., in 1973. He is currently an emeritus professor at the University of Cardiff, CF24 3AA Cardiff, U.K. His research interests include independent component analysis for blind signal separation and polynomial matrix algorithms for broadband sensor array signal processing. He was elected as a fellow of the Royal Academy of Engineering in 1996 and the Royal Society in 1999. His work has attracted various awards, including the European Association for Signal Processing Group Technical Achievement Award in 2003.
Jennifer Pestana (jennifer.pestana@strath.ac.uk) received her D.Phil. degree in numerical analysis from the University of Oxford, U.K., in 2012. She is currently a lecturer in the Department of Mathematics and Statistics, University of Strathclyde, G1 1XH Glasgow, U.K. Her research interests include numerical linear algebra and matrix analysis and their application to problems in science and engineering.
Ian K. Proudler (ian.proudler@strath.ac.uk) received his Ph.D. degree in digital signal processing from the University of Cambridge, U.K., in 1984. He is currently a visiting professor at the University of Strathclyde, G1 1XW Glasgow, U.K. He was an honorary editor for IEE Proceedings: Radar, Sonar, and Navigation and the 2002 recipient of the Institution of Electrical Engineers J.J. Thomson Medal. His research interests include adaptive filtering, adaptive beamforming, multichannel signal processing, and blind signal separation.
Stephan Weiss (stephan.weiss@strath.ac.uk) received his Ph.D. degree in electronic and electrical engineering from the University of Strathclyde, G1 1XW Glasgow, U.K., in 1998, where he is currently a professor of signal processing. His research interests include adaptive, multirate, and array signal processing, with applications in acoustics, communications, audio, and biomedical signal processing. He is a Senior Member of IEEE.
Patrick A. Naylor (p.naylor@imperial.ac.uk) received his Ph.D. degree from Imperial College London, SW7 2AZ London, U.K., where he is currently a professor of speech and acoustic signal processing. He is a member of the Board of Governors of the IEEE Signal Processing Society and a past president of the European Association for Signal Processing. His research interests include microphone array signal processing, speaker diarization, and multichannel speech enhancement for applications, including binaural hearing aids and augmented reality. He is a Fellow of IEEE.
[1] G. Strang, Linear Algebra and Its Application, 2nd ed. New York, NY, USA: Academic, 1980.
[2] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Baltimore, MD, USA: The Johns Hopkins Univ. Press, 1996.
[3] S. Haykin and K. J. R. Liu, Eds., Handbook on Array Processing and Sensor Networks. Hoboken, NJ, USA: Wiley, 2010.
[4] N. S. Jayant and P. Noll, Digital Coding of Waveforms Principles and Applications to Speech and Video. Englewood Cliffs, NJ, USA: Prentice-Hall, 1984.
[5] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, Mar. 1986, doi: 10.1109/TAP.1986.1143830.
[6] M. Vetterli and J. Kovačević, Wavelets and Subband Coding. Upper Saddle River, NJ, USA: Prentice-Hall, 1995.
[7] M. Moonen and B. De Moor, SVD and Signal Processing, III: Algorithms, Architectures and Applications. New York, NY, USA: Elsevier, 1995.
[8] H. L. Van Trees, Optimal Array Processing. Part IV of Detection, Estimation, and Modulation Theory. New York, NY, USA: Wiley, 2002.
[9] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications, 1st ed. New York, NY, USA: Academic, 2010.
[10] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, 1st ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2000.
[11] R. Klemm, Space-Time Adaptive Processing: Principles and Applications. London, U.K.: Inst. Elect. Eng., 1998.
[12] A. Rao and R. Kumaresan, “On decomposing speech into modulated components,” IEEE Trans. Speech Audio Process., vol. 8, no. 3, pp. 240–254, May 2000, doi: 10.1109/89.841207.
[13] S. Weiss and I. K. Proudler, “Comparing efficient broadband beamforming architectures and their performance trade-offs,” in Proc. IEEE Int. Conf. Digit. Signal Process. (DSP), Jul. 2002, pp. 417–424, doi: 10.1109/ICDSP.2002.1027910.
[14] W. Kellermann and H. Buchner, “Wideband algorithms versus narrowband algorithms for adaptive filtering in the DFT domain,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2003, pp. 1–5, doi: 10.1109/ACSSC.2003.1292194.
[15] Y. Avargel and I. Cohen, “System identification in the short-time Fourier transform domain with crossband filtering,” IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 4, pp. 1305–1319, May 2007, doi: 10.1109/TASL.2006.889720.
[16] B. Widrow, P. Mantey, L. Griffiths, and B. Goode, “Adaptive antenna systems,” Proc. IEEE, vol. 55, no. 12, pp. 2143–2159, Dec. 1967, doi: 10.1109/PROC.1967.6092.
[17] R. T. Compton Jr., “The relationship between tapped delay-line and FFT processing in adaptive arrays,” IEEE Trans. Antennas Propag., vol. 36, no. 1, pp. 15–26, Jan. 1988, doi: 10.1109/8.1070.
[18] T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine, “Splitting the unit delay [FIR/All Pass Filters Design] ,” IEEE Signal Process. Mag., vol. 13, no. 1, pp. 30–60, Jan. 1996, doi: 10.1109/79.482137.
[19] T. Kailath, Linear Systems, 1st ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 1980.
[20] P. P. Vaidyanathan, Multirate Systems and Filterbanks, 1st ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 1993.
[21] J. G. McWhirter, P. D. Baxter, T. Cooper, S. Redif, and J. Foster, “An EVD algorithm for para-Hermitian polynomial matrices,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 2158–2169, May 2007, doi: 10.1109/TSP.2007.893222.
[22] S. Icart and P. Comon, “Some properties of Laurent polynomial matrices,” in Proc. IMA Int. Conf. Math. Signal Process., Dec. 2012, pp. 1–4.
[23] S. Weiss, J. Pestana, and I. K. Proudler, “On the existence and uniqueness of the eigenvalue decomposition of a para-Hermitian matrix,” IEEE Trans. Signal Process., vol. 66, no. 10, pp. 2659–2672, May 2018, doi: 10.1109/TSP.2018.2812747.
[24] S. Weiss, J. Pestana, I. K. Proudler, and F. K. Coutts, “Corrections to ‘On the existence and uniqueness of the eigenvalue decomposition of a para-Hermitian matrix’,” IEEE Trans. Signal Process., vol. 66, no. 23, pp. 6325–6327, Dec. 2018, doi: 10.1109/TSP.2018.2877142.
[25] S. Weiss, S. Bendoukha, A. Alzin, F. K. Coutts, I. K. Proudler, and J. Chambers, “MVDR broadband beamforming using polynomial matrix techniques,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2015, pp. 839–843, doi: 10.1109/EUSIPCO.2015.7362501.
[26] R. Brandt and M. Bengtsson, “Wideband MIMO channel diagonalization in the time domain,” in Proc. Int. Symp. Pers., Indoor Mobile Radio Commun., 2011, pp. 1958–1962, doi: 10.1109/PIMRC.2011.6139853.
[27] S. Redif, J. G. McWhirter, and S. Weiss, “Design of FIR paraunitary filter banks for subband coding using a polynomial eigenvalue decomposition,” IEEE Trans. Signal Process., vol. 59, no. 11, pp. 5253–5264, Nov. 2011, doi: 10.1109/TSP.2011.2163065.
[28] V. W. Neo, C. Evers, and P. A. Naylor, “Enhancement of noisy reverberant speech using polynomial matrix eigenvalue decomposition,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 29, pp. 3255–3266, Oct. 2021, doi: 10.1109/TASLP.2021.3120630.
[29] J. Corr, J. Pestana, S. Weiss, I. K. Proudler, S. Redif, and M. Moonen, “Investigation of a polynomial matrix generalised EVD for multi-channel Wiener filtering,” in Proc. Asilomar Conf. Signals, Syst. Comput., 2016, pp. 1354–1358, doi: 10.1109/ACSSC.2016.7869596.
[30] S. Redif, S. Weiss, and J. G. McWhirter, “Relevance of polynomial matrix decompositions to broadband blind signal separation,” Signal Process., vol. 134, pp. 76–86, May 2017, doi: 10.1016/j.sigpro.2016.11.019.
[31] S. Weiss, N. J. Goddard, S. Somasundaram, I. K. Proudler, and P. A. Naylor, “Identification of broadband source-array responses from sensor second order statistics,” in Proc. Sens. Signal Process. Defence Conf. (SSPD), 2017, pp. 1–5, doi: 10.1109/SSPD.2017.8233237.
[32] W. Coventry, C. Clemente, and J. Soraghan, “Enhancing polynomial MUSIC algorithm for coherent broadband sources through spatial smoothing,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2017, pp. 2448–2452.
[33] A. Oppenheim and R. W. Schafer, Digital Signal Processing, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 1993.
[34] B. Girod, R. Rabebstein, and A. Stenger, Signals and Systems. New York, NY, USA: Wiley, 2001.
[35] A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. New York, NY, USA: McGraw-Hill, 1991.
[36] I. Gohberg, P. Lancaster, and L. Rodamn, Matrix Polynomials, 2nd ed. Philadelphia, PA, USA: SIAM, 2009.
[37] V. Kučera, Analysis and Design of Discrete Linear Control Systems. Englewood Cliffs, NJ, USA: Prentice-Hall, 1991.
[38] R. E. Crochiere and L. R. Rabiner, Multirate Digital Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1983.
[39] M. Davis, “Factoring the spectral matrix,” IEEE Trans. Autom. Control, vol. 8, no. 4, pp. 296–305, Oct. 1963, doi: 10.1109/TAC.1963.1105614.
[40] “The polynomial toolbox.” Polyx. Accessed: Apr. 30, 2023. [Online] . Available: http://www.polyx.com
[41] J. J. Shynk, “Frequency-domain and multirate adaptive filtering,” IEEE Signal Process. Mag., vol. 9, no. 1, pp. 14–37, Jan. 1992, doi: 10.1109/79.109205.
[42] A. Gilloire and M. Vetterli, “Adaptive filtering in subbands with critical sampling: Analysis, experiments, and application to acoustic echo cancellation,” IEEE Trans. Signal Process., vol. 40, no. 8, pp. 1862–1875, Aug. 1992, doi: 10.1109/78.149989.
[43] F. Rellich, Perturbation Theory of Eigenvalue Problems. New York, NY, USA: Gordon & Breach, 1969.
[44] T. Kato, Perturbation Theory for Linear Operators. Singapore: Springer, 1980.
[45] G. Barbarino and V. Noferini, “On the Rellich eigendecomposition of para-Hermitian matrices and the sign characteristics of *-palindromic matrix polynomials,” Linear Algebra Appl., vol. 672, pp. 1–27, Sep. 2023, doi: 10.1016/j.laa.2023.04.022.
[46] S. Redif, S. Weiss, and J. G. McWhirter, “Sequential matrix diagonalisation algorithms for polynomial EVD of para-Hermitian matrices,” IEEE Trans. Signal Process., vol. 63, no. 1, pp. 81–89, Jan. 2015, doi: 10.1109/TSP.2014.2367460.
[47] F. K. Coutts, J. Corr, K. Thompson, S. Weiss, I. K. Proudler, and J. G. McWhirter, “Memory and complexity reduction in parahermitian matrix manipulations of PEVD algorithms,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2016, pp. 1633–1637, doi: 10.1109/EUSIPCO.2016.7760525.
[48] S. Kasap and S. Redif, “Novel field-programmable gate array architecture for computing the eigenvalue decomposition of para-Hermitian polynomial matrices,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 522–536, Mar. 2014, doi: 10.1109/TVLSI.2013.2248069.
[49] F. K. Coutts, I. K. Proudler, and S. Weiss, “Efficient implementation of iterative polynomial matrix EVD algorithms exploiting structural redundancy and parallelisation,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 12, pp. 4753–4766, Dec. 2019, doi: 10.1109/TCSI.2019.2937006.
[50] S. Weiss, I. K. Proudler, and F. K. Coutts, “Eigenvalue decomposition of a parahermitian matrix: Extraction of analytic eigenvalues,” IEEE Trans. Signal Process., vol. 69, pp. 722–737, Jan. 2021, doi: 10.1109/TSP.2021.3049962.
[51] A. Tkacenko, “Approximate eigenvalue decomposition of para-Hermitian systems through successive FIR paraunitary transformations,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2010, pp. 4074–4077, doi: 10.1109/ICASSP.2010.5495751.
[52] M. Tohidian, H. Amindavar, and A. M. Reza, “A DFT-based approximate eigenvalue and singular value decomposition of polynomial matrices,” EURASIP J. Appl. Signal Process., vol. 1, no. 93, pp. 1–16, Dec. 2013, doi: 10.1186/1687-6180-2013-93.
[53] S. Haykin, Adaptive Filter Theory, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 1991.
[54] K. M. Buckley, “Spatial/spectral filtering with linear constrained minimum variance beamformers,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35, no. 3, pp. 249–266, Mar. 1987, doi: 10.1109/TASSP.1987.1165142.
[55] W. Liu and S. Weiss, Wideband Beamforming: Concepts and Techniques. Hoboken, NJ, USA: Wiley, 2010.
[56] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming,” IEEE Trans. Signal Process., vol. 53, no. 5, pp. 1684–1696, May 2005, doi: 10.1109/TSP.2005.845436.
[57] K. M. Buckley, “Broad-band beamforming and the generalized sidelobe canceller,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, pp. 1322–1323, Oct. 1986, doi: 10.1109/TASSP.1986.1164927.
[58] S. Weiss, A. P. Millar, and R. W. Stewart, “Inversion of parahermitian matrices,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), Aug. 2010, pp. 447–451.
[59] P. P. Vaidyanathan, “Theory of optimal orthonormal subband coders,” IEEE Trans. Signal Process., vol. 46, no. 6, pp. 1528–1543, Jun. 1998, doi: 10.1109/78.678466.
[60] B. Xuan and R. Bamberger, “FIR principal component filter banks,” IEEE Trans. Signal Process., vol. 46, no. 4, pp. 930–940, Apr. 1998, doi: 10.1109/78.668547.
[61] A. Kirac and P. P. Vaidyanathan, “Theory and design of optimum FIR compaction filters,” IEEE Trans. Signal Process., vol. 46, no. 4, pp. 903–919, Apr. 1998, doi: 10.1109/78.668545.
[62] P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. London, U.K.: Springer-Verlag, 2010.
[63] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous speech corpus,” Linguistic Data Consortium, Philadelphia, PA, USA, Corpus LDC93S1, 1993.
[64] J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, “Estimation of room acoustic parameters: The ACE challenge,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 24, no. 10, pp. 1681–1693, Oct. 2016, doi: 10.1109/TASLP.2016.2577502.
[65] F. Jabloun and B. Champagne, “A multi-microphone signal subspace approach for speech enhancement,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2001, pp. 205–208, doi: 10.1109/ICASSP.2001.940803.
[66] Y. Hu and P. C. Loizou, “A subspace approach for enhancing speech corrupted by colored noise,” IEEE Signal Process. Lett., vol. 9, no. 7, pp. 204–206, Jul. 2002, doi: 10.1109/LSP.2002.801721.
[67] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multimicrophone speech enhancement,” IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2230–2244, Sep. 2002, doi: 10.1109/TSP.2002.801937.
[68] T. Nakatani and K. Kinoshita, “A unified convolutional beamformer for simultaneous denoising and dereverberation,” IEEE Signal Process. Lett., vol. 26, no. 6, pp. 903–907, Jun. 2019, doi: 10.1109/LSP.2019.2911179.
[69] S. Weiss, J. Corr, K. Thompson, J. G. McWhirter, and I. K. Proudler. “Polynomial EVD toolbox.” Polynomial Eigenvalue Decomposition. Accessed: Apr. 30, 2023. [Online] . Available: http://pevd-toolbox.eee.strath.ac.uk
Digital Object Identifier 10.1109/MSP.2023.3269200