Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing: A summary of current methodsJie Chen, Min Zhao, Xiuheng Wang, CÃ©dric Richard, Susanto Rahardja40msp02-chen-opener-3208987Â©SHUTTERSTOCK.COM/PAPAPIGSpectral unmixing is central when analyzing hyperspectral data. To accomplish this task, physics-based methods have become popular because, with their explicit mixing models, they can provide a clear interpretation. Nevertheless, because of their limited modeling capabilities, especially when analyzing real scenes with unknown complex physical properties, these methods may not be accurate. On the other hand, data-driven methods using deep learning in particular have developed rapidly in recent years, thanks to their superior capability in modeling complex nonlinear systems. Simply transferring these methods as black boxes to perform unmixing may lead to low interpretability and poor generalization ability. To bring together the best of two worlds, recent research efforts have focused on combining the advantages of both physics-based models and data-driven methods. In this article, we present an overview of recent advances on this topic from various perspectives, including deep neural network (DNN) design, prior capturing, and loss selection. We summarize these methods within a common optimization framework and discuss ways of enhancing our understanding of these methods. The related source codes are made publicly available at http://github.com/xiuheng-wang/awesome-hyperspectral-image-unmixing.IntroductionConventional machine vision applications widely use red-green-blue cameras. These cameras are suitable for identifying objects with shapes and colors consistent with the human visual system. As a significant advancement in imaging techniques, hyperspectral cameras measure objects or scenes by recording hundreds of narrow bands across a wide-range spectrum that can cover the nonvisible range of light. In that way, hyperspectral images enable the identification of materials based on their unique spectral signatures, beyond their visible characteristics, coupled with spatial information [1]. Initial hyperspectral imaging applications started with the remote sensing of Earthâ€™s surface using satellite or airborne cameras, e.g., for mineral exploration, vegetation monitoring, and land analysis. Recently, the subsequent technical advancements and ongoing cost reduction of hyperspectral cameras have made their deployment practical and possible in other emerging applications, such as food safety inspection, medical diagnostics, industrial sorting, drug analysis, biometric forensics, archaeology, and so on.The spectral content of individual pixels in hyperspectral images is typically a mixture of the spectra of multiple materials [2], [3]. This phenomenon, called spectral mixture, is attributed to multiple factors, including mainly the low spatial resolution of hyperspectral imaging devices, the diversity and intimate interactions of materials in the imaged scenes, and multiple photon reflections from layered objects. Separating the spectra of individual pixels into a set of spectral signatures (called endmembers), and determining the abundance fraction of each endmember, is an essential task in quantitative hyperspectral subpixel analysis. This process, denoted as spectral unmixing or mixed pixel decomposition, is currently used in many applications, such as conventional mineral distribution analysis in remote sensing, biodistribution analysis in fluorescence images, food quality and composition control, and pharmaceutical inspection.Diverse solutions have been proposed for the spectral unmixing problem. They can be divided primarily into two categories: physics-based and data-driven methods. Physics-based methods exploit physical models to estimate light scattering and interaction mechanisms among multiple materials in the imaged scene. The observed spectra are explicitly related to the endmembers and abundances, albeit with strong assumptions regarding photon-interacting processes. In practice, it is difficult to accurately model real scenes, and sophisticated models with complex mathematics are usually difficult to estimate. By contrast, data-driven unmixing methods can overcome some limitations of the physics-based methods by directly deriving mixture models from the observed hyperspectral data. These techniques are effective in uncovering the underlying relationships in complicated scenes with abundant data. Nevertheless, explicit and useful knowledge of the physical mixing process is often overlooked in naive data-driven approaches. To reap the benefits of both physics-based and data-driven models, it is essential to properly integrate these techniques to achieve a superior unmixing performance with clear interpretation.The main purpose of this article is to provide an overview of hyperspectral unmixing models and techniques with special attention focused on the integration of physics-based models and data-driven methods. These methods are analyzed from multiple perspectives, including model structure design, prior capturing, and loss function design. In addition to reviewing the materials dispersed in the literature, we establish a mathematical framework that can characterize and relate these methods. This article provides a set of methods that can serve as good examples for understanding the topics and a selection of references that is sufficient to cover the main contributions. The symbols used in this article are provided in Table 1.Table 1. The main symbols used in this article.chen_t1-3208987A general problem formulation of the unmixing problemThis section establishes a general modeling and optimization framework for spectral unmixing. The classical and recent physics-based and data-driven unmixing methods described subsequently are compatible with this framework.Generally, hyperspectral unmixing consists of three essÂential steps: the estimation of the number of endmembers present in the imaged scene, their extraction from available data, and the estimation of their respective abundances. Estimating the number of endmembers can be performed with signal dimension estimation techniques or circumvented with sparse regression methods. Estimating the endmembers and their abundances can be performed sequentially or simultaneously. During the unmixing process, endmembers and abundances are often constrained to preserve physical properties and interpretability. The endmembers represent the pure spectra existing in the imaged scene, implying that the so-called endmember nonnegativity constraint (ENC) is usually imposed. The abundances represent the contribution in percentage of each pure component, meaning that the abundance nonnegativity constraint (ANC) and the abundance sum-to-one constraint (ASC) are often considered. The feasible regions of endmembers and abundances are denoted by

${\Omega}_{M}$

and

${\Omega}_{a},$

with

\[{\Omega}_{M} = {\{}{\boldsymbol{M}}\,{:}\,{\boldsymbol{M}}\geq{\boldsymbol{0}}{\},} \tag{1} \]

\[{\Omega}_{a} = {\{}{\boldsymbol{a}}\,{:}\,{\boldsymbol{a}}\geq{\boldsymbol{0}}{;}\,{a}_{1} + {a}_{2} + \cdots + {a}_{R} = {1}{\},} \tag{2} \]

where

${\cdot}\geq{\boldsymbol{0}}$

denotes elementwise nonnegativity of its matrix or vector argument. The feasible region

${\Omega}_{M}$

is for the ENC, and the feasible region

${\Omega}_{a}$

is for the ANC and ASC. The parameter R denotes the number of endmembers available in the imaged scene.An observed pixel spectrum y can be described by the following general mixture mechanism:

\[{\boldsymbol{y}} = {\cal{F}}^{\ast}{(}{\boldsymbol{M}},{\boldsymbol{a}}{)} + \boldsymbol{\epsilon} \tag{3} \]

where

${\cal{F}}^{\ast}$

defines the inherent photon interaction mechanism parameterized by the endmembers and their abundances. In a nutshell, spectral unmixing aims at determining the endmembers and their respective abundances by considering the inverse process of

${(}{\cal{F}}^{\ast}{)}^{{-}{1}}{:}{\boldsymbol{y}}\rightarrow{\{}\hat{\boldsymbol{M}},\hat{\boldsymbol{a}}{\}}$

under the constraints defined by

${\Omega}_{M}$

and

${\Omega}_{a},$

provided that this inverse is known or can be estimated. From the perspective of mathematical optimization, the general form of the unmixing problem can be formulated in different terms depending on known information and priors.Problem formulation 1Consider a scene with N pixels

$\{{\boldsymbol{y}}_{i}{\}}_{{i} = {1}}^{N}.$

Under the assumption that the mixture mechanism is defined by an explicit transformation

${\cal{F}},$

the optimization formulation of the unmixing problem is given by

\begin{align*}\{{\hat{\boldsymbol{M}}},\{{\hat{\boldsymbol{a}}}_{i}\}_{{i} = {1}}^{N}\} & = \mathop{\text{argmin}}\limits_{{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}}\mathop{\sum}\limits_{{i} = {1}}^{N}{\cal{L}}{(}{\boldsymbol{y}}_{i},\,{\hat{\boldsymbol{y}}}_{i}{)} + {\cal{R}}{(}{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}{)} \\ & \quad {\text{with}} \quad {\hat{\boldsymbol{y}}}_{i} = {\cal{F}}{(}{\boldsymbol{M}},{\boldsymbol{a}}_{i}{)} \\ & \quad {\text{s.t.}}\quad{\boldsymbol{M}}\in{\Omega}_{M}\,{\text{and}}\,{\boldsymbol{a}}_{i}\in{\Omega}_{a}\,{\text{for}}\,{\forall}{i}{,} \tag{4} \end{align*}

where

${L}{(}\cdot,\cdot{)}$

is a loss function defining the dissimilarity between its two arguments

${\cal{R}}$

is a regularizer that incorporates priors on endmembers M and abundances

$\{{\boldsymbol{a}}_{i}{\}}_{{i} = {1}}^{N}.$

Extensive research has been conducted to derive expÂlicit mathematical formulations for describing mixture mechanisms. These physics-based models are constructed with different assumptions regarding how photons interact with scene materials. The works in [2] and [4] provide thorough reviews on physical mixture models and associated unmixing methods. Some typical models are presented here as examples.Linear mixture modelIn the linear mixture model (LMM), the macroscopically pure components are assumed to be homogeneously distributed in separate patches within the view field. Each photon interacts with only one pure material before reaching the observer, without being affected by any other material. The LMM is expressed as follows:

\begin{align*}{\boldsymbol{y}} & = {a}_{1}{\boldsymbol{m}}_{1} + {a}_{2}{\boldsymbol{m}}_{2} + \cdots + {a}_{R}{\boldsymbol{m}}_{R} + \boldsymbol{\epsilon} \\ & = {\boldsymbol{Ma}} + {\boldsymbol{\epsilon}}{;} \tag{5} \end{align*}

that is, the observed spectrum y is a linear combination of

${a}_{j}{\boldsymbol{m}}_{j},$

where

${\boldsymbol{m}}_{j}$

is the spectrum of the jth pure material weighted by its abundance

${a}_{j}{.}$

An additive noise is denoted by

$\boldsymbol{\epsilon}{.}$

A general form of the LMM considers different endmembers at each pixel due to, e.g., spectral variability. It is formulated as

\[{\boldsymbol{y}}_{i} = {\boldsymbol{M}}_{i}{\boldsymbol{a}}_{i} + {\boldsymbol{\epsilon}}_{i}{,} \tag{6} \]

where

${\boldsymbol{M}}_{i}$

, having the same structure as M defined in Table 1, is the endmember matrix at the ith pixel.Bilinear modelBilinear models [5] generalize the LMM by introducing second-order reflection signatures to capture, e.g., photon scatterings caused by multiple vegetation layers in agricultural scenarios. The mixture model is described by

\[{\boldsymbol{y}} = {\boldsymbol{Ma}} + \mathop{\sum}\limits_{{i} = {1}}^{R}\mathop{\sum}\limits_{{j} = {1}}^{R}{{\gamma}_{i,j}}{\boldsymbol{m}}_{i}\odot{\boldsymbol{m}}_{j} + \boldsymbol{\epsilon}{,} \tag{7} \]

where

$\odot$

denotes the Hadamard product, and

${\gamma}_{i,j}$

stands for the nonlinear contribution of the ith and jth endmembers. The expression of

${\gamma}_{i,j}$

allows us to define different bilinear models. For instance,

${\gamma}_{i,j} = {\beta}_{i,j}{a}_{i}{a}_{j}$

yields the generalized bilinear model in [5], where

${\beta}_{i,j}$

controls the interaction strength between the ith and jth endmembers.Hapke modelThe Hapke model has been developed to characterize intimate mixtures with complex photon interactions [6]. The full Hapke model is complex and requires a number of parameters that are generally unavailable in real-world scenes. Under certain reasonable assumptions and simplifications, such as spherical particles and isotropic scatters, the relationship between the bidirectional reflectance y and the single-scattering albedo (SSA) w becomes

\[{\boldsymbol{y}} = {\cal{S}}{(}{\boldsymbol{w}}{)} = \frac{\boldsymbol{w}}{{(}{1} + {2}{\mu}\sqrt{{1}{-}{\boldsymbol{w}}}{)}\left({{1} + {2}{\mu}_{0}\sqrt{{1}{-}{\boldsymbol{w}}}}\right)}{,} \tag{8} \]

where the division is the elementwise operator, and

${\mu}_{0}$

and

${\mu}$

are the cosines of the incoming and outgoing radiation angles, respectively. In spite of being nonlinear in terms of reflectance, the intimate mixture model is linear in the SSA domain and can be expressed as follows:

\[{\boldsymbol{w}} = \mathop{\sum}\limits_{{i} = {1}}^{R}{{a}_{i}}{\boldsymbol{w}}_{i}{.} \tag{9} \]

Unmixing problems with physics-based models frequently fall under Problem formulation 1 when

${\cal{F}}$

is explicitly defined. We now give two examples of linear unmixing problems with known and unknown endmembers, respectively.Example 1 (fully constrained least-squares unmixing)Under the linear model assumption

${\cal{F}}{(}{\boldsymbol{M}},{\boldsymbol{a}}{)} = {\boldsymbol{Ma}}$

with known M , considering independent pixels and using the squared

${\ell}_{2} $

-norm error

${L}{\boldsymbol{(}}{\boldsymbol{y}},\hat{\boldsymbol{y}}{)} = {\Vert}{\boldsymbol{y}}{-}\hat{\boldsymbol{y}}{\Vert}^{2},$

problem (4) reduces to the following fully constrained least-squares (FCLS) problem [7]:

\begin{align*}{\hat{\boldsymbol{a}}} & = \mathop{\text{argmin}}\limits_{\boldsymbol{a}}{\Vert}{\boldsymbol{y}}{-}{\boldsymbol{Ma}}{\Vert}^{2} \\ & \quad\quad {\text{s.t.}}\quad{\boldsymbol{a}}\in{\Omega}_{a}{.} \tag{10} \end{align*}

Example 2 [regularized nonnegative matrix factorization (NMF) unmixing]Under the linear model assumption

${\cal{F}}{(}{\boldsymbol{M}},{\boldsymbol{a}}_{i}{)} = {\boldsymbol{Ma}}_{i}$

for

${i} = {1},\cdots,{N}$

, and using

${L}{(}{\boldsymbol{y}}_{i},{\hat{\boldsymbol{y}}}_{i}{)} = {\Vert}{\boldsymbol{y}}_{i}{-}{\hat{\boldsymbol{y}}}_{i}{\Vert}^{2}$

with a regularization term, problem (4) becomes

\begin{align*}\{{\hat{\boldsymbol{M}}},\{{\hat{\boldsymbol{a}}}_{i}\}_{{i} = {1}}^{N}\} & = \mathop{\text{argmin}}\limits_{{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}}{\Vert}{\boldsymbol{Y}}{-}{\boldsymbol{MA}}{\Vert}_{\cal{F}}^{2} + {\cal{R}}\left({\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}\right) \\ & {\text{s.t.}}\quad{\boldsymbol{M}}\in{\Omega}_{M},\,{\text{and}}\,{\boldsymbol{a}}_{i}\in{\Omega}_{a}{,} \tag{11} \end{align*}

where we notice that

${\Vert}{\boldsymbol{Y}}{-}{\boldsymbol{MA}}{\Vert}_{F}^{2} = {\Sigma}_{{i} = {1}}^{N}{\Vert}{\boldsymbol{y}}_{i}{-}{\boldsymbol{Ma}}_{i}{\Vert}^{2}{.}$

The design of regularizers

${\cal{R}}$

for enhancing the unmixing performance has been extensively studied in the literature. For example, a minimum volume-based term is usually introduced as a geometrical constraint on the endmember estimation [8]. Total variation, sparsity, and low rankness are usually used to regularize the abundance estimation [9].Though the physics-based model and unmixing with Problem formulation 1 have been widely used, it is clear that the real mixture mechanism may not be explicitly known or accurately assumed, and the actual physics-based models have a limited nonlinearity modeling capacity. Simultaneously learning the function

${\cal{F}}$

from data while determining the endmembers and abundances appears to be a remedy to this problem.Problem formulation 2Under the assumption that the mixture mechanism

${\cal{F}}$

is unknown or partially unknown, the optimization formulation that learns

${\cal{F}}$

from the data can be formulated as

\begin{align*}\{{\hat{\boldsymbol{M}}},\{{\hat{\boldsymbol{a}}}_{i}\}_{{i} = {1}}^{N},{\hat{F}}\} & = \mathop{\text{argmin}}\limits_{{\boldsymbol{M}}{,\{}{\boldsymbol{a}}_{i}{\}}_{{i} = {1}}^{N},{\cal{F}}\in{H}}\mathop{\sum}\limits_{{i} = {1}}^{N}{L}{(}{\boldsymbol{y}}_{i},{\hat{\boldsymbol{y}}}_{i}{)} + {R}{(}{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}{)} \\ & \quad\quad {\text{with}}\quad{\hat{\boldsymbol{y}}}_{i} = {\cal{F}}{(}{\boldsymbol{M}},{\boldsymbol{a}}_{i}{)} \\ & \quad\quad {\text{s.t}}{.}\quad{\boldsymbol{M}}\in{\Omega}_{M},\,{\text{and}}\,{\boldsymbol{a}}_{i}\in{\Omega}_{a}, \tag{12} \end{align*}

where

${\cal{H}}$

denotes a functional space where

${\cal{F}}$

lies. Typical nonlinear unmixing problems with conventional machine learning techniques often fall into this category [10], [11].Learning

${\cal{F}}$

from data opens the possibility of determining mixture mechanisms that are difficult to formulate using explicit descriptions and expressions. However, this strategy may fail if

${\cal{F}}$

is not sufficiently parameterized and jointly constrained with physics-based models. Some researchers have specified several forms for

${\cal{F}}$

to make it more interpretable. In addition to the trivial linear form, the additive nonlinear model and postnonlinear model are two major categories that have been extensively investigated.Additive nonlinear modelsModels in this category consider that the mixture consists of a linear mixture with an additive nonlinear component involving the endmembers and their abundances; that is,

\[{\boldsymbol{y}} = {\boldsymbol{Ma}} + {\cal{F}}_{\text{add}}{(}{\boldsymbol{M}},{\boldsymbol{a}}{)} + \boldsymbol{\epsilon}{.} \tag{13} \]

Several subcategories, depending on specific forms for

${\cal{F}}_{\text{add}},$

have been investigated to make the problem tractable, including

The linear-mixture/nonlinear-fluctuation model: This model assumes that the mixture consists of two components, namely, a linear mixture component and a nonlinear fluctuation depending on the interaction between the endmembers; that is,

\[{\boldsymbol{y}} = {\boldsymbol{Ma}} + {\cal{F}}_{\text{add}}{(}{\boldsymbol{m}}_{1},\ldots,{\boldsymbol{m}}_{R}{)} + \boldsymbol{\epsilon}{.} \tag{14} \]

Though useful and attractive, this model has some limitations: the additive nonlinear fluctuation function of this model does not depend on the abundances, and all endmembers contribute uniformly to the nonlinearity of the model.

The generalized linear-mixture/nonlinear-fluctuation model: This form suggests that the contribution of each material to the nonlinear interaction is proportional to its abundance; that is,

\[{\boldsymbol{y}} = {\boldsymbol{Ma}} + {\cal{F}}_{\text{add}}{(}{a}_{1}{\boldsymbol{m}}_{1},\ldots,{a}_{R}{\boldsymbol{m}}_{R}{)} + \boldsymbol{\epsilon}{.} \tag{15} \]

This assumption appears reasonable given that a material with insignificant abundance may contribute minimally to both the linear and nonlinear mixing components in y and vice versa.

Postnonlinear modelsModels in this category consider that a spectrum is produced by distorting the LMM with a nonlinear function, that is,

\[{\boldsymbol{y}} = {\cal{F}}_{\text{post}}{(}{\boldsymbol{Ma}}{)} + \boldsymbol{\epsilon}{.} \tag{16} \]

A specific form can be constructed by setting

${\cal{F}}_{\text{post}} = {I} + {\cal{F}}_{\text{add}},$

leading to the following additive postnonlinear model:

\[{\boldsymbol{y}} = {\boldsymbol{Ma}} + {\cal{F}}_{\text{add}}{(}{\boldsymbol{Ma}}{)} + \boldsymbol{\epsilon}, \tag{17} \]

which can also be viewed as a special case of (15) by setting

${\cal{F}}_{\text{add}}{(}{a}_{1}{\boldsymbol{m}}_{1},\cdots,{a}_{R}{\boldsymbol{m}}_{R}{)} = {\cal{F}}_{\text{add}}{(}{a}_{1}{\boldsymbol{m}}_{1} + \cdots + {a}_{R}{\boldsymbol{m}}_{R}{).}$

Figure 1 shows the hierarchical structure of the preceding categories and how they cover typical physics-based mixture models. The following examples illustrate the relationship between Problem formulation 2 and two typical nonlinear unmixing techniques.chen01-3208987Figure 1. The hierarchical structure of general mixture models and physics-based models. The arrows go from general models to more specific ones. Physics-based models are marked with

$\begin{gathered}{\dagger{.}}\end{gathered}$

Example 3 (K-Hype and regularized K-Hype unmixing)The derivation of the K-Hype algorithm is based on a linear-mixture/nonlinear-fluctuation model of the form (14). The nonlinear fluctuation function

${\cal{F}}_{\text{add}}$

is constrained to be a member of a reproducing kernel Hilbert space (RKHS)

${H}_{RKHS}$

and expressed with kernels. Considering pixels independently and assuming that M is known, the optimization problem is given by

\begin{align*}\{{\hat{\boldsymbol{a}}},{\hat{\cal{F}}}_{\text{add}}\} = & \mathop{\text{argmin}}\limits_{{\boldsymbol{a}},{\cal{F}}_{\text{add}}\in{H}_{RKHS}}\frac{1}{{2}{\mu}}\mathop{\sum}\limits_{{\ell} = {1}}^{L}{e}_{\ell}^{2} + \frac{1}{2}\left({\Vert}{\cal{F}}_{\text{add}}{\Vert}^{2} + {\Vert}{a}{\Vert}^{2}\right) \\ & \quad {\text{with}}\quad{e}_{\ell} = {y}_{\ell}{-}{\boldsymbol{m}}_{{\lambda}_{\ell}}^{\top}{\boldsymbol{a}}{-}{\cal{F}}_{\text{add}}{(}{\boldsymbol{m}}_{{\lambda}_{\ell}}{)} \\ & \quad {\text{s.t.}}\quad{\boldsymbol{a}}\in{\Omega}_{a}{,} \tag{18} \end{align*}

where

${\boldsymbol{m}}_{{\lambda}_{\ell}}$

denotes the

${\ell}$

th row of M , i.e., all reflectances at band

${\lambda}_{\ell}.$

This problem can be solved using the Lagrangian method, which yields an explicit expression relating

$\hat{\boldsymbol{a}}$

and the dual variables and expands

${\hat{F}}_{add}$

as a sequence of kernels parameterized by

$\{{\boldsymbol{m}}_{{\lambda}_{\ell}}{\}}_{{\ell} = {1}}^{L}.$

The K-Hype formulation, along with its sparse variant [10] and its spatially regularized variant in [11], clearly falls into Problem formulation 2.Example 4 (unmixing with manifold learning/geodesic distance)The geodesic distance can be calculated to conduct data unmixing on a manifold rather than in a linear space. This process considers a nonlinear but continuous bijective mapping

${\cal{C}}$

between the linear space of abundance coefficients and a manifold in the spectral space,

\[{\boldsymbol{y}} = {\cal{C}}{(}{\boldsymbol{Ma}}{)} + \boldsymbol{\epsilon}, \tag{19} \]

which is of the form (16) as shown in [12] and [13]. More specifically, defining a K-nearest neighbor graph with the dataset, the geodesic distance can be approximated as the length of the shortest path between two points. The endmembers

$\{{\boldsymbol{m}}_{i}{\}}_{{i} = {1}}^{R}$

span the simplex with the largest volume on the data manifold, and the abundance coefficients of each data to be unmixed can be expressed as volume ratios in this manifold. The manifold formulation also falls into Problem formulation 2 as

$\text{C}$

is learned from the data.Problem formulations 1 and 2 cover physics-based methods or conventional machine learning techniques with handcrafted regularizers. See â€œImplications of Problem Formulations 1 and 2â€. This article focuses more on the integration of physics-based methods with recent machine learning techniques powered by DNNs and other data-driven techniques. Problem formulation 3 encompasses these new methods by introducing additional hierarchical structures on endmembers and abundances.Implications of Problem Formulations 1 and 2Problem formulations 1 and 2 imply that the following three steps are essential for achieving a good unmixing performance:

Determining a mixture model

${\cal{F}}$

that matches the photon interaction mechanism

Defining a loss function that characterizes the dissimilarity between the reconstructed and the observed spectra

Constructing a regularizer that captures the prior information on the variables.

Problem formulation 3Under the assumption that the mixture mechanism

${\cal{F}}$

is unknown or partially unknown, the optimization formulation that 1) jointly learns

${\cal{F}}$

from data and 2) considers hierarchical reparameterization of the endmembers and abundances can be formulated as

\begin{align*}&\{{\hat{\boldsymbol{M}}},{\boldsymbol{\Theta}}_{\boldsymbol{M}}^{\ast},\{{\hat{\boldsymbol{a}}}_{i}\}_{{i} = {1}}^{N},{\boldsymbol{\Theta}}_{\boldsymbol{a}}^{\ast},{\hat{\cal{F}}}\} = \\ & \quad \mathop{\text{argmin}}\limits_{{\boldsymbol{M}},{\Theta}_{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N},{\boldsymbol{\Theta}}_{\boldsymbol{a}},{\cal{F}}\in{H}}\mathop{\sum}\limits_{{i} = {1}}^{N}{\cal{L}}{(}{\boldsymbol{y}}_{i},{\hat{\boldsymbol{y}}}_{i}{)} + {R}{(}{\boldsymbol{M}},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}{)}, \\ & \qquad\qquad\qquad {\text{with}}\quad{\hat{\boldsymbol{y}}}_{i} = {\cal{F}}{(}{\boldsymbol{M}}{(}{\boldsymbol{\Theta}}_{\boldsymbol{M}}{),}{\boldsymbol{a}}_{i}{(}{\boldsymbol{\Theta}}_{\boldsymbol{a}}{)}{)} \\ & \qquad\qquad\qquad {\text{s.t.}}\quad{\boldsymbol{M}}\in{\Omega}_{M},\,{\text{and}}\,{\boldsymbol{a}}_{i}\in{\Omega}_{a} \tag{20} \end{align*}

where

${\boldsymbol{\Theta}}_{\boldsymbol{M}}$

and

${\boldsymbol{\Theta}}_{\boldsymbol{a}}$

are parameters that allow us to determine the endmember matrix M and the abundances

$\{{\boldsymbol{a}}_{i}{\}}_{{i} = {1}}^{N},$

respectively, and

${\cal{H}}$

denotes a given function space. For example, in recent DNN structures,

${\boldsymbol{\Theta}}_{\boldsymbol{a}}$

can represent the encoder in an autoencoder unmixing framework, and

${\boldsymbol{\Theta}}_{\boldsymbol{M}}$

can represent the decoder part that generates the endmembers. Recent physically inspired data-driven methods will be reviewed in the following three sections. Their connections to Problem formulation 3 will also be established. In contrast to conventional machine learning-based unmixing methods, which mostly focus on learning mixture models from data, recent data-driven methods have shown their superiority in improving a variety of aspects including model design, prior capturing, and loss function design.Integrating physics-based models into DNN designIn practice, it appears natural to model

${\cal{F}}^{\ast}$

in (3) using DNNs since they offer an enhanced modeling capacity compared to conventional nonlinear modeling techniques. Instead of using DNNs as black boxes, elegant designs of network structures for modeling (15)â€“(17) are indispensable for preserving physical interpretation and then facilitating the extraction of abundances and endmembers. Deep autoencoder networks are considered as the most appropriate ones and are intensively studied for this purpose. Essentially, an autoencoder consists of an encoder network and a decoder network. The encoder part seeks a low-dimensional representation (e.g., the abundances) to reconstruct the input image. Given an input y ,

\[{\boldsymbol{\varrho}} = {f}_{{\boldsymbol{\Theta}}_{\text{enc}}}{(}{\boldsymbol{y}}{),} \tag{21} \]

where

${f}_{{\Theta}{\text{enc}}}$

is the function representing the encoder with

${\boldsymbol{\Theta}}_{\text{enc}}$

denoting all network parameters. The decoder part reverses the functionality of the encoder by decompressing the hidden representation vector to reconstruct the original input data, namely,

\[\hat{\boldsymbol{y}} = {f}_{{\Theta}{\text{dec}}}{(}{\boldsymbol{\varrho}}{),} \tag{22} \]

where

${f}_{{\Theta}_{\text{dec}}}$

is the function representing the decoder with

${\Theta}_{\text{dec}}$

denoting all network parameters. For instance, an autoencoder can be designed to provide an estimate

${\boldsymbol{\varrho}}$

of the abundances a and to estimate the endmembers and the mixture mechanism with the decoder network. Then, identifications

${\Theta}_{\boldsymbol{a}}\leftarrow{\Theta}_{\text{enc}}$

and

${\cal{F}}\leftarrow{f}_{{\Theta}_{\text{dec}}}$

can be considered to fit these structures into Problem formulation 3.In the following, rather than simply presenting the original autoencoders introduced in existing works as a whole, we explore the designs of encoders and decoders separately to demonstrate their extra flexibility and the potential of combining different designs. The basic structure of an autoencoder and typical encoder and decoder designs for unmixing are summarized in Figure 2. We start with decoders that are, in fact, associated with mixture models. Then, we proceed with the encoders that implicitly benefit from data priors.chen02-3208987Figure 2. The structure of an autoencoder for hyperspectral unmixing and a variety of selections of encoder and decoder designs. FCN: fully connected network; CNN: convolutional neural network.Design of structured decoders to learn models from dataStarting with the decoder of a single linear layer, we progressively demonstrate that the mixture models learned by an autoencoder are actually dependent on the decoder.Linear decoderThe relationship between the input and output of a fully connected linear layer decoder is as follows:

\[{\hat{\boldsymbol{y}}} = {f}_{{\Theta}_{\text{dec}}}{(}{\boldsymbol{\varrho}}{)} = {W}{\boldsymbol{\varrho}}{.} \tag{23} \]

As shown in D1 of Figure 2, this form is consistent with the LMM defined in (5). Consequently, the learning process of the autoencoder, which minimizes the gap between

$\hat{\boldsymbol{y}}$

and

${\boldsymbol{y}}$

, leads to

${\boldsymbol{\varrho}}$

as an estimate of a and W as an estimate of the endmember matrix M. Despite the fact that the linear unmixing problem has been extensively studied with various classes of methods [14], [15], [16], [17], [18], [19], this structure motivates the other sophisticated designs that learn structured models from the data.Postnonlinear decoderWith the linear decoder at hand, one can connect it to additional nonlinear layers to create the following postnonlinear structure:

\[{\hat{\boldsymbol{y}}} = {f}_{{\boldsymbol{\Theta}}{\text{dec}}}{(}{\boldsymbol{\varrho}}{)} = \Psi{(}{\boldsymbol{W}}{\boldsymbol{\varrho}}{),} \tag{24} \]

where

$\Psi$

denotes an implicit mapping function represented by the subsequent network layers with nonlinear activation functions. Parameters

${\boldsymbol{\Theta}}_{\text{dec}}$

can be determined by training the autoencoder so that the nonlinear function is learned from the data. Compared with the postnonlinear mixture model defined in (16),

$\Psi$

is learned to mimic the generative model

${\cal{F}}_{\text{post}},$

and the endmembers are obtained from the weights of the first layer of the decoder [20]. The structure that forms this process is shown by D2 in Figure 2.Additive nonlinear decodersInstead of adding subsequential nonlinear layers, a network branch can be added in parallel to the linear layer for constructing additive nonlinear structures. However, the parameters coupled in the linear and the nonlinear components should be carefully addressed.

Additive postnonlinear decoder: Passing the linear component through a nonlinear network and adding the output to the linear component leads to the following result:

\[\hat{\boldsymbol{y}} = {\boldsymbol{W}}{\boldsymbol{\varrho}} + \Psi{(}{\boldsymbol{W}}{\boldsymbol{\varrho}}{)}{.} \tag{25} \]

The structure that conducts this process is depicted in D3 of Figure 2. Comparing the output of this decoder to the model in (17), the endmember can be estimated from the network weights of the linear component, and

$\Psi$

mimics

${\cal{F}}_{\text{add}}$

in (17) once the network is learned [21].

Generalized additive nonlinear decoder: Constructing a decoder that mimics the generalized linear-mixture/nonlinear-fluctuation model (15) requires more specific designs. First, a block diagonal matrix

${\boldsymbol{W}}_{b} = {\text{bdiag}}{\{}{\boldsymbol{w}}_{1},\ldots,{\boldsymbol{w}}_{R}{\}}$

is considered for the weights of the first linear layer, which is then sparsely connected. Consequently, the output of this layer is a collection of

${\boldsymbol{w}}_{i}$

weighted by

${\varrho}_{i},$

that is,

\[{\boldsymbol{W}}_{b}{\boldsymbol{\varrho}} = {\left[{{\varrho}_{1}{\boldsymbol{w}}_{1}^{\top},{\varrho}_{2}{\boldsymbol{w}}_{2}^{\top},\ldots,{\varrho}_{R}{\boldsymbol{w}}_{R}^{\top}}\right]}^{\top}{.} \tag{26} \]

On one hand, the linearly mixed component of the model can be obtained by adding all R-spaced entries of

${\boldsymbol{W}}_{b}{\boldsymbol{\varrho}}$

such that

\[{\cal{T}}\left({\boldsymbol{W}}_{b}{\boldsymbol{\varrho}}\right) = {\boldsymbol{W}}{\boldsymbol{\varrho}} \tag{27} \]

with

${\cal{T}}{:}{\mathbb{R}}^{LR}\mapsto{\mathbb{R}}^{L}$

defined as

${\cal{T}}{(}{\boldsymbol{x}}{)} = {[}{\Sigma}_{{i} = {1}}^{R}{x}_{{R}{(}{i}{-}{1}{)} + {1}},\,{\Sigma}_{{i} = {1}}^{R}{x}_{{R}{(}{i}{-}{1}{)} + {2}},\ldots,{\Sigma}_{{i} = {1}}^{R}{x}_{{R}{(}{i}{-}{1}{)} + {L}}{]}^{\top}{.}$

On the other hand, the weighted nonlinear fluctuation component is obtained by feeding

${\boldsymbol{W}}_{b}{\boldsymbol{\varrho}}$

to a nonlinear network representing

$\Psi{.}$

Finally, the sum of these two components,

\[\hat{\boldsymbol{y}} = {\cal{T}}{\boldsymbol{(}}{\boldsymbol{W}}_{b}{\boldsymbol{\varrho}}{)} + \Psi{(}{\boldsymbol{W}}_{b}{\boldsymbol{\varrho}}{)}, \tag{28} \]

is consistent with the model in (15). The endmembers are estimated by W, and the nonlinear mixing function

${\cal{F}}_{\text{add}}$

is estimated by

$\Psi$

from training data [22]. The network structure that achieves this design is depicted in D5 of Figure 2.

Bilinear decoder: Designing a decoder to mimic the bilinear model defined in (7) leads to an autoencoder formulated as

\[\hat{\boldsymbol{y}} = {\boldsymbol{W}}{\boldsymbol{\varrho}} + {\boldsymbol{Db}}, \tag{29} \]

where

${\boldsymbol{D}} = {[}{\boldsymbol{d}}_{1},{\boldsymbol{d}}_{2},\ldots,{\boldsymbol{d}}_{c}{]}\in{\mathbb{R}}^{{L}\times{c}}$

is a cross-product matrix involving all pairs of endmembers, i.e.,

${\boldsymbol{d}}_{i,j} = {\boldsymbol{w}}_{i}\odot{\boldsymbol{w}}_{j}$

with

${c} = {{R}{(}{R}{-}{1}{)}}{/}{2},$

and

${\boldsymbol{b}} = {[}{b}_{1},{b}_{2},\ldots,{b}_{c}{]}\in{\mathbb{R}}^{c}$

denotes the interaction abundance matrix. The structure of this model can be seen from D4 in Figure 2. Referring to (7), the weights of the decoder provide estimates of the endmembers and their cross products. Vector b provides estimates of the cross products of all pairs of abundances

${\boldsymbol{b}} = {[}{\varrho}_{1}{\varrho}_{2},{\varrho}_{2}{\varrho}_{3},\ldots,{\varrho}_{R}{\varrho}_{1}{]}$

or the latent features learned from the data [23], [24].

Endmember-generating decoderInstead of being extracted from the weights of a decoder, the endmembers can also be estimated from a subnetwork; that is,

\[\hat{\boldsymbol{y}} = {f}_{{\Theta}_{\text{end}}}{(}{\boldsymbol{z}}{)}{\boldsymbol{\varrho}} = \Psi{(}{\boldsymbol{z}}{)}{\boldsymbol{\varrho}}, \tag{30} \]

where

${f}_{{\boldsymbol{\Theta}}_{\text{end}}}$

represents the endmember generative model, with

${\boldsymbol{\Theta}}_{\text{end}}$

denoting the network parameters. This subnetwork can be a regular or variational autoencoder (VAE) that maps pixels to endmembers [25], [26] or a pretrained decoder to generate the endmembers from a latent vector [27]. The variable z denotes the input of this subnetwork, which can be pixels, candidate endmembers, or latent representations. This decoder is also a powerful tool to cope with spectral variability [26], [27]. Compared with Problem formulation 3, we identify

${\boldsymbol{\Theta}}_{\boldsymbol{M}}\leftarrow{\boldsymbol{\Theta}}_{\text{end}},$

leading to a hierarchical structure

$\boldsymbol{M}({\boldsymbol{\Theta}}_{\boldsymbol{M}})$

.Design of encoders to integrate priors from dataThe encoder part is designed to map the input spectra into a latent representation space, which is often considered as the abundance space. The encoder implicitly acts as a regularizer relating abundances and spectra and imposes a hierarchical structure on a with parameters

${\boldsymbol{\Theta}}_{a},$

i.e., of the form

${\boldsymbol{a}}{(}{\boldsymbol{\Theta}}_{\boldsymbol{a}}{)}{.}$

The encoder part has been designed in the following ways in the literature.Fully connected network-based encodersEarly works [14], [15], [16], [17] employ fully connected layers to construct the encoder. The latent representation of the kth layer is given by

\begin{align*}&{\boldsymbol{h}}_{k} = {\sigma}_{k}{(}{\boldsymbol{U}}_{k}{\boldsymbol{h}}_{{k}{-}{1}} + {\boldsymbol{b}}_{k}{)} \\ & {\text{with}}\quad{\boldsymbol{h}}_{0} = {\boldsymbol{y}}{,} \tag{31} \end{align*}

where

${\sigma}_{k}$

represents the nonlinear activation function, and

${\boldsymbol{U}}_{k}$

and

${\boldsymbol{b}}_{k}$

stand for the weights and bias of the kth layer, respectively. The output of the final layer is denoted by

${\boldsymbol{\varrho}}$

and is connected to the decoder. Some variants have also been designed based on this structure. For example, in [14] and [15], a sparse encoder is used to enforce abundance sparsity. The work [15] applies a denoising encoder to address the problem of noisy data and outliers in hyperspectral data.Convolutional neural network-based encodersConvolutional neural networks (CNNs) have also been used for designing the encoder [18], [19], [22]. Encoder inputs can be the entire image or overlapping patches. A 2D or 3D convolution operator operates on 3D image cubes

${\mathbb{Y}}_{n}$

as follows:

\begin{align*}&{\mathbb{H}}_{k} = {\sigma}_{k}{(}{\mathbb{H}}_{{k}{-}{1}}\,{\circledast}\,{\mathbb{U}}_{k} + {\mathbb{B}}_{k}{)} \\ & {\text{with}}\quad{\mathbb{H}}_{0} = {\mathbb{Y}}_{n}, \tag{32} \end{align*}

where

${\mathbb{U}}_{k}$

denotes a convolution kernel, and

${\mathbb{B}}_{k}$

represents the bias parameters. As 2D convolutional kernels may cause local spectral distortion, 3D-CNN encoders are used in [19] and [22] to jointly learn the spatial structures and local spectral features of 3D input images.Recurrent neural network encodersRecurrent neural networks, such as long short-term memory (LSTM) networks, benefit from memory cells and are better suited for processing sequences. With their recurrent hidden states depending on previous steps, they can take full advantage of the sequential characteristics in hyperspectral data. The abundances are estimated with an LSTM to encode spectral correlation and band-to-band variability information [21].Kernel-layer-based encodersAnother class of approaches uses kernel layers to process hyperspectral data. Such layers can map the input pixels into a manifold and estimate the abundances according to a Riemannian metric. For example, in [24], the kernel layer is defined as

\[{a}_{i} = \exp\left({-}{\beta}\,{\cdot}\,{\Vert}{\boldsymbol{y}}_{i}{-}{\boldsymbol{c}}_{i}{\Vert}^{2}\right), \tag{33} \]

where

${\beta}$

is the kernel parameter and

${c}_{i}$

is the centroid related to the ith endmember. The smaller the distance between the mixed pixel and the endmember centroid, the larger the corresponding abundance.Variational encodersA variational encoder encodes each input pixel with a distribution in the latent space as opposed to a single vector. In practice, it is common to consider Gaussian distributions parameterized by their means and covariance matrices. The work in [16] uses a variational encoder to infer abundances. In [26], a probabilistic generative model is used to address endmember variability and provide more accurate abundance and endmember estimates.Multibranch encodersSome works use multibranch encoders to capture different features by sharing weights or concatenating features. In [28], an endmember-guided encoder is used to transfer the endmember information into a parallel abundance estimate encoder. The work [29] uses a dual-branch encoder, where one branch consists of fully connected layers to extract spectral information, and the other branch consists of 2D convolution layers to capture spatial information.RemarkThe ANC and ASC are usually addressed at the last layer of the encoder either by using specific activation functions [e.g., a rectified linear unit (ReLU), a sigmoid, or an absolute operator] to enforce the ASC followed by a normalization layer to enforce the ANC or by using SoftMax to enforce the ANC and ASC simultaneously. They can also be addressed approximatively via a penalty term in the loss function.Example 5 (a 3D-CNN-based autoencoder network for additive nonlinearity unmixing)Considering the 3D-CNN-based encoder (E2 in Figure 2) and the generalized additive nonlinear decoder defined in (28) (D5 in Figure 2) leads to the algorithm proposed in [22].Example 6 (a denoising autoencoder with sparsity for spectral unmixing)Considering the denoising and sparse encoder and the linear decoder defined in (23) (E1 and D1 in Figure 2) leads to the method proposed in [15].Example 7 (a kernelization and cross-product layer-based autoencoder for bilinear and postnonlinear unmixing)Considering the kernel-layer-based encoder (E4 in Figure 2), the bilinear nonlinear decoder defined in (29) (D4 in Figure 2), and the postnonlinear decoder defined in (25) (D3 in Figure 2) yield the method proposed in [24].Example 8 (a probabilistic generative model-based autoencoder for spectral unmixing with spectral variability)A two-stream autoencoder framework with a probabilistic generative model (E3 and D6 in Figure 2) that copes with spectral variability leads to the method proposed in [26].Example 9 (a multitask-based autoencoder network for bilinear spectral unmixing)The combination of two deep autoencoders with a multitask learning framework leads to the method in [23]. One of the autoencoders models the linear components and estimates the endmembers as well as their abundances, while the other one models the bilinear components defined in (29) and estimates interaction abundances.In addition to the regular autoencoders that have been described so far, some variants of autoencoder frameworks also have been proposed for spectral unmixing. For instance, in [16], a group of autoencoders is first applied to address issues regarding outliers and generate a good initialization. Then, an autoencoder with a single-layer linear decoder is used to perform the process of unmixing. To improve the unmixing performance, two cascaded autoencoders are utilized in an end-to-end fashion to address cycle-consistency constraints in [30]. In [31], the endmembers are extracted using a geometric endmember extraction method, and the abundances are estimated using a deep image prior based on a CNN (depicted by E5 in Figure 2). In [32], a two-stream network, with one stream for spectral information and the other one for exploiting spatial information, is collaboratively learned to make full use of the spectral and spatial information. The work [33] constructs an attention network to fuse the information of lidar data into the unmixing process. See â€œUnmixing With Deep Autoencoder Networksâ€ for summarized insights for the roles of the encoder and decoder in the unmixing.Unmixing With Deep Autoencoder NetworksRemarks on unmixing with deep autoencoder networks are as follows:

The underlying unmixing model used in an autoencoder network is determined by the structure of the decoder, independently of the encoder.

The encoder implicitly incorporates the inference relation from the observed spectra to the abundances and can thus be considered as a prior for unmixing.

Unmixing with an autoencoder falls into Problem formulation 3 with nonlinear function

$\mathcal{F}$

and parameters

${\boldsymbol{\Theta}}_{\boldsymbol{M}}$

being associated with the decoder network, and parameters

${\boldsymbol{\Theta}}_{\boldsymbol{a}}$

learned by the encoder network.

Prior information learning with data-driven approachesIn addition to modeling the spectral mixture mechanism, DNNs are superior to conventional machine learning techniques in extracting information from data priors. It is intriguing to use DNNs to explicitly learn prior knowledge from data and use it with physical model-based inversion algorithms. Instead of designing sophisticated regularizers, the plug-and-play paradigm uses a denoiser with deep architectures to learn an image prior and then incorporates the denoiser into an iterative optimization of the problem in (4). On the other hand, deep unrolling algorithms aim to unfold an iterative optimization of the problem in (4) with a specific regularizer into a trainable end-to-end deep network.Incorporating data-driven priors with denoisersBenefiting from the variable splitting technique, the plug-and-play framework makes it possible to exploit the priors learned from training data to solve various image restoration problems. State-of-the-art denoising algorithms, especially fast-speed and powerful CNN-based denoisers, are usually plugged into this framework as the proximity operator that captures the intrinsic spatial and spectral structures of hyperspectral images. By leveraging the flexibility of model-based optimization and the powerful learning ability of CNN, this framework has shown great potential in performing unmixing tasks. Among the three problem formulations, the form based on Problem formulation 1 is considered here for illustration because of its simplicity. In the plug-and-play framework, an auxiliary variable Z is introduced in (4) to replace A in the regularization term, with an extra constraint

${\boldsymbol{Z}} = {\boldsymbol{A}}$

to ensure the equivalence of the optimization problems. Using the alternating direction method of multipliers (ADMM) results in the following iterative updates:

\begin{align*}\{{\boldsymbol{M}}^{{(}{k} + {1}{)}},\,{\boldsymbol{A}}^{{(}{k} + {1}{)}}\} = & \mathop{\text{argmin}}\limits_{{M},\{{\boldsymbol{a}}_{i}\}_{{i} = {1}}^{N}}\mathop{\sum}\limits_{{i} = {1}}^{N}{\cal{L}}{(}{\boldsymbol{y}}_{i},{\hat{\boldsymbol{y}}}_{i}{)} + \frac{\rho}{2}{\Vert}{\boldsymbol{A}}{-}{\boldsymbol{Z}}^{(k)} + {\boldsymbol{V}}^{{(}{k}{)}}{\Vert}_{F}^{2} \\ {\text{with}}\quad{\hat{\boldsymbol{y}}}_{i} = & {\cal{F}}{(}{\boldsymbol{M}},{\boldsymbol{a}}_{i}{)} \\ {\text{s.t.}}\quad{\boldsymbol{M}}\in&{\Omega}_{M}\,{\text{and}}\,{\boldsymbol{a}}_{i}\in{\Omega}_{a}, \tag{34a} \\ {\boldsymbol{Z}}^{{(}{k} + {1}{)}} = & \mathop{\text{argmin}}\limits_{\boldsymbol{Z}}\frac{\rho}{2}{\Vert}{\boldsymbol{Z}}{-}{(}{\boldsymbol{A}}^{{(}{k} + {1}{)}} + {\boldsymbol{V}}^{(k)}{)}{\Vert}_{F}^{2} + {\cal{R}}{(}{\boldsymbol{Z}}{),} \tag{34b} \\ {\boldsymbol{V}}^{{(}{k} + {1}{)}} = & {\boldsymbol{V}}^{(k)} + {\boldsymbol{A}}^{{(}{k} + {1}{)}}{-}{\boldsymbol{Z}}^{{(}{k} + {1}{)}}, \tag{34c} \end{align*}

where the superscript

${}^{{(}{k}{)}}$

refers to the iteration index,

${\mathit{\rho}}$

is the penalty parameter, and

${\boldsymbol{V}}^{(k)}$

is the dual variable to be updated at iteration k. In this formulation, the regularizer

${\cal{R}}{(}\boldsymbol{A}{)}$

can be implicitly defined by a plugged denoising operator Denoiser, and (34b) can be seen as the denoising procedure of

${(}{\boldsymbol{A}}^{{(}{k} + {1}{)}} + {\boldsymbol{V}}^{(k)}{)}$

and replaced by

\[{\boldsymbol{Z}}^{{(}{k} + {1}{)}} = {\text{Denoiser}}\,{(}{\boldsymbol{A}}^{{(}{k} + {1}{)}} + {\boldsymbol{V}}^{(k)}{)}{.} \tag{35} \]

Generally, Denoiser can be any off-the-shelf denoising operator. This offers the opportunity of incorporating CNN-based denoisers with powerful prior learning ability into the physical model-based iterative optimization. In particular, the work [34] considers the LMM assumption in (5) with known M and incorporates conventional or deep learning-based denoisers with two strategies, namely, a direct regularizer on A and a regularizer on reconstructed spectra, i.e., MA . Considering the case of an unknown M , the work in [35] jointly estimates M and exploits a handcrafted sparsity prior with the regularizer

${\Vert}{\boldsymbol{A}}{\Vert}_{2,1}$

and a denoiser prior with implicit regularizer

${\cal{R}}(\boldsymbol{A})$

in the plug-and-play framework to produce more stable unmixing results. The work in [36] proposes a nonlinear plug-and-play unmixing method based on the general bilinear model in (7), and both traditional and CNN-based denoisers are plugged into the iterative optimization.Unrolling iterative optimization into learnable deep architecturesUnlike plug-and-play algorithms, the deep unrolling paradigm unfolds and truncates iterative optimization algorithms into trainable deep architectures. Considering the LMM in (5) with known M , deep unrolling methods solve the unmixing problem in (10) with the

${\Vert}{\boldsymbol{A}}{\Vert}_{1}$

sparsity regularizer using iterative algorithms, such as the iterative soft thresholding algorithm (ISTA) and the ADMM algorithm. In [37], by introducing an auxiliary variable z such that

${\boldsymbol{z}} = {\boldsymbol{a}}$

along with a dual variable v , the optimization problem is iteratively solved by the ADMM with the following steps:

\begin{align*}{\boldsymbol{a}}^{{(}{k} + {1}{)}} = & {\boldsymbol{Wy}} + {\boldsymbol{B}}{(}{\boldsymbol{z}}^{(k)} + {\boldsymbol{v}}^{(k)}{),} \tag{36a} \\ {\boldsymbol{z}}^{{(}{k} + {1}{)}} = & \max{(}{\text{soft}}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}}{-}{\boldsymbol{v}}^{(k)},{\lambda}{/}{\rho}{),}{0}{),} \tag{36b} \\ {\boldsymbol{v}}^{{(}{k} + {1}{)}} = & {\boldsymbol{v}}^{(k)}{-}{\eta}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}}{-}{\boldsymbol{z}}^{{(}{k} + {1}{)}}{),} \tag{36c} \end{align*}

where

${\lambda}$

is the regularization parameter,

${\rho}$

denotes the penalty parameter,

${\eta}$

is a parameter offering additional flexibility, and

${\boldsymbol{W}} = {(}{\boldsymbol{M}}^{\top}{\boldsymbol{M}} + {\mu}{\boldsymbol{I}}{)}^{{-}{1}}{\boldsymbol{M}}^{\top}$

and

${\boldsymbol{B}} = {(}{\boldsymbol{M}}^{\top}{\boldsymbol{M}} + {\mu}{\boldsymbol{I}}{)}^{{-}{1}}\,{\rho}{.}$

Operator

${(}{\cdot}{)}$

in (36b) is the soft-threshold operator given by

${\text{soft}}{(}{\boldsymbol{x}},{\lambda}{/}{\rho}{)} = {\text{sign}}{(}{\boldsymbol{x}}{)(}{\vert}{\boldsymbol{x}}{\vert}{-}{\lambda}{/}{\rho}{)}_{+}{.}$

The key feature of deep unrolling unmixing algorithms is to leverage a reparameterization of these iterative update steps through the consecutive DNN layers (denoted as

${f}_{\boldsymbol{a}},{f}_{\boldsymbol{z}},\,{\text{and}}\,{f}_{\boldsymbol{v}}{)}$

with learnable parameters

${\{}{\boldsymbol{W}}^{{(}{k} + {1}{)}},{\boldsymbol{B}}^{{(}{k} + {1}{)}},{\theta}^{{(}{k} + {1}{)}},{\eta}^{{(}{k} + {1}{)}}{\},}$

respectively:

\begin{align*}{\boldsymbol{a}}^{{(}{k} + {1}{)}} = & {f}_{\boldsymbol{a}}{(}{\boldsymbol{z}}^{(k)},{\boldsymbol{v}}^{(k)},{\boldsymbol{y}}{;}{\boldsymbol{W}}^{{(}{k} + {1}{)}},{\boldsymbol{B}}^{{(}{k} + {1}{)}}{)} \\ = & {\boldsymbol{W}}^{{(}{k} + {1}{)}}{\boldsymbol{y}} + {\boldsymbol{B}}^{{(}{k} + {1}{)}}{(}{\boldsymbol{z}}^{(k)} + {\boldsymbol{v}}^{(k)}{),} \\ {\boldsymbol{z}}^{{(}{k} + {1}{)}} = & {f}_{\boldsymbol{z}}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}},{\boldsymbol{v}}^{(k)}{;}{\theta}^{{(}{k} + {1}{)}}{)} \\ = & {\text{ReLU}}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}}{-}{\boldsymbol{v}}^{(k)}{-}{\theta}^{{(}{k} + {1}{)}}{\boldsymbol{I}}{),} \\ {\boldsymbol{v}}^{{(}{k} + {1}{)}} = & {f}_{\boldsymbol{v}}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}},{\boldsymbol{z}}^{{(}{k} + {1}{)}},{\boldsymbol{v}}^{(k)}{;}{\eta}^{{(}{k} + {1}{)}}{)} \\ = & {\boldsymbol{v}}^{(k)}{-}{\eta}^{{(}{k} + {1}{)}}{(}{\boldsymbol{a}}^{{(}{k} + {1}{)}}{-}{\boldsymbol{z}}^{{(}{k} + {1}{)}}{),} \tag{37} \end{align*}

where

${\theta}^{{(}{k} + {1}{)}}$

is a learnable parameter that plays the role of

${\lambda}{/}{\rho},$

and

${\text{ReLU}}{(}{\cdot}{)}$

is a componentwise ReLU operation. These specifically designed layers are then cascaded into an entire DNN architecture for unmixing, which can be trained with supervised loss functions in an end-to-end manner. For the blind unmixing problem in (11), the work in [38] builds a nonconvex sparse NMF model using

${\Vert}{\boldsymbol{A}}{{\Vert}}_{p}$

with

${0}\,{<}\,{p}\,{<}\,{1}$

and designs an interpretable sparsity constrained NMF network based on the ISTA to jointly estimate M and a .Integrating loss learned from data into physics-based inverse problemsThe loss functions

${\cal{L}}$

in (4), (12), and (20) also play a crucial role in determining the unmixing performance. In addition to the data-driven model, data-learned loss functions have been proposed recently to improve the unmixing performance, particularly for Problem formulation 3. The subsequent review focuses on autoencoder-based works as they mostly refer to Problem formulation 3.General geometric distances as cost functionsGeometric distances, which are usually used as the essential loss in autoencoder-based unmixing methods, enforce the network to reconstruct the input image. The works [14], [15], and [16] use the mean-squared error (MSE) to measure the similarity between the input and reconstructed pixels; that is,

\[{\cal{L}}_{\text{MSE}} = \frac{1}{N}\mathop{\sum}\limits_{{i} = {1}}^{N}{\Vert}{\boldsymbol{y}}_{i}{-}{\hat{\boldsymbol{y}}}_{i}{\Vert}^{2}{.} \tag{38} \]

In contrast to the MSE, which is sensitive to the scale of spectra, the spectral angle distance (SAD) and spectral information divergence (SID) are scale invariant. The SAD considers the angle between two spectra as a spectral similarity measure:

\[{\cal{L}}_{\text{SAD}} = \frac{1}{N}\mathop{\sum}\limits_{{i} = {1}}^{N}{\text{arccos}}\left(\frac{\left\langle{\boldsymbol{y}}_{i},{\hat{\boldsymbol{y}}}_{i}\right\rangle}{{\Vert}{\boldsymbol{y}}_{i}{\Vert}{\Vert}{\hat{\boldsymbol{y}}}_{i}{\Vert}}\right){.} \tag{39} \]

The SID models the spectra as a probability distribution and considers the band-to-band spectral variability as the uncertainty existing in random variables to lower the gap between input and reconstructed pixels,

\[{\cal{L}}_{\text{SID}} = \frac{1}{N}\mathop{\sum}\limits_{{i} = {1}}^{N}{{\boldsymbol{p}}_{i}}{\text{log}}\left({\frac{{\boldsymbol{p}}_{i}}{{\hat{\boldsymbol{p}}}_{i}}}\right), \tag{40} \]

where

${\boldsymbol{p}} = \left({\boldsymbol{y}/{\boldsymbol{1}}^{\top}\boldsymbol{y}}\right)$

and

$\hat{\boldsymbol{p}} = \left({\hat{\boldsymbol{y}}/{\boldsymbol{1}}^{\top}\hat{\boldsymbol{y}}}\right)$

represent the probability distribution vector of the input and estimated pixels, respectively. Nevertheless, these conventional losses are defined at the pixel level based on basic statistical assumptions that disregard inherent image structures.Deep learning metrics as cost functionsGenerative adversarial networks (GANs) have recently been used to address the limitations of geometric distance loss functions. A GAN consists of two components, a generator and a discriminator, and it models the distribution of data using DNNs. The generator

$\text{G}$

is trained to generate samples that are similar to real data to confuse the discriminator. The discriminator

$\text{D}$

receives samples from the real data distribution (positive data) and samples from the generator (negative data) and attempts to distinguish between the real and generated samples. These two networks engage in a minâ€“max adversarial game during the training process that can be expressed as follows:

\[\mathop{\min}\limits_{\cal{G}}\mathop{\max}\limits_{\cal{D}}{\text{E}}_{{\boldsymbol{z}}{âˆ¼}{q}{(}{\boldsymbol{z}}\mid{\boldsymbol{y}}{)}}{[}\log{\cal{D}}{(}{\boldsymbol{z}}{)]} + {E}_{\boldsymbol{z}âˆ¼p(\boldsymbol{z})}{[}\log{(}{1}{-}{\cal{D}}{(}{\cal{G}}{(}{\boldsymbol{z}}{)))].} \tag{41} \]

The architecture of unmixing methods using adversarial loss usually contains two terms: an autoencoder and a discriminator. The former, considered to be the generator, aims to estimate the abundances and extract the endmembers. The latter is a deep binary classifier and is used to distinguish between generated and real samples. Thus, the discriminator may be able to transfer intrusive properties of real data that are useful for unmixing. The loss function utilized for generator training is always composed of two terms,

\[{\cal{L}}_{\cal{G}} = {\cal{L}}_{\text{AE}} + {\cal{L}}_{\text{adv}}, \tag{42} \]

where

${\cal{L}}_{\text{adv}} = {(}{1}{/}{N}{)}{\Sigma}_{{i} = {1}}^{N}\log{\cal{D}}\left({{\boldsymbol{z}}_{i}}\right) + {(}{1}{/}{N}{)}{\Sigma}_{{i} = {1}}^{N}\log\left({{1}{-}{\cal{D}}\left({{\hat{\boldsymbol{z}}}_{i}}\right)}\right)$

is the loss function that is used to generate samples

$\hat{\boldsymbol{z}}$

that confuse the discriminator, and

${\cal{L}}_{\text{AE}}$

is the reconstruction loss function, e.g., the geometric cost function. The loss function for training the discriminator is defined as

${\cal{L}}_{\cal{D}} = {-}{\cal{L}}_{\text{adv}}{.}$

The work [39] introduces the Wasserstein distance calculated from the discriminator as a regularization term to characterize the distribution similarity between the input and reconstructed spectra. In [40], an adversarial network is utilized to model the abundance distribution.Recent works have introduced the perception mechanism to discover the discrepancy between reconstructed pixels and their corresponding ground truth (input pixels). This strategy allows us to relax pixel-level reconstruction and enhance the unmixing performance using an end-to-end process. The perceptual similarity can be viewed as a feature-matching operator that compares the perceptual features of the reconstructed output pixels to those of the input pixels. The expression for the perceptual loss defined in a feature domain is

\[{\cal{L}}_{\text{Perceptual}} = \frac{1}{N}\mathop{\sum}\limits_{{i} = {1}}^{N}{\Vert}{\cal{P}}{(}{\boldsymbol{y}}_{i}{)}{-}{\cal{P}}{(}{\hat{\boldsymbol{y}}}_{i}{)}{\Vert}^{2}, \tag{43} \]

where

${\cal{P}}$

stands for a feature extractor. For example, in [30], a cycle-consistency strategy is used to further refine the detailed information in the unmixing process. In [39], the features extracted from the hidden layers of a discriminator are used to ensure the consistency of high-level representations.Discussion and perspectivesThis article reviewed hyperspectral unmixing methods with special attention focused on the integration of physics-based and data-driven methods. A mathematical framework was established to introduce a variety of existing works with a unified formalism. As summarized in this article, notable advances have been recently achieved in this topic. Table 2 lists some features of the surveyed methods and summarizes some of their pros and cons. However, there are still several important aspects that merit further discussion and investigation.Table 2. Features of the surveyed methods.chen_t2-3208987Performance evaluationEvaluating the performance of hyperspectral unmixing algorithms has always been nontrivial since few research works have been devoted to producing ground-truth data [41]. Performance evaluation becomes even more difficult when considering data-driven methods. Preliminary comparison results of several autoencoder-based unmixing methods with synthetic and real datasets have been provided recently in [42]. This study demonstrates the significance of using appropriate loss functions, nonlinear methods, spatial information, and proper initialization. However, thorough and objective evaluation studies are still missing.Model selectionReal data usually involve multiple mixing models in complex scenes. It is anticipated that data-driven models, such as the generalized linear-mixture/nonlinear-fluctuation model, will exhibit superior scene-specific adaptability. Nevertheless, it is challenging, if not impossible, to preselect an optimal model in advance. Using a mixing model with parallel multiple branches and an attention module to balance the contribution of each mixing model is a possible strategy to address this issue. Utilizing superpixel-based strategies and adopting distinct models within each superpixel and at its borders is an alternative solution.A fully integrated frameworkThis article reviews the use of autoencoders to learn mixture models from data, the plug-and-play method to learn priors from data, and perception dissimilarity characterized by features learned from data. Existing works include one or two of these techniques. Investigating an elegant strategy combining physics-based and data-driven methods that benefits from all of these factors is still missing. This requires a deeper understanding and original methods in both aspects of optimization and neural network design.Fast implementationCompared to classical optimization methods, data-driven approaches using DNNs require significantly higher computational resources, especially when multiple data-driven modules collaborate to perform unmixing. Fast implementation must be considered for the practical use of unmixing algorithms. The formulation of the optimization problems, the design of the network structures, and the way of integrating them are crucial for enhancing the convergence speed of the learning process. A trained encoder can be used for supervised or semisupervised unmixing when the computation resource is limited. Further, a Compute Unified Device Architecture implementation can also be useful for executing parallel optimization-solving modules.AcknowledgmentThe work of J. Chen was supported in part by the Natural National Science Foundation of China under Grants 62192713 and 62171380, Guangdong International Cooperation Project 2022A0505050020, Shenzhen Research Grant JCYJ20220530161606014, Shaanxi Key Industrial Innovation Chain Project 2022ZDLGY01-02, and Xiâ€™an Technology Industrialization Plan XA2020-RGZNTJ-0076. The work of C. Richard was supported by the French government through the 3IA CÃ´te dâ€™Azur Investments in the Future project with reference number ANR-19-P3IA-0002. Jie Chen is the corresponding author.AuthorsJie Chen (dr.jie.chen@ieee.org) received his Ph.D. degree from Troyes University of Technology (UTT), France, in 2013. He received his B.S. degree in 2006 from Xiâ€™an Jiaotong University (XJTU), China, and his M.S. and Dipl.-Ing. degrees in 2009 from XJTU and UTT, respectively, all in information and telecommunications engineering. From 2013 to 2014, he was with the University of Nice Sophia Antipolis, France, and from 2014 to 2015, he was with the University of Michigan, United States. He is currently a professor at the School of Marine Science and Technology at Northwestern Polytechnical University, Xiâ€™an 710072 China. His current research interests include adaptive signal processing and distributed optimization with applications to hyperspectral image processing and acoustic signal processing. He is a Senior Member of IEEE.Min Zhao (minzhao@mail.nwpu.edu.cn) received her B.S. degree in electronic and information engineering and her M.S. degree in signal and information processing from Northwestern Polytechnical University, Xiâ€™an 710072, China, in 2017 and 2020, where she is currently pursuing her Ph.D. degree. Her research interests include hyperspectral image analysis and object detection. She received the Peopleâ€™s Choice Award for the Three Minute Thesis Competition at the IEEE International Geoscience and Remote Sensing Symposium 2020 and the (with her team) Champion of Grand Challenges on Near-Infrared Image Colorization at the IEEE International Conference on Visual Communications and Image Processing 2020. She is a Student Member of IEEE.Xiuheng Wang (xiuheng.wang@oca.eu) received his B.S. degree in electronic and information engineering and his M.S. degree in signal and information processing from Northwestern Polytechnical University, Xiâ€™an, China, in 2018 and 2021, respectively. He is currently pursuing his Ph.D. degree at the UniversitÃ© CÃ´te dâ€™Azur, Nice 06000 France. His research interests include inverse problems in image processing, machine learning, and change point detection. He was a corecipient of the Champion of Grand Challenges on Near-Infrared Image Colorization at the IEEE International Conference on Visual Communications and Image Processing 2020. He is a Student Member of IEEE.CÃ©dric Richard (cedric.richard@unice.fr) received his Dipl.-Ing. and M.S. degrees in 1994 and his Ph.D. degree in 1998, all in electrical and computer engineering from the CompiÃ¨gne University of Technology, France. He is a full professor at UniversitÃ© CÃ´te dâ€™Azur, Nice 06000 France. He is a senior chair at the Interdisciplinary Institute for Artificial Intelligence CÃ´te dâ€™Azur, in the chapter Artificial Intelligence for Smart and Secure Territories. His current research interests include statistical signal processing and machine learning. He is the author of more than 300 journal and conference articles. He was distinguished as a junior member of the Institut Universitaire de France in 2010â€“2015. He is a Senior Member of IEEE.Susanto Rahardja (susantorahardja@ieee.org) received his Ph.D. and M.Eng. degrees from Nanyang Technological University, Singapore, and his B.Eng. degree from the National University of Singapore; all in the field of electrical and electronic engineering. He is a Ph.D. advisor at Northwestern Polytechnical University Xiâ€™an 710072 China and a professor at the Singapore Institute of Technology. His research interests are in multimedia, signal processing, wireless communications, discrete transforms, machine learning and signal processing algorithms, implementation, and optimization. He is a Fellow of IEEE and a fellow of the Academy of Engineering, Singapore.References[1] P. Ghamisi et al., â€œAdvances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,â€ IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 37â€“78, Dec. 2017, doi: 10.1109/MGRS.2017.2762087.[2] N. Dobigeon, J.-Y. Tourneret, C. Richard, J. C. M. Bermudez, S. McLaughlin, and A. O. Hero, â€œNonlinear unmixing of hyperspectral images: Models and algorithms,â€ IEEE Signal Process. Mag., vol. 31, no. 1, pp. 82â€“94, Jan. 2014, doi: 10.1109/MSP.2013.2279274.[3] R. Borsoi et al., â€œSpectral variability in hyperspectral data unmixing: A comprehensive review,â€ IEEE Geosci. Remote Sens. Mag., vol. 9, no. 4, pp. 223â€“270, Dec. 2021, doi: 10.1109/MGRS.2021.3071158.[4] R. Heylen, M. Parente, and P. Gader, â€œA review of nonlinear hyperspectral unmixing methods,â€ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1844â€“1868, Jun. 2014, doi: 10.1109/JSTARS.2014.2320576.[5] A. Halimi, Y. Altman, N. Dobigeon, and J.-Y. Tourneret, â€œNonlinear unmixing of hyperspectral images using a generalized bilinear model,â€ IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4153â€“4162, Nov. 2011, doi: 10.1109/TGRS.2010.2098414.[6] B. Hapke, Theory of Reflectance and Emittance Spectroscopy. Cambridge, U.K.: Cambridge Univ. Press, 2012.[7] D. Heinz and C. Chang, â€œFully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery,â€ IEEE Trans. Geosci. Remote Sens., vol. 39, no. 3, pp. 529â€“545, Mar. 2001, doi: 10.1109/36.911111.[8] L. Miao and H. Qi, â€œEndmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization,â€ IEEE Trans. Geosci. Remote Sens., vol. 45, no. 3, pp. 765â€“777, Mar. 2007, doi: 10.1109/TGRS.2006.888466.[9] J. Peng et al., â€œLow-rank and sparse representation for hyperspectral image processing: A review,â€ IEEE Geosci. Remote Sens. Mag., vol. 10, no. 1, pp. 10â€“43, Mar. 2022, doi: 10.1109/MGRS.2021.3075491.[10] Z. Li, J. Chen, and S. Rahardja, â€œKernel-based nonlinear spectral unmixing with dictionary pruning,â€ Remote Sens., vol. 11, no. 5, 2019, Art. no. 529, doi: 10.3390/rs11050529.[11] J. Chen, C. Richard, and P. Honeine, â€œNonlinear estimation of material abundances in hyperspectral images with ,₁-norm spatial regularization,â€ IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 2654â€“2665, May 2014, doi: 10.1109/TGRS.2013.2264392.[12] R. Heylen, D. Burazerovic, and P. Scheunders, â€œNon-linear spectral unmixing by geodesic simplex volume maximization,â€ IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 534â€“542, Jun. 2011, doi: 10.1109/JSTSP.2010.2088377.[13] R. Heylen and P. Scheunders, â€œA distance geometric framework for nonlinear hyperspectral unmixing,â€ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1879â€“1888, Jun. 2014, doi: 10.1109/JSTARS.2014.2319894.[14] S. Ozkan, B. Kaya, and G. B. Akar, â€œEndNet: Sparse autoencoder network for endmember extraction and hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 482â€“496, Jan. 2019, doi: 10.1109/TGRS.2018.2856929.[15] Y. Qu and H. Qi, â€œuDAS: An untied denoising autoencoder with sparsity for spectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 57, no. 3, pp. 1698â€“1712, Mar. 2019, doi: 10.1109/TGRS.2018.2868690.[16] Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, â€œDAEN: Deep autoencoder networks for hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4309â€“4321, Jul. 2019, doi: 10.1109/TGRS.2018.2890633.[17] B. Palsson, J. Sigurdsson, J. R. Sveinsson, and M. O. Ulfarsson, â€œHyperspectral unmixing using a neural network autoencoder,â€ IEEE Access, vol. 6, pp. 25,646â€“25,656, Mar. 2018, doi: 10.1109/ACCESS.2018.2818280.[18] B. Palsson, M. O. Ulfarsson, and J. R. Sveinsson, â€œConvolutional autoencoder for spectralâ€“spatial hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1, pp. 535â€“549, Jan. 2021, doi: 10.1109/TGRS.2020.2992743.[19] F. Khajehrayeni and H. Ghassemian, â€œHyperspectral unmixing using deep convolutional autoencoders in a supervised scenario,â€ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 567â€“576, Feb. 2020, doi: 10.1109/JSTARS.2020.2966512.[20] M. Wang, M. Zhao, J. Chen, and S. Rahardja, â€œNonlinear unmixing of hyperspectral data via deep autoencoder networks,â€ IEEE Geosci. Remote Sens. Lett., vol. 16, no. 9, pp. 1467â€“1471, Sep. 2019, doi: 10.1109/LGRS.2019.2900733.[21] M. Zhao, L. Yan, and J. Chen, â€œLSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing,â€ IEEE J. Sel. Topics Signal Process., vol. 15, no. 2, pp. 295â€“309, Feb. 2021, doi: 10.1109/JSTSP.2021.3052361.[22] M. Zhao, M. Wang, J. Chen, and S. Rahardja, â€œHyperspectral unmixing for additive nonlinear models with a 3-D-CNN autoencoder network,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5509415, doi: 10.1109/TGRS.2021.3098745.[23] Y. Su, X. Xu, J. Li, H. Qi, P. Gamba, and A. Plaza, â€œDeep autoencoders with multitask learning for bilinear hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 59, no. 10, pp. 8615â€“8629, Oct. 2021, doi: 10.1109/TGRS.2020.3041157.[24] K. T. Shahid and I. D. Schizas, â€œUnsupervised hyperspectral unmixing via nonlinear autoencoders,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5506513, doi: 10.1109/TGRS.2021.3077833.[25] Q. Jin, Y. Ma, X. Mei, and J. Ma, â€œTANet: An unsupervised two-stream autoencoder network for hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5506215, doi: 10.1109/TGRS.2021.3094884.[26] S. Shi, M. Zhao, L. Zhang, Y. Altmann, and J. Chen, â€œProbabilistic generative model for hyperspectral unmixing accounting for endmember variability,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5516915, doi: 10.1109/TGRS.2021.3121799.[27] R. Borsoi, T. Imbiriba, and J. Bermudez, â€œDeep generative endmember modeling: An application to unsupervised spectral unmixing,â€ IEEE Trans. Comput. Imag., vol. 6, pp. 374â€“384, 2020, doi: 10.1109/TCI.2019.2948726.[28] Z. Han, D. Hong, L. Gao, B. Zhang, and J. Chanussot, â€œDeep half-siamese networks for hyperspectral unmixing,â€ IEEE Geosci. Remote Sens. Lett., vol. 18, no. 11, pp. 1996â€“2000, Nov. 2021, doi: 10.1109/LGRS.2020.3011941.[29] Z. Hua, X. Li, Y. Feng, and L. Zhao, â€œDual branch autoencoder network for spectral-spatial hyperspectral unmixing,â€ IEEE Geosci. Remote Sens. Lett., vol. 19, 2022, Art no. 5507305, doi: 10.1109/LGRS.2021.3091858.[30] L. Gao, Z. Han, D. Hong, B. Zhang, and J. Chanussot, â€œCyCU-Net: Cycle-consistency unmixing network by learning cascaded autoencoders,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5503914, doi: 10.1109/TGRS.2021.3064958.[31] B. Rasti, B. Koirala, P. Scheunders, and P. Ghamisi, â€œUnDIP: Hyperspectral unmixing using deep image prior,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5504615, doi: 10.1109/TGRS.2021.3067802.[32] L. Qi, F. Gao, J. Dong, X. Gao, and Q. Du, â€œSSCU-Net: Spatialâ€“spectral collaborative unmixing network for hyperspectral images,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5407515, doi: 10.1109/TGRS.2022.3150970.[33] Z. Han, D. Hong, L. Gao, J. Yao, B. Zhang, and J. Chanussot, â€œMultimodal hyperspectral unmixing: Insights from attention networks,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5524913, doi: 10.1109/TGRS.2022.3155794.[34] M. Zhao, X. Wang, J. Chen, and W. Chen, â€œA plug-and-play priors framework for hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5501213, doi: 10.1109/TGRS.2020.3047479.[35] M. Zhao, T. Gao, J. Chen, and W. Chen, â€œHyperspectral unmixing via nonnegative matrix factorization with handcrafted and learned priors,â€ IEEE Geosci. Remote Sens. Lett., vol. 19, 2022, Art no. 5501905, doi: 10.1109/LGRS.2020.3047481.[36] Z. Wang, L. Zhuang, L. Gao, A. Marinoni, B. Zhang, and M. K. Ng, â€œHyperspectral nonlinear unmixing by using plug-and-play prior for abundance maps,â€ Remote Sens., vol. 12, no. 24, 2020, Art. no. 4117, doi: 10.3390/rs12244117.[37] C. Zhou and M. R. Rodrigues, â€œADMM-based hyperspectral unmixing networks for abundance and endmember estimation,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5520018, doi: 10.1109/TGRS.2021.3136336.[38] F. Xiong, J. Zhou, S. Tao, J. Lu, and Y. Qian, â€œSNMF-Net: Learning a deep alternating neural network for hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5510816, doi: 10.1109/TGRS.2021.3081177.[39] A. Min, Z. Guo, H. Li, and J. Peng, â€œJMnet: Joint metric neural network for hyperspectral unmixing,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, Art no. 5505412, doi: 10.1109/TGRS.2021.3069476.[40] Q. Jin, Y. Ma, F. Fan, J. Huang, X. Mei, and J. Ma, â€œAdversarial autoencoder network for hyperspectral unmixing,â€ IEEE Trans. Neural Netw. Learn. Syst., early access, 2022, doi: 10.1109/TNNLS.2021.3114203.[41] M. Zhao, J. Chen, and Z. He, â€œA laboratory-created dataset with ground truth for hyperspectral unmixing evaluation,â€ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 7, pp. 2170â€“2183, Jul. 2019, doi: 10.1109/JSTARS.2019.2905099.[42] B. Palsson, J. R. Sveinsson, and M. O. Ulfarsson, â€œBlind hyperspectral unmixing using autoencoders: A critical comparison,â€ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 1340â€“1372, Jan. 2022, doi: 10.1109/JSTARS.2021.3140154.Digital Object Identifier 10.1109/MSP.2022.3208987CoverSubmit your paper todayMastheadOpen and Reproducible Science: Desirable or Even Mandatory, But Not So Simple!Reaching Out to Members in the Middle East and IndiaBoston Chapter Receives the 2022 Chapter of the Year Award!2022 IEEE Signal Processing Society AwardsPhysics-Driven Machine Learning for Computational Imaging: Part 2ICIP 2023ICIP 2023 2Physics-Embedded Machine Learning for Electromagnetic Data ImagingPhysics-Guided Terahertz Computational ImagingUnfolding-Aided Bootstrapped Phase Retrieval in Optical ImagingIntegration of Physics-Based and Data-Driven Models for Hyperspectral Image UnmixingDeep Optical Coding Design in Computational ImagingPhysics-/Model-Based and Data-Driven Methods for Low-Dose Computed TomographyHigh-Dimensional MR Spatiospectral Imaging by Integrating Physics-Based ModelingPhysics-Driven Deep Learning Methods for Fast Quantitative Magnetic Resonance ImagingPhysics-Driven Synthetic Data Learning for Biomedical Magnetic ResonanceA Guide to Computational Reproducibility in Signal Processing and Machine LearningUpdate on the CIC Multistage Decimation FilterUnderstanding the Basis of Graph Convolutional Neural NetworksExplainable Machine LearningDates AheadMathWorksArchives