Samuel Pinilla, Kumar Vijay Mishra, Igor Shevkunov, Mojtaba Soltanalian, Vladimir Katkovnik, Karen Egiazarian
©SHUTTERSTOCK.COM/PAPAPIG
Phase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene, resulting in coded diffraction patterns (CDPs) at the sensor. Recently, the hybrid approach of a model-driven network or deep unfolding has emerged as an effective alternative to conventional model- and learning-based phase-retrieval techniques because it allows for bounding the complexity of algorithms while also retaining their efficacy. Additionally, such hybrid approaches have shown promise in improving the design of DOEs that follow theoretical uniqueness conditions. There are opportunities to exploit novel experimental setups and resolve even more complex DOE phase-retrieval applications. This article presents an overview of algorithms and applications of deep unfolding for bootstrapped—regardless whether near, middle, or far zones—phase retrieval.
Phase retrieval in diffractive optical imaging (DOI) deals with the estimation of a complex domain signal from magnitude-only data in the form of its diffraction patterns [1]. This problem has been at the center of exciting developments in DOI [2], [3], [4] during the last decade to improve achievable spatial resolution [5], [6], [7], lighter optical setups [6], and better detection [2]. When the diffraction patterns (phaseless data) are acquired through a setup that employs coherent light and a DOE (also as a coded aperture) to modulate the scene, phase CDPs are obtained. Typically, this setup allows for acquiring several snapshots of the scene by changing the spatial configuration of the DOE. The inclusion of the DOE in the phase-retrieval setting paves the way for uniqueness guarantees for all 2D signals up to a unimodular constant [3], [8], [9], overcoming the classical result in [10] that holds for only nonreducible signals.
The CDP is experimentally acquired in three diffraction zones: near, middle, and far [3], [11]. Mathematically, assuming a diagonal matrix ${\boldsymbol{D}}_{\ell}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$ modeling the DOE for the ${\ell}$-th snapshot, ${\ell} = {1},{\ldots},{L}$ the CDP consists of quadratic equations of the form ${y}_{{k},{i},{\ell}} = {\left\vert{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{D}}_{\ell}{\boldsymbol{x}}\right\vert}$ where ${\boldsymbol{a}}_{k,i}\,{\in}\,{\Bbb{C}}^{n}$ are the known wavefront propagation vectors associated with the kth diffraction for ${i} = {1},{\ldots},{n}$, ${\boldsymbol{x}}\,{\in}\,{\Bbb{C}}^{n}$ is the unknown scene of interest; and k = 1, 2, 3 indexes the near, middle, and far zones, respectively. By harnessing specific properties of each diffraction zone, several advances in imaging applications have been made. The near zone is exploited in scanning near-field optical microscopy [12], Raman imaging [13], and spectroscopy [14]. Applications such as Fresnel holography [11] and lensless imaging [15] take advantage of the middle diffraction zone to develop new acquisition imaging devices [15], and optical elements such as Fresnel lenses [16]. The far or Fraunhofer diffraction zone has been instrumental in the development of applications such as crystallography, astronomical imaging, and microscopy [17].
Motivated by the aforementioned applications and guided by the analytical models of the physical setups for a given diffraction zone, [3] has showed that the DOE and the phase-retrieval algorithm need to be jointly designed to guarantee unique recovery; these theoretical results were not possible a decade ago. The entries of each matrix ${\boldsymbol{D}}_{\ell}$ (a DOE for the ${\ell} $-th snapshot) were assumed to be independent identically distributed from a discrete random variable d obeying ${\left\vert{d}\right\vert}\,{≤}\,{1}$. Interestingly, [3] also shows that the design criteria for DOEs is independent of the diffraction zone. This result provides a bootstrapped (regardless of the diffraction zone) phase retrieval in optical imaging. Table 1 summarizes common DOE phase-retrieval applications.
Table 1. Common DOE-based optical imaging applications.
The role of DOEs in recovering the phase across all diffraction zones has been studied in [2], [7], [18], [19], [20], and [21], where either the DOE was modeled by specific measurement setups or machine learning (ML) was deployed to obtain data-driven model-free recovery. Although very flexible and powerful, model-based methods also yield significant reconstruction artefacts arising from imperfect system models, mismatch, calibration errors, hand-tuned parameters, and hand-picked regularizers. Furthermore, these methods take hundreds to thousands of iterations to converge, which is unacceptable for real-time imaging. These shortcomings are overcome by employing deep neural networks, whose expressive power allows training such that the resulting network acts as an estimator of the true signal. For example, [22] proposed a deep neural network for object classification from CDPs, achieving high stability in the classification performance when all CDPs are not employed. However, neural networks are treated as black boxes, leading to limited interpretability and, therefore, inadequate control. Further, the training times could be prohibitively longer.
In this context, a hybrid model-based and data-driven technique has the potential to further improve data acquisition by effectively mitigating the adverse effects of both techniques [23], [24]. This so-called deep unfolding technique effectively facilitates the design of model-aware deep architectures based on well-established iterative signal processing techniques [25], [26], [27]. Unfolding exploits the data to enhance accuracy and performance so that the resulting interpretability leads to trusted outcomes, requiring fewer data, and a faster convergence rate. Another related work in [26] viewed the model-based iterative shrinkage thresholding algorithm as a recurrent neural network, where its activation functions are the shrinkage operator. In the context of inverse problems in imaging, [27] provides further insights into the theory and implementations of unrolling. The related terms include unrolling [28] and unrectifying [29]. Whereas the former is used for the algorithms employed by unfolding networks (or simply synonymously with unfolding), the latter refers to the process of transforming nonlinear activations in neural networks into data-dependent linear equations and constraints. Very recently, unfolded networks were proposed for both conventional and sparse phase-retrieval problems [30]. In this article, we present a bootstrapped overview of deep unfolding methods for imaging tasks in a CDP phase-retrieval setting to improve the design of DOEs that can follow theoretical conditions to meet uniqueness. We also summarize the use of designed DOEs together with their unfolded phase-retrieval (UPR) algorithms in terms of robustness against noise [31], with/without priors on the scene [30], and smoothness [32]. Finally, we review progress in application areas that enable experimental setups to implement these techniques to address even more ill-posed problems.
The design principles [3] of DOEs require that the set of matrices ${\boldsymbol{D}}_{\ell}$ satisfies ${\Bbb{E}}{[}{\Sigma}_{{\ell} = {1}}^{L}{\bar{\boldsymbol{D}}}_{\ell}{\boldsymbol{D}}_{\ell}{]} = {r}{\boldsymbol{I}},{\ell} = {1},{\ldots},{L}$ to guarantee uniqueness in recovery for all 2D scenes with high probability (regardless of the zone), with ${0}\,{<}\,{r}\,{≤}\,{L}$ where ${L}\,{≥}\,{c}_{0}{n}$ for some large constant ${c}_{0}\,{>}\,{0}$, with I as the identity matrix, and ${\Bbb{E}}{[}{\cdot}{]}$ is the statistical expectation. The implication is that the variance of the entries of matrices ${\boldsymbol{D}}_{\ell}$ needs to be designed along all snapshots to guarantee unique recovery. More information is collected from the scene when this variance is maximized along the snapshots [3], [34]. In the following, we summarize the design principles for two common scenarios: without any prior assumption on the scene and when the scene is known to be sparse in some basis. As explained later, unfolding is applicable to both scenarios.
This is accomplished by either by targeting the design of the DOEs in the image-registration process or obtaining the sensing matrix ${\boldsymbol{A}}_{k}$ to acquire CDPs at the kth diffraction zone co-designed jointly with phase-retrieval algorithm. It is the latter where unfolding is commonly employed (see “A Comparison Between Diffractive Optical Element Design Methodologies”).
For ease of exposition, consider ${\boldsymbol{C}}^{\ell}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}}$ to be the 2D version of the diagonal matrix ${\boldsymbol{D}}_{\ell}$ whose entries are ${(}{\boldsymbol{D}}_{\ell}{)}_{q,q} = {(}{\boldsymbol{C}}^{\ell}{)}_{{q}{-}{vN},{v} + {1}}$ for ${v} = \left\lfloor{{q}{-}{1} / {N}}\right\rfloor$, ${q} = {1},{\ldots},{n}$ and ${n} = {N}^{2}$. The design principles are based on considering an admissible random variable ${d} = {\{}{e}_{1},{e}_{2},{e}_{3},{e}_{4}{\}}$ with probability ${\{}{1} / {4},{1} / {4},{1} / {4},{1} / {4}{\},}$ respectively, assuming that ${L} = {C}_{d}{b}$ for some integer ${b}\,{>}\,{0}$ where ${C}_{d} = {4}$ is the number of coding elements.
In particular, the DOE design takes temporal correlation and spatial separation into account. For temporal correlation, the condition on the 2D version ${\boldsymbol{C}}^{\ell}$ of ${\boldsymbol{D}}_{\ell}$ can be satisfied if matrices ${\boldsymbol{C}}^{\ell}$ are constrained to have all the coding elements of d along the L projections at each particular spatial position of the ensemble [34] to directly maximize the variance of the entries along snapshots. It means that ${\{(}{\boldsymbol{C}}^{\ell}{)}_{s,u}\mid{\ell} = {1},{\ldots},{L}{\}}$ contains b times each possible value of d for any ${s},{u}\,{\in}\,{\{}{1},{\ldots},{N}{\}}$. For spatial separation, it has been shown that a set of DOEs with an equispaced distribution of the coding elements improves the ability of data ${y}_{{k},{i},{\ell}}$ to uniquely identify the scene x and increases image reconstruction quality [34], [35].
The overall design strategy comprises the following steps:
Figure 1. (a) The physical implementation of a compact microscope using a DOE manufactured in Tampere’s Photonics Laboratory with an electron beam lithography system on a fused silica glass and its fragments. The CDPs are acquired in the near zone [${\boldsymbol{A}}_{1}$ in (1)]. When compared with a theoretically expected design, note that the fabricated DOE has imperfections arising from physical artefacts such as material imperfections and printing inaccuracies. The top and side views of scanning electron microscope images of the DOE are shown in the right half of the image and magnified further. (b) Frames from a complex-valued dynamic object video with amplitudes [in or airy units (a.u.) top row] and phases (in radians or rad; bottom row) of a moving single-celled eukaryote reconstructed using sparsity-based filtering method. The video footage consisted of 287 frames through 10 s (see visualization), of which frames 50, 100, 150, and 200 frames are reproduced (left to right) here.
Figure S1. A diffractive optical element design strategy using admissible random variable ${d} = \left\{{{e}_{1},{e}_{2},{e}_{3},{e}_{4}}\right\}$ with probability ${d} = \left\{{{1} / {4},{1} / {4},{1} / {4},{1} / {4}}\right\}$, respectively, and ${C}_{d} = {4}$.
A Comparison Between Diffractive Optical Element Design Methodologies
Greedy methods design the diffractive optical element itself (instead of the sensing matrix). The experimental performance of this strategy for phase retrieval is typically not as expected because of imperfect system modeling; reconstruction artefacts such as mismatch, calibration errors, and hand-tuned parameters; and even poorer performance for a single snapshot.
In unfolding-based techniques, the sensing matrix is designed across all diffraction zones. The performance of the unfolding design method overcomes the greedy methodology because the sensing matrix and the phase-retrieval algorithm are jointly designed; and the reconstruction algorithm is adjusted for the single-snapshot extreme case.
The most important advantage of unfolding co-design is the ability to simultaneously design the entire or end-to-end sensing process and the phase-retrieval algorithm. Define the global noiseless measurement vector for the kth diffraction zone as ${\boldsymbol{y}}_{k} = {[}{y}_{k,1,1},{\ldots},{y}_{k,n,L}{]}^{T}\,{\in}\,{\Bbb{R}}^{{m} = {nL}}$ and consider the set of sensing matrices $\{{\boldsymbol{A}}_{k}{\}}_{{k} = {1}}^{3}$ with \begin{align*}&{\boldsymbol{A}}_{1} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\boldsymbol{F}}{\boldsymbol{\overline{T}}}{\boldsymbol{F}}^{H},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\boldsymbol{F}}{\boldsymbol{\overline{T}}}{\boldsymbol{F}}^{H}}\right]}^{H} \qquad {(}{\text{near zone }}{)} \\ & {\boldsymbol{A}}_{2} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\overline{\boldsymbol{Q}}}{\boldsymbol{F}},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\overline{\boldsymbol{Q}}}{\boldsymbol{F}}}\right]}^{H} \qquad \qquad \,\,{(}{\text{middle zone}}{)} \\ & {\boldsymbol{A}}_{3} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\boldsymbol{F}}^{H},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\boldsymbol{F}}^{H}}\right]}^{H} \qquad \qquad \quad {\text{ext}}{(}{\text{far zone}}{)} \tag{1} \end{align*} where ${\boldsymbol{F}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$ is the discrete Fourier transform matrix, and ${\boldsymbol{T}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$ ${(}{\boldsymbol{Q}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}{)}$ is the auxiliary orthogonal diagonal matrix that depends on the propagation distance (wavelength of the coherent source) to model the near (middle) zone (see [11, Ch. 4] for further details). The sensing process for the kth diffraction zone is ${\boldsymbol{y}}_{k} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$ which corresponds to the first layer ${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$ of the deep network (Figure S2) to model the CDP for the kth diffraction zone.
Figure S2. A high-level overview of the unfolding method, resulting in the network layers ${f}^{1},{\ldots},{f}^{T}$. The first layer ${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$ to model the coded diffraction pattern for the ${k}{th}$ diffraction zone. Each iteration for phase retrieval is transferred into network parameters ${\rho}^{2},{\ldots},{\rho}^{T}$. These parameters are learned from training data sets through end-to-end training.
We seek to recover the signal of interest x from the embedded phaseless data ${\boldsymbol{y}}_{k}$ given the knowledge of ${\boldsymbol{A}}_{k}$ by solving the following optimization problem: \[{\cal{P}}_{0}{:} \qquad {\text{Find }}{\boldsymbol{x}}\,\,{\text{ subject to }}{\boldsymbol{y}}_{k} = {\mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid} \tag{2} \] where the nonconvex constraint models the measurement process. Additionally, if ${\boldsymbol{x}}^{\dagger}$ is a feasible point of the problem ${P}_{0}$ then ${e}^{{-}{j}\varphi}{\boldsymbol{x}}^{\dagger}$ is also a solution to the problem for an arbitrary phase constant ${\varphi}$. Hence, it is only possible to recover the underlying signal of interest up to a constant global phase ${\varphi}\in{\Bbb{R}}$. This ambiguity arising from the global phase naturally leads to the following meaningful quantifying metric of closeness of the recovered signal ${\boldsymbol{x}}^{\dagger}$ to the true signal x: \[{\text{dist}}{(}{\boldsymbol{x}},{\boldsymbol{x}}^{\dagger}{)} = \mathop{\min}\limits_{\varphi\in{[}{0},{2}{\pi}{)}}{\left\Vert{{\boldsymbol{x}}{-}{e}^{{-}{j}\varphi}{\boldsymbol{x}}^{\dagger}}\right\Vert}_{2}{.} \tag{3} \]
If ${\text{dist}}{(}{\boldsymbol{x}},{\boldsymbol{x}}^{\dagger}{)} = {0}$, then x and ${\boldsymbol{x}}^{\dagger}$ are equal up to some global phase.
The end-to-end unfolding co-design seeks to jointly design the sensing matrix (the encoder module or first layer in Figure S2) and the reconstruction algorithm (decoder module from layers 2 to T), in contrast to the existing methodologies that consider either the development of the reconstruction algorithm or the design of only DOEs. As a result, it is a unification of both approaches. Denote the class of decoder functions parameterized on a set of parameters as $\Phi$ by ${S}_{\Phi}{:}{\Bbb{R}}^{m}\rightarrow{\Bbb{R}}^{n}$. Assume that for a given measurement matrix ${\boldsymbol{A}}_{k}$ and the corresponding measurement vector ${\boldsymbol{y}}_{k}$, an estimation $\hat{\boldsymbol{x}}$ of the signal of interest is provided by a certain characterization of the decoder function, i.e., \[\hat{\boldsymbol{x}} = {S}_{\theta}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{)}\equiv{S}_{\theta}{(}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{)} \tag{4} \] where ${\theta}\in\Phi$ denotes a characterization of the decoder module from the original class. In addition, as ${\boldsymbol{A}}_{k}$ models a physical image formation model, it is desired to define the set of feasible sensing matrices for the optical setup as ${T}\subset{\Bbb{C}}^{{m}\,{\times}\,{n}}$. This set contains the physical constraints related to implementation of the DOE encapsulated in matrices ${\boldsymbol{D}}_{\ell}$ as in (1) (see the “Physical Constraints” section for examples of these constraints). Consequently, this means that a realistic ${\boldsymbol{A}}_{k}$ depends on the feasibility of the DOE (see, for instance, Figure 1 for a practical implementation. Then, for a fixed characterization of the decoder function ${S}_{{\theta}\in\Phi}$, the sensing matrix ${\boldsymbol{A}}_{k}\,{\in}\,{T}$ for the kth diffraction zone is obtained by solving the optimization problem \[\mathop{\text{minimize}}\limits_{{\boldsymbol{A}}_{k}\,{\in}\,{T}}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{S}_{\theta}{(}{\boldsymbol{y}}_{k}\equiv{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{))}}\right] \tag{5} \] where the expectation is over the distribution of x through the chosen image data set for end-to-end training. Note that this approach considers signal reconstruction accuracy as a criterion with respect to a particular realization of the decoder functions. Thus, ${\boldsymbol{A}}_{k}$ obtained in this technique is a task-specific encoding matrix that considers not only the underlying distribution of the signal but also the class of decoder functions, resulting in a superior performance. After solving the aforementioned optimization problem, the sensing matrix is utilized for data-acquisition purposes while the fixed decoder module ${S}_{\theta}$ is used for the reconstruction of x.
In optical applications such as radio astronomy [36] and microscopy [6], the scene x is naturally sparse in some orthogonal domain $\boldsymbol{\Psi}$, i.e., ${\boldsymbol{\Psi}}^{H}\boldsymbol{\Psi} = {\boldsymbol{I}}{;}$ common examples include wavelet or discrete cosine transform. Then, there exists an s sparse ${s}\ll{n}$ representation ${\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}$ such that ${\left\Vert{\boldsymbol{z}}\right\Vert}_{0} = {s}$ where ${\boldsymbol{x}} = {\boldsymbol{\Psi}}^{H}{\boldsymbol{z}}$ and ${\left\Vert{\cdot}\right\Vert}_{0}$ are the ${\ell}_{0}$ pseudonorm. It was shown in [3] that very low estimation errors in the sparse representation z of x were obtained when the admissible random variable d that models the DOEs satisfies ${\Bbb{E}}{[}{d}{]}\,{≠}\,{0}$. This result suggests the type of coding elements of d needed for the highest performance without any restriction on design strategy. Consequently, the unfolding co-design method is also valid for the sparsity prior scenarios; only the first layer needs to be changed to incorporate the aforementioned result by choosing an appropriate d. Note that d plays a crucial role in uniquely identifying x from the CDP and the flexibility of the unfolding framework to combine model-based theoretical results and neural networks. Thus, it is desirable to consider and replicate [3] for the analogous analysis of a given application to achieve optimal imaging quality.
The optimization problem in (5) is intended to design the measurement matrix ${\boldsymbol{A}}_{k}$ for a given diffraction zone k. More physical constraints may be incorporated into the problem. For example, given the block-matrix structure in (1), the DOE may be modeled as a piecewise continuous function [37] wherein the constant intervals of the function are characterized by the random variable d. Similarly, the Fresnel order or DOE thickness may be described using a smoothing function [33]. In each one of these cases, the physical parameters are learnable.
Thickness of the DOE is defined as ${Q} = {2}{\pi}{m}_{Q}$ (in radians) where ${m}_{Q}$ is the “Fresnel order” of the mask, which is not necessarily an integer. Denote the wavelength of the source by ${\lambda}_{0}$ and the resulting phase profile of the DOE by ${\varphi}_{{\lambda}_{0}}$. Then, the phase profile of the DOE considering the thickness is given by \[{(}{\hat{\varphi}}_{{\lambda}_{0}}{)}_{p,q} = \mod\left({{(}{\varphi}_{{\lambda}_{0}}{)}_{p,q} + \frac{Q}{2},{Q}}\right){-}\frac{Q}{2} \tag{6} \] resulting in ${\hat{\varphi}}_{{\lambda}_{0}}$ taking values in the interval ${[}{-}{Q} / {2},{Q} / {2}{)}$.
The DOE is defined on a discretized 2D grid. We obtain a piecewise-invariant surface for the DOE after a nonlinear transformation of its absolute phase. The uniform grid discretization of the wrapped phase profile ${\hat{\varphi}}_{{\lambda}_{0}}$ up to N levels is \[{(}{\boldsymbol{\beta}}_{{\lambda}_{0}}{)}_{p,q} = \left\lfloor{({\hat{\varphi}}_{{\lambda}_{0}}{)}_{p,q}/N}\right\rfloor\cdot{N} \tag{7} \] where ${\left\lfloor{w}\right\rfloor}$ denotes a floor operation that yields the largest integer less than or equal to w. The values of ${\boldsymbol{\beta}}_{{\lambda}_{0}}$ are restricted to ${[}{-}{Q} / {2},{Q} / {2}{)}$ where each step of the piecewise function is a coding element of d. Here Q is an upper bound of the thickness phase of ${\boldsymbol{\beta}}_{{\lambda}_{0}}$.
The floor and modulo functions are not differentiable. Therefore, a smoothing approximation is used for optimizing the DOE’s thickness and number of levels (see [33] for details).
These parameters are also considered while designing the measurement matrix, because in practice, all sizes and resolutions may not be implementable [38]. Further details on the impact of these physical DOE constraints in a practical imaging application are available in [39].
Compared to some popular neural network architectures (Figure 2), the unfolded network is a directed acyclic graph that is trained like any standard neural network. However, the input to this unfolded network is not a single vector from the sequence, but rather it is the entire sequence used all at once. The output used to compute the gradients is also the entire sequence of output values that a typical neural network would have produced for each step in the input sequence. As such, the unfolded network has multiple inputs and outputs with identical weight matrices for each step.
Figure 2. The conventional network architectures popular in signal/image processing and computer vision applications. (a) In a multilayer perceptron, all the neurons are fully connected. (b) In a convolutional neural network, the neurons are sparsely connected and the weights are shared among different neurons and convolutional layers. (c) An unfolded neural network is a recurrent structure where connections between nodes form a directed or undirected graph along a temporal sequence.
A preliminary step in unfolding is to mathematically formulate a proper parameterized class of decoder and encoder modules that facilitate incorporating the domain knowledge. To this end, the decoder is obtained by first formulating a phase-retrieval problem as a minimization program and then resorting it to first-order optimization techniques that lay the ground for creating a rich class of parameterized decoder functions. Once such a class is formalized, a connection between deep neural networks and first-order optimization techniques is established.
Assume ${\cal{S}}_{{\theta}_{1}\in\Phi}$ and ${\cal{S}}_{{\theta}_{2}\in\Phi}$ represent two realizations of the decoder module from the same class. Denote the fixed sensing matrix that will be the solution to (5) for ${\theta} = {\theta}_{2}$ by ${\boldsymbol{A}}_{k}$ for the kth diffraction zone. Then, for a signal of interest x and corresponding phaseless measurements ${\boldsymbol{y}}_{k}$, it is possible that \[{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}_{1}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]\leq{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}_{2}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]{.} \tag{8} \]
As a result, it is more appropriate to consider the design of the sensing matrix ${\boldsymbol{A}}_{k}$ with respect to ${\cal{S}}_{{\theta}_{1}}$ and find an optimal ${\theta}^{\ast}\in\Phi$ such that \[{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}^{\ast}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]\leq{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right],\forall{\theta}\in\Phi{.} \tag{9} \]
If such a characterization ${\theta}^{\ast}$ is known, the sensing matrix is easily designed. The aforementioned observation is further evidence of the paramount importance of developing a unified framework that allows for a joint optimization of the reconstruction algorithm (i.e., finding an optimal or suboptimal ${\theta}^{\ast})$ and the sensing matrix. Then, the problem of jointly designing ${\boldsymbol{A}}_{k}$ and the reconstruction algorithm is equivalent to the optimization program \[\mathop{\text{minimize}}\limits_{{A}_{k}\,{\in}\,{\cal{T}},{\theta}\in\Phi}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}{(}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{))}}\right]{.} \tag{10} \]
Next, we describe tackling this optimization problem using the available data.
Define the hypothesis class ${\cal{H}}_{e}$ of the possible encoder functions (module) whose computation dynamics mimics the behavior of the data-acquisition system in a phase-retrieval model (e.g., ${\Bbb{E}}{[}{d}{]}\,{≠}\,{0}$, a temporal or spatial correlation of ${\boldsymbol{A}}_{k})$. Then, \[{\cal{H}}_{e} = \left\{{{f}^{1}{(}\,{\cdot}\,{;}{\boldsymbol{A}}_{k}{)}{:}{\boldsymbol{x}}\rightarrow\mid{\boldsymbol{A}}_{k}{x}\mid{:}{\boldsymbol{A}}_{k}\,{\in}\,{\Bbb{C}}^{{m}\,{\times}\,{n}}}\right\}{.} \tag{11} \]
The superscript in $f^{1}$ denotes that this hypothesis class corresponds to the first layer of the unfolded network. It can be interpreted as a one-layer neural network with n input neurons and m output neurons, where the activation function is $\mid\cdot\mid$ with the trainable weight matrix ${\boldsymbol{A}}_{k}$. The encoder (matrix ${\boldsymbol{A}}_{k})$ for a given realization of a decoder function ${\cal{S}}_{\theta}$ with respect to the hypothesis class ${\cal{H}}_{e}$ is designed by solving the following learning problem \[\mathop{\text{minimize}}\limits_{{f}^{1}\,{\in}\,{\cal{H}}_{e}}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))}}\right] \tag{12} \]
Where $\circ$ stands for the composition of functions. This problem seeks to find an encoder function ${f}^{1}{(}\,{\cdot}\,{;}{\boldsymbol{A}}_{k}{)}$ such that the resulting functions maximize the reconstruction accuracy according to the decoder function ${\cal{S}}_{\theta}$.
The goal here is to obtain an estimate of the underlying signal of interest from the phaseless measurements of the form ${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$ for a given sensing matrix ${\boldsymbol{A}}_{k}$ of the kth diffraction zone. Equivalently, a decoder function seeks to tackle the following problem: \[\mathop{\text{minimize}}\limits_{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}}\frac{1}{2m}{\left\Vert{{f}^{1}{(}{\boldsymbol{z}}{;}{\boldsymbol{A}}_{k}{)}{-}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right\Vert}_{2}^{2}{\text{ subject to }}{\boldsymbol{z}}\,{\in}\,{\Gamma} \tag{13} \] where ${\Gamma}$ represents the search space of the underlying signal of interest. For instance, if x is s sparse, this information is encoded in $\Gamma = \left\{{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}{:}{\parallel{\boldsymbol{z}}\parallel}_{0} = {s}}\right\}$.
Deep unfolded networks have repeatedly displayed great promise in various signal processing applications [22], [31] by exploiting copious amounts of data along with the domain knowledge gleaned from the underlying problem. This is suitable for deploying phase retrieval in optical imaging, particularly in nonconvex settings where bounding the complexity of signal processing algorithms is often traded off for performance. In general, iterative-optimization techniques are a popular choice for nonconvex phase-retrieval problems. Specifically, first-order methods are widely used iterative-optimization techniques because of their low per-iteration complexity and efficiency in complex scenarios. One such prominent approach that is also suitable for training the decoder is the projected gradient descent (PGD) algorithm. Here, given an initial point ${\boldsymbol{z}}^{(0)}$, the tth iteration of PGD is \[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\cal{P}}_{\Gamma}\left({{\boldsymbol{z}}^{(t)}{-}\frac{{\alpha}^{(t)}}{m}{\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}^{(t)}{)}}\right) \tag{14} \] where ${\cal{P}}_{\Gamma}{:}{\Bbb{C}}^{n}\rightarrow\Gamma\subseteq{\Bbb{C}}^{n}$ denotes a mapping function of the vector argument to the feasible set ${\Gamma}$, ${\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}^{(t)}{)}$ is the gradient of the objective function obtained at the point ${\boldsymbol{z}}^{(t)}$, and ${\alpha}^{(t)}$ represents the step size of the PGD algorithm at the tth iteration. To bound the computational cost of first-order methods, a sensible approach is to fix the total number of iterations of such algorithms (e.g., T), and to follow up with a proper choice of the parameters for each iteration, which results in the best improvement in the objective function while allowing only T iterations.
Define a parameterized mapping operator ${f}^{t}{(}\,{\cdot}\,{;}{{\rho}}^{t}{)}{:}{\Bbb{C}}^{n}\rightarrow{\Bbb{C}}^{n}$ for ${t}\,{≥}\,{2}$ (see Figure S2) as \[{f}^{t}{(}{\boldsymbol{z}}{;}{{\rho}}^{t}{)} = {\cal{P}}_{\Gamma}\left({{\boldsymbol{z}}{-}\frac{1}{m}{\boldsymbol{G}}^{(t)}{\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}{)}}\right) \tag{15} \] where ${{\rho}}^{t} = \left\{{{\boldsymbol{G}}^{(t)}}\right\}$ is the set of parameters of the mapping function ${f}^{t}{(}\,{\cdot}\,{;}{{\rho}}^{t}{)}$, and ${\boldsymbol{G}}^{(t)}$ is a positive-semidefinite matrix. This mapping is then utilized to model various first-order optimization techniques for a given problem. In particular, for the PGD algorithm, performing ${T}{-}{1}$ iterations of the aforementioned form (Figure S2) is modeled as \[{\cal{N}}_{\Gamma}^{{T}{-}{1}}{(}{\boldsymbol{x}}{)} = {f}^{T}{(}\,{\cdot}\,{;}{{\rho}}^{T}{)}\circ{f}^{{T}{-}{1}}{(}\,{\cdot}\,{;}{{\rho}}^{{T}{-}{1}}{)}\,{\circ}\,{\cdots}\,{\circ}\,{f}^{2}{(}\,{\cdot}\,{;}{{\rho}}^{2}{)} \tag{16} \] where the aforementioned function corresponds to the PGD in all scenarios, including for a fixed step size ${\boldsymbol{G}}^{(t)} = {\alpha}{\boldsymbol{I}}$, time-varying step sizes ${\boldsymbol{G}}^{(t)} = {\alpha}^{(t)}{\boldsymbol{I}}$, or the general preconditioned PGD algorithm with an arbitrary choice of ${\boldsymbol{G}}^{(t)}\succeq{0}$.
The specific preconditioning-based update rule in (18) is known to be advantageous in model-based algorithms to speed up the convergence rate [41, Ch. 5] and yield robustness when the sensing matrix is ill-conditioned (i.e., the condition number is far greater than one) [42], [43], [44]. This latter advantage is critical in practical imaging applications (listed in Table 1) to overcome imperfect system modeling, reconstruction artefacts such as mismatch, calibration errors, and even poorer performance for the single snapshot of an experimental sensing matrix [45], [46]. In the context of unfolding [47], the benefits of preconditioning appear indirectly because it was learned as a matrix that encapsulates both sensing (via its transpose) and preconditioning matrices.
Finally, for a fixed ${\boldsymbol{A}}_{k}$, learning the deep decoder module with respect to a realization of the encoder function corresponds to the problem \[\mathop{\text{minimize}}\limits_{\Gamma = \left\{{{\boldsymbol{G}}^{(t)}\succeq{0}}\right\}}{\Bbb{E}}{[}{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{N}}_{\Gamma}^{{T}{-}{1}}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}{)}{]}. \tag{17} \]
This formulation is akin to training a T-layer deep neural network ${\cal{N}}_{\Gamma}^{{T}{-}{1}}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}$ with a task-specific architecture whose computation dynamics at each layer mimic the behavior of one iteration of a first-order optimization algorithm, and the input to such a model-based deep scheme is given by a realization of the encoder module. As domain knowledge is included in the design of such a deep unfolded network, it is more interpretable and has fewer training parameters than its black-box counterpart.
The prevailing usage of unfolding algorithms in phase retrieval is centered on the sparsity assumption, or lack thereof, of sparsity in the target scenario in a nonconvex setting. More recently, optical applications such as 3D object detection and 3D phase imaging have also exploited unfolded networks.
The computational dynamics and network structure of a UPR architecture (see “Unfolded Phase Retrieval Without Priors”) for this scenario are depicted in Figure S3. rom the empirical success rate, it follows that the unfolded network significantly outperforms the model-based algorithm in all scenarios.
Figure S3. (a) A single iteration of the unfolded phase-retrieval (UPR) method executes both linear and nonlinear operations and is thus recast into (b) a network layer. Stacking the layers together, (c) a deep unfolded network is formed. The network is subsequently trained using paired inputs and outputs by backpropagation to optimize the parameters ${\boldsymbol{G}}^{(t)}$ and ${\boldsymbol{A}}_{k}$. The learning rate was set to 10−4 for 100 epochs using an Adam stochastic optimizer. (d) The empirical success rate, obtained by averaging more than 100 Monte Carlo trials, is shown with respect to the measurement-to-signal-length ratio ${(m / n)}$ when the signal length is fixed and set to ${n} = {100}$. The size of the training data set is 2,048, where each data point is an n-dimensional vector following a standard Gaussian distribution. In the model-based case, the sensing matrix ${\boldsymbol{A}}_{k}$ follows a standard Gaussian distribution with a variance equal to one.
When the measurement matrix is fixed (by the physics of the system) and known a priori, learning the preconditioning matrices ${\boldsymbol{G}}^{(t)}$ significantly improves the performance of the underlying signal-recovery algorithm. This supports the proposition that a judicious design of the preconditioning matrices, when the number of layers (iterations) is fixed, indeed results in an accelerated convergence. The unfolding approach is better, even when only the measurement matrix is learned while employing a fixed-scalar step size. With both a learned ${\boldsymbol{A}}_{k}$ and ${\boldsymbol{G}}^{(t)}$, the unfolded network is the most successful.
Figure 3 demonstrates the relative error [as in (3)] as a function of iterations (layers). It follows that UPR methodology benefits from accelerated convergence, and an optimal point of the objective function is reached with as few as ${T} = {30}$ layers. This reveals that the proposed architecture may be truncated to even fewer iterations, thereby saving computations in the overall algorithm.
Figure 3. Convergence behavior of UPR [30] and model-based [40] algorithms for ${x} = {\Bbb{R}}^{n}$, ${n} = {100}$, and ${m} = {6}{n}$. This experiment follows the same setup of Figure S3 in terms of step size, learning rate, data set, and sensing matrices.
Unfolded Phase Retrieval Without Priors
Without sparsity prior, the optimization problem for phase retrieval follows directly as \[\mathop{\text{minimize}}\limits_{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}}{{\cal{L}}}\left({{\boldsymbol{z}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right)\equiv\frac{1}{2m}{\parallel{f}^{1}\left({{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}}\right){-}{f}^{1}\left({{\boldsymbol{z}}{;}{\boldsymbol{A}}_{k}}\right)\parallel}_{2}^{2} \tag{S1} \] where ${m} = {nL}$, L is the number of snapshots of the scene, and signal length ${n}$ is a constant parameter that controls the step size of each iteration.
The iterations for finding the critical points of this nonconvex problem are derived as follows. Starting from a proper initial point ${\boldsymbol{z}}^{(0)}$, a sequence of points $\left\{{{\boldsymbol{z}}^{(0)},{\boldsymbol{z}}^{(1)},{\boldsymbol{z}}^{(2)},{\ldots},{\boldsymbol{z}}^{{(}{T}{)}}}\right\}$ is generated according to the following update rule: \[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\boldsymbol{z}}^{(t)}{-}\frac{1}{m}{\boldsymbol{G}}^{{(}{t}{)}}{\nabla}_{\boldsymbol{z}}{{\cal{L}}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right){.} \tag{S2} \]
This update rule was shown to be highly effective in solving the phase-retrieval problem [40], [S1]. Therefore, it is instructive to further explore its potential in the unfolding context. The gradient of the objective function is \[{\nabla}_{\boldsymbol{z}}{{\cal{L}}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right) = {\boldsymbol{A}}_{k}^{H}\left({{\boldsymbol{A}}_{k}{\boldsymbol{z}}^{(t)}{-}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}\odot{Ph}{(}{\boldsymbol{A}}_{k}{\boldsymbol{z}}^{{(}{t}{)}}{)}}\right) \tag{S3} \] where ${\odot}$ is the inner product, and the function ${Ph}{(}{\boldsymbol{z}}{)}$ is applied elementwise and captures the phase of the vector argument; e.g., for real-valued signals ${Ph}{(}{\boldsymbol{z}}{)} = {sign}{(}{\boldsymbol{z}}{)}$.
Reference
[S1] G. Wang, L. Zhang, G. B. Giannakis, M. Akçakaya, and J. Chen, “Sparse phase retrieval via truncated amplitude flow,” IEEE Trans. Signal Process., vol. 66, no. 2, pp. 479–491, 2017, doi: 10.1109/TSP.2017.2771733.
In sparse phase retrieval (see “UPR With Sparsity Prior”), decoding signal x from the measurement vector ${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}$ is formulated as a nonconvex optimization problem (see the “Underlying Principles” section). Both the objective function ${{\cal{L}}}{(}{\boldsymbol{z}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))}$ and the constraint for this sparse problem are nonconvex, making it NP-hard in its general form. Figure S4 hows the relevant UPR architecture, which significantly outperforms the standard state-of-the-art, sparse model-based algorithm under various learned parameters. When the measurement matrix is imposed by the physical system, then learning the preconditioning matrices significantly increases the recovery performance. Considering learning only the matrix ${\boldsymbol{A}}_{k}$ and employing fixed step sizes further indicates the effectiveness of learning task-specific measurement matrices tailored to the application at hand.
Figure S4. (a) The unfolded network is trained using paired inputs and outputs by backpropagation to optimize the parameters ${\boldsymbol{G}}^{(t)}$ and ${\boldsymbol{A}}_{k}$. Here, the signal length n is a constant that controls the step size (equal to one) of each iteration. The learning rate was set to 10−4 for 100 epochs, using an Adam stochastic optimizer. (b) The empirical success rate obtained by averaging more than 100 Monte Carlo trials versus the measurement-to-signal-length ratio ${(m / n)}$ when the signal length is set to ${n} = {100}$ and ${s} = {5}$. The size of the training data set is 2,048, where each data point is an n-dimensional vector following a standard Gaussian distribution, whose ${(}{n}{-}{s}{)}$ entries set to zero are chosen uniformly at random. In the model-based case, the sensing matrix ${\boldsymbol{A}}_{k}$ follows a standard Gaussian distribution with a variance equal to one.
The objective of learning the preconditioning matrices is to accelerate convergence. This may be validated through a per-layer analysis of the sparse UPR framework in terms of achieved relative error [as in (3)]. The model-based nature of the proposed approach facilitates analysis of the resulting interpretable deep architectures as opposed to the conventional black-box data-driven approaches. Figure 4(a) demonstrates the relative error versus the number of iterations/layers for the case of recovering a five-sparse signal ${\boldsymbol{x}}\,{\in}\,{\Bbb{R}}^{n}$, ${n} = {m} = {300}$. It follows that the UPR yields both a lower relative error and faster convergence than the standard model-based algorithm [40]. The success rate versus the sparsity plot in Figure 4(b) further highlights the superior performance of UPR.
Figure 4. (a) The convergence behavior of UPR [40] and model-based [S1] algorithms for ${x} = {\Bbb{R}}^{n}$, ${n} = {m} = {300}$. The step size, learning rate, data set, and sensing matrices are as in Figure S4. (b) The empirical success rate versus the sparsity level ${s} = {5}$ for ${x} = {\Bbb{R}}^{n}$, ${n} = {m} = {150}$.
For 3D object detection [2] based on phase-only information from the CDP, a rapid estimation of the scene from the CDP is carried out before detecting the target using its optical phase through a cross-correlation analysis (template matching). For the estimation, filtered spectral initialization approximates the phase of the 3D object from the acquired CDP. This requires low-pass filtering of the leading eigenvector of matrix ${\Lambda} = {(}{1} / \mid{\cal{I}}_{0}\mid{)}{\Sigma}_{{i}\,{\in}\,{\cal{I}}_{0}}{(}{\boldsymbol{a}}_{k,i}{\boldsymbol{a}}_{k,i}^{H}{)} / {\parallel{\boldsymbol{a}}_{k,i}\parallel}_{2}^{2}$. Here, unfolding (Figure 5) is employed to learn the low-pass filter [31], where ${\cal{I}}_{0}$ is as in the “UPR With Sparsity Prior” section. The low-pass filter, which facilitates accurate estimation of the phase of the object, is the key aspect of initialization.
Figure 5. (a) An algorithm for 3D phaseless object detection follows a power-iteration methodology and reduces computational complexity to estimate the leading eigenvector of ${\Lambda}$ using a low-pass filter ${\boldsymbol{W}}$. (b) The unfolded network is subsequently trained, following the setup in [31], via backpropagation using paired inputs and outputs to optimize the filter ${\boldsymbol{W}}$. (c) Validation of the detection technique using an experimental CDP acquired in the near zone [${\boldsymbol{A}}_{1}$ in (1)] [2]. CHF: circular harmonic filter.
The filtered spectral initialization follows a power-iteration method (see Figure 5), wherein ${\boldsymbol{z}}^{{(}{t} + {1}{)}} = {{\boldsymbol{W}}}\,{\ast}\,{(}{\Lambda}{\boldsymbol{z}}^{(t)}{)}$ where ${\boldsymbol{z}}^{(0)}$ is chosen at random, W is the low-pass filter, and ${\ast}$ is the convolution operation. This product is then normalized. Iteratively applying W over the estimation of the image selects those low frequencies that sparsely represent the image in the Fourier domain. As the filtering process is a convolution, this selection is rapidly performed.
In the next step, a circular harmonic filter (CHF) (in Fourier domain polar coordinates) is built based on a reference pattern of the scene [2]. This system of coordinates is preferred to make CHF invariant to rotations so that the filter detects the object, regardless of any of its rotated versions. The correlation matrix ${\hat{\boldsymbol{C}}}$ between the CHF and the Fourier transform of the approximated optical field ${\hat{\boldsymbol{Z}}}$ is then computed. The target is detected using a thresholding approach by building the decision matrix \[{(}{\boldsymbol{R}}{)}_{u,v} = \begin{cases}\begin{array}{ll}{1,}&{\text{if }}{\left\vert{(}{\hat{\boldsymbol{C}}}{)}_{u,v}\right\vert}\,{≥}\,{\varepsilon}\,{\cdot}\,{\max}{(}{\hat{\boldsymbol{C}}}{)} \\ {0,}&{\text{otherwise}}\end{array}\end{cases} \tag{18} \] where ${\varepsilon}\,{\in}\,{(}{0},{1}{]}$ is the tolerance parameter. Figure 5 shows the unfolded detection performance using an experimental CDP in the near zone for a single snapshot. Here, the experimental data were acquired following the optical setup in [7].
Unfolded Phase Retrieval With Sparsity Prior
Starting from an initial guess ${\boldsymbol{z}}^{(0)}$, the algorithm estimates the signal in an iterative manner via update equations as \[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\cal{P}}_{s}\left({{\boldsymbol{z}}^{(t)}{-}\frac{\alpha}{m}{\nabla}_{\boldsymbol{z}}{{\cal{L}}}_{\text{tr}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right)}\right){.} \tag{S4} \]
These iterations are viewed as performing projected gradient descent with a per-iteration step size ${\alpha}$ on the truncated loss function objective ${{\cal{L}}}_{\text{tr}}$. The gradient is \[{\nabla}_{\boldsymbol{z}}{{\cal{L}}}_{\text{tr}}{(}{\boldsymbol{z}}^{{(}{t}{)}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))} = \mathop{\sum}\limits_{{i}\,{\in}\,{\cal{I}}_{t}}{\left({{a}_{k,i}^{H}{\boldsymbol{z}}^{(t)}{-}\left[{{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right]_{i}\frac{{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{(t)}}{\mid{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{(t)}\mid}}\right)}{a}_{k,i}, \tag{S5} \] where ${\cal{I}}_{t} = {\{}{1}\leq{i}{\leq}{m}{:}\mid{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{{(}{t}{)}}\mid{\geq} {\left[{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}\right]}_{i} / {(}{1} + {\tau}{)}{\}}$, The constant ${\tau}$ represents the truncation threshold and ${\cal{P}}_{s}{(}{{\boldsymbol{u}}}{)}{:}{\Bbb{C}}^{n}\rightarrow{\Bbb{C}}^{n}$ sets all entries of ${\boldsymbol{u}}$ to zero except for the ${s}$ entries with the largest magnitudes. This truncation addresses the noncontinuity of the gradient by eliminating the terms that could lead to unbounded descent direction [S2]. This provides strong motivation to investigate this procedure for unfolding.
The vector ${\boldsymbol{z}}^{(0)}$ is initialized as ${\sqrt{{\Sigma}_{{i} = {1}}^{m}\left[{{f}^{1}{(}{x}{;}{A}_{k}{)}}\right]_{i}^{2} / {m}}}{\hat{\boldsymbol{z}}}^{(0)}$ where ${\hat{\boldsymbol{z}}}^{(0)}\,{\in}\,{\Bbb{C}}^{n}$ is created by placing zeros in ${\hat{\boldsymbol{z}}}_{\hat{\cal{S}}}^{(0)}$ where the indices are not in $\hat{\cal{S}}$ (the indices with the s-largest values of ${\hat{\boldsymbol{z}}}^{(0)}$). The principal eigenvector ${\hat{\boldsymbol{z}}}_{\hat{\cal{S}}}^{(0)}\,{\in}\,{\Bbb{C}}^{s}$ is determined by performing power-method iterations on the matrix \[{\Lambda} = \frac{1}{\mid{\cal{I}}_{0}\mid}\mathop{\sum}\limits_{{i}\,{\in}\,{\cal{I}}_{0}}{\frac{{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}^{H}}{{\parallel{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}\parallel}_{2}^{2}}}{.} \tag{S6} \]
The condition ${\Bbb{E}}\left[{{\Sigma}_{{\ell} = {1}}^{L}{\bar{\boldsymbol{D}}}_{\ell}{\boldsymbol{D}}_{\ell}}\right] = {r}{\boldsymbol{I}}$ imposed over the set of diffractive optical elements to guarantee uniqueness is needed for this initialization to return an accurate estimation of ${\boldsymbol{x}}$ [3]. This result connects the initialization procedure with the recovery conditions of coded diffraction pattern phase retrieval.
[S2] Y. Chen and E. Candes, “Solving random quadratic systems of equations is nearly as easy as solving linear systems,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 739–747.
Phase imaging is of high interest in autonomous vehicles, the detection of moving objects, and precision agriculture. Here the 3D shape of an object is recovered by solving the nonconvex phase retrieval in a setup that records CDPs. Usually, the acquisition of several snapshots from the scene is required. Recently, a single-shot technique was proposed in [17]; it accurately estimates the optical phase ${\varphi}{(}{x},{y}{)}$ across the spatial domain (x, y) of the object following the pipeline of Figure 5. The estimated phase is then used to infer the 3D shape by performing an unwrapping over ${\varphi}{(}{x},{y}{)}$ to obtain ${\varphi}_{\text{unwrapp}}{(}{x},{y}{)} = {\varphi}{(}{x},{y}{)} + {2}{k}{\pi}$ where k is an integer that represents the projection period [17]. The depth coordinate is based on the difference between the measured phase ${\varphi}_{\text{unwrapp}}$ and the phase value from a reference plane. This reference plane is obtained during calibration of the acquisition system [17]. Thus, the depth coordinate of the 3D object is \begin{align*}{\text{depth}}{(}{x},{y}{)} = & {c}_{0} + {c}_{1}{(}{\varphi}_{\text{unwrapp}}{(}{x},{y}{)} \\ & {-}{\varphi}_{0}{(}{x},{y}{)}{)} \tag{19} \end{align*} where ${\varphi}_{0}{(}{x},{y}{)}$ is the reference phase, and ${\left\{{{c}_{i}}\right\}}_{{i} = {0}}^{1}$ are tunable constants.
The advantage of the DOE lies in its imaging capability to successfully recover the phase without additional optical elements (such as lenses), leading to even more compact imaging devices [6]. The eschewing of the lens, which is the case in phase retrieval from CDPs, makes the system not only light and cost-effective but also lens-aberration free and with a larger field of view. Here we summarize a few recent developments in experimental setups that employ DOEs and are of interest to enable the use of unfolding-aided phase-retrieval techniques.
In case of lensless superresolution microscopy for middle-zone CDP observations (Figure 1 and Table 1), the propagation operator ${\cal{P}}_{z}{(}{\boldsymbol{w}}{)}$ at a distance z is precisely modeled through convolution of a wavefront w with propagation kernel ${\boldsymbol{K}}_{z}$ as ${\cal{P}}_{z}{(}{\boldsymbol{w}}{)} = {\boldsymbol{F}}^{{-}{1}}\left\{{{\boldsymbol{K}}_{z}\,{\cdot}\,{\boldsymbol{F}}\left\{{\boldsymbol{w}}\right\}}\right\}$ where \[{\boldsymbol{K}}_{z} = \begin{cases}\begin{array}{ll}{{e}^{i\frac{{2}{\pi}}{\lambda}z\sqrt{{1}{-}{\lambda}^{2}\left({{f}_{x}^{2} + {f}_{y}^{2}}\right)}},}&{{f}_{x}^{2} + {f}_{y}^{2}\leq\frac{1}{{\lambda}^{2}}}\\{0,}&{\text{otherwise.}}\end{array}\end{cases}, \tag{20} \]
Considering the spatial distribution of the DOE, the observation model [6] for the scene x and the DOE D is ${\boldsymbol{y}} = \left|{{\cal{P}}_{{z}_{2}}\left({{\boldsymbol{D}}\,{\cdot}\,{\cal{P}}_{{z}_{1}}\left({\boldsymbol{x}}\right)}\right)}\right|^{2}$ where z1 and z2 are object-mask and mask-sensor distances, respectively (see Table 1). In a phase retrieval with a DOE setting, the object, mask, and sensor planes may have different sampling intervals, ${\Delta}_{o}$, ${\Delta}_{m}$, and ${\Delta}_{s}$, respectively. If the resulting pattern y has computational pixels smaller than the physical sensor $\left({{\Delta}_{c}\,{<}\,{\Delta}_{s}}\right)$ and the reconstruction of the object is produced with ${\Delta}_{c}$, it leads to a subpixel resolution or superresolution scenario. The ratio ${r}_{s} = {\Delta}_{s} / {\Delta}_{c}\,{≥}\,{1}$ is the superresolution factor.
With the objective of manufacturing the DOE, the matrix D is modeled as ${D}_{s,u} = \exp{(}{j}{\phi}_{s,u}{)}$ for ${u},{s} = {1},{\ldots},{n}$ where ${\phi}_{s,u}$ takes value from a uniform random variable d with two possible equiprobable coding elements zero and 2.62rad i.e, ${d} = \left\{{\exp{(}{j}{0}{),}\exp{(}{2}{.}{62}{j}{r}{a}{d}{)}}\right\}$. These values of d are chosen because they lead to the best performance in terms of reconstruction quality [6] (recall ${\Bbb{E}}{[}{d}{]} {\ne} ={0}{)}$. Figure 1 shows the resulting DOE (manufactured in Tampere’s Photonics Laboratory) with the pixel size of ${\Delta}_{m} = {1}{.}{73}{\mu}{m}$. This size is half of the sensor, and hence a better fit for the superresolution factor ${r}_{s}$ in powers of two. The compact microscope in Figure 1 enables a full wavefront reconstruction of dynamic scenes. Contrary to existing phase-retrieval setups, this unique system is capable of computational, superresolved video reconstruction via phase retrieval, with a high frame rate limited only by the camera.
To reconstruct the signal of interest as reported in the bottom of Figure 1(b), [6] studied the impact of a special multiplicative calibration term, which allows for building a good propagation model to compensate for the errors of theoretical models. This corrective term was shown to improve the imaging quality and resolution. However, it is constant across iterations. Therefore, it may benefit from a reconditioning unfolded method, wherein the correction is varied on a per-iteration basis.
Hyperspectral complex-domain imaging is a comparatively new development that deals with the phase delay of coherent light in transparent or reflective objects [49]. The hyperspectral broadband-phase imaging is more informative than the monochromatic technique (see the near-zone setup and results in Figure 6). Conventionally, for the processing of hyperspectral images, 2D spectral narrow-band images are stacked together and represented as 3D cubes with two spatial coordinates, (x, y), and a third longitudinal spectral coordinate. In hyperspectral phase imaging, for a given wavelength ${\lambda}$, the data ${\boldsymbol{x}}_{\lambda}\,{\in}\,{\Bbb{C}}^{n}$ in these 3D cubes are complex valued with spatially and spectrally varying amplitudes and phases. Denote the set of wavelengths under study by ${\cal{L}}$. From (1), the measurement matrix for diffraction zone k and wavelength ${\lambda}$ is ${\boldsymbol{A}}_{{k},{\lambda}}$. Then, the noisy phaseless measurements are [49] \[{\boldsymbol{y}}_{k} = \mathop{\sum}\limits_{{\lambda}\,{\in}\,{{\cal{L}}}}{\mid{\boldsymbol{A}}_{{k},{\lambda}}{\boldsymbol{x}}_{\lambda}\mid^{2}} + {\epsilon}{.} \tag{21} \]
Figure 6. (a) A hyperspectral optical setup. “Laser” is a broadband coherent light source, “DOE” is modeled as a spatial light modulator, and “Sensor” is a registration camera. (b) The reconstructed phase and amplitude from CDPs in the near zone [${\boldsymbol{A}}_{1}$ in (1)] for the ${\lambda} = {640}{-} {nm}$ wavelength in the presence of Gaussian (top right) and Poisson noise (bottom right) with a signal-to-noise ratio (SNR) equal to 34 dB in each case. The images are shown in square frames to emphasize that the size and location of the object’s support are assumed unknown and reconstructed automatically. Support of the true image is used only for computation of observations produced for the zero-padded object. For the amplitudes, these frames have nearly zero values, while the phase estimates take random values in the frame area as the phase cannot be defined when amplitudes vanish.
The reconstruction of ${\boldsymbol{x}}_{\lambda}$ here is more ill-conditioned than the classical phase-retrieval problem because the acquired data are summed across wavelengths of phaseless measurements. This sum does not appear in the classical phase retrieval, which employs a single wavelength. This also makes phase image processing more complex than the hyperspectral intensity imaging, where the corresponding 3D cubes are real valued.
Model-based algorithms require significant amounts of measurements to efficiently recover the complex-valued ${\boldsymbol{x}}_{\lambda}$ [49]. As suggested by the aforementioned numerical results, the preconditioning unfolding is useful in the hyperspectral, complex-domain phase-retrieval problem because it facilitates reduction of the condition number, admitting fewer data samples and converging quickly.
In certain CDP applications, the combination of blind deconvolution, superresolution, and phase retrieval naturally manifests. Although this is a severely ill-posed problem, it has been shown [52] that image of interest could be estimated in polynomial time. The approach relies on previous results, which established the DOE design to achieve high-quality images [5] and partially analyzed the combined problem by solving the superresolution phase-retrieval problem [7] from a CDP (Figure 7). These designs are obtained by exploiting the model of the physical setup using ML methods, where the DOE is modeled as a layer of a neural network (a data-driven model or unrolled) that is trained to act as an estimator of the true image [33]. This data-driven design has shown outstanding imaging quality, using a single snapshot as well as robustness against noise.
Figure 7. (a) A blind deconvolution superresolution phase-retrieval optical setup comprising a ${\lambda} / {4}$ retardation plate, polarizer $(P)$, lenses $({L}_{1}$ and ${L}_{2})$, beamsplitter (BS), a DOE physically implemented using a spatial light modulator (SLM), sensor, and registration camera. (b) The recovered amplitude and phase of the object “TUT” (the initials of Tampere University of Technology) from CDPs at the near zone [${\boldsymbol{A}}_{1}$ in (1)] with a pixelwise resolution factor of four [7]. The SLM was configured to work as a phase-only modulator because the device is not very accurate in modulating both magnitudes and phases [48].
The plethora of phase-retrieval algorithms and their relative advantages make it difficult to conclude which of them will always be best. The only foregone conclusion has been that the computational imaging community needs to continue basing these algorithms on models devoid of several practical measurement errors. In this context, deep unfolding methods are gaining salience in optical phase-retrieval problems to offer interpretability, state-of-the-art performance, and high computational efficiency. By learning priors from the trained data, these empirically designed architectures have been exploited to build globally converged learning networks for a specific phase-retrieval setting. A particularly desired outcome of this development is improvement in the design of DOEs that satisfies theoretical conditions for the uniqueness of signal recovery.
We provided examples of several recent novel setups, which indicate that unfolding has the potential to enable a new generation of optical devices as a consequence of co-designing optics and recovery algorithms. Several applications of lensless imaging, synthetic apertures, and highly ill-posed variants of phase retrieval appear within reach in the near future using this technique, including in areas such as optical metrology, digital holography, and medicine.
Samuel Pinilla (samuel.pinilla@stfc.ac.uk) received his Ph.D. degree in electrical engineering. He is with the University of Manchester at Harwell Science and Innovation campus, Oxfordshire, Oxon OX11 0FA, U.K. He is a Member of IEEE.
Kumar Vijay Mishra (kvm@ieee.org) received his Ph.D. in electrical and computer engineering, He is with the United States DEVCOM Army Research Laboratory, Adelphi, MD 20783 USA. He is a Senior Fellow of IEEE.
Igor Shevkunov (igor.shevkunov@tuni.fi) received his Ph.D. degree in optics. He is with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland.
Mojtaba Soltanalian (msol@uic.edu) received his Ph.D. degree in electrical engineering. He is with the University of Illinois Chicago, Chicago, 60607 USA. He is a Senior Member of IEEE.
Vladimir Katkovnik (vladimir.katkovnik@tuni.fi) received his Ph.D. degree in technical cybernetics. He is with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland and Noiseless Imaging Ltd., Tampere, 33720, Finland.
Karen Egiazarian (karen.eguiazarian@tuni.fi) received his Ph.D. degree in physics and mathematics. Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland and Noiseless Imaging Ltd., Tampere, 33720, Finland. He is a Fellow of IEEE.
[1] S. Pinilla, J. Bacca, and H. Arguello, “Phase retrieval algorithm via nonconvex minimization using a smoothing function,” IEEE Trans. Signal Process., vol. 66, no. 17, pp. 4574–4584, Sep. 2018, doi: 10.1109/TSP.2018.2855667.
[2] A. Jerez, S. Pinilla, and H. Arguello, “Fast target detection via template matching in compressive phase retrieval,” IEEE Trans. Comput. Imag., vol. 6, pp. 934–944, May 2020, doi: 10.1109/TCI.2020.2995999.
[3] A. Guerrero, S. Pinilla, and H. Arguello, “Phase recovery guarantees from designed coded diffraction patterns in optical imaging,” IEEE Trans. Image Process., vol. 29, pp. 5687–5697, Apr. 2020, doi: 10.1109/TIP.2020.2985208.
[4] S. Pinilla, H. García, L. Díaz, J. Poveda, and H. Arguello, “Coded aperture design for solving the phase retrieval problem in X-ray crystallography,” J. Comput. Appl. Math., vol. 338, pp. 111–128, Aug. 2018, doi: 10.1016/j.cam.2018.02.002.
[5] J. Bacca, S. Pinilla, and H. Arguello, “Coded aperture design for super-resolution phase retrieval,” in Proc. Eur. Signal Process. Conf., 2019, pp. 1–5, doi: 10.23919/EUSIPCO.2019.8903020.
[6] P. Kocsis, I. Shevkunov, V. Katkovnik, H. Rekola, and K. Egiazarian, “Single-shot pixel super-resolution phase imaging by wavefront separation approach,” Opt. Exp., vol. 29, no. 26, pp. 43,662–43,678, 2021, doi: 10.1364/OE.445218.
[7] V. Katkovnik, I. Shevkunov, N. V. Petrov, and K. Egiazarian, “Computational super-resolution phase retrieval from multiple phase-coded diffraction patterns: Simulation study and experiments,” Optica, vol. 4, no. 7, pp. 786–794, 2017, doi: 10.1364/OPTICA.4.000786.
[8] E. J. Candés, X. Li, and M. Soltanolkotabi, “Phase retrieval from coded diffraction patterns,” Appl. Comput. Harmon. Anal., vol. 39, no. 2, pp. 277–299, 2015, doi: 10.1016/j.acha.2014.09.004.
[9] D. Gross, F. Krahmer, and R. Kueng, “Improved recovery guarantees for phase retrieval from coded diffraction patterns,” Appl. Comput. Harmon. Anal., vol. 42, no. 1, pp. 37–64, 2017, doi: 10.1016/j.acha.2015.05.004.
[10] M. Hayes, “The reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 2, pp. 140–154, Apr. 1982, doi: 10.1109/TASSP.1982.1163863.
[11] T.-C. Poon and J.-P. Liu, Introduction to Modern Digital Holography: With MATLAB. Cambridge, U.K.: Cambridge Univ. Press, 2014.
[12] U. Dürig, D. W. Pohl, and F. Rohner, “Near-field optical-scanning microscopy,” J. Appl. Phys., vol. 59, no. 10, pp. 3318–3327, 1986, doi: 10.1063/1.336848.
[13] C. Jahncke, M. Paesler, and H. Hallen, “Raman imaging with near-field scanning optical microscopy,” Appl. Phys. Lett., vol. 67, no. 17, pp. 2483–2485, 1995, doi: 10.1063/1.114615.
[14] H. Hess, E. Betzig, T. Harris, L. Pfeiffer, and K. West, “Near-field spectroscopy of the quantum constituents of a luminescent system,” Science, vol. 264, no. 5166, pp. 1740–1745, 1994, doi: 10.1126/science.264.5166.1740.
[15] T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: Quasi-coherent coding,” Appl. Opt., vol. 57, no. 11, pp. 2841–2850, 2018, doi: 10.1364/AO.57.002841.
[16] M. Sao, Y. Nakamura, K. Tajima, and T. Shimano, “Lensless close-up imaging with Fresnel zone aperture,” Japanese J. Appl. Phys., vol. 57, no. 9S1, p. 09SB05, 2018, doi: 10.7567/JJAP.57.09SB05.
[17] S. Pinilla, L. Galvis, K. Egiazarian, and H. Arguello, “Single-shot coded diffraction system for 3D object shape estimation,” Electron. Imag., vol. 2020, no. 14, pp. 59–1, 2020, doi: 10.2352/ISSN.2470-1173.2020.14.COIMG-059.
[18] H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, “Shift-variant color-coded diffractive spectral imaging system,” Optica, vol. 8, no. 11, pp. 1424–1434, 2021, doi: 10.1364/OPTICA.439142.
[19] K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, “Learned reconstructions for practical mask-based lensless imaging,” Opt. Exp., vol. 27, no. 20, pp. 28,075–28,090, 2019, doi: 10.1364/OE.27.028075.
[20] V. Karitans, E. Nitiss, A. Tokmakovs, M. Ozolinsh, and S. Logina, “Optical phase retrieval using four rotated versions of a single binary amplitude modulating mask,” J. Astronomical Telescopes, Instrum., syst., vol. 5, no. 3, p. 039004, 2019, doi: 10.1117/1.JATIS.5.3.039004.
[21] Z. Tong, X. Ren, Q. Ye, D. Xiao, J. Huang, and G. Meng, “Quantitative reconstruction of the complex-valued object based on complementary phase modulations,” Ultramicroscopy, vol. 228, Sep. 2021, Art. no. 113343, doi: 10.1016/j.ultramic.2021.113343.
[22] D. Morales, A. Jerez, and H. Arguello, “Object classification using deep neural networks from coded diffraction patterns,” in Proc. Symp. Image, Signal Process. Artif. Vis., 2021, pp. 1–5, doi: 10.1109/STSIVA53688.2021.9591996.
[23] C.-J. Wang, C.-K. Wen, S.-H. L. Tsai, and S. Jin, “Phase retrieval with learning unfolded expectation consistent signal recovery algorithm,” IEEE Signal Process. Lett., vol. 27, pp. 780–784, Apr. 2020, doi: 10.1109/LSP.2020.2990767.
[24] C.-J. Wang, C.-K. Wen, S.-H. Tsai, S. Jin, and G. Y. Li, “Phase retrieval using expectation consistent signal recovery algorithm based on hypernetwork,” IEEE Trans. Signal Process., vol. 69, pp. 5770–5783, Oct. 2021, doi: 10.1109/TSP.2021.3118494.
[25] N. Naimipour, S. Khobahi, and M. Soltanalian, “UPR: A model-driven architecture for deep phase retrieval,” in Proc. Asilomar Conf. Signals, Syst., Comput., 2020, pp. 205–209, doi: 10.1109/IEEECONF51394.2020.9443438.
[26] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 399–406, doi: 10.5555/3104322.3104374.
[27] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in imaging,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 39–56, May 2020, doi: 10.1109/JSAIT.2020.2991563.
[28] R. Liu, S. Cheng, L. Ma, X. Fan, and Z. Luo, “Deep proximal unrolling: Algorithmic framework, convergence analysis and applications,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 5013–5026, Oct. 2019, doi: 10.1109/TIP.2019.2913536.
[29] W.-L. Hwang and A. Heinecke, “Un-rectifying non-linear networks for signal representation,” IEEE Trans. Signal Process., vol. 68, pp. 196–210, 2020, doi: 10.1109/TSP.2019.2957607.
[30] N. Naimipour, S. Khobahi, and M. Soltanalian, “Unfolded algorithms for deep phase retrieval,” 2020, arXiv:2012.11102.
[31] D. Morales, A. Jerez, and H. Arguello, “Learning spectral initialization for phase retrieval via deep neural networks,” Appl. Opt., vol. 61, no. 9, pp. F25–F33, 2022, doi: 10.1364/AO.445085.
[32] J. Estupiñán, A. Jerez, J. Bacca, and H. Arguello, “Deep unrolled phase retrieval approach from coded diffraction patterns,” in Proc. Symp. Image, Signal Process. Artif. Vis., 2021, pp. 1–4, doi: 10.1109/STSIVA53688.2021.9591671.
[33] S. R. M. Rostami, S. Pinilla, I. Shevkunov, V. Katkovnik, and K. Egiazarian, “Power-balanced hybrid optics boosted design for achromatic extended depth-of-field imaging via optimized mixed OTF,” Appl. Opt., vol. 60, no. 30, pp. 9365–9378, 2021, doi: 10.1364/AO.434852.
[34] S. Pinilla, J. Poveda, and H. Arguello, “Coded diffraction system in X-ray crystallography using a Boolean phase coded aperture approximation,” Opt. Commun., vol. 410, pp. 707–716, Mar. 2018, doi: 10.1016/j.optcom.2017.11.035.
[35] C. V. Correa, H. Arguello, and G. R. Arce, “Spatiotemporal blue noise coded aperture design for multi-shot compressive spectral imaging,” J. Opt. Soc. Amer. A, vol. 33, no. 12, pp. 2312–2322, 2016, doi: 10.1364/JOSAA.33.002312.
[36] C. Fienup and J. Dainty, “Phase retrieval and image reconstruction for astronomy,” in Image Recovery: Theory Application, H. Stark, Ed. New York, NY, USA: Elsevier, 1987, pp. 231–275.
[37] S. Pinilla, S. R. M. Rostami, I. Shevkunov, V. Katkovnik, and K. Eguiazarian, “Hybrid diffractive optics design via hardware-in-the-loop methodology for achromatic extended-depth-of-field imaging,” 2022, arXiv:2203.16407.
[38] X. Dun, H. G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica, vol. 7, no. 8, pp. 913–922, 2020, doi: 10.1364/OPTICA.394413.
[39] S. R. M. Rostami, S. Pinilla, I. Shevkunov, V. Katkovnik, and K. Egiazarian, “On design of hybrid diffractive optics for achromatic extended depth-of-field (EDoF) RGB imaging,” 2022, arXiv:2203.16985.
[40] H. Zhang and Y. Liang, “Reshaped Wirtinger flow for solving quadratic system of equations,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2622–2630.
[41] J. Nocedal and S. J. Wright, Numerical Optimization. New York, NY, USA: Springer-Verlag, 1999.
[42] J. Jia and K. Rohe, “Preconditioning the lasso for sign consistency,” Electron. J. Statist., vol. 9, no. 1, pp. 1150–1172, 2015, doi: 10.1214/15-EJS1029.
[43] F. L. Wauthier, N. Jojic, and M. I. Jordan, “A comparative framework for preconditioned lasso algorithms,” in Proc. Adv. Neural Inf. Process. Syst., 2013, vol. 26.
[44] J. A. Kelner, F. Koehler, R. Meka, and D. Rohatgi, “On the power of preconditioning in sparse linear regression,” in Proc. 2022 IEEE 62nd Annu. Symp. Found. Comput. Sci. (FOCS), pp. 550–561, doi: 10.1109/FOCS52979.2021.00061.
[45] E. Mojica, C. V. Correa, and H. Arguello, “High-resolution coded aperture optimization for super-resolved compressive x-ray cone-beam computed tomography,” Appl. Opt., vol. 60, no. 4, pp. 959–970, 2021, doi: 10.1364/AO.413695.
[46] Y. Mejia and H. Arguello, “Binary codification design for compressive imaging by uniform sensing,” IEEE Trans. Image Process., vol. 27, no. 12, pp. 5775–5786, Dec. 2018, doi: 10.1109/TIP.2018.2857445.
[47] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds,” in Proc. Adv. Neural Inf. Process. Syst., 2018, vol. 31.
[48] “PLUTO-2.1 LCOS spatial light modulator phase only (Reflective).” HOLOEYE Pioneers in Photonic Technology. Accessed: Jun. 4, 2022. [Online] . Available: https://holoeye.com/slm-pluto-phase-only/
[49] V. Katkovnik, I. Shevkunov, and K. Egiazarian, “ADMM and spectral proximity operators in hyperspectral broadband phase retrieval for quantitative phase imaging,” 2021, arXiv:2105.07891.
[50] S. Pinilla, K. V. Mishra, and B. M. Sadler, “Non-convex recovery from phaseless low-resolution blind deconvolution measurements using noisy masked patterns,” in Proc. 55th Asilomar Conf. Signals, Syst., Comput., 2021, pp. 876–880, doi: 10.1109/IEEECONF53345.2021.9723219.
Digital Object Identifier 10.1109/MSP.2022.3214325