Unfolding-Aided Bootstrapped Phase Retrieval in Optical Imaging

Unfolding-Aided Bootstrapped Phase Retrieval in Optical Imaging: Explainable AI reveals new imaging frontiersSamuel Pinilla, Kumar Vijay Mishra, Igor Shevkunov, Mojtaba Soltanalian, Vladimir Katkovnik, Karen Egiazarian40msp02-pinilla-opener-3214325Â©SHUTTERSTOCK.COM/PAPAPIGPhase retrieval in optical imaging refers to the recovery of a complex signal from phaseless data acquired in the form of its diffraction patterns. These patterns are acquired through a system with a coherent light source that employs a diffractive optical element (DOE) to modulate the scene, resulting in coded diffraction patterns (CDPs) at the sensor. Recently, the hybrid approach of a model-driven network or deep unfolding has emerged as an effective alternative to conventional model- and learning-based phase-retrieval techniques because it allows for bounding the complexity of algorithms while also retaining their efficacy. Additionally, such hybrid approaches have shown promise in improving the design of DOEs that follow theoretical uniqueness conditions. There are opportunities to exploit novel experimental setups and resolve even more complex DOE phase-retrieval applications. This article presents an overview of algorithms and applications of deep unfolding for bootstrappedâ€”regardless whether near, middle, or far zonesâ€”phase retrieval.IntroductionPhase retrieval in diffractive optical imaging (DOI) deals with the estimation of a complex domain signal from magnitude-only data in the form of its diffraction patterns [1]. This problem has been at the center of exciting developments in DOI [2], [3], [4] during the last decade to improve achievable spatial resolution [5], [6], [7], lighter optical setups [6], and better detection [2]. When the diffraction patterns (phaseless data) are acquired through a setup that employs coherent light and a DOE (also as a coded aperture) to modulate the scene, phase CDPs are obtained. Typically, this setup allows for acquiring several snapshots of the scene by changing the spatial configuration of the DOE. The inclusion of the DOE in the phase-retrieval setting paves the way for uniqueness guarantees for all 2D signals up to a unimodular constant [3], [8], [9], overcoming the classical result in [10] that holds for only nonreducible signals.The CDP is experimentally acquired in three diffraction zones: near, middle, and far [3], [11]. Mathematically, assuming a diagonal matrix

${\boldsymbol{D}}_{\ell}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$

modeling the DOE for the

${\ell}$

-th snapshot,

${\ell} = {1},{\ldots},{L}$

the CDP consists of quadratic equations of the form

${y}_{{k},{i},{\ell}} = {\left\vert{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{D}}_{\ell}{\boldsymbol{x}}\right\vert}$

where

${\boldsymbol{a}}_{k,i}\,{\in}\,{\Bbb{C}}^{n}$

are the known wavefront propagation vectors associated with the kth diffraction for

${i} = {1},{\ldots},{n}$

,

${\boldsymbol{x}}\,{\in}\,{\Bbb{C}}^{n}$

is the unknown scene of interest; and k = 1, 2, 3 indexes the near, middle, and far zones, respectively. By harnessing specific properties of each diffraction zone, several advances in imaging applications have been made. The near zone is exploited in scanning near-field optical microscopy [12], Raman imaging [13], and spectroscopy [14]. Applications such as Fresnel holography [11] and lensless imaging [15] take advantage of the middle diffraction zone to develop new acquisition imaging devices [15], and optical elements such as Fresnel lenses [16]. The far or Fraunhofer diffraction zone has been instrumental in the development of applications such as crystallography, astronomical imaging, and microscopy [17].Motivated by the aforementioned applications and guided by the analytical models of the physical setups for a given diffraction zone, [3] has showed that the DOE and the phase-retrieval algorithm need to be jointly designed to guarantee unique recovery; these theoretical results were not possible a decade ago. The entries of each matrix

${\boldsymbol{D}}_{\ell}$

(a DOE for the

${\ell} $

-th snapshot) were assumed to be independent identically distributed from a discrete random variable d obeying

${\left\vert{d}\right\vert}\,{â‰¤}\,{1}$

. Interestingly, [3] also shows that the design criteria for DOEs is independent of the diffraction zone. This result provides a bootstrapped (regardless of the diffraction zone) phase retrieval in optical imaging. Table 1 summarizes common DOE phase-retrieval applications.Table 1. Common DOE-based optical imaging applications.pinilla_t1-3214325The role of DOEs in recovering the phase across all diffraction zones has been studied in [2], [7], [18], [19], [20], and [21], where either the DOE was modeled by specific measurement setups or machine learning (ML) was deployed to obtain data-driven model-free recovery. Although very flexible and powerful, model-based methods also yield significant reconstruction artefacts arising from imperfect system models, mismatch, calibration errors, hand-tuned parameters, and hand-picked regularizers. Furthermore, these methods take hundreds to thousands of iterations to converge, which is unacceptable for real-time imaging. These shortcomings are overcome by employing deep neural networks, whose expressive power allows training such that the resulting network acts as an estimator of the true signal. For example, [22] proposed a deep neural network for object classification from CDPs, achieving high stability in the classification performance when all CDPs are not employed. However, neural networks are treated as black boxes, leading to limited interpretability and, therefore, inadequate control. Further, the training times could be prohibitively longer.In this context, a hybrid model-based and data-driven technique has the potential to further improve data acquisition by effectively mitigating the adverse effects of both techniques [23], [24]. This so-called deep unfolding technique effectively facilitates the design of model-aware deep architectures based on well-established iterative signal processing techniques [25], [26], [27]. Unfolding exploits the data to enhance accuracy and performance so that the resulting interpretability leads to trusted outcomes, requiring fewer data, and a faster convergence rate. Another related work in [26] viewed the model-based iterative shrinkage thresholding algorithm as a recurrent neural network, where its activation functions are the shrinkage operator. In the context of inverse problems in imaging, [27] provides further insights into the theory and implementations of unrolling. The related terms include unrolling [28] and unrectifying [29]. Whereas the former is used for the algorithms employed by unfolding networks (or simply synonymously with unfolding), the latter refers to the process of transforming nonlinear activations in neural networks into data-dependent linear equations and constraints. Very recently, unfolded networks were proposed for both conventional and sparse phase-retrieval problems [30]. In this article, we present a bootstrapped overview of deep unfolding methods for imaging tasks in a CDP phase-retrieval setting to improve the design of DOEs that can follow theoretical conditions to meet uniqueness. We also summarize the use of designed DOEs together with their unfolded phase-retrieval (UPR) algorithms in terms of robustness against noise [31], with/without priors on the scene [30], and smoothness [32]. Finally, we review progress in application areas that enable experimental setups to implement these techniques to address even more ill-posed problems.Optimal DOE designThe design principles [3] of DOEs require that the set of matrices

${\boldsymbol{D}}_{\ell}$

satisfies

${\Bbb{E}}{[}{\Sigma}_{{\ell} = {1}}^{L}{\bar{\boldsymbol{D}}}_{\ell}{\boldsymbol{D}}_{\ell}{]} = {r}{\boldsymbol{I}},{\ell} = {1},{\ldots},{L}$

to guarantee uniqueness in recovery for all 2D scenes with high probability (regardless of the zone), with

${0}\,{<}\,{r}\,{â‰¤}\,{L}$

where

${L}\,{â‰¥}\,{c}_{0}{n}$

for some large constant

${c}_{0}\,{>}\,{0}$

, with I as the identity matrix, and

${\Bbb{E}}{[}{\cdot}{]}$

is the statistical expectation. The implication is that the variance of the entries of matrices

${\boldsymbol{D}}_{\ell}$

needs to be designed along all snapshots to guarantee unique recovery. More information is collected from the scene when this variance is maximized along the snapshots [3], [34]. In the following, we summarize the design principles for two common scenarios: without any prior assumption on the scene and when the scene is known to be sparse in some basis. As explained later, unfolding is applicable to both scenarios.Design without priors on the sceneThis is accomplished by either by targeting the design of the DOEs in the image-registration process or obtaining the sensing matrix

${\boldsymbol{A}}_{k}$

to acquire CDPs at the kth diffraction zone co-designed jointly with phase-retrieval algorithm. It is the latter where unfolding is commonly employed (see â€œA Comparison Between Diffractive Optical Element Design Methodologiesâ€).Greedy DOE methodologyFor ease of exposition, consider

${\boldsymbol{C}}^{\ell}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}}$

to be the 2D version of the diagonal matrix

${\boldsymbol{D}}_{\ell}$

whose entries are

${(}{\boldsymbol{D}}_{\ell}{)}_{q,q} = {(}{\boldsymbol{C}}^{\ell}{)}_{{q}{-}{vN},{v} + {1}}$

for

${v} = \left\lfloor{{q}{-}{1} / {N}}\right\rfloor$

,

${q} = {1},{\ldots},{n}$

and

${n} = {N}^{2}$

. The design principles are based on considering an admissible random variable

${d} = {\{}{e}_{1},{e}_{2},{e}_{3},{e}_{4}{\}}$

with probability

${\{}{1} / {4},{1} / {4},{1} / {4},{1} / {4}{\},}$

respectively, assuming that

${L} = {C}_{d}{b}$

for some integer

${b}\,{>}\,{0}$

where

${C}_{d} = {4}$

is the number of coding elements.In particular, the DOE design takes temporal correlation and spatial separation into account. For temporal correlation, the condition on the 2D version

${\boldsymbol{C}}^{\ell}$

of

${\boldsymbol{D}}_{\ell}$

can be satisfied if matrices

${\boldsymbol{C}}^{\ell}$

are constrained to have all the coding elements of d along the L projections at each particular spatial position of the ensemble [34] to directly maximize the variance of the entries along snapshots. It means that

${\{(}{\boldsymbol{C}}^{\ell}{)}_{s,u}\mid{\ell} = {1},{\ldots},{L}{\}}$

contains b times each possible value of d for any

${s},{u}\,{\in}\,{\{}{1},{\ldots},{N}{\}}$

. For spatial separation, it has been shown that a set of DOEs with an equispaced distribution of the coding elements improves the ability of data

${y}_{{k},{i},{\ell}}$

to uniquely identify the scene x and increases image reconstruction quality [34], [35].The overall design strategy comprises the following steps:

Step 1: Split into several cells. The matrix

${\boldsymbol{C}}^{\ell}$

is divided into squared cells (the green squares highlighted in Figure S1). The positions of all e_i of d in the first cell of

${\boldsymbol{C}}^{\ell}$

are chosen uniformly at random as

${(}{\boldsymbol{C}}^{\ell}{)}_{s,u}{âˆ¼}{U}\times$

${[\{}{e}_{1},{\ldots},{e}_{4}{\}}{-}{\{}{(}{\boldsymbol{C}}^{1}{)}_{s,u},{\ldots},{(}{\boldsymbol{C}}^{{\ell}{-}{1}}{)}_{s,u}{\}}{]}$

for s, u in the cell. Note that set subtraction

${\{}{e}_{1},{\ldots},{e}_{4}{\}}{-}{\{}{(}{\boldsymbol{C}}^{1}{)}_{s,u},{\ldots},{(}{\boldsymbol{C}}^{{\ell}{-}{1}}{)}_{s,u}{\}}$

guarantees that a different coding element of d is chosen for each of the first

${\ell}{-}{1}$

coded apertures.

Step 2: Determine the coding element location. Move to the cell on the right to determine at random a proper position for each coding element. These positions are determined by maximizing the Manhattan distance between pixels

$({\boldsymbol{C}}^{\ell}{)}_{{s}_{1},{u}_{1}}$

and

$({\boldsymbol{C}}^{\ell}{)}_{{s}_{2},{u}_{2}}$

in the cell on the left and right, respectively. The positions maximizing this distance for the e₂ element are one of the two highlighted yellow squares.

Step 3: Maximal spatial separation. Define

${\Omega}_{{e}_{i}}$

to be the set of distances between the pixel, with value e_i in the first cell and the positions

${(}{s}_{2},{u}_{2}{)}$

of the next right cell. The positions

${R}_{{e}_{i}}$

of maximal distance (as in the previous step) are obtained as

${R}_{{e}_{i}} = {\arg\max}{\Omega}_{{e}_{i}}$

. Define the set

${B}_{{e}_{i}} = {\{}{(}{s},{u}{)}\,{\in}\,{R}_{{e}_{i}}\mid{(}{\boldsymbol{C}}^{k}{)}_{s,u} = {e}_{i},{for}{ }{1}\,{â‰¤}\,{k}\,{â‰¤}\,{\ell}{-}{1}{\}}$

. From

${F}_{{e}_{i}} = $

${R}_{{e}_{i}}{-}{B}_{{e}_{i}}$

, we can randomly determine the position of the coding element e_i in the next cell. The positions in

${F}_{{e}_{i}}$

maximize the distance between pixels and guarantee choosing a different coding element than the first

${\ell}{-}{1}$

projections. For e₂, it would be the blue square highlighted in the next right cell.

Step 4: Optimize across the ${\ell}$-th DOE. Steps 2 and 3 are repeated for all coding elements e_i of the admissible variable d. Thus, the spatial distribution of the ${\ell}$-th DOE is optimized.

Step 5: Optimize across all DOEs. The set of DOEs maximizes the variances along snapshots until

${\ell} = {C}_{d}$

. If

${\ell} = {C}_{d} + {1}$

, a new DOE dedicated to optimizing the spatial distribution must be generated. Thus, from

${\ell} = {C}_{d} + {2}$

to

${\ell} = {2}{C}_{d}$

, the temporal and spatial correlation must be exploited considering steps 1â€“4 until the L projections are completed.

pinilla01-3214325Figure 1. (a) The physical implementation of a compact microscope using a DOE manufactured in Tampereâ€™s Photonics Laboratory with an electron beam lithography system on a fused silica glass and its fragments. The CDPs are acquired in the near zone [

${\boldsymbol{A}}_{1}$

in (1)]. When compared with a theoretically expected design, note that the fabricated DOE has imperfections arising from physical artefacts such as material imperfections and printing inaccuracies. The top and side views of scanning electron microscope images of the DOE are shown in the right half of the image and magnified further. (b) Frames from a complex-valued dynamic object video with amplitudes [in or airy units (a.u.) top row] and phases (in radians or rad; bottom row) of a moving single-celled eukaryote reconstructed using sparsity-based filtering method. The video footage consisted of 287 frames through 10 s (see visualization), of which frames 50, 100, 150, and 200 frames are reproduced (left to right) here.pinilla_s1-3214325Figure S1. A diffractive optical element design strategy using admissible random variable

${d} = \left\{{{e}_{1},{e}_{2},{e}_{3},{e}_{4}}\right\}$

with probability

${d} = \left\{{{1} / {4},{1} / {4},{1} / {4},{1} / {4}}\right\}$

, respectively, and

${C}_{d} = {4}$

.A Comparison Between Diffractive Optical Element Design MethodologiesGreedy methods design the diffractive optical element itself (instead of the sensing matrix). The experimental performance of this strategy for phase retrieval is typically not as expected because of imperfect system modeling; reconstruction artefacts such as mismatch, calibration errors, and hand-tuned parameters; and even poorer performance for a single snapshot.In unfolding-based techniques, the sensing matrix is designed across all diffraction zones. The performance of the unfolding design method overcomes the greedy methodology because the sensing matrix and the phase-retrieval algorithm are jointly designed; and the reconstruction algorithm is adjusted for the single-snapshot extreme case.End-to-end unfolding co-design of sensing matrixThe most important advantage of unfolding co-design is the ability to simultaneously design the entire or end-to-end sensing process and the phase-retrieval algorithm. Define the global noiseless measurement vector for the kth diffraction zone as

${\boldsymbol{y}}_{k} = {[}{y}_{k,1,1},{\ldots},{y}_{k,n,L}{]}^{T}\,{\in}\,{\Bbb{R}}^{{m} = {nL}}$

and consider the set of sensing matrices

$\{{\boldsymbol{A}}_{k}{\}}_{{k} = {1}}^{3}$

with

\begin{align*}&{\boldsymbol{A}}_{1} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\boldsymbol{F}}{\boldsymbol{\overline{T}}}{\boldsymbol{F}}^{H},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\boldsymbol{F}}{\boldsymbol{\overline{T}}}{\boldsymbol{F}}^{H}}\right]}^{H} \qquad {(}{\text{near zone }}{)} \\ & {\boldsymbol{A}}_{2} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\overline{\boldsymbol{Q}}}{\boldsymbol{F}},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\overline{\boldsymbol{Q}}}{\boldsymbol{F}}}\right]}^{H} \qquad \qquad \,\,{(}{\text{middle zone}}{)} \\ & {\boldsymbol{A}}_{3} = {\left[{{\overline{\boldsymbol{D}}}_{1}{\boldsymbol{F}}^{H},{\ldots},{\overline{\boldsymbol{D}}}_{L}{\boldsymbol{F}}^{H}}\right]}^{H} \qquad \qquad \quad {\text{ext}}{(}{\text{far zone}}{)} \tag{1} \end{align*}

where

${\boldsymbol{F}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$

is the discrete Fourier transform matrix, and

${\boldsymbol{T}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}$

${(}{\boldsymbol{Q}}\,{\in}\,{\Bbb{C}}^{{n}\,{\times}\,{n}}{)}$

is the auxiliary orthogonal diagonal matrix that depends on the propagation distance (wavelength of the coherent source) to model the near (middle) zone (see [11, Ch. 4] for further details). The sensing process for the kth diffraction zone is

${\boldsymbol{y}}_{k} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$

which corresponds to the first layer

${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$

of the deep network (Figure S2) to model the CDP for the kth diffraction zone.pinilla_s2-3214325Figure S2. A high-level overview of the unfolding method, resulting in the network layers

${f}^{1},{\ldots},{f}^{T}$

. The first layer

${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$

to model the coded diffraction pattern for the

${k}{th}$

diffraction zone. Each iteration for phase retrieval is transferred into network parameters

${\rho}^{2},{\ldots},{\rho}^{T}$

. These parameters are learned from training data sets through end-to-end training.We seek to recover the signal of interest x from the embedded phaseless data

${\boldsymbol{y}}_{k}$

given the knowledge of

${\boldsymbol{A}}_{k}$

by solving the following optimization problem:

\[{\cal{P}}_{0}{:} \qquad {\text{Find }}{\boldsymbol{x}}\,\,{\text{ subject to }}{\boldsymbol{y}}_{k} = {\mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid} \tag{2} \]

where the nonconvex constraint models the measurement process. Additionally, if

${\boldsymbol{x}}^{\dagger}$

is a feasible point of the problem

${P}_{0}$

then

${e}^{{-}{j}\varphi}{\boldsymbol{x}}^{\dagger}$

is also a solution to the problem for an arbitrary phase constant

${\varphi}$

. Hence, it is only possible to recover the underlying signal of interest up to a constant global phase

${\varphi}\in{\Bbb{R}}$

. This ambiguity arising from the global phase naturally leads to the following meaningful quantifying metric of closeness of the recovered signal

${\boldsymbol{x}}^{\dagger}$

to the true signal x:

\[{\text{dist}}{(}{\boldsymbol{x}},{\boldsymbol{x}}^{\dagger}{)} = \mathop{\min}\limits_{\varphi\in{[}{0},{2}{\pi}{)}}{\left\Vert{{\boldsymbol{x}}{-}{e}^{{-}{j}\varphi}{\boldsymbol{x}}^{\dagger}}\right\Vert}_{2}{.} \tag{3} \]

If

${\text{dist}}{(}{\boldsymbol{x}},{\boldsymbol{x}}^{\dagger}{)} = {0}$

, then x and

${\boldsymbol{x}}^{\dagger}$

are equal up to some global phase.The end-to-end unfolding co-design seeks to jointly design the sensing matrix (the encoder module or first layer in Figure S2) and the reconstruction algorithm (decoder module from layers 2 to T), in contrast to the existing methodologies that consider either the development of the reconstruction algorithm or the design of only DOEs. As a result, it is a unification of both approaches. Denote the class of decoder functions parameterized on a set of parameters as

$\Phi$

by

${S}_{\Phi}{:}{\Bbb{R}}^{m}\rightarrow{\Bbb{R}}^{n}$

. Assume that for a given measurement matrix

${\boldsymbol{A}}_{k}$

and the corresponding measurement vector

${\boldsymbol{y}}_{k}$

, an estimation

$\hat{\boldsymbol{x}}$

of the signal of interest is provided by a certain characterization of the decoder function, i.e.,

\[\hat{\boldsymbol{x}} = {S}_{\theta}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{)}\equiv{S}_{\theta}{(}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{)} \tag{4} \]

where

${\theta}\in\Phi$

denotes a characterization of the decoder module from the original class. In addition, as

${\boldsymbol{A}}_{k}$

models a physical image formation model, it is desired to define the set of feasible sensing matrices for the optical setup as

${T}\subset{\Bbb{C}}^{{m}\,{\times}\,{n}}$

. This set contains the physical constraints related to implementation of the DOE encapsulated in matrices

${\boldsymbol{D}}_{\ell}$

as in (1) (see the â€œPhysical Constraintsâ€ section for examples of these constraints). Consequently, this means that a realistic

${\boldsymbol{A}}_{k}$

depends on the feasibility of the DOE (see, for instance, Figure 1 for a practical implementation. Then, for a fixed characterization of the decoder function

${S}_{{\theta}\in\Phi}$

, the sensing matrix

${\boldsymbol{A}}_{k}\,{\in}\,{T}$

for the kth diffraction zone is obtained by solving the optimization problem

\[\mathop{\text{minimize}}\limits_{{\boldsymbol{A}}_{k}\,{\in}\,{T}}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{S}_{\theta}{(}{\boldsymbol{y}}_{k}\equiv{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{))}}\right] \tag{5} \]

where the expectation is over the distribution of x through the chosen image data set for end-to-end training. Note that this approach considers signal reconstruction accuracy as a criterion with respect to a particular realization of the decoder functions. Thus,

${\boldsymbol{A}}_{k}$

obtained in this technique is a task-specific encoding matrix that considers not only the underlying distribution of the signal but also the class of decoder functions, resulting in a superior performance. After solving the aforementioned optimization problem, the sensing matrix is utilized for data-acquisition purposes while the fixed decoder module

${S}_{\theta}$

is used for the reconstruction of x.DOE design with sparsity prior on the sceneIn optical applications such as radio astronomy [36] and microscopy [6], the scene x is naturally sparse in some orthogonal domain

$\boldsymbol{\Psi}$

, i.e.,

${\boldsymbol{\Psi}}^{H}\boldsymbol{\Psi} = {\boldsymbol{I}}{;}$

common examples include wavelet or discrete cosine transform. Then, there exists an s sparse

${s}\ll{n}$

representation

${\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}$

such that

${\left\Vert{\boldsymbol{z}}\right\Vert}_{0} = {s}$

where

${\boldsymbol{x}} = {\boldsymbol{\Psi}}^{H}{\boldsymbol{z}}$

and

${\left\Vert{\cdot}\right\Vert}_{0}$

are the

${\ell}_{0}$

pseudonorm. It was shown in [3] that very low estimation errors in the sparse representation z of x were obtained when the admissible random variable d that models the DOEs satisfies

${\Bbb{E}}{[}{d}{]}\,{â‰ }\,{0}$

. This result suggests the type of coding elements of d needed for the highest performance without any restriction on design strategy. Consequently, the unfolding co-design method is also valid for the sparsity prior scenarios; only the first layer needs to be changed to incorporate the aforementioned result by choosing an appropriate d. Note that d plays a crucial role in uniquely identifying x from the CDP and the flexibility of the unfolding framework to combine model-based theoretical results and neural networks. Thus, it is desirable to consider and replicate [3] for the analogous analysis of a given application to achieve optimal imaging quality.Physical constraintsThe optimization problem in (5) is intended to design the measurement matrix

${\boldsymbol{A}}_{k}$

for a given diffraction zone k. More physical constraints may be incorporated into the problem. For example, given the block-matrix structure in (1), the DOE may be modeled as a piecewise continuous function [37] wherein the constant intervals of the function are characterized by the random variable d. Similarly, the Fresnel order or DOE thickness may be described using a smoothing function [33]. In each one of these cases, the physical parameters are learnable.Fresnel order (thickness)Thickness of the DOE is defined as

${Q} = {2}{\pi}{m}_{Q}$

(in radians) where

${m}_{Q}$

is the â€œFresnel orderâ€ of the mask, which is not necessarily an integer. Denote the wavelength of the source by

${\lambda}_{0}$

and the resulting phase profile of the DOE by

${\varphi}_{{\lambda}_{0}}$

. Then, the phase profile of the DOE considering the thickness is given by

\[{(}{\hat{\varphi}}_{{\lambda}_{0}}{)}_{p,q} = \mod\left({{(}{\varphi}_{{\lambda}_{0}}{)}_{p,q} + \frac{Q}{2},{Q}}\right){-}\frac{Q}{2} \tag{6} \]

resulting in

${\hat{\varphi}}_{{\lambda}_{0}}$

taking values in the interval

${[}{-}{Q} / {2},{Q} / {2}{)}$

.Number of levelsThe DOE is defined on a discretized 2D grid. We obtain a piecewise-invariant surface for the DOE after a nonlinear transformation of its absolute phase. The uniform grid discretization of the wrapped phase profile

${\hat{\varphi}}_{{\lambda}_{0}}$

up to N levels is

\[{(}{\boldsymbol{\beta}}_{{\lambda}_{0}}{)}_{p,q} = \left\lfloor{({\hat{\varphi}}_{{\lambda}_{0}}{)}_{p,q}/N}\right\rfloor\cdot{N} \tag{7} \]

where

${\left\lfloor{w}\right\rfloor}$

denotes a floor operation that yields the largest integer less than or equal to w. The values of

${\boldsymbol{\beta}}_{{\lambda}_{0}}$

are restricted to

${[}{-}{Q} / {2},{Q} / {2}{)}$

where each step of the piecewise function is a coding element of d. Here Q is an upper bound of the thickness phase of

${\boldsymbol{\beta}}_{{\lambda}_{0}}$

.The floor and modulo functions are not differentiable. Therefore, a smoothing approximation is used for optimizing the DOEâ€™s thickness and number of levels (see [33] for details).Pixel resolution and size of the DOEThese parameters are also considered while designing the measurement matrix, because in practice, all sizes and resolutions may not be implementable [38]. Further details on the impact of these physical DOE constraints in a practical imaging application are available in [39].Unfolding for optical imagingCompared to some popular neural network architectures (Figure 2), the unfolded network is a directed acyclic graph that is trained like any standard neural network. However, the input to this unfolded network is not a single vector from the sequence, but rather it is the entire sequence used all at once. The output used to compute the gradients is also the entire sequence of output values that a typical neural network would have produced for each step in the input sequence. As such, the unfolded network has multiple inputs and outputs with identical weight matrices for each step.pinilla02-3214325Figure 2. The conventional network architectures popular in signal/image processing and computer vision applications. (a) In a multilayer perceptron, all the neurons are fully connected. (b) In a convolutional neural network, the neurons are sparsely connected and the weights are shared among different neurons and convolutional layers. (c) An unfolded neural network is a recurrent structure where connections between nodes form a directed or undirected graph along a temporal sequence.A preliminary step in unfolding is to mathematically formulate a proper parameterized class of decoder and encoder modules that facilitate incorporating the domain knowledge. To this end, the decoder is obtained by first formulating a phase-retrieval problem as a minimization program and then resorting it to first-order optimization techniques that lay the ground for creating a rich class of parameterized decoder functions. Once such a class is formalized, a connection between deep neural networks and first-order optimization techniques is established.Underlying principlesAssume

${\cal{S}}_{{\theta}_{1}\in\Phi}$

and

${\cal{S}}_{{\theta}_{2}\in\Phi}$

represent two realizations of the decoder module from the same class. Denote the fixed sensing matrix that will be the solution to (5) for

${\theta} = {\theta}_{2}$

by

${\boldsymbol{A}}_{k}$

for the kth diffraction zone. Then, for a signal of interest x and corresponding phaseless measurements

${\boldsymbol{y}}_{k}$

, it is possible that

\[{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}_{1}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]\leq{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}_{2}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]{.} \tag{8} \]

As a result, it is more appropriate to consider the design of the sensing matrix

${\boldsymbol{A}}_{k}$

with respect to

${\cal{S}}_{{\theta}_{1}}$

and find an optimal

${\theta}^{\ast}\in\Phi$

such that

\[{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{{\theta}^{\ast}}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right]\leq{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}{(}{\boldsymbol{y}}_{k}{;}{\boldsymbol{A}}_{k}{))}}\right],\forall{\theta}\in\Phi{.} \tag{9} \]

If such a characterization

${\theta}^{\ast}$

is known, the sensing matrix is easily designed. The aforementioned observation is further evidence of the paramount importance of developing a unified framework that allows for a joint optimization of the reconstruction algorithm (i.e., finding an optimal or suboptimal

${\theta}^{\ast})$

and the sensing matrix. Then, the problem of jointly designing

${\boldsymbol{A}}_{k}$

and the reconstruction algorithm is equivalent to the optimization program

\[\mathop{\text{minimize}}\limits_{{A}_{k}\,{\in}\,{\cal{T}},{\theta}\in\Phi}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}{(}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{);}{\boldsymbol{A}}_{k}{))}}\right]{.} \tag{10} \]

Next, we describe tackling this optimization problem using the available data.Encoder architectureDefine the hypothesis class

${\cal{H}}_{e}$

of the possible encoder functions (module) whose computation dynamics mimics the behavior of the data-acquisition system in a phase-retrieval model (e.g.,

${\Bbb{E}}{[}{d}{]}\,{â‰ }\,{0}$

, a temporal or spatial correlation of

${\boldsymbol{A}}_{k})$

. Then,

\[{\cal{H}}_{e} = \left\{{{f}^{1}{(}\,{\cdot}\,{;}{\boldsymbol{A}}_{k}{)}{:}{\boldsymbol{x}}\rightarrow\mid{\boldsymbol{A}}_{k}{x}\mid{:}{\boldsymbol{A}}_{k}\,{\in}\,{\Bbb{C}}^{{m}\,{\times}\,{n}}}\right\}{.} \tag{11} \]

The superscript in $f^{1}$ denotes that this hypothesis class corresponds to the first layer of the unfolded network. It can be interpreted as a one-layer neural network with n input neurons and m output neurons, where the activation function is

$\mid\cdot\mid$

with the trainable weight matrix

${\boldsymbol{A}}_{k}$

. The encoder (matrix

${\boldsymbol{A}}_{k})$

for a given realization of a decoder function

${\cal{S}}_{\theta}$

with respect to the hypothesis class

${\cal{H}}_{e}$

is designed by solving the following learning problem

\[\mathop{\text{minimize}}\limits_{{f}^{1}\,{\in}\,{\cal{H}}_{e}}{\Bbb{E}}\left[{{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{S}}_{\theta}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))}}\right] \tag{12} \]

Where

$\circ$

stands for the composition of functions. This problem seeks to find an encoder function

${f}^{1}{(}\,{\cdot}\,{;}{\boldsymbol{A}}_{k}{)}$

such that the resulting functions maximize the reconstruction accuracy according to the decoder function

${\cal{S}}_{\theta}$

.Decoder architectureThe goal here is to obtain an estimate of the underlying signal of interest from the phaseless measurements of the form

${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)} = \mid{\boldsymbol{A}}_{k}{\boldsymbol{x}}\mid$

for a given sensing matrix

${\boldsymbol{A}}_{k}$

of the kth diffraction zone. Equivalently, a decoder function seeks to tackle the following problem:

\[\mathop{\text{minimize}}\limits_{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}}\frac{1}{2m}{\left\Vert{{f}^{1}{(}{\boldsymbol{z}}{;}{\boldsymbol{A}}_{k}{)}{-}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right\Vert}_{2}^{2}{\text{ subject to }}{\boldsymbol{z}}\,{\in}\,{\Gamma} \tag{13} \]

where

${\Gamma}$

represents the search space of the underlying signal of interest. For instance, if x is s sparse, this information is encoded in

$\Gamma = \left\{{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}{:}{\parallel{\boldsymbol{z}}\parallel}_{0} = {s}}\right\}$

.Application to phase retrievalDeep unfolded networks have repeatedly displayed great promise in various signal processing applications [22], [31] by exploiting copious amounts of data along with the domain knowledge gleaned from the underlying problem. This is suitable for deploying phase retrieval in optical imaging, particularly in nonconvex settings where bounding the complexity of signal processing algorithms is often traded off for performance. In general, iterative-optimization techniques are a popular choice for nonconvex phase-retrieval problems. Specifically, first-order methods are widely used iterative-optimization techniques because of their low per-iteration complexity and efficiency in complex scenarios. One such prominent approach that is also suitable for training the decoder is the projected gradient descent (PGD) algorithm. Here, given an initial point

${\boldsymbol{z}}^{(0)}$

, the tth iteration of PGD is

\[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\cal{P}}_{\Gamma}\left({{\boldsymbol{z}}^{(t)}{-}\frac{{\alpha}^{(t)}}{m}{\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}^{(t)}{)}}\right) \tag{14} \]

where

${\cal{P}}_{\Gamma}{:}{\Bbb{C}}^{n}\rightarrow\Gamma\subseteq{\Bbb{C}}^{n}$

denotes a mapping function of the vector argument to the feasible set

${\Gamma}$

,

${\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}^{(t)}{)}$

is the gradient of the objective function obtained at the point

${\boldsymbol{z}}^{(t)}$

, and

${\alpha}^{(t)}$

represents the step size of the PGD algorithm at the tth iteration. To bound the computational cost of first-order methods, a sensible approach is to fix the total number of iterations of such algorithms (e.g., T), and to follow up with a proper choice of the parameters for each iteration, which results in the best improvement in the objective function while allowing only T iterations.Define a parameterized mapping operator

${f}^{t}{(}\,{\cdot}\,{;}{{\rho}}^{t}{)}{:}{\Bbb{C}}^{n}\rightarrow{\Bbb{C}}^{n}$

for

${t}\,{â‰¥}\,{2}$

(see Figure S2) as

\[{f}^{t}{(}{\boldsymbol{z}}{;}{{\rho}}^{t}{)} = {\cal{P}}_{\Gamma}\left({{\boldsymbol{z}}{-}\frac{1}{m}{\boldsymbol{G}}^{(t)}{\nabla}_{\boldsymbol{z}}{\ell}{(}{\boldsymbol{z}}{)}}\right) \tag{15} \]

where

${{\rho}}^{t} = \left\{{{\boldsymbol{G}}^{(t)}}\right\}$

is the set of parameters of the mapping function

${f}^{t}{(}\,{\cdot}\,{;}{{\rho}}^{t}{)}$

, and

${\boldsymbol{G}}^{(t)}$

is a positive-semidefinite matrix. This mapping is then utilized to model various first-order optimization techniques for a given problem. In particular, for the PGD algorithm, performing

${T}{-}{1}$

iterations of the aforementioned form (Figure S2) is modeled as

\[{\cal{N}}_{\Gamma}^{{T}{-}{1}}{(}{\boldsymbol{x}}{)} = {f}^{T}{(}\,{\cdot}\,{;}{{\rho}}^{T}{)}\circ{f}^{{T}{-}{1}}{(}\,{\cdot}\,{;}{{\rho}}^{{T}{-}{1}}{)}\,{\circ}\,{\cdots}\,{\circ}\,{f}^{2}{(}\,{\cdot}\,{;}{{\rho}}^{2}{)} \tag{16} \]

where the aforementioned function corresponds to the PGD in all scenarios, including for a fixed step size

${\boldsymbol{G}}^{(t)} = {\alpha}{\boldsymbol{I}}$

, time-varying step sizes

${\boldsymbol{G}}^{(t)} = {\alpha}^{(t)}{\boldsymbol{I}}$

, or the general preconditioned PGD algorithm with an arbitrary choice of

${\boldsymbol{G}}^{(t)}\succeq{0}$

.The specific preconditioning-based update rule in (18) is known to be advantageous in model-based algorithms to speed up the convergence rate [41, Ch. 5] and yield robustness when the sensing matrix is ill-conditioned (i.e., the condition number is far greater than one) [42], [43], [44]. This latter advantage is critical in practical imaging applications (listed in Table 1) to overcome imperfect system modeling, reconstruction artefacts such as mismatch, calibration errors, and even poorer performance for the single snapshot of an experimental sensing matrix [45], [46]. In the context of unfolding [47], the benefits of preconditioning appear indirectly because it was learned as a matrix that encapsulates both sensing (via its transpose) and preconditioning matrices.Finally, for a fixed

${\boldsymbol{A}}_{k}$

, learning the deep decoder module with respect to a realization of the encoder function corresponds to the problem

\[\mathop{\text{minimize}}\limits_{\Gamma = \left\{{{\boldsymbol{G}}^{(t)}\succeq{0}}\right\}}{\Bbb{E}}{[}{\text{dist}}{(}{\boldsymbol{x}}{;}{\cal{N}}_{\Gamma}^{{T}{-}{1}}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}{)}{]}. \tag{17} \]

This formulation is akin to training a T-layer deep neural network

${\cal{N}}_{\Gamma}^{{T}{-}{1}}\circ{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}$

with a task-specific architecture whose computation dynamics at each layer mimic the behavior of one iteration of a first-order optimization algorithm, and the input to such a model-based deep scheme is given by a realization of the encoder module. As domain knowledge is included in the design of such a deep unfolded network, it is more interpretable and has fewer training parameters than its black-box counterpart.Unfolding-aided techniques for coded DOIThe prevailing usage of unfolding algorithms in phase retrieval is centered on the sparsity assumption, or lack thereof, of sparsity in the target scenario in a nonconvex setting. More recently, optical applications such as 3D object detection and 3D phase imaging have also exploited unfolded networks.UPR without priorsThe computational dynamics and network structure of a UPR architecture (see â€œUnfolded Phase Retrieval Without Priorsâ€) for this scenario are depicted in Figure S3. rom the empirical success rate, it follows that the unfolded network significantly outperforms the model-based algorithm in all scenarios.pinilla_s3-3214325Figure S3. (a) A single iteration of the unfolded phase-retrieval (UPR) method executes both linear and nonlinear operations and is thus recast into (b) a network layer. Stacking the layers together, (c) a deep unfolded network is formed. The network is subsequently trained using paired inputs and outputs by backpropagation to optimize the parameters

${\boldsymbol{G}}^{(t)}$

and

${\boldsymbol{A}}_{k}$

. The learning rate was set to 10^âˆ’4 for 100 epochs using an Adam stochastic optimizer. (d) The empirical success rate, obtained by averaging more than 100 Monte Carlo trials, is shown with respect to the measurement-to-signal-length ratio

${(m / n)}$

when the signal length is fixed and set to

${n} = {100}$

. The size of the training data set is 2,048, where each data point is an n-dimensional vector following a standard Gaussian distribution. In the model-based case, the sensing matrix

${\boldsymbol{A}}_{k}$

follows a standard Gaussian distribution with a variance equal to one.When the measurement matrix is fixed (by the physics of the system) and known a priori, learning the preconditioning matrices

${\boldsymbol{G}}^{(t)}$

significantly improves the performance of the underlying signal-recovery algorithm. This supports the proposition that a judicious design of the preconditioning matrices, when the number of layers (iterations) is fixed, indeed results in an accelerated convergence. The unfolding approach is better, even when only the measurement matrix is learned while employing a fixed-scalar step size. With both a learned

${\boldsymbol{A}}_{k}$

and

${\boldsymbol{G}}^{(t)}$

, the unfolded network is the most successful.Figure 3 demonstrates the relative error [as in (3)] as a function of iterations (layers). It follows that UPR methodology benefits from accelerated convergence, and an optimal point of the objective function is reached with as few as

${T} = {30}$

layers. This reveals that the proposed architecture may be truncated to even fewer iterations, thereby saving computations in the overall algorithm.pinilla03-3214325Figure 3. Convergence behavior of UPR [30] and model-based [40] algorithms for

${x} = {\Bbb{R}}^{n}$

,

${n} = {100}$

, and

${m} = {6}{n}$

. This experiment follows the same setup of Figure S3 in terms of step size, learning rate, data set, and sensing matrices.Unfolded Phase Retrieval Without PriorsWithout sparsity prior, the optimization problem for phase retrieval follows directly as

\[\mathop{\text{minimize}}\limits_{{\boldsymbol{z}}\,{\in}\,{\Bbb{C}}^{n}}{{\cal{L}}}\left({{\boldsymbol{z}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right)\equiv\frac{1}{2m}{\parallel{f}^{1}\left({{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}}\right){-}{f}^{1}\left({{\boldsymbol{z}}{;}{\boldsymbol{A}}_{k}}\right)\parallel}_{2}^{2} \tag{S1} \]

where

${m} = {nL}$

, L is the number of snapshots of the scene, and signal length

${n}$

is a constant parameter that controls the step size of each iteration.The iterations for finding the critical points of this nonconvex problem are derived as follows. Starting from a proper initial point

${\boldsymbol{z}}^{(0)}$

, a sequence of points

$\left\{{{\boldsymbol{z}}^{(0)},{\boldsymbol{z}}^{(1)},{\boldsymbol{z}}^{(2)},{\ldots},{\boldsymbol{z}}^{{(}{T}{)}}}\right\}$

is generated according to the following update rule:

\[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\boldsymbol{z}}^{(t)}{-}\frac{1}{m}{\boldsymbol{G}}^{{(}{t}{)}}{\nabla}_{\boldsymbol{z}}{{\cal{L}}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right){.} \tag{S2} \]

This update rule was shown to be highly effective in solving the phase-retrieval problem [40], [S1]. Therefore, it is instructive to further explore its potential in the unfolding context. The gradient of the objective function is

\[{\nabla}_{\boldsymbol{z}}{{\cal{L}}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right) = {\boldsymbol{A}}_{k}^{H}\left({{\boldsymbol{A}}_{k}{\boldsymbol{z}}^{(t)}{-}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}\odot{Ph}{(}{\boldsymbol{A}}_{k}{\boldsymbol{z}}^{{(}{t}{)}}{)}}\right) \tag{S3} \]

where

${\odot}$

is the inner product, and the function

${Ph}{(}{\boldsymbol{z}}{)}$

is applied elementwise and captures the phase of the vector argument; e.g., for real-valued signals

${Ph}{(}{\boldsymbol{z}}{)} = {sign}{(}{\boldsymbol{z}}{)}$

.Reference[S1] G. Wang, L. Zhang, G. B. Giannakis, M. AkÃ§akaya, and J. Chen, â€œSparse phase retrieval via truncated amplitude flow,â€ IEEE Trans. Signal Process., vol. 66, no. 2, pp. 479â€“491, 2017, doi: 10.1109/TSP.2017.2771733.UPR with sparsity priorIn sparse phase retrieval (see â€œUPR With Sparsity Priorâ€), decoding signal x from the measurement vector

${f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}$

is formulated as a nonconvex optimization problem (see the â€œUnderlying Principlesâ€ section). Both the objective function

${{\cal{L}}}{(}{\boldsymbol{z}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))}$

and the constraint for this sparse problem are nonconvex, making it NP-hard in its general form. Figure S4 hows the relevant UPR architecture, which significantly outperforms the standard state-of-the-art, sparse model-based algorithm under various learned parameters. When the measurement matrix is imposed by the physical system, then learning the preconditioning matrices significantly increases the recovery performance. Considering learning only the matrix

${\boldsymbol{A}}_{k}$

and employing fixed step sizes further indicates the effectiveness of learning task-specific measurement matrices tailored to the application at hand.pinilla_s4-3214325Figure S4. (a) The unfolded network is trained using paired inputs and outputs by backpropagation to optimize the parameters

${\boldsymbol{G}}^{(t)}$

and

${\boldsymbol{A}}_{k}$

. Here, the signal length n is a constant that controls the step size (equal to one) of each iteration. The learning rate was set to 10^âˆ’4 for 100 epochs, using an Adam stochastic optimizer. (b) The empirical success rate obtained by averaging more than 100 Monte Carlo trials versus the measurement-to-signal-length ratio

${(m / n)}$

when the signal length is set to

${n} = {100}$

and

${s} = {5}$

. The size of the training data set is 2,048, where each data point is an n-dimensional vector following a standard Gaussian distribution, whose

${(}{n}{-}{s}{)}$

entries set to zero are chosen uniformly at random. In the model-based case, the sensing matrix

${\boldsymbol{A}}_{k}$

follows a standard Gaussian distribution with a variance equal to one.The objective of learning the preconditioning matrices is to accelerate convergence. This may be validated through a per-layer analysis of the sparse UPR framework in terms of achieved relative error [as in (3)]. The model-based nature of the proposed approach facilitates analysis of the resulting interpretable deep architectures as opposed to the conventional black-box data-driven approaches. Figure 4(a) demonstrates the relative error versus the number of iterations/layers for the case of recovering a five-sparse signal

${\boldsymbol{x}}\,{\in}\,{\Bbb{R}}^{n}$

,

${n} = {m} = {300}$

. It follows that the UPR yields both a lower relative error and faster convergence than the standard model-based algorithm [40]. The success rate versus the sparsity plot in Figure 4(b) further highlights the superior performance of UPR.pinilla04-3214325Figure 4. (a) The convergence behavior of UPR [40] and model-based [S1] algorithms for

${x} = {\Bbb{R}}^{n}$

,

${n} = {m} = {300}$

. The step size, learning rate, data set, and sensing matrices are as in Figure S4. (b) The empirical success rate versus the sparsity level

${s} = {5}$

for

${x} = {\Bbb{R}}^{n}$

,

${n} = {m} = {150}$

.Unfolded object detectionFor 3D object detection [2] based on phase-only information from the CDP, a rapid estimation of the scene from the CDP is carried out before detecting the target using its optical phase through a cross-correlation analysis (template matching). For the estimation, filtered spectral initialization approximates the phase of the 3D object from the acquired CDP. This requires low-pass filtering of the leading eigenvector of matrix

${\Lambda} = {(}{1} / \mid{\cal{I}}_{0}\mid{)}{\Sigma}_{{i}\,{\in}\,{\cal{I}}_{0}}{(}{\boldsymbol{a}}_{k,i}{\boldsymbol{a}}_{k,i}^{H}{)} / {\parallel{\boldsymbol{a}}_{k,i}\parallel}_{2}^{2}$

. Here, unfolding (Figure 5) is employed to learn the low-pass filter [31], where

${\cal{I}}_{0}$

is as in the â€œUPR With Sparsity Priorâ€ section. The low-pass filter, which facilitates accurate estimation of the phase of the object, is the key aspect of initialization.pinilla05-3214325Figure 5. (a) An algorithm for 3D phaseless object detection follows a power-iteration methodology and reduces computational complexity to estimate the leading eigenvector of

${\Lambda}$

using a low-pass filter

${\boldsymbol{W}}$

. (b) The unfolded network is subsequently trained, following the setup in [31], via backpropagation using paired inputs and outputs to optimize the filter

${\boldsymbol{W}}$

. (c) Validation of the detection technique using an experimental CDP acquired in the near zone [

${\boldsymbol{A}}_{1}$

in (1)] [2]. CHF: circular harmonic filter.The filtered spectral initialization follows a power-iteration method (see Figure 5), wherein

${\boldsymbol{z}}^{{(}{t} + {1}{)}} = {{\boldsymbol{W}}}\,{\ast}\,{(}{\Lambda}{\boldsymbol{z}}^{(t)}{)}$

where

${\boldsymbol{z}}^{(0)}$

is chosen at random, W is the low-pass filter, and

${\ast}$

is the convolution operation. This product is then normalized. Iteratively applying W over the estimation of the image selects those low frequencies that sparsely represent the image in the Fourier domain. As the filtering process is a convolution, this selection is rapidly performed.In the next step, a circular harmonic filter (CHF) (in Fourier domain polar coordinates) is built based on a reference pattern of the scene [2]. This system of coordinates is preferred to make CHF invariant to rotations so that the filter detects the object, regardless of any of its rotated versions. The correlation matrix

${\hat{\boldsymbol{C}}}$

between the CHF and the Fourier transform of the approximated optical field

${\hat{\boldsymbol{Z}}}$

is then computed. The target is detected using a thresholding approach by building the decision matrix

\[{(}{\boldsymbol{R}}{)}_{u,v} = \begin{cases}\begin{array}{ll}{1,}&{\text{if }}{\left\vert{(}{\hat{\boldsymbol{C}}}{)}_{u,v}\right\vert}\,{â‰¥}\,{\varepsilon}\,{\cdot}\,{\max}{(}{\hat{\boldsymbol{C}}}{)} \\ {0,}&{\text{otherwise}}\end{array}\end{cases} \tag{18} \]

where

${\varepsilon}\,{\in}\,{(}{0},{1}{]}$

is the tolerance parameter. Figure 5 shows the unfolded detection performance using an experimental CDP in the near zone for a single snapshot. Here, the experimental data were acquired following the optical setup in [7].Unfolded Phase Retrieval With Sparsity PriorStarting from an initial guess

${\boldsymbol{z}}^{(0)}$

, the algorithm estimates the signal in an iterative manner via update equations as

\[{\boldsymbol{z}}^{{(}{t} + {1}{)}} = {\cal{P}}_{s}\left({{\boldsymbol{z}}^{(t)}{-}\frac{\alpha}{m}{\nabla}_{\boldsymbol{z}}{{\cal{L}}}_{\text{tr}}\left({{\boldsymbol{z}}^{(t)}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right)}\right){.} \tag{S4} \]

These iterations are viewed as performing projected gradient descent with a per-iteration step size

${\alpha}$

on the truncated loss function objective

${{\cal{L}}}_{\text{tr}}$

. The gradient is

\[{\nabla}_{\boldsymbol{z}}{{\cal{L}}}_{\text{tr}}{(}{\boldsymbol{z}}^{{(}{t}{)}}{;}{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{))} = \mathop{\sum}\limits_{{i}\,{\in}\,{\cal{I}}_{t}}{\left({{a}_{k,i}^{H}{\boldsymbol{z}}^{(t)}{-}\left[{{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}}\right]_{i}\frac{{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{(t)}}{\mid{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{(t)}\mid}}\right)}{a}_{k,i}, \tag{S5} \]

where

${\cal{I}}_{t} = {\{}{1}\leq{i}{\leq}{m}{:}\mid{\boldsymbol{a}}_{k,i}^{H}{\boldsymbol{z}}^{{(}{t}{)}}\mid{\geq} {\left[{f}^{1}{(}{\boldsymbol{x}}{;}{\boldsymbol{A}}_{k}{)}\right]}_{i} / {(}{1} + {\tau}{)}{\}}$

, The constant

${\tau}$

represents the truncation threshold and

${\cal{P}}_{s}{(}{{\boldsymbol{u}}}{)}{:}{\Bbb{C}}^{n}\rightarrow{\Bbb{C}}^{n}$

sets all entries of

${\boldsymbol{u}}$

to zero except for the

${s}$

entries with the largest magnitudes. This truncation addresses the noncontinuity of the gradient by eliminating the terms that could lead to unbounded descent direction [S2]. This provides strong motivation to investigate this procedure for unfolding.The vector

${\boldsymbol{z}}^{(0)}$

is initialized as

${\sqrt{{\Sigma}_{{i} = {1}}^{m}\left[{{f}^{1}{(}{x}{;}{A}_{k}{)}}\right]_{i}^{2} / {m}}}{\hat{\boldsymbol{z}}}^{(0)}$

where

${\hat{\boldsymbol{z}}}^{(0)}\,{\in}\,{\Bbb{C}}^{n}$

is created by placing zeros in

${\hat{\boldsymbol{z}}}_{\hat{\cal{S}}}^{(0)}$

where the indices are not in

$\hat{\cal{S}}$

(the indices with the s-largest values of

${\hat{\boldsymbol{z}}}^{(0)}$

). The principal eigenvector

${\hat{\boldsymbol{z}}}_{\hat{\cal{S}}}^{(0)}\,{\in}\,{\Bbb{C}}^{s}$

is determined by performing power-method iterations on the matrix

\[{\Lambda} = \frac{1}{\mid{\cal{I}}_{0}\mid}\mathop{\sum}\limits_{{i}\,{\in}\,{\cal{I}}_{0}}{\frac{{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}^{H}}{{\parallel{\boldsymbol{a}}_{k,i,\hat{\cal{S}}}\parallel}_{2}^{2}}}{.} \tag{S6} \]

The condition

${\Bbb{E}}\left[{{\Sigma}_{{\ell} = {1}}^{L}{\bar{\boldsymbol{D}}}_{\ell}{\boldsymbol{D}}_{\ell}}\right] = {r}{\boldsymbol{I}}$

imposed over the set of diffractive optical elements to guarantee uniqueness is needed for this initialization to return an accurate estimation of

${\boldsymbol{x}}$

[3]. This result connects the initialization procedure with the recovery conditions of coded diffraction pattern phase retrieval.Reference[S2] Y. Chen and E. Candes, â€œSolving random quadratic systems of equations is nearly as easy as solving linear systems,â€ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 739â€“747.Unfolded phase imagingPhase imaging is of high interest in autonomous vehicles, the detection of moving objects, and precision agriculture. Here the 3D shape of an object is recovered by solving the nonconvex phase retrieval in a setup that records CDPs. Usually, the acquisition of several snapshots from the scene is required. Recently, a single-shot technique was proposed in [17]; it accurately estimates the optical phase

${\varphi}{(}{x},{y}{)}$

across the spatial domain (x, y) of the object following the pipeline of Figure 5. The estimated phase is then used to infer the 3D shape by performing an unwrapping over

${\varphi}{(}{x},{y}{)}$

to obtain

${\varphi}_{\text{unwrapp}}{(}{x},{y}{)} = {\varphi}{(}{x},{y}{)} + {2}{k}{\pi}$

where k is an integer that represents the projection period [17]. The depth coordinate is based on the difference between the measured phase

${\varphi}_{\text{unwrapp}}$

and the phase value from a reference plane. This reference plane is obtained during calibration of the acquisition system [17]. Thus, the depth coordinate of the 3D object is

\begin{align*}{\text{depth}}{(}{x},{y}{)} = & {c}_{0} + {c}_{1}{(}{\varphi}_{\text{unwrapp}}{(}{x},{y}{)} \\ & {-}{\varphi}_{0}{(}{x},{y}{)}{)} \tag{19} \end{align*}

where

${\varphi}_{0}{(}{x},{y}{)}$

is the reference phase, and

${\left\{{{c}_{i}}\right\}}_{{i} = {0}}^{1}$

are tunable constants.Enabling DOE experimental setupsThe advantage of the DOE lies in its imaging capability to successfully recover the phase without additional optical elements (such as lenses), leading to even more compact imaging devices [6]. The eschewing of the lens, which is the case in phase retrieval from CDPs, makes the system not only light and cost-effective but also lens-aberration free and with a larger field of view. Here we summarize a few recent developments in experimental setups that employ DOEs and are of interest to enable the use of unfolding-aided phase-retrieval techniques.Lensless superresolution microscopyIn case of lensless superresolution microscopy for middle-zone CDP observations (Figure 1 and Table 1), the propagation operator

${\cal{P}}_{z}{(}{\boldsymbol{w}}{)}$

at a distance z is precisely modeled through convolution of a wavefront w with propagation kernel

${\boldsymbol{K}}_{z}$

as

${\cal{P}}_{z}{(}{\boldsymbol{w}}{)} = {\boldsymbol{F}}^{{-}{1}}\left\{{{\boldsymbol{K}}_{z}\,{\cdot}\,{\boldsymbol{F}}\left\{{\boldsymbol{w}}\right\}}\right\}$

where

\[{\boldsymbol{K}}_{z} = \begin{cases}\begin{array}{ll}{{e}^{i\frac{{2}{\pi}}{\lambda}z\sqrt{{1}{-}{\lambda}^{2}\left({{f}_{x}^{2} + {f}_{y}^{2}}\right)}},}&{{f}_{x}^{2} + {f}_{y}^{2}\leq\frac{1}{{\lambda}^{2}}}\\{0,}&{\text{otherwise.}}\end{array}\end{cases}, \tag{20} \]

Considering the spatial distribution of the DOE, the observation model [6] for the scene x and the DOE D is

${\boldsymbol{y}} = \left|{{\cal{P}}_{{z}_{2}}\left({{\boldsymbol{D}}\,{\cdot}\,{\cal{P}}_{{z}_{1}}\left({\boldsymbol{x}}\right)}\right)}\right|^{2}$

where z₁ and z₂ are object-mask and mask-sensor distances, respectively (see Table 1). In a phase retrieval with a DOE setting, the object, mask, and sensor planes may have different sampling intervals,

${\Delta}_{o}$

,

${\Delta}_{m}$

, and

${\Delta}_{s}$

, respectively. If the resulting pattern y has computational pixels smaller than the physical sensor

$\left({{\Delta}_{c}\,{<}\,{\Delta}_{s}}\right)$

and the reconstruction of the object is produced with

${\Delta}_{c}$

, it leads to a subpixel resolution or superresolution scenario. The ratio

${r}_{s} = {\Delta}_{s} / {\Delta}_{c}\,{â‰¥}\,{1}$

is the superresolution factor.With the objective of manufacturing the DOE, the matrix D is modeled as

${D}_{s,u} = \exp{(}{j}{\phi}_{s,u}{)}$

for

${u},{s} = {1},{\ldots},{n}$

where

${\phi}_{s,u}$

takes value from a uniform random variable d with two possible equiprobable coding elements zero and 2.62rad i.e,

${d} = \left\{{\exp{(}{j}{0}{),}\exp{(}{2}{.}{62}{j}{r}{a}{d}{)}}\right\}$

. These values of d are chosen because they lead to the best performance in terms of reconstruction quality [6] (recall

${\Bbb{E}}{[}{d}{]} {\ne} ={0}{)}$

. Figure 1 shows the resulting DOE (manufactured in Tampereâ€™s Photonics Laboratory) with the pixel size of

${\Delta}_{m} = {1}{.}{73}{\mu}{m}$

. This size is half of the sensor, and hence a better fit for the superresolution factor ${r}_{s}$ in powers of two. The compact microscope in Figure 1 enables a full wavefront reconstruction of dynamic scenes. Contrary to existing phase-retrieval setups, this unique system is capable of computational, superresolved video reconstruction via phase retrieval, with a high frame rate limited only by the camera.To reconstruct the signal of interest as reported in the bottom of Figure 1(b), [6] studied the impact of a special multiplicative calibration term, which allows for building a good propagation model to compensate for the errors of theoretical models. This corrective term was shown to improve the imaging quality and resolution. However, it is constant across iterations. Therefore, it may benefit from a reconditioning unfolded method, wherein the correction is varied on a per-iteration basis.Hyperspectral complex-domain imagingHyperspectral complex-domain imaging is a comparatively new development that deals with the phase delay of coherent light in transparent or reflective objects [49]. The hyperspectral broadband-phase imaging is more informative than the monochromatic technique (see the near-zone setup and results in Figure 6). Conventionally, for the processing of hyperspectral images, 2D spectral narrow-band images are stacked together and represented as 3D cubes with two spatial coordinates, (x, y), and a third longitudinal spectral coordinate. In hyperspectral phase imaging, for a given wavelength ${\lambda}$, the data

${\boldsymbol{x}}_{\lambda}\,{\in}\,{\Bbb{C}}^{n}$

in these 3D cubes are complex valued with spatially and spectrally varying amplitudes and phases. Denote the set of wavelengths under study by

${\cal{L}}$

. From (1), the measurement matrix for diffraction zone k and wavelength ${\lambda}$ is

${\boldsymbol{A}}_{{k},{\lambda}}$

. Then, the noisy phaseless measurements are [49]

\[{\boldsymbol{y}}_{k} = \mathop{\sum}\limits_{{\lambda}\,{\in}\,{{\cal{L}}}}{\mid{\boldsymbol{A}}_{{k},{\lambda}}{\boldsymbol{x}}_{\lambda}\mid^{2}} + {\epsilon}{.} \tag{21} \]

pinilla06-3214325Figure 6. (a) A hyperspectral optical setup. â€œLaserâ€ is a broadband coherent light source, â€œDOEâ€ is modeled as a spatial light modulator, and â€œSensorâ€ is a registration camera. (b) The reconstructed phase and amplitude from CDPs in the near zone [

${\boldsymbol{A}}_{1}$

in (1)] for the

${\lambda} = {640}{-} {nm}$

wavelength in the presence of Gaussian (top right) and Poisson noise (bottom right) with a signal-to-noise ratio (SNR) equal to 34 dB in each case. The images are shown in square frames to emphasize that the size and location of the objectâ€™s support are assumed unknown and reconstructed automatically. Support of the true image is used only for computation of observations produced for the zero-padded object. For the amplitudes, these frames have nearly zero values, while the phase estimates take random values in the frame area as the phase cannot be defined when amplitudes vanish.The reconstruction of

${\boldsymbol{x}}_{\lambda}$

here is more ill-conditioned than the classical phase-retrieval problem because the acquired data are summed across wavelengths of phaseless measurements. This sum does not appear in the classical phase retrieval, which employs a single wavelength. This also makes phase image processing more complex than the hyperspectral intensity imaging, where the corresponding 3D cubes are real valued.Model-based algorithms require significant amounts of measurements to efficiently recover the complex-valued

${\boldsymbol{x}}_{\lambda}$

[49]. As suggested by the aforementioned numerical results, the preconditioning unfolding is useful in the hyperspectral, complex-domain phase-retrieval problem because it facilitates reduction of the condition number, admitting fewer data samples and converging quickly.Blind deconvolution superresolution phase retrievalIn certain CDP applications, the combination of blind deconvolution, superresolution, and phase retrieval naturally manifests. Although this is a severely ill-posed problem, it has been shown [52] that image of interest could be estimated in polynomial time. The approach relies on previous results, which established the DOE design to achieve high-quality images [5] and partially analyzed the combined problem by solving the superresolution phase-retrieval problem [7] from a CDP (Figure 7). These designs are obtained by exploiting the model of the physical setup using ML methods, where the DOE is modeled as a layer of a neural network (a data-driven model or unrolled) that is trained to act as an estimator of the true image [33]. This data-driven design has shown outstanding imaging quality, using a single snapshot as well as robustness against noise.pinilla07-3214325Figure 7. (a) A blind deconvolution superresolution phase-retrieval optical setup comprising a

${\lambda} / {4}$

retardation plate, polarizer

$(P)$

, lenses

$({L}_{1}$

and

${L}_{2})$

, beamsplitter (BS), a DOE physically implemented using a spatial light modulator (SLM), sensor, and registration camera. (b) The recovered amplitude and phase of the object â€œTUTâ€ (the initials of Tampere University of Technology) from CDPs at the near zone [

${\boldsymbol{A}}_{1}$

in (1)] with a pixelwise resolution factor of four [7]. The SLM was configured to work as a phase-only modulator because the device is not very accurate in modulating both magnitudes and phases [48].Summary and future outlookThe plethora of phase-retrieval algorithms and their relative advantages make it difficult to conclude which of them will always be best. The only foregone conclusion has been that the computational imaging community needs to continue basing these algorithms on models devoid of several practical measurement errors. In this context, deep unfolding methods are gaining salience in optical phase-retrieval problems to offer interpretability, state-of-the-art performance, and high computational efficiency. By learning priors from the trained data, these empirically designed architectures have been exploited to build globally converged learning networks for a specific phase-retrieval setting. A particularly desired outcome of this development is improvement in the design of DOEs that satisfies theoretical conditions for the uniqueness of signal recovery.We provided examples of several recent novel setups, which indicate that unfolding has the potential to enable a new generation of optical devices as a consequence of co-designing optics and recovery algorithms. Several applications of lensless imaging, synthetic apertures, and highly ill-posed variants of phase retrieval appear within reach in the near future using this technique, including in areas such as optical metrology, digital holography, and medicine.AuthorsSamuel Pinilla (samuel.pinilla@stfc.ac.uk) received his Ph.D. degree in electrical engineering. He is with the University of Manchester at Harwell Science and Innovation campus, Oxfordshire, Oxon OX11 0FA, U.K. He is a Member of IEEE.Kumar Vijay Mishra (kvm@ieee.org) received his Ph.D. in electrical and computer engineering, He is with the United States DEVCOM Army Research Laboratory, Adelphi, MD 20783 USA. He is a Senior Fellow of IEEE.Igor Shevkunov (igor.shevkunov@tuni.fi) received his Ph.D. degree in optics. He is with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland.Mojtaba Soltanalian (msol@uic.edu) received his Ph.D. degree in electrical engineering. He is with the University of Illinois Chicago, Chicago, 60607 USA. He is a Senior Member of IEEE.Vladimir Katkovnik (vladimir.katkovnik@tuni.fi) received his Ph.D. degree in technical cybernetics. He is with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland and Noiseless Imaging Ltd., Tampere, 33720, Finland.Karen Egiazarian (karen.eguiazarian@tuni.fi) received his Ph.D. degree in physics and mathematics. Faculty of Information Technology and Communication Sciences, Tampere University, 33720, Finland and Noiseless Imaging Ltd., Tampere, 33720, Finland. He is a Fellow of IEEE.References[1] S. Pinilla, J. Bacca, and H. Arguello, â€œPhase retrieval algorithm via nonconvex minimization using a smoothing function,â€ IEEE Trans. Signal Process., vol. 66, no. 17, pp. 4574â€“4584, Sep. 2018, doi: 10.1109/TSP.2018.2855667.[2] A. Jerez, S. Pinilla, and H. Arguello, â€œFast target detection via template matching in compressive phase retrieval,â€ IEEE Trans. Comput. Imag., vol. 6, pp. 934â€“944, May 2020, doi: 10.1109/TCI.2020.2995999.[3] A. Guerrero, S. Pinilla, and H. Arguello, â€œPhase recovery guarantees from designed coded diffraction patterns in optical imaging,â€ IEEE Trans. Image Process., vol. 29, pp. 5687â€“5697, Apr. 2020, doi: 10.1109/TIP.2020.2985208.[4] S. Pinilla, H. GarcÃa, L. DÃaz, J. Poveda, and H. Arguello, â€œCoded aperture design for solving the phase retrieval problem in X-ray crystallography,â€ J. Comput. Appl. Math., vol. 338, pp. 111â€“128, Aug. 2018, doi: 10.1016/j.cam.2018.02.002.[5] J. Bacca, S. Pinilla, and H. Arguello, â€œCoded aperture design for super-resolution phase retrieval,â€ in Proc. Eur. Signal Process. Conf., 2019, pp. 1â€“5, doi: 10.23919/EUSIPCO.2019.8903020.[6] P. Kocsis, I. Shevkunov, V. Katkovnik, H. Rekola, and K. Egiazarian, â€œSingle-shot pixel super-resolution phase imaging by wavefront separation approach,â€ Opt. Exp., vol. 29, no. 26, pp. 43,662â€“43,678, 2021, doi: 10.1364/OE.445218.[7] V. Katkovnik, I. Shevkunov, N. V. Petrov, and K. Egiazarian, â€œComputational super-resolution phase retrieval from multiple phase-coded diffraction patterns: Simulation study and experiments,â€ Optica, vol. 4, no. 7, pp. 786â€“794, 2017, doi: 10.1364/OPTICA.4.000786.[8] E. J. CandÃ©s, X. Li, and M. Soltanolkotabi, â€œPhase retrieval from coded diffraction patterns,â€ Appl. Comput. Harmon. Anal., vol. 39, no. 2, pp. 277â€“299, 2015, doi: 10.1016/j.acha.2014.09.004.[9] D. Gross, F. Krahmer, and R. Kueng, â€œImproved recovery guarantees for phase retrieval from coded diffraction patterns,â€ Appl. Comput. Harmon. Anal., vol. 42, no. 1, pp. 37â€“64, 2017, doi: 10.1016/j.acha.2015.05.004.[10] M. Hayes, â€œThe reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform,â€ IEEE Trans. Acoust., Speech, Signal Process., vol. 30, no. 2, pp. 140â€“154, Apr. 1982, doi: 10.1109/TASSP.1982.1163863.[11] T.-C. Poon and J.-P. Liu, Introduction to Modern Digital Holography: With MATLAB. Cambridge, U.K.: Cambridge Univ. Press, 2014.[12] U. DÃ¼rig, D. W. Pohl, and F. Rohner, â€œNear-field optical-scanning microscopy,â€ J. Appl. Phys., vol. 59, no. 10, pp. 3318â€“3327, 1986, doi: 10.1063/1.336848.[13] C. Jahncke, M. Paesler, and H. Hallen, â€œRaman imaging with near-field scanning optical microscopy,â€ Appl. Phys. Lett., vol. 67, no. 17, pp. 2483â€“2485, 1995, doi: 10.1063/1.114615.[14] H. Hess, E. Betzig, T. Harris, L. Pfeiffer, and K. West, â€œNear-field spectroscopy of the quantum constituents of a luminescent system,â€ Science, vol. 264, no. 5166, pp. 1740â€“1745, 1994, doi: 10.1126/science.264.5166.1740.[15] T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, â€œLensless light-field imaging with Fresnel zone aperture: Quasi-coherent coding,â€ Appl. Opt., vol. 57, no. 11, pp. 2841â€“2850, 2018, doi: 10.1364/AO.57.002841.[16] M. Sao, Y. Nakamura, K. Tajima, and T. Shimano, â€œLensless close-up imaging with Fresnel zone aperture,â€ Japanese J. Appl. Phys., vol. 57, no. 9S1, p. 09SB05, 2018, doi: 10.7567/JJAP.57.09SB05.[17] S. Pinilla, L. Galvis, K. Egiazarian, and H. Arguello, â€œSingle-shot coded diffraction system for 3D object shape estimation,â€ Electron. Imag., vol. 2020, no. 14, pp. 59â€“1, 2020, doi: 10.2352/ISSN.2470-1173.2020.14.COIMG-059.[18] H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, â€œShift-variant color-coded diffractive spectral imaging system,â€ Optica, vol. 8, no. 11, pp. 1424â€“1434, 2021, doi: 10.1364/OPTICA.439142.[19] K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny, and L. Waller, â€œLearned reconstructions for practical mask-based lensless imaging,â€ Opt. Exp., vol. 27, no. 20, pp. 28,075â€“28,090, 2019, doi: 10.1364/OE.27.028075.[20] V. Karitans, E. Nitiss, A. Tokmakovs, M. Ozolinsh, and S. Logina, â€œOptical phase retrieval using four rotated versions of a single binary amplitude modulating mask,â€ J. Astronomical Telescopes, Instrum., syst., vol. 5, no. 3, p. 039004, 2019, doi: 10.1117/1.JATIS.5.3.039004.[21] Z. Tong, X. Ren, Q. Ye, D. Xiao, J. Huang, and G. Meng, â€œQuantitative reconstruction of the complex-valued object based on complementary phase modulations,â€ Ultramicroscopy, vol. 228, Sep. 2021, Art. no. 113343, doi: 10.1016/j.ultramic.2021.113343.[22] D. Morales, A. Jerez, and H. Arguello, â€œObject classification using deep neural networks from coded diffraction patterns,â€ in Proc. Symp. Image, Signal Process. Artif. Vis., 2021, pp. 1â€“5, doi: 10.1109/STSIVA53688.2021.9591996.[23] C.-J. Wang, C.-K. Wen, S.-H. L. Tsai, and S. Jin, â€œPhase retrieval with learning unfolded expectation consistent signal recovery algorithm,â€ IEEE Signal Process. Lett., vol. 27, pp. 780â€“784, Apr. 2020, doi: 10.1109/LSP.2020.2990767.[24] C.-J. Wang, C.-K. Wen, S.-H. Tsai, S. Jin, and G. Y. Li, â€œPhase retrieval using expectation consistent signal recovery algorithm based on hypernetwork,â€ IEEE Trans. Signal Process., vol. 69, pp. 5770â€“5783, Oct. 2021, doi: 10.1109/TSP.2021.3118494.[25] N. Naimipour, S. Khobahi, and M. Soltanalian, â€œUPR: A model-driven architecture for deep phase retrieval,â€ in Proc. Asilomar Conf. Signals, Syst., Comput., 2020, pp. 205â€“209, doi: 10.1109/IEEECONF51394.2020.9443438.[26] K. Gregor and Y. LeCun, â€œLearning fast approximations of sparse coding,â€ in Proc. 27th Int. Conf. Mach. Learn., 2010, pp. 399â€“406, doi: 10.5555/3104322.3104374.[27] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, â€œDeep learning techniques for inverse problems in imaging,â€ IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 39â€“56, May 2020, doi: 10.1109/JSAIT.2020.2991563.[28] R. Liu, S. Cheng, L. Ma, X. Fan, and Z. Luo, â€œDeep proximal unrolling: Algorithmic framework, convergence analysis and applications,â€ IEEE Trans. Image Process., vol. 28, no. 10, pp. 5013â€“5026, Oct. 2019, doi: 10.1109/TIP.2019.2913536.[29] W.-L. Hwang and A. Heinecke, â€œUn-rectifying non-linear networks for signal representation,â€ IEEE Trans. Signal Process., vol. 68, pp. 196â€“210, 2020, doi: 10.1109/TSP.2019.2957607.[30] N. Naimipour, S. Khobahi, and M. Soltanalian, â€œUnfolded algorithms for deep phase retrieval,â€ 2020, arXiv:2012.11102.[31] D. Morales, A. Jerez, and H. Arguello, â€œLearning spectral initialization for phase retrieval via deep neural networks,â€ Appl. Opt., vol. 61, no. 9, pp. F25â€“F33, 2022, doi: 10.1364/AO.445085.[32] J. EstupiÃ±Ã¡n, A. Jerez, J. Bacca, and H. Arguello, â€œDeep unrolled phase retrieval approach from coded diffraction patterns,â€ in Proc. Symp. Image, Signal Process. Artif. Vis., 2021, pp. 1â€“4, doi: 10.1109/STSIVA53688.2021.9591671.[33] S. R. M. Rostami, S. Pinilla, I. Shevkunov, V. Katkovnik, and K. Egiazarian, â€œPower-balanced hybrid optics boosted design for achromatic extended depth-of-field imaging via optimized mixed OTF,â€ Appl. Opt., vol. 60, no. 30, pp. 9365â€“9378, 2021, doi: 10.1364/AO.434852.[34] S. Pinilla, J. Poveda, and H. Arguello, â€œCoded diffraction system in X-ray crystallography using a Boolean phase coded aperture approximation,â€ Opt. Commun., vol. 410, pp. 707â€“716, Mar. 2018, doi: 10.1016/j.optcom.2017.11.035.[35] C. V. Correa, H. Arguello, and G. R. Arce, â€œSpatiotemporal blue noise coded aperture design for multi-shot compressive spectral imaging,â€ J. Opt. Soc. Amer. A, vol. 33, no. 12, pp. 2312â€“2322, 2016, doi: 10.1364/JOSAA.33.002312.[36] C. Fienup and J. Dainty, â€œPhase retrieval and image reconstruction for astronomy,â€ in Image Recovery: Theory Application, H. Stark, Ed. New York, NY, USA: Elsevier, 1987, pp. 231â€“275.[37] S. Pinilla, S. R. M. Rostami, I. Shevkunov, V. Katkovnik, and K. Eguiazarian, â€œHybrid diffractive optics design via hardware-in-the-loop methodology for achromatic extended-depth-of-field imaging,â€ 2022, arXiv:2203.16407.[38] X. Dun, H. G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, â€œLearned rotationally symmetric diffractive achromat for full-spectrum computational imaging,â€ Optica, vol. 7, no. 8, pp. 913â€“922, 2020, doi: 10.1364/OPTICA.394413.[39] S. R. M. Rostami, S. Pinilla, I. Shevkunov, V. Katkovnik, and K. Egiazarian, â€œOn design of hybrid diffractive optics for achromatic extended depth-of-field (EDoF) RGB imaging,â€ 2022, arXiv:2203.16985.[40] H. Zhang and Y. Liang, â€œReshaped Wirtinger flow for solving quadratic system of equations,â€ in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 2622â€“2630.[41] J. Nocedal and S. J. Wright, Numerical Optimization. New York, NY, USA: Springer-Verlag, 1999.[42] J. Jia and K. Rohe, â€œPreconditioning the lasso for sign consistency,â€ Electron. J. Statist., vol. 9, no. 1, pp. 1150â€“1172, 2015, doi: 10.1214/15-EJS1029.[43] F. L. Wauthier, N. Jojic, and M. I. Jordan, â€œA comparative framework for preconditioned lasso algorithms,â€ in Proc. Adv. Neural Inf. Process. Syst., 2013, vol. 26.[44] J. A. Kelner, F. Koehler, R. Meka, and D. Rohatgi, â€œOn the power of preconditioning in sparse linear regression,â€ in Proc. 2022 IEEE 62nd Annu. Symp. Found. Comput. Sci. (FOCS), pp. 550â€“561, doi: 10.1109/FOCS52979.2021.00061.[45] E. Mojica, C. V. Correa, and H. Arguello, â€œHigh-resolution coded aperture optimization for super-resolved compressive x-ray cone-beam computed tomography,â€ Appl. Opt., vol. 60, no. 4, pp. 959â€“970, 2021, doi: 10.1364/AO.413695.[46] Y. Mejia and H. Arguello, â€œBinary codification design for compressive imaging by uniform sensing,â€ IEEE Trans. Image Process., vol. 27, no. 12, pp. 5775â€“5786, Dec. 2018, doi: 10.1109/TIP.2018.2857445.[47] X. Chen, J. Liu, Z. Wang, and W. Yin, â€œTheoretical linear convergence of unfolded ISTA and its practical weights and thresholds,â€ in Proc. Adv. Neural Inf. Process. Syst., 2018, vol. 31.[48] â€œPLUTO-2.1 LCOS spatial light modulator phase only (Reflective).â€ HOLOEYE Pioneers in Photonic Technology. Accessed: Jun. 4, 2022. [Online] . Available: https://holoeye.com/slm-pluto-phase-only/[49] V. Katkovnik, I. Shevkunov, and K. Egiazarian, â€œADMM and spectral proximity operators in hyperspectral broadband phase retrieval for quantitative phase imaging,â€ 2021, arXiv:2105.07891.[50] S. Pinilla, K. V. Mishra, and B. M. Sadler, â€œNon-convex recovery from phaseless low-resolution blind deconvolution measurements using noisy masked patterns,â€ in Proc. 55th Asilomar Conf. Signals, Syst., Comput., 2021, pp. 876â€“880, doi: 10.1109/IEEECONF53345.2021.9723219.Digital Object Identifier 10.1109/MSP.2022.3214325CoverSubmit your paper todayMastheadOpen and Reproducible Science: Desirable or Even Mandatory, But Not So Simple!Reaching Out to Members in the Middle East and IndiaBoston Chapter Receives the 2022 Chapter of the Year Award!2022 IEEE Signal Processing Society AwardsPhysics-Driven Machine Learning for Computational Imaging: Part 2ICIP 2023ICIP 2023 2Physics-Embedded Machine Learning for Electromagnetic Data ImagingPhysics-Guided Terahertz Computational ImagingUnfolding-Aided Bootstrapped Phase Retrieval in Optical ImagingIntegration of Physics-Based and Data-Driven Models for Hyperspectral Image UnmixingDeep Optical Coding Design in Computational ImagingPhysics-/Model-Based and Data-Driven Methods for Low-Dose Computed TomographyHigh-Dimensional MR Spatiospectral Imaging by Integrating Physics-Based ModelingPhysics-Driven Deep Learning Methods for Fast Quantitative Magnetic Resonance ImagingPhysics-Driven Synthetic Data Learning for Biomedical Magnetic ResonanceA Guide to Computational Reproducibility in Signal Processing and Machine LearningUpdate on the CIC Multistage Decimation FilterUnderstanding the Basis of Graph Convolutional Neural NetworksExplainable Machine LearningDates AheadMathWorksArchives