Physics-Driven Deep Learning Methods for Fast Quantitative Magnetic Resonance Imaging

Physics-Driven Deep Learning Methods for Fast Quantitative Magnetic Resonance Imaging: Performance improvements through integration with deep neural networksYanjie Zhu, Jing Cheng, Zhuo-Xu Cui, Qingyong Zhu, Leslie Ying, Dong Liang40msp02-liang-opener-3236483Â©SHUTTERSTOCK.COM/PAPAPIGQuantitative magnetic resonance imaging (qMRI) aims to obtain quantitative biophysical parameters based on physical models derived from MR spin magnetization evolution. This requires the acquisition of multiple MR images, resulting in very long scan times. Recently, deep learning (DL) has shown significant potential for reconstructing undersampled MR data. When applying DL to fast qMRI, physical priors can be integrated into the DL framework to improve the performance of qMRI further. This article introduces the physical models of qMRI and four ways to integrate these models into a deep neural network (NN), namely, training sample generation, contrast image prediction, loss function design, and network architecture design. The core concepts and methods for each category are presented. The associated signal processing issues regarding the generalization and reliability of physics-driven DL-based fast qMRI methods are also discussed.IntroductionqMRI is a promising technique that aims to provide quantitative measurements of tissue in physical units rather than only visualizing anatomy, as in conventional contrast-weighed MRI. These biophysical parameters are influenced by the interactions of the nuclear spin within tissues and reflect biological tissue microstructures and components. Therefore, qMRI has the potential to detect subtle microscopic damage, thereby facilitating the early diagnosis of various diseases. Ideally, these biophysical parameters are directly comparable across different acquisition time points, imaging methods, and sites, providing a promising method for monitoring clinical prognoses.Typically, qMRI first acquires multiple contrast-weighted images and then obtains quantitative parameter maps by using an explicit or implicit model [1] derived from the physics of MRI signal dynamics to characterize the intrinsic correlations among multicontrast images. However, the long scan times associated with multiple acquisitions are a major drawback of almost all qMRI methods. Because the total scan time is equal to the scan time for a single contrast-weighted image multiplied by the number of images, there are two main strategies for accelerating image acquisition. One is to estimate parameter maps from a reduced number of images and even from a single image [2], [3]. The other is to undersample the measured k-space data, similar to fast MRI, and then restore images based on signal processing theory, such as compressed sensing [4], [5], [6], where physical models can be utilized as priors to better recover the information lost through undersampling [7], [8].Recently, DL has shown significant potential for removing aliasing artifacts caused by undersampling in MRI [9], [10], [11], [12]. In the early stages of DL-based fast qMRI, prior information from huge numbers of historical images was exploited using deep NNs to improve the accuracy of image/parameter map recovery, but prior information from physical models was seldom considered. More recently, this type of prior has attracted significant attention for further improving the performance of fast qMRI. Generally, when applying DL to fast qMRI, physical models can be integrated through four ways of training sample generation, contrast-weighted image prediction, loss function design, and network architecture design. Specifically, the first approach [13], [14] is to generate a training dataset through numerical simulations based on a physical model, which avoids the challenging training data collection step of MRI. Additionally, DL can accelerate the time-consuming parameter fitting step for MR fingerprinting (MRF). The second one [15], [16], [17] uses a physical model to guide a network to predict missing and optimized contrast-weighted images from fewer acquired images. Completed and optimized images are then used to obtain quantitative maps with higher accuracy. The third one [18], [19] integrates physical models into the loss function as additional terms on top of the loss between the estimated parameter maps and labels. This added loss measures the errors between the labeled contrast images in the k-space and synthetic data generated based on physical models. The last approach [3], [20], [21] incorporates a physical model as a network layer to enforce model consistency in the image and parameter domains. Similar to unrolling-based DL reconstruction [22], [23], a deep NN is designed based on updating rules, where the physical model is incorporated into the forward model and can be naturally integrated by the network architecture.This article aims to provide an overview of physics-driven DL-based fast qMRI methods, where â€œphysics-drivenâ€ specifically refers to physics related to biophysical parameters. In contrast to MR imaging physics, which has been widely employed in DL-based fast anatomical imaging, physical models of biophysical parameters are related to tissue characterization and typically used to quantify these parameters. In the â€œPhysics and Reconstruction of qMRIâ€ section, we briefly introduce the physical models and general reconstruction formulas for MRI and qMRI. The four categories summarized are then presented in detail in the â€œTraining Sample Generation Using Physical Models,â€ â€œPhysical Model-Based Synthetic Images via DL,â€ â€œPhysical Model-Integrated Loss Function Design,â€ â€œPhysical Model Consistency Network Design,â€ and â€œRelated Signal Processing Issues and Conclusionsâ€ sections. Finally, related signal processing issues are discussed.Physics and reconstruction of qMRIMR physics and physical modelsCommonly used MR signals originate from the hydrogen nuclei containing a single proton. There are many protons in a single-volume element, or voxel, in MRI. The vector sum of these signals is the bulk magnetization; i.e.,

${\bf{M}} = {M}_{x}{\vec{\bf{i}}} + {M}_{y}{\vec{\bf{j}}} + {M}_{z}{\vec{\bf{k}}}$

, where

${\vec{\bf{i}}}$

,

${\vec{\bf{j}}}$

, and

${\vec{\bf{k}}}$

are unit vectors in the three spatial directions. When put in a static main magnetic field

${B}_{0}$

, the dynamics of magnetization with respect to time

${\tau}$

can be described using the following well-known Bloch equation:

\begin{align*} \frac{{{\bf{d}}}{\bf{M}}{{\left({\tau}\right)}}}{{{\bf{d}}}{\tau}} = & {\bf{M}}{{\left({\tau}\right)}}\times{\gamma}{\bf{B}}{{\left({\tau}\right)}}{-}\frac{{M}_{x}{{\left({\tau}\right)}}{\vec{\bf{i}}} + {M}_{y}{\left({\tau}\right)}{\vec{\bf{j}}}}{{T}_{2}} \\ &{-}\frac{\left({{M}_{z}{\left({\tau}\right)}{-}{M}_{0}}\right){\vec{\bf{k}}}}{{T}_{1}} \tag{1} \end{align*}

where

${\gamma}$

is the gyromagnetic ratio;

${M}_{0}$

is the equilibrium magnetization;

${\bf{B}}{{\left({\tau}\right)}}$

is the spatially and time-varying total magnetic field generated by

${B}_{0}$

inhomogeneity, radio frequency, gradients, and so on; and

${T}_{1}$

and

${T}_{2}$

are biophysical parameters representing the longitudinal and transverse relaxation times, respectively. The basic Bloch equation can be extended by adding additional terms to characterize the interactions between nuclei further. Such extensions include the Blochâ€“Torrey equation for diffusion as well as the Blochâ€“McConnell equation for chemical exchange and magnetization transfer. These terms involve additional biophysical parameters, such as diffusivity.The relationships between magnetization and biophysical parameters can be derived through simulations using the Bloch equation and its extensions, with the acquisition methods (i.e., MR sequences) collectively called Bloch simulations. Throughout this article, for simplicity, we use the following unified formula to describe these relationships:

\[{\rho}_{{t}_{i}} = {\bf{A}}{\left({x},{t}_{i}\right)} \tag{2} \]

where

${\bf{A}}$

represents the physical model of the biophysical parameter

${x}$

and

${\rho}_{{t}_{i}}$

is the magnetization signal measured with the sequence parameter

${t}_{i}$

. A huge family of physical models have been derived through Bloch simulations based on different sequences, physical principles, and so on. An MR sequence can be manipulated to â€œweightâ€ the magnetization by changing a certain sequence parameter

${t}_{i}$

to estimate the parameter

${x}$

. By acquiring a group of magnetization signals with different weightings, biophysical parameters can be estimated using corresponding models and fitting methods.MR physical models can be roughly divided into two categories. The first consists of analytical models, which include analytical solutions to Bloch simulations under simplified conditions. Additionally, the susceptibility model derived from Maxwellâ€™s equations is often used to estimate tissue susceptibility. We include this model in this category. To provide a better understanding, we list some examples of commonly used analytical models as well as the related sequences and varying parameters in Table 1. Note that

${\rho}_{{t}_{i}}$

in Table 1 denotes the phase variance in the susceptibility model but represents the magnitude signal in the other models. More complex models will be introduced for later applications when needed.Table 1. The commonly used physical models and their dependencies on MRI sequences.liang_t1-3236483The second category of MR physical models is signal evolution during data acquisition. By randomly changing any sequence parameter, the transient signal evolutions of hundreds and even thousands of time points can be recorded and serve as â€œmarkersâ€ of multiple biophysical parameters. These parameters are then determined by matching a marker to a dictionary calculated using Bloch simulation with predefined ranges of biophysical parameters and sequence parameters. This approach includes MRF [24], quantitative imaging using configuration states [25], MR spin tomography in the time domain [26], and so on. This approach is becoming a generic framework that can be used to quantify multiple biophysical parameters simultaneously. Taking MRF as an example, it was proposed to quantify

${T}_{1}$

and

${T}_{2}$

relaxations as well as proton density and then gradually expanded to involve more, e.g., diffusion and chemical exchange saturation transfer (CEST), parameters. We use parameter MRF in this article to indicate the desired parameters of a specific MRF method. Although this approach provides more degrees of freedom for characterizing biophysical parameters, the dictionary size increases tremendously if all possible factors are involved, such as

${B}_{0}$

and

${B}_{1}$

field inhomogeneities. Therefore, dictionary matching is time-consuming and requires extensive computational resources.MR imaging modelqMRI aims to obtain the distribution of biophysical parameters in an anatomical region. As introduced previously, it is necessary to acquire a set of images with different weightings (i.e., contrast-weighted images) by using sequence parameters designed according to physical models. A biophysical parameter map can then be estimated through pixel-wise model fitting based on the contrast-weighted images.To form an MR image, the magnetization of the imaging region should be spatially encoded during data acquisition. The most commonly used method is Fourier encoding. For an image with a sequence parameter

${t}_{i}$

, the forward generic model can be derived as

\[{\bf{b}}_{{t}_{i}}^{\ell}{\left({\bf{k}}\right)} = {\int}{{c}^{\ell}{\left({\bf{r}}\right)}\,{\cdot}\,{\rho}_{{t}_{i}}{\left({\bf{r}}\right)}\,{\cdot}\,{e}^{{-}{j}{2}{\pi}{\bf{k}}^{\text{T}}{\bf{r}}}{d}{\bf{r}} + {\varepsilon}_{{t}_{i}}{\left({\bf{k}}\right)},{\ell} = {1},{\ldots},{\cal{L}}} \tag{3} \]

where

${\bf{b}}_{{t}_{i}}^{\ell}{\left({\bf{k}}\right)}$

is the measured k-space data from the

${\ell}$

th receiver coil;

${\varepsilon}_{{t}_{i}}{\left({\bf{k}}\right)}$

denotes the associated measurement noise;

${\rho}_{{t}_{i}}{\left({\bf{r}}\right)}$

is the image intensity that reflects the magnetization distribution in the field of view;

${\bf{k}}\,{\in}\,{[}{-}{0.5}, {0.5}{)}^{d}$

and

${\bf{r}}\,{\in}\,{\Bbb{R}}^{d}$

,

${d} = {2},{3}$

denote the k-space and image domain coordinates, respectively; and

${c}^{\ell}{\left({\bf{r}}\right)}$

is the

${\ell}$

th coil sensitivity of the

${\cal{L}}$

receiver coil. Commonly, the coil sensitivity is estimated using a calibration procedure prior to reconstruction. Equation (3) can be rewritten in matrix form as

\[{\bf{b}}_{{t}_{i}}^{\ell} = {\bf{FC}}^{\ell}{\rho}_{{t}_{i}} + {\varepsilon}_{{t}_{i}} \tag{4} \]

where

${\bf{b}}_{{t}_{i}}^{\ell}$

and

${\rho}_{{t}_{i}}$

are the vector representations of the k-space data and contrast-weighted image, respectively, and

${\varepsilon}_{{t}_{i}}$

is the noise. Here,

${\bf{F}}$

denotes the Fourier operator,

${\bf{C}}^{\ell}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}}$

is a diagonal matrix containing the sensitivity of the

${\ell}$

th coil on its diagonal, and N is the pixel number. By stacking

${\left\{{\bf{b}}_{{t}_{i}}^{\ell}\right\}}$

into a vector

${\bf{b}}_{{t}_{i}}$

and packing

${\left\{{\bf{FC}}^{\ell}\right\}}$

into a block-diagonal matrix

${\bf{E}}$

, we obtain a forward model using the following operator notation:

\[{\bf{b}}_{{t}_{i}} = {\bf{E}}{\rho}_{{t}_{i}} + {\varepsilon}_{{t}_{i}}{.} \tag{5} \]

For single-channel measurements,

${\bf{C}}^{\ell}$

becomes an identity matrix, and

${\bf{E}}$

is equal to the Fourier operator

${\bf{F}}$

. In fast MRI, where the k-space is undersampled, the Fourier transform with the undersampling operator

${\bf{F}}_{u}{(}{\bf{F}}_{u}{:}{\Bbb{C}}^{{N}\,{\times}\,{1}}\,{\rightarrow}\,{\Bbb{C}}^{{V}\,{\times}\,{1}},{V}\,{â‰ª}\,{N}{)}$

is used in place of

${\bf{F}}$

. Equation (5) is hereafter referred to as the imaging model.ReconstructionTraditional approaches of qMRI consist of two steps: 1) MR image reconstruction and 2) parameter fitting. Regarding the second step, qMRI belongs to a class of inverse problems, where biophysical parameters (unknowns in a system) are estimated from a series of measured contrast-weighted images (observations) [27]. In most cases, this process can be formulated as a nonlinear inverse problem, and the least-squares method is commonly used to solve this problem, based on the following equation:

\[{\hat{\bf{x}}} = \mathop{\arg\min}\limits_{{\bf{x}}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}_{p}}} \frac{1}{2} \mathop{\sum}\limits_{{i} = {1}}\limits^{L}{\left\Vert{\bf{A}}{\left({\bf{x}},{t}_{i}\right)}{-}{\rho}_{{t}_{i}}\right\Vert}_{2}^{2}\,{â‰œ}\,\frac{1}{2}{\left\Vert{\bf{A}} {\left({\bf{x}},{t}\right)}{-}{\rho}_{t}\right\Vert}_{F}^{2} \tag{6} \]

where

${\bf{x}}$

is the vectorized biophysical parameter map and

${N}_{p}$

denotes the number of biophysical parameters to be estimated;

${\bf{A}}{\left({\bf{x}},{t}\right)} = {\left[{\bf{A}}{\left({\bf{x}},{t}_{1}\right)},{\bf{A}}{\left({\bf{x}},{t}_{2}\right)},{\ldots},{\bf{A}}{\left({\bf{x}},{t}_{L}\right)}\right]},{\rho}_{t} = {\left[{\rho}_{{t}_{1}},{\rho}_{{t}_{2}},{\ldots},{\rho}_{{t}_{L}}\right]}$

, and

${L}$

is the total number of images. It should be noted that the signal-to-noise ratios (SNRs) of the contrast-weighted images may vary according to the sequence parameter. This typically happens when changing the flip angle (FA) in

${T}_{1}$

relaxation and the echo time in

${T}_{2}$

and

${T}_{2}^{\ast}$

relaxation. Considering the heteroscedasticity of this noise process, the least-squares criterion in (6) can be transformed into a weighted-least-squares objective function, where the weights are inversely proportional to the noise variances associated with multiple datasets. For some special applications with ill-posed models, such as quantitative susceptibility mapping (QSM), a regularization term

${\cal{R}}{\left({\bf{x}}\right)}$

is added to (6) as follows:

\[{\hat{\bf{x}}} = \mathop{\arg\min}\limits_{{\bf{x}}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}_{p}}}\frac{1}{2}{\left\Vert{{\bf{A}}{\left({\bf{x}},{t}\right)}{-}{\rho}_{t}}\right\Vert}_{F}^{2} + {\cal{R}}{\left({\bf{x}}\right)}{.} \tag{7} \]

In fast qMRI, which undersamples the k-space, the biophysical parameter map can be directly estimated from the undersampled k-space data by incorporating the imaging model into (7) as

\[{\hat{\bf{x}}} = \mathop{\arg\min}\limits_{{\bf{x}}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{N}_{p}}}\frac{1}{2}{\left\Vert{\bf{EA}}{\left({\bf{x}},{t}\right)}{-}{\bf{B}}_{t}\right\Vert}_{F}^{2} + {\cal{R}}{\left({\bf{x}}{;}{\rho}_{t}\right)} \tag{8} \]

where

${\bf{B}}_{t} = {\left[{\bf{b}}_{{t}_{1}},{\bf{b}}_{{t}_{2}},{\ldots},{\bf{b}}_{{t}_{L}}\right]}$

represents the acquired k-space data of all contrast-weighted images;

${\bf{E}}$

denotes the forward imaging operator that transfers images into the k-space, including the coil sensitivity, Fourier transform, and sampling trajectory, as shown in (6); and

${\cal{R}}{\left({\bf{x}}{;}{\rho}_{t}\right)}$

represents the regularization of the parameter maps and/or images. A typical regularization in fast qMRI is structured sparsity, such as the group least absolute shrinkage and selection operator (LASSO), sparse group LASSO, and octagonal selection and clustering algorithm for regression regularization [28].Categories of physics-driven DL-based fast qMRI methodsFrom a signal processing perspective, the two steps of qMRI can be treated as signal recovery from the k-space and model parameter fitting. Fast qMRI is often achieved by undersampling in the k-space and parameter domains. The former is an ill-posed inversion problem. Therefore, some priors are typically introduced for regularization to recover better images. The latter is a perturbation-sensitive fitting problem, where a higher SNR or greater number of images leads to more accurate fitting results. Commonly used priors, such as sparsity and fitting methods, are well interpretable, with solid theoretical support for convergence, complexity, and so on, but they may not be sufficiently sophisticated and flexible to describe priors and the complex relationships in model fitting fully. DL is a powerful tool for learning implicit priors from a large amount of historical data and establishing robust fitting network relationships, so it has the potential to improve the performance of fast qMRI.In this article, DL-based fast qMRI methods are classified into the four categories listed in Table 2 according to their strategies for utilizing physics in their networks. We summarize the advantages and limitations of each category as well as the applications introduced in this review. It should be noted that this article does not cover a complete list of DL-based qMRI applications but introduces typical examples that are helpful for understanding the field and its trends.Table 2. A summary of the strategies of DL-based fast qMRI methods.liang_t2-3236483The following sections detail each category separately. In each section, the main concepts and a flowchart of each target strategy are first introduced. State-of-the-art methods belonging to each category are then introduced to demonstrate how physics is integrated in each application. We also attempt to illustrate the evolution of physics utilization in specific applications when possible.Training sample generation using physical modelsThis section describes the type 1 category in Table 2, and the corresponding flowchart is presented in Figure 1. In the early stages of applying DL to qMRI, a straightforward method is to use a deep network, instead of the nonlinear parameter fitting step, to reduce computational complexity and time. One of the main challenges in employing an end-to-end deep network is a lack of high-quality training sample pairs. Generating training samples using Bloch simulation with a predefined parameter range provides an easy and low-cost method for training sample collection. Additionally, such simulated datasets have true labels for biophysical parameters, which may help avoid bias from real scans during network training. Such bias may be introduced by noise contamination as well as perturbations from hardware imperfections. Because the Bloch simulation calculates only spatially independent weighting with respect to specified physical and sequence parameters, synthetic images are often used to mimic the spatial distributions of biophysical parameters. Based on the strong ability of deep networks to handle complex models, researchers have adopted sophisticated sequences for qMRI to accelerate acquisition further and quantify multiple parameters simultaneously. Additionally, to overcome the gap between synthetic data and real scans, imperfections in real scans can be simulated to improve the robustness of networks.liang01-3236483Figure 1. The type 1 category in Table 2. A deep network is trained for the parameter fitting of qMRI with training samples generated using a physical model.T₂ mappingThis strategy was employed in a single-shot 2D

${T}_{2}$

mapping method of the brain, called overlapping-echo detachment (OLED) planar imaging. Unlike conventional multiecho spin echo-based

${T}_{2}$

mapping sequences, OLED accelerates

${T}_{2}$

mapping through the synchronized acquisition of multiple signals that overlap in the k-space [29]. These signals are then separated using an optimization method to quantify

${T}_{2}$

. This process is rather slow due to the highly nonlinear mapping process of OLED imaging. To solve this issue, a deep residual network [13] was employed to estimate

${T}_{2}$

maps from images with overlapping signals. A training dataset was generated using Bloch simulation with a group of images synthesized by randomly sorting geometrical shapes. The results demonstrated that using the network improved both

${T}_{2}$

map quality and the reconstruction speed of OLED.In subsequent studies, more complex sequences, such as increased overlapping signals [30] and simultaneous multislice acquisition [31], were employed in OLED imaging to improve

${T}_{2}$

accuracy and accelerate acquisition further. Bloch simulation provides a feasible way to update the training dataset for each method at almost no cost. Even with increased complexity in the acquired signals, a deep network can still achieve accurate

${T}_{2}$

estimation. Additionally, real anatomical textures extracted from public datasets, instead of synthetic images and imperfections, such as subject motion, were added to the Bloch simulation to bridge the gap between simulated and real data. In this manner, the simulated data were brought closer to real data, resulting in reduced motion artifacts in

${T}_{2}$

maps in the presence of unpredictable subject movements [32].Multiple-parameter mapping using MRFAs mentioned in the â€œPhysics and Reconstruction of qMRIâ€ section, MRF can be considered a general framework for simultaneously acquiring multiple parameter maps. However, its parameter fitting step, which involves obtaining the desired parameters from acquired signals through dictionary matching, is time-consuming. It is natural to train an NN by using a pair of dictionary entries and replacing the dictionary matching step in MRF [14]. The MRF deep reconstruction network (DRONE) can achieve a 300â€“5,000 Ã— speedup compared to dictionary matching for simultaneous

${T}_{1}$

,

${T}_{2}$

, and proton density mapping in 2D MRF. In addition to conventional

${T}_{1}$

and

${T}_{2}$

, magnetization transfer contrast (MTC) can also be quantified in a 3D MRF framework using a network for parameter fitting [33]. Because more biophysical parameters are involved in MTC-MRF, simulated training datasets grow tremendously to include all possible tissue combinations. In cardiac applications, imperfections in real scans, such as cardiac rhythms, are added to Bloch simulations to enrich the training dataset and improve the robustness of the network [34], similar to the

${\text{OLED}}-{T}_{2}$

mapping discussed in the preceding.Physical model-based synthetic images via DLIn qMRI, one major issue is the disturbances introduced by imperfections in real scans, such as

${B}_{0}$

inhomogeneities, particularly in the non-Cartesian sampling trajectory [35], which significantly affects quantitative accuracy. Therefore, the number of acquired images typically needs to be much larger than the unknown number in the physical model, i.e., overdetermined, to overcome these disturbances. However, multiple acquisitions make qMRI very time-consuming. Therefore, the discovery of a method to estimate parameter maps accurately based on fewer acquired images is of great interest.End-to-end networks have been applied to mapping from fewer acquired images to the parameter maps, but such networks require ample training data and may be unstable [2], [15]. To address these issues, two main types of methods have been proposed. The first is to predict missing images by using a network for replenishment. Because missing and acquired images lie in the same manifold represented by a physical model, the physics-based relationships among images can be learned by a network for missing image generation. The second is to remove disturbances by using a network for correction and improving overall image quality. The processes of both methods implicitly use physics to learn the relationships between generated and acquired images, as demonstrated in Figure 2. Finally, parameter maps can be estimated from network output images by using either conventional parameter fitting or network fitting. This section introduces three examples of this strategy: quantitative magnetization transfer (qMT) imaging, glutamate-weighted CEST (GluCEST) imaging, and diffusion tensor imaging (DTI).liang02-3236483Figure 2. The type 2 category in Table 2. Physical models are implicitly involved when using the network to generate missing and improved images.qMT imagingqMT imaging aims to provide a quantitative measurement of the macromolecular content of tissue, based on the magnetization transfer effect. Conventional qMT involves acquiring images with multiple off-resonance frequencies for parameter fitting (typically 12 off-resonance images are acquired), which increases the scan time. Recently, the qMTNet (see Figure 3) was proposed to produce 2D qMT parameter maps from four off-resonance images by generating missing images [15]. Two subnetworks are involved in qMTNet. First, qMTNet-acq produces eight missing off-resonance images from the four acquired images. Then, a total of 12 images are fed into the qMTNet-fit subnetwork to obtain qMT parameters. A direct fitting network based on four off-resonance images was also evaluated but was less stable than qMTNet with its generation module.liang03-3236483Figure 3. The qMTNet. Two subnetworks are involved. qMTNet-acq produces the missing eight off-resonance images from the four acquired images. qMTNet-fit obtains qMT parameters from a total of 12 images.GluCEST imagingGluCEST uses the chemical exchange effect to measure the relative concentration of glutamate, which plays a central role in brain function. Theoretically, because the frequency of exchangeable amine protons on glutamate is 3 parts per million (ppm), the GluCEST metric can be calculated as the ratio of the difference between two images obtained by saturating the frequencies of

${\pm}{3}{\text{ ppm}}$

, as follows:

\[{\text{GluCEST ratio}} = \frac{{\rho}_{{\text{sat}}{\left({-}{3}{\text{ ppm}}\right)}}{-}{\rho}_{{\text{sat}}{\left({3}{\text{ ppm}}\right)}}}{{\rho}_{{\text{sat}}{\left({-}{3}{\text{ ppm}}\right)}}}\,{\times}\,{10} \tag{9} \]

where

${\rho}_{{\text{sat}}\left({\pm{3}{\text{ ppm}}}\right)}$

is the signal obtained with saturation at a

${\Delta}{\omega} = \pm{3} {-} {\text{ ppm}}$

offset from the water resonance. However, as a result of

${B}_{0}$

inhomogeneities, more than six pairs of offset frequency CEST images are typically acquired to cover the expected

${B}_{0}$

inhomogeneity range and synthesize

${B}_{0} {-} {\text{corrected}}$

images in a voxel-wise manner. The requirement for images at multiple offset frequencies makes GluCEST very time-consuming.A DL-based method was proposed to estimate

${B}_{0}{-}{\text{corrected}}$

$\pm{3}{\text{ ppm}}$

images for 2D GluCEST imaging based on a significantly reduced number of offset images [16]. The positive and negative sides were predicted separately. For each side, a deep residual network was trained to learn a nonlinear mapping from a few acquired images to the image at 3 ppm on the same side. To simplify the network, an initial interpolation module was used to shift images at a particular frequency to the desired frequency, and then the interpolated images were used as inputs for the network. Once the

${\pm}{3}{\text{ ppm}}$

CEST images were produced by the two networks, the GluCEST ratio was directly calculated using (9). In experiments on the human brain, 2D GluCEST ratio maps can be obtained from a minimum of three pairs of GluCEST images, resulting in a 50% reduction in scan time.DTIDTI is an important tool for the noninvasive characterization of tissue microstructure and structural connectivity in the human brain. The DTI model is listed in Table 1, where the diffusion tensor

${\bf{D}}$

is a

${3}\,{\times}\,{3}$

symmetric matrix with six unknowns. Therefore, DTI requires only six diffusion-weighted images (DWIs) and one nondiffusion image (the

${b}_{0}$

image) to estimate a diffusion tensor. In practice, a large number of DWIs are often acquired owing to the inherently low image SNR, which significantly increases scan time. DL-based methods have been proposed to obtain diffusion parameters using only six DWIs and one

${b}_{0}$

image for 2D DTI [2], [17]. The DeepDTI [17] method first selects six DWIs with optimized diffusion encoding directions, which are defined by minimizing the condition number of the diffusion tensor transformation matrix and are as uniform as possible. A convolutional NN (CNN) is then used to improve the image quality of DWIs, using both DWIs and an anatomical

${T}_{1}{-}{\text{weighted}}$

image as inputs. Finally, the diffusion tensor is estimated from the output DWIs by using the conventional fitting method.Physical model-integrated loss function designThis section describes the type 3 category listed in Table 2. In DL-based qMRI with end-to-end mapping, the most commonly used loss function is the difference between the estimated parameter maps and parameter labels. By using the physical model in (2) and imaging model in (5), synthetic contrast images and synthetic k-spaces can be generated from estimated maps [see Figure 4(a) and (b), respectively]. The errors between these synthetic data and the input data can be used as an additional loss term in the network to enforce data consistency (DC) and model fidelity. The formula for this loss term is

\[\begin{cases}\begin{array}{c}{\cal{L}} = {\lambda}_{1}{\cal{L}}_{1} + {\lambda}_{2}{\cal{L}}_{2} \\ {\cal{L}}_{1} = {\left\Vert{\hat{\bf{x}}}{-}{\bf{x}}\right\Vert}_{F}^{2} \\ {\cal{L}}_{2} = {\left\Vert{\hat{\rho}}_{t}{-}{\rho}_{t}\right\Vert}_{F}^{2}{\text{ or }}{\left\Vert{\hat{\bf{B}}}_{t}{-}{\bf{B}}_{t}\right\Vert}_{F}^{2}\end{array}\end{cases} \tag{10} \]

liang04-3236483Figure 4. The physical model-integrated loss function design with synthetic data in the (a) image domain and (b)

${k}$

-space domain. The network is used to generate parameter maps

${\hat{\bf{x}}}$

from input contrast images

${\rho}_{t}$

. The conventional loss is the error between

${\hat{\bf{x}}}$

and the labels. The synthetic images

${\hat{\rho}}_{t}$

and

${k}$

-space

${\hat{\bf{B}}}_{t}$

can be generated from

${\hat{\bf{x}}}$

according to the physical model and image operator

${\bf{E}}$

. The error between the synthetic data and labeled data is used as an additional loss term.where

${\lambda}_{1}$

and

${\lambda}_{2}$

are regularization parameters that balance the two terms and

${\hat{\bf{x}}}$

is the parameter estimated by the network;

${\hat{\rho}}_{t} = {A}\left({\hat{x},{t}}\right)$

and

${\hat{\bf{B}}}_{t} = {E}{\hat{\rho}}_{t}$

denote synthetic contrast images and k-space data, respectively.This strategy also offers the opportunity to train a network for qMRI without labeled parameter maps. In other words, a network can be trained using only

${\cal{L}}_{2}$

, eliminating the need for

${\cal{L}}_{1}$

, which relies on the labeled parameter maps in (10). The errors between the synthetic data and input data are used to enforce the â€œself-supervisionâ€ of the network. In this scenario, such a method is referred to as self-supervised learning or unsupervised learning. Additional regularizations that do not rely on labels can be added to

${\cal{L}}_{2}$

to improve reconstruction performance further. The workflow of this category is illustrated in Figure 4.T₂ mappingThis strategy has been applied to 2D

${T}_{2}$

mapping of the knee and utilizes a method called a model-augmented NN with incoherent k-space sampling (MANTIS) [18]. MANTIS directly estimates

${T}_{2}$

maps from undersampled k-space data, using a CNN with the loss function

\[{\cal{L}} = {\lambda}_{1}{\left\Vert{\left({\hat{\bf{I}}}_{0},{\hat{\bf{T}}}_{2}\right)}{-}{\left({\bf{I}}_{0},{\bf{T}}_{2}\right)}\right\Vert}_{F}^{2} + {\lambda}_{2}{\left\Vert{\hat{\bf{B}}}_{t}{-}{\bf{B}}_{t}\right\Vert}_{F}^{2} \tag{11} \]

where

${\bf{I}}_{0}$

and

${\bf{T}}_{2}$

are the proton density and

${T}_{2}$

maps to be reconstructed, respectively;

${\hat{\bf{I}}}_{0}$

and

${\hat{\bf{T}}}_{2}$

are the outputs of the network;

${\hat{\bf{B}}}_{t} = {\bf{EA}}\left({\left({{\hat{\bf{I}}}_{0},{\hat{\bf{T}}}_{2}}\right),{\bf{TE}}}\right)$

denotes the synthetic k-space data from the forward physical model;

${\bf{E}}$

denotes the imaging operator;

${\bf{A}}$

is the

${T}_{2}$

relaxation model in Table 1; and TE denotes the echo times of

${T}_{2}$

mapping. The first loss term represents the biophysical parameter fidelity, whereas the second term represents the k-space DC.The unsupervised version of this strategy has also been applied to 2D

${T}_{2}$

mapping and is referred to as reference-free latent map extraction (RELAX) [19]. The loss function of the network is

\[{\cal{L}} = {\left\Vert{{\hat{\bf{B}}}_{t}{-}{\bf{B}}_{t}}\right\Vert}_{F}^{2} + {\lambda}{\cal{R}}\left({{\hat{\bf{I}}}_{0},{\hat{\bf{T}}}_{2}}\right), \tag{12} \]

where

${\cal{R}}$

is a regularization function of the estimated parameter maps, e.g., the spatial total variation (TV). Similar to (8) in the constrained reconstruction, more than one regularization term (e.g., wavelet and TV) can be implemented jointly in

${\cal{R}}$

;

${\lambda}\,{>}\,{0}$

is the regularization parameter.MTC-MRFAs mentioned in the â€œTraining Sample Generation Using Physical Modelsâ€ section, MTC-MRF employs a network to estimate MT-related parameters, but this supervised learning method requires a large amount of labeled training data and long time for training. Therefore, an unsupervised learning architecture based on synthetic images [36] was designed to solve the complex nonlinear inverse mapping problem in 3D MTC-MRF. Specifically, MTC-MRF images were fed into a CNN to generate initial quantitative MTC and

${T}_{1}$

maps. Then, these network outputs, additional acquired

${T}_{2}$

maps, and MRF scan parameters were used to generate synthetic MTC-MRF images through the two-pool Blochâ€“McConnell simulation. The loss function was defined as the squared error between the acquired and synthetic MTC-MRF images, as follows:

\[{\cal{L}} = {\left\Vert{{\hat{\rho}}_{t}{-}{\rho}_{t}}\right\Vert}_{F}^{2} \tag{13} \]

where

${\hat{\rho}}_{t}$

denotes the synthetic MTC-MRF images and

${\rho}_{t}$

denotes the acquired images. This method requires only a small amount of unlabeled data for training, and the computation time is reduced by 1,000 times compared to the conventional non-DL approach.Physical model consistency network designThis section introduces the last category in Table 2 for unrolling-based network design. As shown in (7) and (8), the fast qMRI problem can be formulated as an optimization problem in which the parameter map is unknown. Traditionally, this problem has been solved using an iterative algorithm. With DL, iterations can be unrolled into a deep network to learn regularization and data fidelity terms. As detailed in Figure 5, each iteration is unrolled as a network module. The physical model is incorporated into a network module as part of the measurement operator that transfers the parameter maps into the raw data space to enforce DC. This operator is known as the DC layer. A network is typically trained in a supervised manner. Unlike the end-to-end networks used in the previous three categories, the network architectures of unrolling-based qMRI methods rely on optimization algorithms and physical models. In each of the following examples, additional details are introduced based on a specific algorithm and physical model.liang05-3236483Figure 5. The unrolling-based network design for qMRI. Each iteration in traditional iterative reconstruction is unrolled as a network module. The physical model is incorporated as a part of the measurement operator that transfers parameter maps into the raw data space to enforce DC.QSMGenerally, when biological tissue with a susceptibility distribution

${\chi}$

is placed in the main magnetic field, it generates a magnetic field perturbation

${\delta}{\bf{B}}$

, which can be observed in MR phase images. Model-based DL (MoDL) for QSM [20] has been proposed to obtain a 2D susceptibility parameter map from a single-orientation MR phase image, based on the susceptibility tensor (STI) model. In the STI model,

${\chi}$

is orientation dependent and can be represented by a second-order symmetric tensor. After a complex derivation, the relationship between

${\delta}{\bf{B}}$

and the main components of

${\chi}$

in the single-orientation QSM can be simply modeled as

\[{\delta}{\bf{B}} = {\bf{AX}} \tag{14} \]

where

${\bf{A}} = {\bf{F}}^{{-}{1}}\left[{{\bf{1}} / {\bf{3}}{-}{\left({{k}_{z}}\right)}^{2} / {\parallel{\bf{k}}\parallel}_{2}^{2},{1}}\right]{\bf{F}}$

is the forward operator matrix,

${\bf{F}}^{{-}{1}}$

is the inverse Fourier transform,

${\bf{k}} = {\left[{{k}_{x},{k}_{y},{k}_{z}}\right]}^{T}$

is the spatial frequency vector,

${\bf{X}}$

contains one diagonal term of

${\chi}$

${(}{\chi}_{33}{)}$

that represents the true susceptibility, and the field perturbation

${\delta}{\bf{B}}$

' is derived from the off-diagonal tensor terms of

${\chi}$

. To solve

${\bf{X}}$

using

${\delta}{\bf{B}}$

, (7) is reformulated as follows in MoDL-QSM:

\[{\hat{\bf{X}}} = \mathop{\arg\min}\limits_{{\bf{X}}\,{\in}\,{\Bbb{C}}^{{N}\,{\times}\,{2}}}\frac{1}{2}{\left\Vert{{\bf{AX}}{-}{\delta}{\bf{B}}}\right\Vert}_{F}^{2} + {\cal{R}}{\left({\bf{X}}\right)}{.} \tag{15} \]

By using the proximal gradient descent algorithm, (13) can be solved iteratively with respect to

${\bf{X}}$

in the kth iteration, which is expressed as

\begin{align*}{\hat{\bf{X}}}_{{k} + {1}} & = {\text{Prox}}_{{\alpha}_{k},{\cal{R}}}\left({{{\hat{\bf{X}}}}_{k}{-}{\alpha}_{k}{\bf{A}}^{\text{H}}\left({{\bf{A}}{{\hat{\bf{X}}}}_{k}{-}{\delta}{\bf{B}}}\right)}\right) \\ & = {\text{Prox}}_{{\alpha}_{k},{\cal{R}}}\left({\left({{\bf{I}}{-}{\alpha}_{k}{\bf{A}}^{\text{H}}{\bf{A}}}\right){{\hat{\bf{X}}}}_{k} + {\alpha}_{k}{\bf{A}}^{\text{H}}{\delta}{\bf{B}}}\right) \tag{16} \end{align*}

where

${\alpha}_{k}$

is the step size in the kth iteration,

${\bf{A}}^{\text{H}}$

is the conjugate transpose of

${\bf{A}}$

, and

${\text{Prox}}_{{\alpha}_{k},{\cal{R}}}{\left({\cdot}\right)}$

is the proximal operator of the regularization term

${\cal{R}}{\left({\bf{X}}\right)}$

.Equation (14) is unrolled in three iterations. In each iteration, a CNN is used to replace the proximal operator

${\text{Prox}}_{{\alpha}_{k},{\cal{R}}}{\left({\cdot}\right)}$

with learnable parameters. The CNNs for all three iterations share their weights, and the output of each CNN performs an operation defined by the physical model. The final output of MoDL-QSM consists of two channels, namely, the susceptibility distribution and

${\delta}{\bf{B}}{'}$

. Figure 6 presents the architecture of MoDL-QSM.liang06-3236483Figure 6. The architecture of MoDL-QSM. Here,

${\cal{C}}$

represents a CNN. The output of each CNN performs the operation defined by the physical model, which is labeled

${\cal{P}}$

.QSM with a nonlinear physical modelIn contrast to the commonly used linear-susceptibility-to-field relationship in QSM, VaNDI-QSM [3] considers a nonlinear forward model based on the fact that the phase noise distribution deviates from the Gaussian distribution, particularly in low-SNR regions.With a nonlinear fidelity term, (7) is rewritten in VaNDI-QSM as

\[\mathop{\min}\limits_{\chi}\left\{{\frac{\lambda}{2}{\left\vert{\left\vert{\bf{W}}\left({{e}^{{j}{\Bbb{D}}{\chi}}{-}{e}^{{j}{\phi}}}\right)\right\vert}\right\vert}_{F}^{2} + {\cal{R}}{\left({\chi}\right)}}\right\} \tag{17} \]

where W is the noise-weighting factor,

${\phi}$

is the tissue phase,

${\Bbb{D}} = {\bf{F}}^{{-}{1}}{\bf{dF}}$

,

${\bf{d}}$

is the dipole kernel, and

${\bf{F}}$

denotes the Fourier transform.VaNDI-QSM uses a variational network (VN) to combine DL with the nonlinear QSM model. In the VN framework, the regularization term

${\cal{R}}{\left({\chi}\right)} = {\Sigma}_{{i} = {1}}^{{N}_{k}}{\left\langle{\Psi}^{i}\left({{\bf{K}}^{i}{\chi}}\right),{1}\right\rangle}$

is a generalization of the field of the experts model, where

${\bf{K}}^{i}$

denotes a linear operator and

${\Psi}^{i}$

is a nonlinear potential function. By unrolling the gradient descent algorithm, (17) can be solved using learned variational regularizers, as follows:

\[{\chi}_{{k} + {1}} = {\chi}_{k}{-} \mathop{\sum}\limits_{{i} = {1}}\limits^{{N}_{k}}{\left({\bf{K}}_{k}^{i}\right)}^{\text{H}}{\Psi}_{k}^{i'}{\left({\bf{K}}_{k}^{i}{\chi}_{k}\right)}{-}{\lambda}_{k}{\bf{D}}^{\text{H}}{\bf{W}}^{\text{H}}{\bf{W}}{\sin}{\left({\Bbb{D}}{\chi}_{k}{-}{\phi}\right)} \tag{18} \]

where the nonlinear activation function

${\Psi}_{k}^{i'}$

denotes the first derivatives of the potential function

${\Psi}_{k}^{i}$

. The parameters

${\bf{K}}_{k}^{i}$

,

${\Psi}_{k}^{i'}$

, and

${\lambda}_{k}$

are time dependent, and

${\lambda}_{k}{\Bbb{D}}^{\text{H}}{\bf{W}}^{\text{H}}{\bf{W}}{\sin}{\left({\Bbb{D}}{\chi}_{k}{-}{\phi}\right)}$

is the analytical derivation of the data fidelity gradient (for additional details, please refer to [3]). The VaNDI-QSM architecture is provided in Figure 7.liang07-3236483Figure 7. The architecture of VaNDI-QSM. This method unrolls the gradient descent algorithm into deep networks. The nonlinear dipole inversion updating rule with the learnable regularization parameter

${\lambda}_{k}$

is integrated into the network to enforce QSM model fidelity. In the regularization term, 3D convolutions (filter

${\bf{K}}_{k}^{i}$

) and nonlinear activations

${\Psi}_{k}^{i'}$

are learned for each gradient descent step

${\text{GD}}_{k}$

.T₁ mappingA deep model-based MR parameter mapping network (DOPAMINE) [21] was developed for the fast 3D

${T}_{1}$

mapping of the brain, using a variable FA (VFA) gradient echo sequence. In DOPAMINE, a parameter mapping network is first used to estimate initial parameter maps from undersampled k-space data, and an artifact removing network is used to remove the aliasing artifacts in parameter maps. An interleaved DC layer is embedded in the artifact removing network to guarantee data fidelity.Specifically, DOPAMINE represents the regularization term in (8) as

\[{\cal{R}}{\left({\bf{X}}\right)} = {\left\Vert{{X}{-}{\cal{D}}_{\text{R}}{\left({\bf{X}}\right)}}\right\Vert}_{2}^{2} \tag{19} \]

where

${\cal{D}}_{\text{R}}{\left({\bf{X}}\right)}{:}{\Bbb{C}}^{{N}\,{\times}\,{1}}\,{\mapsto}\,{\Bbb{C}}^{{N}\,{\times}\,{1}}$

denotes a CNN denoiser. Then, (8) can be rewritten as follows:

\[{\hat{\bf{X}}} = \mathop{\arg\min}\limits_{X}\frac{1}{2}{\left\Vert{{\bf{A}}{\left({\bf{X}}\right)}{-}{\bf{B}}_{t}}\right\Vert}_{F}^{2} + {\lambda}{\left\Vert{{\bf{X}}{-}{\cal{D}}_{\text{R}}{\left({\bf{X}}\right)}}\right\Vert}_{2}^{2} \tag{20} \]

where

${\bf{A}}$

is the VFA

${T}_{1}$

model, as shown in the first row in Table 1. The least-squares problem in (20) is solved using an iterative algorithm based on the gradient descent method. The gradient of

${\cal{D}}_{\text{R}}{\left({\bf{X}}\right)}$

can be approximated as zero for a small perturbation of

${\bf{X}}$

. Therefore, the following equation can be obtained:

\begin{align*}{\bf{X}}_{{k} + {1}} & = {\bf{X}}_{k}{-}{2}{\mu}_{k}{\left[{\bf{J}}_{\bf{A}}^{\text{H}}{\left({\bf{X}}_{k}\right)}{\left({\bf{A}}{\left({\bf{X}}_{k}\right)}{-}{\bf{B}}_{t}\right)} + {\lambda}_{k}{\left({\bf{X}}_{k}{-}{\cal{D}}_{\text{R}}{\left({\bf{X}}_{k}\right)}\right)}\right]} \\ & = {\left({1}{-}{2}{\lambda}_{k}{\mu}_{k}\right)}{\bf{X}}_{k}{-}{2}{\lambda}_{k}{\mu}_{k}{\cal{D}}_{\text{R}}{\left({\bf{X}}_{k}\right)} \\ & \quad {-}{2}{\mu}_{k}{\bf{J}}_{\bf{A}}^{\text{H}}{\left({\bf{X}}_{k}\right)}{\left({\bf{A}}{\left({\bf{X}}_{k}\right)}{-}{\bf{B}}_{t}\right)} \tag{21} \end{align*}

where

${\bf{J}}_{\bf{A}}$

is the Jacobian matrix of A,

${\bf{J}}_{\bf{A}}^{\text{H}}$

is the conjugate transpose of

${\bf{J}}_{\bf{A}}$

, and

${\lambda}_{k}$

and

${\mu}_{k}$

are the trainable parameters for the regularization and step size in iteration

${k}$

, respectively. Because the forward models for the imaging and parameter map are known, the Jacobian matrix can be calculated analytically.The overall architecture of DOPAMINE is illustrated in Figure 8. The initial

${\bf{X}}_{1} = {\left({\bf{I}}_{0},{\bf{T}}_{1}\right)}$

is generated by the mapping network from zero-filling

${T}_{1}{-}{\text{weighted}}$

images. The network

${\cal{D}}_{\text{R}}$

serves as a CNN-based denoiser in the DOPAMINE framework, and the physical model is incorporated into the operation of

${\bf{J}}_{\bf{A}}^{\text{H}}$

in the model-based DC layer, as represented in (21).liang08-3236483Figure 8. The overall architecture of DOPAMINE. The initial

${\bf{X}}_{1}$

is generated by the mapping network from zero-filling

${T}_{1}$

-weighted images. The network

${\cal{D}}_{\text{R}}$

serves as a CNN-based denoiser, and the physical model is incorporated into the operation of

${\bf{J}}_{\bf{A}}^{\text{H}}$

in the DC layer.Related signal processing issues and conclusionsTheoretical analysisThe lack of theoretical analysis is one of the major drawbacks of DL-based computational imaging methods. It is assumed that black-box DL networks can learn the nonlinear relationships between low-quality MR images and reference parameter maps. Many studies [37] have employed such black-box networks for qMRI, due to their straightforward implementation. However, the performance of a network strongly depends on the training data and network architecture. Additionally, there is no guarantee that the mapping established by a deep network is consistent with that of a real model. Using physical models as prior information helps maintain consistency with acquired data and provides partial interpretation for reconstruction. Compared to a black-box network without a physical model, a physics-driven network exhibits better performance in terms of both quantitative accuracy and artifact suppression [21].Uncertainties in reconstructionIn DL-based qMRI applications, NNs are trained to map inputs to predictions. However, there is limited insight into the accuracy and reliability of predictions. The uncertainties of erroneous predictions are critical in medical applications because failure cases may alter diagnoses and result in distrust from potential users. In previous studies on uncertainty estimation in MRI reconstruction, uncertainty was measured in a pixel-wise manner based on the standard deviation or variance of the posterior distribution of the desired reconstructed images. The key step in this process is to introduce randomness to characterize the posterior distribution. Randomness is typically introduced through three different ways: 1) in network nodes, 2) in network parameters, and 3) via the Markov chain Monte Carlo sampling algorithm.In MRI reconstruction, variational autoencoders (VAEs) [38], Bayesian networks [39], and score matching-based methods [40] have been applied to explore uncertainty in DL-based MR image recovery. VAEs use the first method mentioned in the preceding to introduce randomness. They consider the model node a random variable and then approximate the posterior distribution of the desired image through variational inference. Bayesian networks belong to the second category, where a multivariate Gaussian distribution is used to approximate the true posterior distribution of network parameters through variational inference. Variance is calculated by sampling from the approximated Gaussian distribution to estimate uncertainty accurately. The score matching-based method estimates the prior of an image to be reconstructed and then uses the Markov chain Monte Carlo sampling algorithm to generate samples randomly according to the posterior distribution of the desired image.Unfortunately, very few studies have investigated the uncertainty of DL-based parameter fitting in qMRI. Glang et al. added a probabilistic layer as the output of the network to quantify the uncertainty in estimated CEST parameters (DeepCEST) by implicitly learning the output variance through maximum likelihood estimation [41]. However, DeepCEST does not consider the conditional probability of the CEST physical model and prior representation of CEST parameters in probabilistic modeling. Therefore, it is worth investigating a more accurate deep probability model conditioned on qMRI physics and accurately learned priors.Relationships with other physics-driven approachesPhysical models were already explored in constrained reconstruction for fast qMRI before introducing DL in three main ways:

Similar to simulated training samples in DL, the dictionary learning method for fast qMRI also employs a physical model to generate a training dataset [42].

The model-based reconstruction method [43] incorporates analytical physical models into its optimization formula so that biophysical parameters can be directly estimated from undersampled k-space data. In this manner, the physical model establishes a bridge between multicontrast images and biophysical parameters. It is straightforward to replace iterative reconstruction with DL with supervision and self-supervision (i.e., MANTIS and RELAX, respectively), as discussed in the â€œPhysical Model-Integrated Loss Function Designâ€ section. It is worth noting that although they are rooted in the same formula, the physical models in type 2 DL-based qMRI and model-based reconstruction differ in both their implementation and interpretation.

A physical model can be exploited to construct additional constraints in image reconstruction based on sparsity and low-rank properties [44]. Such properties have seldom been explored using DL-based methods and warrant further study.

Generalization of frameworksDL-based reconstruction leverages historical data to train NNs and then uses trained networks to classify new unseen data. However, the training and testing datasets must exhibit similar statistical features. Otherwise, a trained network may fail in the testing phase when encountering unfamiliar data. For DL-based qMRI, condition variations occur at several different levels, including sequence parameter changes, physical model changes, and vendor and site changes.The inputs for a qMRI network are typically designed for a specific series of sequence parameters, such as the diffusion gradient scheme in DTI and offset frequencies in MT and CEST imaging. If the sequence parameters change, even if only the number of acquired images changes, then a network may not perform well. Park et al. [45] used a preprocessing method to generalize a diffusion parameter mapping network with different diffusion gradient schemes and b values. In particular, diffusion signals were normalized in the q-space and then projected and quantized, producing a matrix as an input for the network to eliminate the differences introduced by different imaging parameters. Similar strategies may be extended to other qMRI applications using preprocessing methods designed according to specific physics.Various physical models have been derived under different assumptions for certain biophysical parameters. For example, relaxation models assume that a signal originates from a single component, whereas in vivo tissue typically consists of multiple components. Therefore, the monoexponential model of relaxation can be extended to biexponential and multiexponential models. Complex physical models typically involve unknown biophysical parameters so that the trained network cannot capture new features. In a scenario with a changing physical model, the strategy described in the â€œTraining Sample Generation Using Physical Modelsâ€ section for generating training samples based on a physical model may be a promising solution. Other condition changes, such as vendor and site changes, exist in scenarios similar to DL-based MR image recovery. Transfer learning can be adopted to transform a trained network to solve a different problem.Joint optimization of image acquisition and reconstructionImage acquisition plays a vital role in scanning efficiency and the quantitative accuracy of qMRI. This review focuses on the reconstruction aspect, but DL-based methods make it possible to optimize image acquisition and reconstruction jointly. Perlman et al. [46] developed a DL-based framework to generate optimized acquisition protocols and corresponding quantitative maps for CEST imaging (AutoCEST). Specifically, AutoCEST is composed of a CEST saturation block, spin dynamics module, and DRONE, all of which are differentiable and jointly connected. The first two blocks are represented as computational graphs based on the analytical physical model of CEST imaging, allowing the updating of sequence parameters using autodifferentiation. The simulated magnetization from the first two blocks is passed to the reconstruction network to obtain quantitative CEST parameter maps. Given an expected range of CEST parameter values, AutoCEST outputs a refined set of acquisition parameters (e.g., FA, saturation frequency, and power) as well as the optimized weights of the reconstruction network.Another method for jointly optimizing these two aspects is to consider the effects of the k-space trajectory on reconstruction performance when recovering MR images from undersampled k-space data. Cartesian sampling is the most commonly used trajectory in MR imaging, whereas in qMRI, fast acquisitions using non-Cartesian trajectories are often employed. For example, MRF uses spiral and radial trajectories for ultrafast imaging at each time point to record signal evolution. DL approaches have also been developed to optimize 2D radial trajectories and related image reconstruction jointly [47], [48]. When using a non-Cartesian trajectory, a density compensation step should be incorporated to compensate for variable density sampling [49]. qMRI applications with non-Cartesian trajectories may benefit from the results of these studies.Pilot applicationsDL also provides an elegant means of translating certain qMRI methods. Chen et al. [50] presented the first case of phosphocreatine (PCr) mapping of human skeletal muscle, using a clinical 3-T MRI system with artificial NN-based CEST imaging. PCr plays a vital role in cellular energy transport, particularly in skeletal and cardiac muscles, which have high energy demands. However, the translation of PCr mapping from ultrahigh-field animal studies to clinical practice with lower field strengths (1.5 and 3 T) is hindered by reduced contrast-to-noise ratios and frequency shifts at lower field strengths. With help from a deep network, CEST exhibits potential as a method for measuring PCr and diagnosing related diseases. Another example is the DL-aided CEST application [51], which aims to quantify apoptosis following oncolytic virotherapy. Apoptosis is considered an early predictor of cancer therapy outcomes prior to a visible reduction in tumor volume. The authors developed CEST-MRF under the DRONE framework for quantitative and rapid molecular imaging of treatment responses to oncolytic virotherapy and evaluated their approach in mice. This method was also translated into a clinical scanner to image a human subject. Although they are still far from clinical use, these pilot studies demonstrate the excellent potential of DL-based qMRI.ConclusionsIn conclusion, physical models are playing an increasingly important role in DL-based fast qMRI reconstructions. Despite the success of existing methods, there are still many open questions related to modeling and theoretical analysis. The generalization, uncertainty, and optimization of Âphysics-driven DL-based qMRI require further investigation by the signal processing community.AcknowledgmentThis work was supported, in part, by the National Key R&D Program of China, under grants 2020YFA0712200 and 2021YFF0501503; National Natural Science Foundation of China, under grants 12226008, 81971611, 62106252, 62125111, 12026603, 62206273, and U21A6005; and Shenzhen Science and Technology Program, under grant RCYX20210609104444089. Dong Liang is the corresponding author.AuthorsYanjie Zhu (yj.zhu@siat.ac.cn) received her Ph.D. degree in circuits and systems from the Shanghai Institute of Technical Physics, Chinese Academy of Sciences, in 2011. She is currently a full professor with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China. Her research interests include magnetic resonance sequence design, image reconstruction, and machine learning. She is a Member of IEEE.Jing Cheng (jing.cheng@siat.ac.cn) received her Ph.D. degree in pattern recognition and intelligent systems from the University of Chinese Academy of Sciences in 2020. From 2020 to 2022, she was a postdoctoral researcher with the Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China, where she is currently a research associate. Her research interests include compressed sensing, image reconstruction, and machine learning.Zhuo-Xu Cui (zx.cui@siat.ac.cn) received his Ph.D. degree from the School of Mathematics and Statistics, Wuhan University, in 2020. He is currently a research associate with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China. His research interests include image reconstruction, inverse problems, and statistical machine learning.Qingyong Zhu (qy.zhu@siat.ac.cn) received his Ph.D. degree in computational mathematics from Xiâ€™an Jiaotong University in 2020. From 2020 to 2022, he was a postdoctoral researcher with the Research Center for Medical AI, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China, where he now serves as an assistant professor. His research interests include computational imaging and inverse problems.Leslie Ying (leiying@buffalo.edu) received her Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign in 2003. She is currently a Clifford C. Furnas Chair Professor of Biomedical Engineering and Electrical Engineering at the State University of New York at Buffalo, Buffalo, NY 14222 USA. Her research interests include magnetic resonance imaging, image reconstruction, compressed sensing, and machine learning. She is a Senior Member of IEEE.Dong Liang (dong.liang@siat.ac.cn) received his Ph.D. degree in pattern recognition and intelligent systems from Shanghai Jiaotong University in 2006. He is currently a full professor at the Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China. His research interests include compressed sensing, image reconstruction, biomedical imaging, and machine learning. He is a Senior Member of IEEE.References[1] H. L. Margaret Cheng et al., â€œPractical medical applications of quantitative MR relaxometry,â€ J. Magn. Reson. Imag., vol. 36, no. 4, pp. 805â€“824, Oct. 2012, doi: 10.1002/jmri.23718.[2] H. Li et al., â€œSuperDTI: Ultrafast DTI and fiber tractography with deep learning,â€ Magn. Reson. Med., vol. 86, no. 6, pp. 3334â€“3347, Dec. 2021, doi: 10.1002/mrm.28937.[3] D. Polak et al., â€œNonlinear dipole inversion (NDI) enables robust quantitative susceptibility mapping (QSM),â€ NMR Biomed., vol. 33, no. 12, Dec. 2020, Art. no. e4271, doi: 10.1002/nbm.4271.[4] D. L. Donoho, â€œCompressed sensing,â€ IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289â€“1306, Apr. 2006, doi: 10.1109/TIT.2006.871582.[5] D. Liang, B. Liu, J. Wang, and L. Ying, â€œAccelerating SENSE using compressed sensing,â€ Magn. Reson. Med., vol. 62, no. 6, pp. 1574â€“1584, Dec. 2009, doi: 10.1002/mrm.22161.[6] M. Lustig, D. Donoho, and J. M. Pauly, â€œSparse MRI: The application of compressed sensing for rapid MR imaging,â€ Magn. Reson. Med., vol. 58, no. 6, pp. 1182â€“1195, Dec. 2007, doi: 10.1002/mrm.21391.[7] D. Lee et al., â€œAcceleration of MR parameter mapping using annihilating filter-based low rank Hankel matrix (ALOHA),â€ Magn. Reson. Med., vol. 76, no. 6, pp. 1848â€“1864, Dec. 2016, doi: 10.1002/mrm.26081.[8] B. Zhao, F. Lam, and Z.-P. Liang, â€œModel-based MR parameter mapping with sparsity constraints: Parameter estimation and performance bounds,â€ IEEE Trans. Med. Imag., vol. 33, no. 9, pp. 1832â€“1844, Sep. 2014, doi: 10.1109/TMI.2014.2322815.[9] B. Zhu et al., â€œImage reconstruction by domain-transform manifold learning,â€ Nature, vol. 555, no. 7697, pp. 487â€“492, Mar. 2018, doi: 10.1038/nature25988.[10] M. Mardani et al., â€œDeep generative adversarial neural networks for compressive sensing MRI,â€ IEEE Trans. Med. Imag., vol. 38, no. 1, pp. 167â€“179, Jan. 2019, doi: 10.1109/TMI.2018.2858752.[11] B. Wen et al., â€œTransform learning for magnetic resonance image reconstruction: From model-based learning to building neural networks,â€ IEEE Signal Process. Mag., vol. 37, no. 1, pp. 41â€“53, Jan. 2020, doi: 10.1109/MSP.2019.2951469.[12] D. Lee et al., â€œDeep residual learning for accelerated MRI using magnitude and phase networks,â€ IEEE Trans. Biomed. Eng., vol. 65, no. 9, pp. 1985â€“1995, Sep. 2018, doi: 10.1109/TBME.2018.2821699.[13] C. Cai et al., â€œSingle-shot T2 mapping using overlapping-echo detachment planar imaging and a deep convolutional neural network,â€ Magn. Reson. Med., vol. 80, no. 5, pp. 2202â€“2214, Nov. 2018, doi: 10.1002/mrm.27205.[14] O. Cohen, B. Zhu, and M. S. Rosen, â€œMR fingerprinting deep reconstruction network (DRONE),â€ Magn. Reson. Med., vol. 80, no. 3, pp. 885â€“894, Sep. 2018, doi: 10.1002/mrm.27198.[15] H. M. Luu et al., â€œqMTNet: Accelerated quantitative magnetization transfer imaging with artificial neural networks,â€ Magn. Reson. Med., vol. 85, no. 1, pp. 298â€“308, Jan. 2021, doi: 10.1002/mrm.28411.[16] Y. Li et al., â€œAccelerating GluCEST imaging using deep learning for B0 correction,â€ Magn. Reson. Med., vol. 84, no. 4, pp. 1724â€“1733, Oct. 2020, doi: 10.1002/mrm.28289.[17] Q. Tian et al., â€œDeepDTI: High-fidelity six-direction diffusion tensor imaging using deep learning,â€ NeuroImage, vol. 219, Nov. 2020, Art. no. 117017, doi: 10.1016/j.neuroimage.2020.117017.[18] F. Liu, L. Feng, and R. Kijowski, â€œMANTIS: Model-augmented neural neTwork with Incoherent k-space sampling for efficient MR parameter mapping,â€ Magn. Reson. Med., vol. 82, no. 1, pp. 174â€“188, Mar. 2019, doi: 10.1002/mrm.27707.[19] F. Liu et al., â€œMagnetic resonance parameter mapping using model-guided self-supervised deep learning,â€ Magn. Reson. Med., vol. 85, no. 6, pp. 3211â€“3226, Jun. 2021, doi: 10.1002/mrm.28659.[20] R. Feng et al., â€œMoDL-QSM: Model-based deep learning for quantitative susceptibility mapping,â€ NeuroImage, vol. 240, Oct. 2021, Art. no. 118376, doi: 10.1016/j.neuroimage.2021.118376.[21] Y. Jun et al., â€œDeep model-based magnetic resonance parameter mapping network (DOPAMINE) for fast T1 mapping using variable flip angle method,â€ Med. Image Anal., vol. 70, May 2021, Art. no. 102017, doi: 10.1016/j.media.2021.102017.[22] Z. Ke et al., â€œLearned low-rank priors in dynamic MR imaging,â€ IEEE Trans. Med. Imag., vol. 40, no. 12, pp. 3698â€“3710, Dec. 2021, doi: 10.1109/TMI.2021.3096218.[23] Y. Yang et al., â€œDeep ADMM-net for compressive sensing MRI,â€ in Proc. Adv. Neural Inf. Process. Syst., 2016, vol. 29, pp. 10â€“18.[24] D. Ma et al., â€œMagnetic resonance fingerprinting,â€ Nature, vol. 495, no. 7440, pp. 187â€“192, Mar. 2013, doi: 10.1038/nature11971.[25] L. Leroi et al., â€œSimultaneous multi-parametric mapping of total sodium concentration, T1, T2 and ADC at 7 T using a multi-contrast unbalanced SSFP,â€ Magn. Reson. Imag., vol. 53, pp. 156â€“163, Nov. 2018, doi: 10.1016/j.mri.2018.07.012.[26] A. Sbrizzi et al., â€œFast quantitative MRI as a nonlinear tomography problem,â€ Magn. Reson. Imag., vol. 46, pp. 56â€“63, Feb. 2018, doi: 10.1016/j.mri.2017.10.015.[27] R. G. Spencer and C. Bi, â€œA tutorial introduction to inverse problems in magnetic resonance,â€ NMR Biomed., vol. 33, no. 12, Dec. 2020, Art. no. e4315, doi: 10.1002/nbm.4315.[28] L. El Gueddari et al., â€œCalibration-less multi-coil compressed sensing magnetic resonance image reconstruction based on OSCAR regularization,â€ J. Imag., vol. 7, no. 3, Mar. 2021, Art. no. 58, doi: 10.3390/jimaging7030058.[29] C. Cai et al., â€œSingle-shot mapping through overlapping-echo detachment (OLED) planar imaging,â€ IEEE Trans. Biomed. Eng., vol. 64, no. 10, pp. 2450â€“2461, 2017, doi: 10.1109/TBME.2017.2661840.[30] J. Zhang et al., â€œRobust single-shot T2 mapping via multiple overlapping-echo acquisition and deep neural network,â€ IEEE Trans. Med. Imag., vol. 38, no. 8, pp. 1801â€“1811, Aug. 2019, doi: 10.1109/TMI.2019.2896085.[31] S. Li et al., â€œA simultaneous multi-slice T2 mapping framework based on overlapping-echo detachment planar imaging and deep learning reconstruction,â€ Magn. Reson. Med., vol. 87, no. 5, pp. 2239â€“2253, May 2022, doi: 10.1002/mrm.29128.[32] Q. Yang et al., â€œModel-based syntheTic data-driven learning (MOST-DL): Application in single-shot T2 mapping with severe head motion using overlapping-echo acquisition,â€ IEEE Trans. Med. Imag., vol. 41, no. 11, pp. 3167â€“3181, Nov. 2022, doi: 10.1109/TMI.2022.3179981.[33] B. Kim et al., â€œA deep learning approach for magnetization transfer contrast MR fingerprinting and chemical exchange saturation transfer imaging,â€ NeuroImage, vol. 221, Nov. 2020, Art. no. 117165, doi: 10.1016/j.neuroimage.2020.117165.[34] J. I. Hamilton et al., â€œDeep learning reconstruction for cardiac magnetic resonance fingerprinting T1 and T2 mapping,â€ Magn. Reson. Med., vol. 85, no. 4, pp. 2127â€“2135, Apr. 2021, doi: 10.1002/mrm.28568.[35] G. Daval-FrÃ©rot et al., â€œIterative static field map estimation for off-resonance correction in non-Cartesian susceptibility weighted imaging,â€ Magn. Reson. Med., vol. 88, no. 4, pp. 1592â€“1607, Oct. 2022, doi: 10.1002/mrm.29297.[36] B. Kang et al., â€œUnsupervised learning for magnetization transfer contrast MR fingerprinting: Application to CEST and nuclear Overhauser enhancement imaging,â€ Magn. Reson. Med., vol. 85, no. 4, pp. 2040â€“2054, Apr. 2021, doi: 10.1002/mrm.28573.[37] Y. Chen et al., â€œHigh-resolution 3D MR Fingerprinting using parallel imaging and deep learning,â€ NeuroImage, vol. 206, Feb. 2020, Art. no. 116329, doi: 10.1016/j.neuroimage.2019.116329.[38] V. Edupuganti et al., â€œUncertainty quantification in deep MRI reconstruction,â€ IEEE Trans. Med. Imag., vol. 40, no. 1, pp. 239â€“250, Jan. 2021, doi: 10.1109/TMI.2020.3025065.[39] D. Narnhofer et al., â€œBayesian uncertainty estimation of learned variational MRI reconstruction,â€ IEEE Trans. Med. Imag., vol. 41, no. 2, pp. 279â€“291, Feb. 2022, doi: 10.1109/TMI.2021.3112040.[40] Z. Ramzi et al., â€œDenoising score-matching for uncertainty quantification in inverse problems,â€ in Proc. Workshop Deep Learn. Inverse Probl. (NeurIPS), 2020.[41] F. Glang et al., â€œDeepCEST 3T: Robust MRI parameter determination and uncertainty quantification with neural networks-application to CEST imaging of the human brain at 3T,â€ Magn. Reson. Med., vol. 84, no. 1, pp. 450â€“466, Jul. 2020, doi: 10.1002/mrm.28117.[42] Y. Zhu et al., â€œIntegrated motion correction and dictionary learning for free-breathing myocardial T1 mapping,â€ Magn. Reson. Med., vol. 81, no. 4, pp. 2644â€“2654, Apr. 2019, doi: 10.1002/mrm.27579.[43] F. Knoll et al., â€œA model-based reconstruction for undersampled radial spin-echo DTI with variational penalties on the diffusion tensor,â€ NMR Biomed., vol. 28, no. 3, pp. 353â€“366, Mar. 2015, doi: 10.1002/nbm.3258.[44] X. Peng et al., â€œAccelerated exponential parameterization of T2 relaxation with model-driven low rank and sparsity priors (MORASA),â€ Magn. Reson. Med., vol. 76, no. 6, pp. 1865â€“1878, Jan. 2016, doi: 10.1002/mrm.26083.[45] J. Park et al., â€œDIFFnet: Diffusion parameter mapping network generalized for input diffusion gradient schemes and b-value,â€ IEEE Trans. Med. Imag., vol. 41, no. 2, pp. 491â€“499, Feb. 2022, doi: 10.1109/TMI.2021.3116298.[46] O. Perlman et al., â€œAn end-to-end AI-based framework for automated discovery of rapid CEST/MT MRI acquisition protocols and molecular parameter quantification (AutoCEST),â€ Magn. Reson. Med., vol. 87, no. 6, pp. 2792â€“2810, Jun. 2022, doi: 10.1002/mrm.29173.[47] G. Wang et al., â€œB-spline parameterized joint optimization of reconstruction and K-space trajectories (BJORK) for accelerated 2D MRI,â€ IEEE Trans. Med. Imag., vol. 41, no. 9, pp. 2318â€“2330, Sep. 2022, doi: 10.1109/TMI.2022.3161875.[48] T. Weiss et al., â€œPilot: Physics-informed learned optimized trajectories for accelerated MRI,â€ J. Mach. Learn. Biomed. Imag., vol. 1, pp. 1â€“23, Apr. 2021.[49] Z. Ramzi et al., â€œNC-PDNet: A density-compensated unrolled network for 2D and 3D non-cartesian MRI reconstruction,â€ IEEE Trans. Med. Imag., vol. 41, no. 7, pp. 1625â€“1638, Jul. 2022, doi: 10.1109/TMI.2022.3144619.[50] L. Chen et al., â€œIn vivo imaging of phosphocreatine with artificial neural networks,â€ Nature Commun., vol. 11, no. 1, Feb. 2020, Art. no. 1072, doi: 10.1038/s41467-020-14874-0.[51] O. Perlman et al., â€œQuantitative imaging of apoptosis following oncolytic virotherapy by magnetic resonance fingerprinting aided by deep learning,â€ Nature Biomed. Eng., vol. 6, no. 5, pp. 648â€“657, May 2022, doi: 10.1038/s41551-021-00809-7.Digital Object Identifier 10.1109/MSP.2023.3236483CoverSubmit your paper todayMastheadOpen and Reproducible Science: Desirable or Even Mandatory, But Not So Simple!Reaching Out to Members in the Middle East and IndiaBoston Chapter Receives the 2022 Chapter of the Year Award!2022 IEEE Signal Processing Society AwardsPhysics-Driven Machine Learning for Computational Imaging: Part 2ICIP 2023ICIP 2023 2Physics-Embedded Machine Learning for Electromagnetic Data ImagingPhysics-Guided Terahertz Computational ImagingUnfolding-Aided Bootstrapped Phase Retrieval in Optical ImagingIntegration of Physics-Based and Data-Driven Models for Hyperspectral Image UnmixingDeep Optical Coding Design in Computational ImagingPhysics-/Model-Based and Data-Driven Methods for Low-Dose Computed TomographyHigh-Dimensional MR Spatiospectral Imaging by Integrating Physics-Based ModelingPhysics-Driven Deep Learning Methods for Fast Quantitative Magnetic Resonance ImagingPhysics-Driven Synthetic Data Learning for Biomedical Magnetic ResonanceA Guide to Computational Reproducibility in Signal Processing and Machine LearningUpdate on the CIC Multistage Decimation FilterUnderstanding the Basis of Graph Convolutional Neural NetworksExplainable Machine LearningDates AheadMathWorksArchives