Wenjun Xia, Hongming Shan, Ge Wang, Yi Zhang
©SHUTTERSTOCK.COM/PAPAPIG
Since 2016, deep learning (DL) has advanced tomographic imaging with remarkable successes, especially in low-dose computed tomography (LDCT) imaging. Despite being driven by big data, the LDCT denoising and pure end-to-end reconstruction networks often suffer from the black-box nature and major issues, such as instabilities, which are major barriers to applying DL methods in LDCT applications. An emerging trend is to integrate imaging physics and models into deep networks, enabling a hybridization of physics-/model-based and data-driven elements. In this article, we systematically review the physics-/model-based data-driven methods for LDCT, summarize the loss functions and training strategies, evaluate the performance of different methods, and discuss relevant issues and future directions.
Since the invention of CT in the 1970s, it has become an indispensable imaging modality for screening, diagnosis, and therapeutic planning. Due to the potential damage to healthy tissues, the radiation dose minimization for X-ray CT has been widely studied over the past two decades. In some major clinical tasks, the radiation dose of a single CT scan can be up to 43 mSv [1], which is an order of magnitude higher than the amount of natural background radiation one receives annually. The radiation dose can be reduced by lowering the X-ray flux physically, which is called LDCT. However, LDCT will degrade the signal-to-noise ratio (SNR) and compromise the subsequent image quality.
The conventional tomographic reconstruction algorithm can hardly achieve satisfactory LDCT image quality. To meet the clinical requirements, advanced algorithms are required to suppress the noise and artifacts associated with LDCT. Up to now, promising results have been obtained, improving LDCT quality and diagnostic performance in various clinical scenarios. Generally speaking, LDCT algorithms can be divided into four categories: sinogram domain filtering, image domain postprocessing, model-based iterative reconstruction (MBIR), and DL methods.
Sinogram domain filtering directly performs denoising in the space of projection data. Then, the denoised raw data can be reconstructed into high-quality CT images using analytic algorithms. Depending on the noise distribution, appropriate filters can be designed. Structural adaptive filtering [2] is a representative algorithm in this category that effectively refines the clarity of LDCT images. The main advantage of sinogram domain filtering is that it can suppress noise based on the known distribution. However, any model mismatch or inappropriate operations in the projection domain will introduce global interference, compromising the accuracy and robustness of sinogram domain filtering results.
Image domain postprocessing is more flexible and stable than sinogram domain filtering. Based on appropriate prior assumptions about CT images, such as sparsity, several popular methods were developed [3]. These methods can effectively denoise LDCT images, but their prototypes were often developed for natural image processing. In many aspects, the properties of LDCT are quite different from those of natural images. For example, LDCT image noise does not follow any known distribution, depends on underlying structures, and is difficult to model analytically. The image noise distribution is complex, and so is the image content prior. These are responsible for the limited performance of image domain postprocessing.
MBIR combines the advantages of the two kinds of methods mentioned and works to minimize an energy-based objective function. The energy model usually consists of two parts: the fidelity term with the noise model in the projection domain and the regularization term with the prior model in the image domain. Since the noise model for LDCT in the projection domain is well established, research efforts in developing MBIR are more focused on the prior model. Utilizing the well-known image sparsity for LDCT, a number of methods have been proposed [4], [5], [6]. The MBIR algorithms usually deliver robust performance and achieve clinically satisfactory results after the regularization terms are properly designed along with well-tuned balancing parameters. However, these requirements for an MBIR algorithm may restrict its applicability. Customizing an MBIR algorithm takes extensive experience and skills. Also, MBIR algorithms suffer from expensive computational cost.
Recently, DL was introduced for tomographic imaging. Driven by big data, DL promises to overcome the main shortcoming of conventional algorithms, which demands the explicit design of regularizers and cannot guarantee the optimality and generalizability. The DL methods extract information from a large amount of data to ensure the objectivity and comprehensiveness of the extracted information. By learning the mapping from LDCT scans to normal-dose CT (NDCT) images, a series of studies were performed [7], [8], [9], [10], [11].
These methods can be seen as a combination of image domain postprocessing and data-driven methods. They inherit the advantages of the postprocessing algorithms and DL methods, and have high processing efficiency, excellent performance, and great clinical potential. However, they also have drawbacks. These methods usually use an approximation or pseudo-inversion of the raw data as the input of the network. The initially reconstructed images may miss some structures that cannot be easily restored by the network if the raw data are unavailable. On the other hand, noise and artifacts in filtered back-projection (FBP) reconstructions could be perceived as meaningful structures by a denoising network. Both circumstances will compromise the diagnostic performance, resulting in either false positives or false negatives.
Naturally, synergizing physics-/model-based methods and data-driven methods will lead to the best of both worlds. While deep image denoising only handles reconstructed images, MBIR methods are more robust and safer. In each iteration of MBIR, raw data will be used to rectify intermediate results and improve the data consistency. By introducing the CT physics or MBIR model, researchers can embed the raw data constraint into the network, which avoids information loss in the process of image reconstruction. Over the past years, a number of physics-/model-based data-driven methods for LDCT have been proposed [12], [13], [14]. As shown in Figure 1, these methods address the shortcomings of the physics-/model-based methods and data-driven networks, and they achieve an excellent balance between the improved accuracy with learned parameters and the robustness aided by data fidelity.
Figure 1. The advantages of synergizing physics-/model-based methods and data-driven methods.
In this article, we review these methods. In the “Physics-/Model-Based LDCT Methods” section the problem of LDCT is described, and the conventional modeling and optimization methods are introduced. In the “Physics-/Model-Based Data-Driven Methods” section, different kinds of methods are summarized to incorporate physics/models into a DL framework. In the “Experimental Comparison” section, several experiments are conducted to compare different hybrid methods for LDCT. In the “Discussion” section, we discuss relevant issues. Finally, we conclude the article in the last section.
Assuming that an X-ray tube has an incident flux ${I}_{0}$ that can be measured in an air scan, the number of photons received by a detector I can be formulated as ${I} = {I}_{0}\exp\left({{-}{y}_{l}{\mu}{\text{d}}{l}}\right)$, where ${\mu}$ is the linear attenuation coefficient, and l represents the X-ray path. After a logarithmic transformation, the line integral can be obtained as \[{-}{\log}\frac{I}{{I}_{0}} = \mathop{\int}\nolimits_{l}{\mu}{\text{d}}{l}{.} \tag{1} \]
Such line integrals are typically organized into projections and stored as a sinogram. The line integrals in the form of (1) can be discretized into a linear system ${\boldsymbol{y}} = {A}{\boldsymbol{x}}$, where ${\boldsymbol{x}}\,{\in}\,{\Bbb{R}}^{N}$ denotes the attenuation coefficient distribution to be solved, ${\boldsymbol{y}}\,{\in}\,{\Bbb{R}}^{M}$ represents the projection data, and ${A}\,{\in}\,{\Bbb{R}}^{{M}\,{\times}\,{N}}$ is the system matrix for a prespecified scanning geometry.
The noise in CT data mainly consists of the following two components [15]:
In clinical practice, it is difficult to obtain paired LDCT and NDCT datasets from two separate scans due to uncontrollable organ movement and radiation dose limitations. As a result, numerical simulation is important to produce LDCT data from an NDCT scan.
In [16], a noise simulation method was proposed for LDCT research and applied to generate a public dataset called the 2016 National Institutes of Health (NIH)/American Association of Physicists in Medicine (AAPM)/Mayo Clinic LDCT Grand Challenge. The number of detected X-ray photons can be approximately considered as normally distributed and formulated as \[{\tilde{\boldsymbol{y}}} = {\boldsymbol{y}} + \sqrt{\frac{{1}{-}{a}}{a}\,{\cdot}\,\frac{\exp{(}{\boldsymbol{y}}{)}}{{I}_{0}}\,{\cdot}\,{\left({1} + \frac{{1} + {a}}{a}\cdot\frac{{\sigma}_{e}^{2}\cdot\exp{(}{\boldsymbol{y}}{)}}{{I}_{0}}\right)}}\,{\cdot}\,{\xi}\] where ${\xi}\,{\sim}\,{\cal{N}}{(}{0},{1}{)}$, ${\sigma}_{e}^{2}$ represents the variance of electronic noise, and a denotes the dose factor.
Conventional CT image reconstruction takes both measurement data and prior knowledge into account and is performed by minimizing an energy model iteratively. The general energy model for LDCT reconstruction can be formulated as \[\mathop{\min}\limits_{\boldsymbol{x}}{\Phi}{(}{\boldsymbol{x}}{)} + {\lambda}{R}{(}{\boldsymbol{x}}{)} \tag{2} \] which has two parts: a fidelity term ${\Phi}{(}{\boldsymbol{x}}{)}$ and a regularization term ${R}{(}{\boldsymbol{x}}{)}$, with ${\lambda}$ being the penalty parameter.
The fidelity term is a metric of the reconstruction result measuring the consistence to the measurement data. The weighted least-squares (WLS) function is usually adopted as the fidelity term: ${\Phi}{(}{\boldsymbol{x}}{)} = {1} / {2}\left\Vert{{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}}\right\Vert{}_{{\Sigma}^{{-}{1}}}^{2} = {1} / {2}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}^{T}{\Sigma}^{{-}{1}}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}$, where ${\Sigma}$ is a diagonal matrix with its elements on the main diagonal, with ${\Sigma}_{ii} = {\sigma}_{i}^{2}$ being the estimated variances of the data. Since the ideal flux is unknown, the variance is usually estimated as ${\sigma}_{i}^{2} = {(}{\tilde{I}}_{i} + {\sigma}_{e}^{2}{)} / {\tilde{I}}_{i}^{2}$, where ${\tilde{I}}_{i} = {aI}_{0}\exp{(}{-}{{\tilde{\boldsymbol{y}}}}_{i}{)}$ [6].
Dedicated regularization terms were designed for different types of images, depending on the nature of the images and researchers’ expertise. Over the past years, various regularization terms have been proposed along with the improved understanding of CT image properties. Importantly, the well-known sparsity can be expressed as ${R}{(}{\boldsymbol{x}}{)} = {\left\Vert{W\boldsymbol{x}}\right\Vert}_{1}$, where ${\parallel{\cdot}\parallel}_{1}$ is the ${\ell}_{1}$ norm, and W is a sparsifying transform matrix. Commonly used sparsifying transforms include the gradient transform [total variation (TV)] [4], learned sparsifying transform [5], [6], [17], etc. Subsequently, further leveraging the 2D structure of an image, low rank became popular for LDCT reconstruction [18], [19], [20], [21]. The low-rank constraint can be relaxed to the minimization of the nuclear norm ${R}{(}{\boldsymbol{x}}{)} = {\left\Vert{\boldsymbol{x}}\right\Vert}_{\ast} = {\Sigma}_{i}{\sigma}_{i}{(}{\boldsymbol{x}}{)}$, where ${\sigma}_{i}{(}{\boldsymbol{x}}{)}$ is the i-th-largest singular value of x.
Generally, these models in (2) do not have closed solutions and need to be iteratively optimized. Sometimes, auxiliary and dual variables are introduced to simplify the calculation and facilitate the convergence. The idea of introducing auxiliary variables is in the same spirit of plug and play (PnP) [22], [23], which can decouple the primal problem and inverse conveniently. For example, based on the PnP scheme with the WLS fidelity, (2) can be rewritten as \[\mathop{\min}\limits_{\boldsymbol{x},\boldsymbol{v}}\frac{1}{2}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}^{T}{\Sigma}^{{-}{1}}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)} + {\lambda}{R}{(}{\boldsymbol{v}}{),}{s}{.}{t}{.}{\boldsymbol{x}} = {\boldsymbol{v}}{.} \tag{3} \]
Then, the primal variable x and auxiliary variable v can be alternately optimized. Two representative alternating optimization algorithms are alternating direction method of multipliers (ADMM) and split Bregman, which divide the model into subproblems and solve them accordingly.
The popular approach for LDCT denoising with DL employs convolution layers and activation functions to build a neural network whose input and output are both images. These methods are simple to implement and deliver impressive denoising performance, but they can hardly recover details lost in the input image. On the other hand, the MBIR algorithm is safer. In each iteration of MBIR, it uses the measurement to correct an intermediate result. Constrained by the measurement, the MBIR result respects the data consistency and restores missing structures well in the reconstructed image. Following the idea of the DL-based postprocessing method, it is natural to synergize the physics-/model-based and data-driven methods. Such hybrid methods would not only have data-driven benefits but also have better robustness and interpretability out of the physics-/model-based formulation. Table 1 summarizes these methods. The rest of this section introduces this kind of method.
Table 1. Representative physics-/model-based data-driven methods.
As shown in Figure 2, the physics-based data-driven methods include a differentiable domain transform based on the CT physics between the projection and image domains in the network. The input and output of the network are usually projection data and image data, respectively. In the beginning, the network used a conventional domain transform from the projection domain to the image domain. Then, inspired by the work on learned domain transformation, researchers built networks with fully connected (FC) layers to replace the conventional domain transform and learn the inverse Radon transform directly [24].
Figure 2. The general workflow for the physics-based data-driven LDCT denoising.
The architecture of this kind of network is often featured by two subnetworks: one in the projection domain and the other in the image domain, both of which are usually implemented with convolutional neural networks. The projection data are fed into the first subnetwork, in which the measurement data can be denoised. The denoised projection is then converted into an image using a differentiable conventional transformation. Finally, the image is processed by the second subnetwork to improve the reconstruction quality. Many network architectures widely used for image processing in the literature can be adapted into the subnetworks in both domains. Since the statistical distributions of CT noise in the projection and image domains are quite different, the combination of the denoising processes in the two domains can be complementary, making the denoising process more effective and more stable. The differentiable domain transform allows the information exchange between the two subnetworks. The simplest domain transform is back-projection, which is a differentiable linear transform [25].
A more reasonable yet very efficient transform is FBP [26], [28]. A main advantage of FBP lies in that the projection data can be directly transformed into a suitable numerical range, which is more friendly to the subsequent image domain processing. Another interesting domain transform is the FBP view by view [27]. This transform back-projects the projection data into multichannels in the image space, each of which is the back-projection from one projection view. It decouples the data from multiviews to obtain more information. These domain transforms are limited by the understanding and modeling of CT physics. With DL, it is feasible to learn the involved kernels and perform the domain transform.
The learned transform can use FC layers to learn the physics-based transform from the projection domain to the image space. AUTOMAP is a representative network that maps tomographic data to a reconstructed image through FC layers [29]. However, such an architecture would be unaffordable in most cases of medical images because of the expensive computational and memory costs. As a result, major efforts were made to improve the learned transforms by reducing the computational overhead [24], [30], [31]. Since each pixel traces a sinusoidal curve in the projection domain, [24] proposed summing linearly along the trajectory so that the weights of the FC layers are sparse. In an improved version of this work, the geometry and volume were down-sampled to further reduce the computational cost [30]. Another effective way to reduce the cost is to use shared parameters. In [31], the measurements of different views are processed with shared parameters for the domain transform. In [32], the authors proposed a hierarchical architecture where the shared parameters are gradually localized to the pixel level. Compared with the conventional transform, the learned transform has the potential to achieve better performance.
By incorporating the CT physics into the denoising process and working in both the projection and image domains, image denoising can be effectively performed. However, the issue of generalizability is important for clinical applications. The learned transform has limited generalizability because it can only be applied for a fixed imaging geometry. When the geometry and volume differ from what is assumed in the training setup, the trained network will be inapplicable. In contrast, the traditional transform is more stable and only needs to adjust the corresponding parameters for different geometries and volumes. Therefore, the further development of learned transforms needs to make them more flexible and more generalizable.
Given the generalizability, stability, and interpretability of the MBIR algorithm, it is desirable to combine MBIR and DL for LDCT denoising. DL is effective in solving complicated problems with big data. MBIR-based reconstruction has a fixed fidelity term and needs efforts to find a good regularizer. For model-based data-driven reconstruction, researchers replaced the handcrafted regularization terms with neural networks and produced results often superior to the traditional MBIR counterparts. By embedding a neural network into the MBIR scheme, we can divide the model-based data-driven methods into two categories: denoiser and unrolling.
This approach follows the conventional iterative optimization scheme. In its iterative process, a neural network is introduced with the idea of PnP [23]. Drawing on the framework of PnP, a regularization by denoising was proposed [40]. With this regularization, the CT optimization model can be formulated as \[\mathop{\min}\limits_{\boldsymbol{x}}\frac{1}{2}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}^{T}{\Sigma}^{{-}{1}}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)} + \frac{\lambda}{2}{\boldsymbol{x}}^{T}{(}{\boldsymbol{x}}{-}{D}{(}{\boldsymbol{x}}{)}{)}, \tag{4} \] where ${D}{(}\,{\cdot}\,{)}$ is a denoiser implemented by the neural network. The optimization process can be expressed as \begin{align*}{\boldsymbol{x}}^{{t} + \frac{1}{2}} & = \mathop{\arg\min}\limits_{\boldsymbol{x}}\frac{1}{2}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}^{T}{\Sigma}^{{-}{1}}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)} + \frac{\alpha}{2}\parallel{\boldsymbol{x}}{-}{\boldsymbol{x}}^{t}\parallel{}_{2}^{2}, \\ {\boldsymbol{x}}^{t} & = \mathop{\arg\min}\limits_{\boldsymbol{x}}\frac{\beta}{2}\parallel{\boldsymbol{x}}{-}{\boldsymbol{x}}^{{t} + \frac{1}{2}}\parallel{}_{2}^{2} + \frac{\lambda}{2}{\boldsymbol{x}}^{T}{(}{\boldsymbol{x}}{-}{D}{(}{\boldsymbol{x}}^{{t} + \frac{1}{2}}{))}{.} \tag{5} \end{align*}
The optimization of ${\boldsymbol{x}}^{{t} + \frac{1}{2}}$ can be done using a method for solving the quadratic problem. The solution of ${\boldsymbol{x}}^{t}$ can be obtained either directly from the denoiser: \[{\boldsymbol{x}}^{t} = {D}{(}{\boldsymbol{x}}^{{t} + \frac{1}{2}}{)} \tag{6} \] or as a semidenoised result: \[{\boldsymbol{x}}^{t} = \left({{1}{-}\frac{\lambda}{{\beta} + {\lambda}}}\right){\boldsymbol{x}}^{{t} + \frac{1}{2}} + \frac{\lambda}{{\beta} + {\lambda}}{D}{(}{\boldsymbol{x}}^{{t} + \frac{1}{2}}{)}{.} \tag{7} \]
Of course, there are other solutions and combinations [34], [36].
There are two ways to train the denoiser. The first is to train a general denoiser for all iterations [12], [33], [34]. In this option, the denoiser can be obtained with noisy images and the corresponding labels as training pairs. However, it is difficult for the denoiser to achieve an optimal denoising performance in each iteration. The second option can partially solve this problem by training an iteration-dependent denoiser dynamically [35], [36]. In each iteration, the denoiser will denoise an intermediate image with different parameters to optimize denoising performance. Of course, training model parameters for each iteration will demand a much higher computational cost. Algorithm 1 supports either iteration-independent or iteration-dependent denoisers, where ${\tilde{\boldsymbol{y}}}$ denotes projection data, $\hat{\boldsymbol{x}}$ represents a noise-free image, and the mean square error (MSE) is assumed as the loss function.
Input: Training set ${\left\{{{{\tilde{\boldsymbol{y}}}}_{i},{\hat{\boldsymbol{x}}}_{i}}\right\}}_{{i} = {1}}^{{N}_{s}}$, Denoiser ${D}_{\theta}$
Initialize: ${\boldsymbol{x}}_{i}^{0} = {FBP}{(}{{\tilde{\boldsymbol{y}}}}_{i}{),}{i} = {1},{2},\ldots,{N}_{s}$
Iteration-independent denoiser:
Train ${D}_{\theta}$: ${\theta} = {argmin}_{\theta}\tfrac{1}{{N}_{s}}{\Sigma}_{{i} = {1}}^{{N}_{s}}\parallel{D}_{\theta}{(}{\boldsymbol{x}}_{i}^{0}{)}{-}{\hat{\boldsymbol{x}}}_{i}\parallel{}_{2}^{2}$
for ${t} = {0},{1},\ldots,{N}_{t}{-}{1}$ do
Obtain ${\boldsymbol{x}}^{{t} + {1}}$ from ${D}_{\theta}{(}{\boldsymbol{x}}^{t}{)}$
end for
return ${\boldsymbol{x}}^{{N}_{t}}$
Iteration-dependent denoiser:
Train ${D}_{\theta}{:}$ ${\theta}^{t} = {\arg\min}_{\theta}\tfrac{1}{{N}_{s}}{\Sigma}_{{i} = {1}}^{{N}_{s}}{\parallel{D}_{\theta}{(}{\boldsymbol{x}}_{i}^{t}{)}{-}{\hat{\boldsymbol{x}}}_{i}\parallel}_{2}^{2}$
Obtain ${\boldsymbol{x}}^{{t} + {1}}$ from ${D}_{{\theta}^{t}}{(}{\boldsymbol{x}}^{t}{)}$
Unrolling expands the iterative optimization process into a finite number of stages and maps them to a neural network [13], [14], [37], [38], [39], [41]. As a general objective function, (3) can be extended using the augmented Lagrangian method as follows: \begin{align*} \mathop{\min}\limits_{\boldsymbol{x},\boldsymbol{v},\boldsymbol{u}}{L}{(}{\boldsymbol{x}},{\boldsymbol{v}},{\boldsymbol{u}}{)} = & \frac{1}{2}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)}^{T}{\Sigma}^{{-}{1}}{(}{\tilde{\boldsymbol{y}}}{-}{A}{\boldsymbol{x}}{)} + {\lambda}{R}{(}{\boldsymbol{v}}{)} \\ & + {\boldsymbol{u}}^{T}{(}{\boldsymbol{x}}{-}{\boldsymbol{v}}{)} + \frac{\rho}{2}{\parallel{\boldsymbol{x}}{-}{\boldsymbol{v}}\parallel}_{2}^{2}{.} \tag{8} \end{align*}
A general iterative optimization method ADMM can be formulated as \begin{align*}{\boldsymbol{x}}^{{t} + {1}} & = \mathop{argmin}\limits_{\boldsymbol{x}}{L}{(}{\boldsymbol{x}},{\boldsymbol{v}}^{t},{\boldsymbol{u}}^{t}{),}_{\mathop{}\limits_{}} \\ {\boldsymbol{v}}^{{t} + {1}} & = \mathop{argmin}\limits_{\boldsymbol{v}}{L}{(}{\boldsymbol{x}}^{{t} + {1}},{\boldsymbol{v}},{\boldsymbol{u}}^{t}{),} \\ {\boldsymbol{u}}^{{t} + {1}} & = {\boldsymbol{u}}^{t} + {\rho}{(}{\boldsymbol{x}}^{{t} + {1}}{-}{\boldsymbol{v}}^{{t} + {1}}{)}{.} \tag{9} \end{align*}
Each of these variables can be optimized using a corresponding algorithm. In an unrolling-based data-driven method, each optimization problem can be solved using a subnetwork. When the total number of iterations is fixed, (9) can be realized as a neural network: \begin{align*}{\boldsymbol{x}}^{{t} + {1}} & = {\cal{F}}{(}{\boldsymbol{x}}^{t},{\boldsymbol{v}}^{t},{\boldsymbol{u}}^{t}{;}{\theta}^{t}{),} \\ {\boldsymbol{v}}^{{t} + {1}} & = {\cal{G}}{(}{\boldsymbol{x}}^{{t} + {1}},{\boldsymbol{v}}^{t},{\boldsymbol{u}}^{t}{;}{\theta}^{t}{),} \\ {\boldsymbol{u}}^{{t} + {1}} & = {\cal{H}}{(}{\boldsymbol{x}}^{{t} + {1}},{\boldsymbol{v}}^{{t} + {1}},{\boldsymbol{u}}^{t}{;}{\theta}^{t}{)} \tag{10} \end{align*} where ${\cal{F}}$, ${\cal{G}}$, and ${\cal{H}}$ denote the three subnetworks, respectively. Figure 3 shows a top level of the workflow.
Figure 3. The workflow of an unrolled data-driven reconstruction process.
With different energy models and optimization algorithms, various network architectures were developed for unrolled data-driven image reconstruction. In [13], the simplest gradient descent algorithm was unrolled into a neural network. In [14], the primal–dual hybrid gradient algorithm was designed as a generalized unrolling technique. In [39], the momentum method commonly used for traditional optimization was adapted into a reconstruction network for better performance with a limited number of iterations. However, it is difficult to directly judge the performance of the network based on the performance of the unrolled optimization scheme, and it remains an important topic as to how to unroll an optimization scheme and train it optimally.
The denoiser-based method is based on the traditional iterative algorithm, where the training and optimization are separated. On the other hand, the unrolling-based method is an end-to-end procedure, where the optimization is incorporated into the training. Figure 4 shows the forward and backward processes for denoiser-based and unrolling-based methods, where the green and red arrows represent the forward and backward directions, respectively. As shown in Figure 4, the forward data streams for the two methods are similar, but the backward data stream is end to end for the unrolling-based method; i.e., the complete backward data stream is a back-projection of error signals from the output to the input. While the denoiser-based method adopts a separate training strategy, the unrolling-based method can be trained in a unified fashion, where all parameters, including the regularization parameters, can be obtained from training.
Figure 4. The forward and backward processes for (a) denoiser-based and (b) unrolling-based methods, respectively.
However, it is unavoidable that the model requires larger memory and, thus, limits the number of iterations for unrolling. In many cases, the model performance is closely related to the number of iterations. In contrast, the denoiser-based method allows for more iterations, and the trained denoiser can be embedded in different optimization schemes, making the denoiser-based method more flexible. Nevertheless, with the denoiser-based method, it is still necessary to set the parameters manually, which has a significant impact on the performance. Hence, it is more important for the denoiser-based method to appropriately set parameters and be coupled with a method for adaptive parameter adjustment.
The commonly used loss functions for LDCT imaging are MSE and mean absolute error (MAE). To better remove noise and artifacts, TV regularization, which performs well in compressed sensing methods for image denoising, is often used as an auxiliary loss [42]. In [8], the discriminator was used to make the denoised image have the same data distribution as that of clinical images. Additionally, a model pretrained for the classification task was used to extract features, and the perceptual loss was computed in the feature space. The adversarial loss and perceptual loss can improve the visual performance and suppress the oversmoothness. However, the adversarial loss for generative adversarial networks may introduce erroneous structures [43]. Similarly, the perceptual loss could generate checkerboard artifacts [44] when the constraint is imposed on the feature space down-sampled with maxpooling.
In [42], the structural similarity index metric (SSIM) was introduced to promote structures closer to the ground truth. Similarly, to protect edgeness in denoised images, the Sobel operator was applied to extract edges and keep the edge coherence [10]. The identity loss is also relevant for image denoising tasks, which means that, if a noise-free image is fed to the network, then it should be dormant; i.e. the network output should be close to the clean input [9]. To maintain the measurement consistency, the result of the network needs to be transformed into the projection domain to compute the MSE or MAE loss [42]. Table 2 summarizes these commonly used loss functions.
Table 2. Representative loss functions.
In this section, we report our comparative study on the performance of some popular physics-/model-based data-driven methods and different loss functions. This evaluation was performed with a unified code framework to ensure fairness as much as possible. All codes have been succinctly documented to help readers understand the models. (Codes are released at https://github.com/deep-imaging-group/physics-model-data-driven-review; related datasets and checkpoints can also be found on that page.)
For simplicity and fairness, the MSE loss function and AdamW optimizer were employed for all of the methods when evaluating the models. And learned primal-dual (LPD) was adopted as the backbone for the evaluation of different loss functions. Training was performed in a naive way, without any trick. For a fair comparison, all of the models were trained within 200 epochs, which is sufficient for convergence of all of the methods. The penalty/regularization parameters of the models were carefully tuned in our experiments to guarantee the optimal performance of each model on the relevant dataset. After training, the optimal model for validation was taken as the final model and used for testing. Of course, there are many factors that affect the performance of a neural network. Therefore, the results in this article are for reference only and may not perfectly reflect the performance of these methods.
The dataset used for our experiments is the public LDCT data from the 2016 NIH/AAPM/Mayo Clinic LDCT Grand Challenge. The dataset contains 2,378 slices of 3-mm full-dose CT images from 10 patients. In this study, 600 images from seven patients were randomly chosen as the training set, 100 images from one patient were used as the validation set, and 200 images from the remaining two patients were the testing set. The projection data were simulated with the distance-driven method. The geometry and volume were set according to the scanning parameters associated with the dataset. The noise simulation was done using the algorithm in [16]. The incident photon number for NDCT is the same as that provided in the dataset. The incident photon number of LDCT was set to 20% of that for NDCT. The variance of the electronic noise was assumed to be 8.2 according to the recommendation in [16].
The training process for model evaluation was to minimize the MSE loss function with different kinds of the methods. The commonly used peak SNR (PSNR) and SSIM metrics were adopted to quantify the performance of different denoising methods. To evaluate the visual effect of the results, we introduced the Frechet inception distance (FID) score [45]. A smaller FID score means a visual impression closer to the ground truth. Figure 5(a) shows the means of the PSNR, SSIM, and FID scores on the whole testing set. In Figure 5(b), the 2D positions of the different methods are specified by the horizontal and vertical coordinates representing the PSNR and SSIM of the results, respectively, and the radii of the circles indicate the FID values of different methods. It can be seen that the unrolling-based methods have a more robust performance. FistaNet and LPD are in favorable spots.
Figure 5. The quantitative results obtained using different methods on the whole testing set.
The denoiser-based methods also have outstanding performance, especially MomentumNet, based on an iteration-dependent denoiser. The comparison between MomentumNet and CNNRPGD shows that an iteration-dependent denoiser has clearly better performance. However, the training of an iteration-dependent denoiser is more complicated and time-consuming. The training time of MomentumNet for 200 epochs is more than five days, which is much longer than that needed by CNNRPGD. Additionally, the denoiser-based methods need manually setting of the regularization parameters, which often has a greater impact on the performance than the network architecture and requires a major fine-tuning effort.
HDNet delivers the best performance among the physics-based methods, which proves that the simple FBP transform is effective for dual-domain-based reconstruction. For the learned transform-based method, since the FC layers are of a large scale, the training process is relatively difficult, compromising the stability of reconstruction results.
Table 3 shows the computational time of the compared methods. It can be seen that most methods can complete the reconstruction in a short time, which is beneficial for clinical applications. Given a large number of iterations, the computational times of the denoiser-based iterative methods are much greater than those of other methods.
Table 3. The computational costs of the compared methods.
Unlike the unrolling-based methods, which are end-to-end networks, the denoiser-based methods are implemented in the iterative framework. Therefore, it is important to study their convergence properties. In [12] and [35], the authors proved that the denoiser-based iterative methods can converge. Furthermore, even if the denoiser is applied to other iterative optimization schemes with a good convergence property, they should converge similarly, which demands a more rigorous justification in the future.
To evaluate the effects of loss functions, we have combined the loss functions in various ways and applied each representative combination to a unified LPD model. Table 4 shows these combinations of the loss functions. The corresponding reconstructions of an abdominal slice are shown in Figure 6. Note that the weights of combinations have been fine-tuned experimentally for optimal visibility. In Figure 6, the LPDs trained with different loss functions can all keep the key information on the metastases indicated by the red arrows. The area indicated by the blue arrows is enlarged for better visualization. Based on the same network architecture, while the restored information of different results is basically the same, the main difference among them can still be visually appreciated. The MSE and MAE have an evident oversmoothing effect. The adversarial and perceptual losses can effectively improve the visual impression, giving reconstructed textures similar to the ground truth. With the help of the adversarial and TV losses, the network can achieve satisfactory results via unsupervised learning.
Figure 6. The results obtained using LPD with different combinations of loss functions: (a) a normal dose, (b) a low dose, and (c)–(l) reconstructions with the combinations of loss functions shown in Table 4. The display window is [–160, 240] HU.
Table 4. The loss functions used for experimental comparison.
Physics-/model-based data-driven methods have received increasing attention in the tomographic imaging field because they incorporate the CT physics or models into the neural networks synergistically, resulting in superior imaging performance. With rapid development over the past years, researchers have proposed a number of models based on physics/models from different angles. Although they are promising, these models still need further improvements. We believe that the following issues are worth further investigation. The first issue is the generalizability of learned transform-based data-driven methods. Training the networks separately for each imaging geometry is an unaffordable cost in clinical applications. Therefore, a major problem with these methods is making a trained model applicable to multiple geometries and volumes. Interpolation can help match the sizes of the input data required by a reconstruction network. Furthermore, a DL method can be a good solution for converting projection data from a source geometry to a target geometry. The second topic is the parametric setting for the denoiser-based data-driven methods. Currently, this kind of method requires a handcrafted setting, which limits its generalization to different datasets. The introduction of adaptive parameters or learned parameters is worthy of attention. Reinforcement learning could be another option to automatically select hyperparameters. These are our specific interests for physics-/model-based data-driven methods for LDCT. From a larger perspective, the tomographic imaging field has other open topics and challenges that are also closely related to LDCT.
The transformer is an emerging technology of DL. It has shown great potential in various areas [46]. In the denoising task, a transformer directs attention to various important features, resulting in adaptive denoising based on image content and features. Coupled with the transformer, physics-/model-based data-driven methods will have more design routes. It is predicted that transformers will further improve the performance of physics-/model-based data-driven methods.
Paired training data have always been a conundrum plaguing data-driven tomography. The mainstream methods are now unsupervised [9] and self-supervised learning [47], which do not require paired/labeled data. Self-supervised training treats the input as the target in appropriate ways to calculate losses and performs denoising according to the statistical characteristics of the underlying data. Clearly, a combination of self-supervised training and physics-/model-based data-driven methods can help us meet the challenge of LDCT in clinical applications.
Tomographic imaging is always a service for diagnosis and intervention. Thus, reconstructed images are often processed or analyzed before being clinically useful. To optimize the whole workflow, we can take the downstream image analysis tasks into account to improve the performance of the reconstruction network in a task-specific fashion. The physics-/model-based task-/data-driven method can be designed with shared feature layers linked to task loss functions. A deep tomographic imaging network incorporated with a task-driven technique can reconstruct results that are more suitable for the intended task in terms of the diagnostic performance.
DL-based tomographic imaging may suffer from a domain heterogeneity problem from different distributions of training data, which originate from different scanners, populations, tasks, settings, and so on [48]. Existing tomographic imaging methods could generalize poorly on datasets in shifted domains, especially unseen ones. Domain generalization is to learn a model from one or several different but related domains, which has attracted increasing attention [49]. This is a promising direction to address the data domain heterogeneity and advance the clinical translation of deep tomographic imaging methods.
At present, the main means to evaluate reconstructed image quality is still mostly the popular quantitative metrics. However, in many cases, the classic quantitative evaluation is not consistent with the visual effects and clinical utilities. Especially, the way to evaluate medical images is very different from that for natural images. Therefore, it is currently an open problem to have a set of metrics suitable to evaluate the diagnostic performance of tomographic imaging. For natural image processing, there are neural networks reported for image quality assessment (IQA) [50] that suggest new solutions for medical image quality evaluation. Ideally, DL-based IQA should not only judge the reconstruction quality and diagnostic performance but also help tomographic imaging in the form of loss functions. It is expected that more DL-based IQA methods will be developed for medical imaging and eventually can perform advanced numerical observer studies as well as human reader studies.
In this article, we have systematically reviewed the physics-/model-based data-driven methods for LDCT. In important clinical applications of LDCT imaging, DL-based methods bring major gains in image quality and diagnostic performance and are undoubtedly becoming the mainstream of LDCT imaging research and translation. In the next few years, our efforts would cover dataset enrichment, network adaption, and clinical evaluation as well as methodological innovation and theoretical investigation. From a larger perspective, DL-based tomographic imaging is only in its infancy. It offers many problems to solve for numerous health-care benefits and opens a new era of artificial intelligence-empowered medicine.
This work was supported in part by the Sichuan Science and Technology Program under Grant 2021JDJQ0024, in part by the Sichuan University “From 0 to 1” Innovative Research Program under Grant 2022SCUH0016, in part by the National Natural Science Foundation of China under Grant 62101136, in part by the Shanghai Sailing Program under Grant 21YF1402800, in part by the Shanghai Municipal of Science and Technology Project under Grant 20JC1419500, in part by the Shanghai Center for Brain Science and Brain-Inspired Technology, in part by the National Institute of Biomedical Imaging and Bioengineering of the NIH under Grant R01EB026646, and in part by the National Institute of General Medical Sciences of the NIH under Grant R42GM142394.
This work involved human subjects or animals in its research. The authors confirm that all human/animal subject research procedures and protocols are exempt from review board approval. The corresponding author is Yi Zhang.
Wenjun Xia (xwj90620@gmail.com) received his B.S. and Ph.D. degrees from Sichuan University in 2012 and 2022, respectively. He is currently a postdoctoral research associate with the Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York, USA. His research interests include medical imaging, compressive sensing, and deep learning.
Hongming Shan (hmshan@fudan.edu.cn) received his Ph.D. degree in machine learning from Fudan University in 2017. From 2017 to 2020, he was a postdoctoral research associate and research scientist at Rensselaer Polytechnic Institute, USA. He is currently an associate professor with the Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, and also a “Qiusuo” Research Leader with the Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 200433, China. He was recognized with a Youth Outstanding Paper Award at the World Artificial Intelligence Conference 2021. His research interests include developing machine learning algorithms for biomedical imaging and health care. He is a Senior Member of IEEE.
Ge Wang (wangg6@rpi.edu) is the Clark & Crossan Chair Professor and director of the Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, New York 12180, USA. He published the first spiral cone-beam computed tomography algorithm in the early 1990s; there are ∼200 million CT scans yearly, with a majority in the spiral cone-beam mode. He also published the first perspective on deep imaging in 2016 and many follow-up papers. His recent honors including the IEEE Region 1 Outstanding Teaching Award, Engineering in Medicine and Biology Society Career Achievement Award, International Society for Optical Engineers (SPIE) Meinel Technology Award, and Sigma Xi Chubb Award for Innovation. He is Fellow of IEEE, SPIE, American Association of Physicists in Medicine, Optical Society of America, American Institute for Medical and Biological Engineering, American Association for the Advancement of Science, and NAI. His research interests include medical imaging and artificial intelligence.
Yi Zhang (yzhang@scu.edu.cn) received his Ph.D. degree in computer science and technology from the College of Computer Science, Sichuan University, Chengdu, China, in 2012. From 2014 to 2015, he was with the Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, New York, USA, as a postdoctoral researcher. He is currently a full professor with the School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China, and is the director of the deep imaging group. He has authored more than 80 papers in the field of medical imaging. He is an associate editor of IEEE Transactions on Medical Imaging and IEEE Access. His research interests include deep learning-based medical imaging. He is a Senior Member of IEEE.
[1] R. Smith-Bindman et al., “Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer,” Arch. Internal Med., vol. 169, no. 22, pp. 2078–2086, Dec. 2009, doi: 10.1001/archinternmed.2009.427.
[2] M. Balda, J. Hornegger, and B. Heismann, “Ray contribution masks for structure adaptive sinogram filtering,” IEEE Trans. Med. Imag., vol. 31, no. 6, pp. 1228–1239, Jun. 2012, doi: 10.1109/TMI.2012.2187213.
[3] Y. Chen et al., “Improving abdomen tumor low-dose CT images using a fast dictionary learning based processing,” Phys. Med. Biol., vol. 58, no. 16, pp. 5803–5820, Aug. 2013, doi: 10.1088/0031-9155/58/16/5803.
[4] G. Yu et al., “Total variation based iterative image reconstruction,” in Proc. Int. Workshop Comput. Vis. Biomed. Image Appl., Springer-Verlag, 2005, pp. 526–534, doi: 10.1007/11569541_53.
[5] B. Wen, S. Ravishankar, and Y. Bresler, “Structured overcomplete sparsifying transform learning with convergence guarantees and applications,” Int. J. Comput. Vis., vol. 114, nos. 2–3, pp. 137–167, Sep. 2015, doi: 10.1007/s11263-014-0761-1.
[6] I. Y. Chun, X. Zheng, Y. Long, and J. A. Fessler, “Sparse-view X-ray CT reconstruction using ℓ1 regularization with learned sparsifying transform,” in Proc. 14th Int. Meeting Fully 3-D Image Reconstruction Radiol. Nucl. Med., SPIE, 2017, pp. 115–119, doi: 10.12059/Fully3D.2017-11-310900.
[7] H. Chen et al., “Low-dose CT with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2524–2535, Dec. 2017, doi: 10.1109/TMI.2017.2715284.
[8] Q. Yang et al., “Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1348–1357, Jun. 2018, doi: 10.1109/TMI.2018.2827462.
[9] E. Kang et al., “Cycle-consistent adversarial denoising network for multiphase coronary CT angiography,” Med. Phys., vol. 46, no. 2, pp. 550–562, Feb. 2019, doi: 10.1002/mp.13284.
[10] H. Shan et al., “Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction,” Nature Mach. Intell., vol. 1, no. 6, pp. 269–276, Jun. 2019, doi: 10.1038/s42256-019-0057-9.
[11] T. Nishii et al., “Deep learning-based post hoc CT denoising for myocardial delayed enhancement,” Radiology, p. 220,189, Jun. 2022, doi: 10.1148/radiol.220189.
[12] H. Gupta et al., “CNN-based projected gradient descent for consistent CT image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1440–1453, Jun. 2018, doi: 10.1109/TMI.2018.2832656.
[13] H. Chen et al., “LEARN: Learned experts’ assessment-based reconstruction network for sparse-data CT,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1333–1347, Jun. 2018, doi: 10.1109/TMI.2018.2805692.
[14] J. Adler and O. Öktem, “Learned primal-dual reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1322–1332, Jun. 2018, doi: 10.1109/TMI.2018.2799231.
[15] M. Diwakar and M. Kumar, “A review on CT image noise and its denoising,” Biomed. Signal Process. Control, vol. 42, pp. 73–88, Apr. 2018, doi: 10.1016/j.bspc.2018.01.010.
[16] L. Yu et al., “Development and validation of a practical lower-dose-simulation tool for optimizing computed tomography scan protocols,” J. Comput. Assisted Tomography, vol. 36, no. 4, pp. 477–487, Jul./Aug. 2012, doi: 10.1097/RCT.0b013e318258e891.
[17] X. Zheng, S. Ravishankar, Y. Long, and J. A. Fessler, “PWLS-ULTRA: An efficient clustering and learning-based approach for low-dose 3D CT image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1498–1510, Jun. 2018, doi: 10.1109/TMI.2018.2832007.
[18] J.-F. Cai et al., “Cine cone beam CT reconstruction using low-rank matrix factorization: Algorithm and a proof-of-principle study,” IEEE Trans. Med. Imag., vol. 33, no. 8, pp. 1581–1591, Aug. 2014, doi: 10.1109/TMI.2014.2319055.
[19] K. Kim et al., “Sparse-view spectral CT reconstruction using spectral patch-based low-rank penalty,” IEEE Trans. Med. Imag., vol. 34, no. 3, pp. 748–760, Mar. 2014, doi: 10.1109/TMI.2014.2380993.
[20] W. Xia et al., “Spectral CT reconstruction—ASSIST: Aided by self-similarity in image-spectral tensors,” IEEE Trans. Comput. Imag., vol. 5, no. 3, pp. 420–436, Sep. 2019, doi: 10.1109/TCI.2019.2904207.
[21] X. Chen et al., “FONT-SIR: Fourth-order nonlocal tensor decomposition model for spectral CT image reconstruction,” IEEE Trans. Med. Imag., vol. 41, no. 8, pp. 2144–2156, Aug. 2022, doi: 10.1109/TMI.2022.3156270.
[22] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Inf. Process., 2013, pp. 945–948, doi: 10.1109/GlobalSIP.2013.6737048.
[23] A. Rond, R. Giryes, and M. Elad, “Poisson inverse problems by the plug-and-play scheme,” J. Vis. Commun. Image Representation, vol. 41, pp. 96–108, Nov. 2016, doi: 10.1016/j.jvcir.2016.09.009.
[24] J. He, Y. Wang, and J. Ma, “Radon inversion via deep learning,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2076–2087, Jun. 2020, doi: 10.1109/TMI.2020.2964266.
[25] T. Würfl et al., “Deep learning computed tomography: Learning projection-domain weights from image domain in limited angle problems,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1454–1463, Jun. 2018, doi: 10.1109/TMI.2018.2833499.
[26] D. Hu et al., “Hybrid-domain neural network processing for sparse-view CT reconstruction,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 88–98, Jan. 2020, doi: 10.1109/TRPMS.2020.3011413.
[27] X. Tao et al., “Learning to reconstruct CT images from the VVBP-tensor,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3030–3041, Nov. 2021, doi: 10.1109/TMI.2021.3090257.
[28] Y. Zhang et al., “CLEAR: Comprehensive learning enabled adversarial reconstruction for subtle structure enhanced low-dose CT imaging,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3089–3101, Nov. 2021, doi: 10.1109/TMI.2021.3097808.
[29] B. Zhu et al., “Image reconstruction by domain-transform manifold learning,” Nature, vol. 555, no. 7697, pp. 487–492, Mar. 2018, doi: 10.1038/nature25988.
[30] J. He et al., “Downsampled imaging geometric modeling for accurate CT reconstruction via deep learning,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 2976–2985, Nov. 2021, doi: 10.1109/TMI.2021.3074783.
[31] Y. Li et al., “Learning to reconstruct computed tomography images directly from sinogram data under a variety of data acquisition conditions,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2469–2481, Oct. 2019, doi: 10.1109/TMI.2019.2910760.
[32] L. Fu and B. De Man, “A hierarchical approach to deep learning and its application to tomographic reconstruction,” in Proc. 15th Int. Meeting Fully 3-D Image Reconstruction Radiol. Nucl. Med., SPIE, 2019, vol. 11072, p. 1,107,202, doi: 10.1117/12.2534615.
[33] D. Wu, K. Kim, G. El Fakhri, and Q. Li, “Iterative low-dose CT reconstruction with priors trained by artificial neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2479–2486, Dec. 2017, doi: 10.1109/TMI.2017.2753138.
[34] F. Zhang et al., “REDAEP: Robust and enhanced denoising autoencoding prior for sparse-view CT reconstruction,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 108–119, Jan. 2020, doi: 10.1109/TRPMS.2020.2989634.
[35] I. Y. Chun et al., “Momentum-Net: Fast and convergent iterative neural network for inverse problems,” IEEE Trans. Pattern Anal. Mach. Intell., early access, 2020, doi: 10.1109/TPAMI.2020.3012955.
[36] S. Ye et al., “Unified supervised-unsupervised (super) learning for X-ray CT image reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 2986–3001, Nov. 2021, doi: 10.1109/TMI.2021.3095310.
[37] Q. Ding, Y. Nan, H. Gao, and H. Ji, “Deep learning with adaptive hyper-parameters for low-dose CT image reconstruction,” IEEE Trans. Comput. Imag., vol. 7, no. 11, pp. 648–660, Nov. 2021, doi: 10.1109/TCI.2021.3093003.
[38] H. Zhang et al., “MetaInv-Net: Meta inversion network for sparse view CT image reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 2, pp. 621–634, Feb. 2020, doi: 10.1109/TMI.2020.3033541.
[39] J. Xiang, Y. Dong, and Y. Yang, “Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging,” IEEE Trans. Med. Imag., vol. 40, no. 5, pp. 1329–1339, May 2021, doi: 10.1109/TMI.2021.3054167.
[40] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, Oct. 2017, doi: 10.1137/16M1102884.
[41] D. Gilton, G. Ongie, and R. Willett, “Neumann networks for linear inverse problems in imaging,” IEEE Trans. Comput. Imag., vol. 6, pp. 328–343, 2020, doi: 10.1109/TCI.2019.2948732.
[42] M. O. Unal, M. Ertas, and I. Yildirim, “An unsupervised reconstruction method for low-dose CT using deep generative regularization prior,” Biomed. Signal Process. Control, vol. 75, p. 103,598, May 2022, doi: 10.1016/j.bspc.2022.103598.
[43] C. Ledig et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 105–114, doi: 10.1109/CVPR.2017.19.
[44] Y. Sugawara, S. Shiota, and H. Kiya, “Super-resolution using convolutional neural networks without any checkerboard artifacts,” in Proc. 25th IEEE Int. Conf. Image Process. (ICIP), 2018, pp. 66–70, doi: 10.1109/ICIP.2018.8451141.
[45] M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neur. Inf. Process. Syst., 2017, vol. 30, pp. 6626–6637.
[46] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10,012–10,022.
[47] K. Kim and J. C. Ye, “Noise2Score: Tweedie’s approach to self-supervised image denoising without clean images,” in Proc. Adv. Neur. Inf. Process. Syst., 2021, vol. 34, pp. 1–11.
[48] W. Xia et al., “CT reconstruction with PDF: Parameter-dependent framework for data from multiple geometries and dose levels,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3065–3076, Nov. 2021, doi: 10.1109/TMI.2021.3085839.
[49] J. Wang et al., “Generalizing to unseen domains: A survey on domain generalization,” IEEE Trans. Knowl. Data Eng., early access, 2022, doi: 10.1109/TKDE.2022.3178128.
[50] Z. Wang et al., “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004, doi: 10.1109/TIP.2003.819861.
Digital Object Identifier 10.1109/MSP.2022.3204407