RED-PSM: Regularization by Denoising of Factorized Low Rank Models for Dynamic Imaging

Dynamic imaging addresses the recovery of a time-varying 2D or 3D object at each time instant using its undersampled measurements. In particular, in the case of dynamic tomography, only a single projection at a single view angle may be available at a time, making the problem severely ill-posed. We propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem. The first, are non-parametric factorized low rank models, also known as partially separable models (PSMs), which have been used to efficiently introduce a low-rank prior for the spatio-temporal object. The second is the recent Regularization by Denoising (RED), which provides a flexible framework to exploit the impressive performance of state-of-the-art image denoising algorithms, for various inverse problems. We propose a partially separable objective with RED and a computationally efficient and scalable optimization scheme with variable splitting and ADMM. Theoretical analysis proves the convergence of our objective to a value corresponding to a stationary point satisfying the first-order optimality conditions. Convergence is accelerated by a particular projection-domain-based initialization. We demonstrate the performance and computational improvements of our proposed RED-PSM with a learned image denoiser by comparing it to a recent deep-prior-based method known as TD-DIP. Although the main focus is on dynamic tomography, we also show performance advantages of RED-PSM in a cardiac dynamic MRI setting.


I. INTRODUCTION
T IME-VARYING or dynamic tomography involves the reconstruction of a dynamic object using its projections acquired sequentially in time.The problem arises in micro tomography [1], myocardial perfusion imaging [2], thoracic CT [3], imaging of fluid flow processes [4], [5] and dynamic imaging of material samples undergoing compression [6], [7].Also, it is closely related to the dynamic MRI (dMRI) problem, which typically arises in cardiac imaging [8].
Dynamic tomography is a challenging ill-posed inverse problem: since the measurements are inconsistent due to the evolving object, traditional reconstruction algorithms lead to significant artifacts.As discussed further below, although numerous methods have been proposed to address this problem, they suffer from various limitations, or provide less than satisfactory reconstruction quality, especially in the scenario This research was supported in part by Los Alamos National Labs under Subcontract No. 599416/CW13995. of main interest in this paper, when at each time instant only one projection is available.

A. Proposed Approach
We propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem.The first technique, are non-parametric factorized low-rank object models, popularized under the name partially separable models (PSMs) [9].PSM was combined with low-rank matrix recovery in [10]- [12], and used since then to represent or motivate a low-rank object prior in tens of works on dynamic imaging.The second technique is the recent Regularization by Denoising (RED) [13], which provides a flexible framework to exploit the impressive performance of state-of-the-art image denoising algorithms, for various inverse problems.
We propose a non-convex partially separable objective with RED for learning-regularized low-rank spatio-temporal object recovery and a computationally efficient and scalable optimization scheme with bi-convex ADMM.
Theoretical analysis proves the convergence of our objective to a value corresponding to a stationary point satisfying the first-order optimality conditions.Convergence is accelerated by a particular projection-domain-based initialization.
We demonstrate the performance and computational improvements of our proposed RED-PSM with a learned image denoiser by comparing it to a recent deep-prior-based method known as TD-DIP [14], and to spatial and spatiotemporal total variation (TV) regularized versions of PSM with a bi-convex objective.
Although the main focus is on dynamic tomography, we also show the performance advantages of RED-PSM in a cardiac dynamic MRI setting.
RED was originally inspired by the Plug-and-Play (PnP) approach [15], which was the first to exploit powerful denoisers to represent the prior on the object in inverse problems but without explicitly defining the regularizer.Instead, in PnP the denoiser is incorporated by replacing a proximal mapping.Both PnP and RED have shown strong empirical performance in various applications.A recent survey [16] provides a detailed discussion of the main results and different applications of PnP and the relation to RED.For the algorithmic variations PnP-ADMM and PnP-ISTA there are convergence results to unique fixed points.Moreover, for PnP-ISTA, similar to our result, convergence is shown to a stationary point of a possibly non-convex objective when the denoiser is an MMSE estimator [17].This result requires the prior to be non-degenerate, i.e., not lie on a lower dimensional manifold, which, unfortunately, would be violated by the low-rank bilinear representation in our PSM.We believe that a PnP-based method similar to RED-PSM can be formulated for our problem, but a proof of convergence to a stationary point may not directly follow, and a different approach and analysis may be needed.
Our motivation for using the original RED formulation in RED-PSM rather than more recent formulations and analyses (e.g., as in [18] and the references therein) are its simple gradient expression, and explicit regularizer expression facilitating implementation and theoretical analysis.The objective is to demonstrate the large improvement in performance that this simple learned regularizer can bring to the PSM scheme, and the improvement over much more complex and slower stateof-the-art methods.Recent variations on RED such as in [18], and PnP methods (as covered in [16]) have improved over the original methods in reconstruction quality in various static imaging problems and/or in theoretical guarantees.However, theoretical guarantees for the case of a non-convex data fidelity term have been provided recently for only one such nonconvex scenario -phase-retrieval [19].As this is the first work to extend RED to spatio-temporal imaging with a PSM, and to provide a convergence guarantee for the method in the nonconvex bilinear scenario, we chose to limit our attention in this paper to the original RED version [13], [20].

B. Previous Work
We limit our discussion to dynamic CT and dMRI, although similar or analogous methods have been or can be applied in other dynamic imaging modalities.The same can be said of the method proposed in this paper.
1) Dynamic MRI (dMRI): In spite of major advances in hardware and signal acquisition methods, MRI has remained a relatively slow modality, posing significant challenges for dynamic imaging.This, coupled with the high flexibility in the acquisition strategy, has motivated extensive work on dMRI, with hundreds of papers on the subject.We highlight some key developments in the algorithm-based dMRI techniques.
Fundamentally, algorithmic approaches to dMRI are based on identifying and exploiting redundancy in the dynamic object, which enable its recovery from an incomplete, and usually sparse set of samples acquired in the joint k-space and time domain -the k-t-space.Many of the proposed methods have extensions to parallel imaging, which introduces additional such redundancy by additional hardware.
A large set of approaches rely on modeling the support of the object in the domain dual to the k-t-domain -the (xf ) domain (the space -temporal frequency domain).These approaches may be classified into three groups.The first involves generic modeling of the (x-f ) support as in reduced field-of-view (FOV) techniques [21]- [23].The second involves sparse and unknown support modeling in a transform domain, and sampling and recovery using the methods of compressed sensing [24]- [27].The third involves adaptive sparse support modeling [28]- [31], where the support information is adapted to the dynamics of the imaged slice, and the theory of timesequential sampling [32], [33] is used to design an optimal k-t sampling and reconstruction scheme.Other related approaches model prior information such as motion periodicity and related harmonic structure [34], [35], or the object's spatial and temporal correlation functions [36].
Since its introduction, the partially-separable model (PSM) [9], with a non-parametric factorized low-rank form separating the spatial and temporal structure, has been used extensively in many works on dMRI to represent the underlying spatiotemporal object.A group of methods recover the PSM in two steps: (a) estimate the temporal subspace using data (e.g., a navigator signal) sampled at low spatial, but high temporal resolution; and (b) estimate the spatial subspace using the entire acquired data set.Such methods include e.g., [9], [37]- [40], with followup versions promoting sparsity of the object in a transform domain (e.g., [41]).Other PSM approaches use a low-rank matrix recovery formulation to estimate spatial and temporal components jointly (free of conditions or additional data acquisition enabling a separate reconstruction of the temporal subspace), e.g., [10], [11]), also with versions promoting sparsity [12], [42].
Manifold models have also been considered for dMRI.Recent such work [43] surveys previous related works, and proposes a new method that is related to PSM.The method partitions the dynamic object into temporal subsets each lying on a low-dimensional manifold, approximated by its tangent subspace.As a result, the object is represented by a temporally partitioned PSM.A simple sparsity penalty is used for the spatial representation.The spatial basis functions and linear transformations aligning the temporal subsets are determined by optimizing a non-convex objective using an ADMM algorithm.The subset temporal partitioning and temporal bases are estimated using an initial reconstruction obtained from training data.In some applications, such data may not be available.
Another set of methods, including [44]- [47], promotes lowrankness of the entire spatio-temporal object matrix or of patches thereof [48], [49] implicitly by using the nuclear norm or Schatten-p functional with p < 1, often promoting sparsity in a transform domain as well.Methods combining motion estimation and compensation with other priors are also used widely for dMRI, e.g., [50]- [53].
A different approach in dMRI decomposes the object into the sum of low-rank and sparse components, with the lowrankness encouraged implicitly using nuclear or a Schattenp functional [54]- [57].A recent related fast method [58] decomposes the object representation into the sum of three components: the mean signal; a low-rank PSM; and a residual sparse in the Fourier domain.Although benchmark competing methods are either faster or more accurate on some of the diverse data sets used for the comparison in [58], the method provides the lowest reconstruction error and runtimes when these quantities are averaged over these data sets.
Recent approaches to dMRI reconstruction have used deep neural networks (DNNs).A majority of these methods, such as [59]- [72], are supervised, requiring a large set of high-resolution artifact-free images for network training.However, acquiring such training data can be difficult for some dMRI applications, and may be impossible for many other dynamic imaging applications, where clinicial dMRI strategies for freezing motion such as ECG synchronization and breath-hold by a cooperative patient, are not available.
The need for a training data set is overcome by recent object-domain deep image prior (DIP)-based DNN algorithms for dMRI [14], [73], which are unsupervised.Providing impressive results, DIP-based algorithms such as [14] suffer from overfitting and usually require handcrafted early stopping criteria during the optimization of generator network parameters.To overcome these drawbacks, [73] includes regularization constraining the geodesic distances between generated objects.However, this requires the computation of the Jacobian of the generator network at each iteration of the update of the weights, significantly increasing the computation and run time.
A different unsupervised approach [74] combines factorized low rank (i.e., partially separable) with generative models.Although it combines the PSM with the recent DIP framework, this method has the following limitations: (i) The spatial generator is an artifact removal network taking the full-sized spatial basis functions as input.Since this prevents a patchbased implementation, it may be computationally problematic for high-resolution 3D+temporal settings; (ii) The CNN structural prior for natural images that is used in the spatial generator may not be useful as a prior for the individual spatial basis functions, since the least-squares optimal spatial basis functions are the left singular vectors of the complete object, and as such may not have the structure of natural images; (iii) As with the other DIP-based methods, if the additional penalties on the generator parameters are insufficient, this method can be prone to overfitting.These various limitations are overcome by the proposed approach.
2) Dynamic Tomography (dCT): CT is faster than MRI, thanks to acquiring an entire projection at one view angle at once.However, because of the angular scanning required in almost all systems, in spite of significant advances in hardware, dCT is still a relatively slow modality.It is also the problem that motivated much of our work reported in this paper.
In perhaps the earliest algorithmic approaches to dCT, the time-sequential sampling problem in d-CT was formulated and studied in [75], and [76], [77] provided a solution using timesequential sampling of bandlimited spatio-temporal signals including an optimal view angle order for the scan and theoretical guarantees for unique and stable reconstruction.However, the approach is limited by its bandlimitedness assumptions.
A class of algorithms for dCT are based on modeling the time variation by a motion field, and motion-compensated reconstruction.Many of these algorithms, e.g., [78]- [81] alternate between estimating the motion field and the time-varying object.Some algorithms, e.g., [82] use the optical flow (mass continuity) PDE.A different method [83] estimates the motion field separately by evolving a linear elastic deformation by the Navier-Cauchy PDE, assuming known boundary evolution of the object and initial density distribution.However, these methods assume the total mass or density to be preserved over time, which may be limiting assumptions.For instance, flow of a contrast agent into the imaged volume, or imaging a fixed slice of a 3D time-varying object under compression with cross-slice motion may violate either of these assumptions.As a hardware-based, problem-specific approach, optical tracking of fiducial marks has been used for motion correction [84].
Various methods approach dCT reconstruction as Bayesian estimation.These include methods [85]- [88] that employ a state-space formulation and use Kalman filter techniques to approximate an MMSE estimate.A different Bayesian method [89] incorporates a spatiotemporal Markov random field object prior, and models measurement imperfections.An iterative algorithm is used to compute a MAP estimate.An interlaced projection acquisition scheme is also proposed.
Several dCT methods motivated by compressed sensing use a total-variation penalty along both spatial and temporal coordinates as in [90], while other methods, e.g., [91] propose efficient sparse representations for the dynamic object.Incorporating low-rank modeling [92] decomposes the object into low-rank+sparse components, promoting the low rankness implicitly using the nuclear norm.The method relies on the object being approximately static for groups of projections, enabling an approximate reconstruction from each such group.
The PSM model has been introduced into dynamic tomography [93], [94] by carrying the PSM to the projection domain using the harmonic representation of projections, and estimating the spatial and temporal basis functions jointly.The work provides a theoretical analysis of uniqueness and stability and the choice of time-sequential angular scan scheme for the problem.Despite the advantages of this approach, its performance is still limited by the null space of the measurement operator.
To help address the underdetermined problem, the method of [95] improves over [93], [94] by combining a PSM with basic spatial regularization by total variation.Like [93], [94] this method too estimates the temporal and spatial PSM components jointly.This makes it applicable to the time-sequential acquisition scenario considered in this paper, where only a single projection is available per time frame.It imposes low rankness as a soft constraint, using a hybrid biconvex objective with a data fidelity term and a TV regularizer expressed in terms of the unfactorized object, and a penalty for object-PSM mismatch.It uses [93], [94] for fast initialization.The method [95] can be considered a precursor to the method of this paper, which combines a hard rank constrained PSM with a sophisticated learned spatial regularizer, and provides theoretical convergence analysis.
Recent methods [96]- [99] introduce deep learning into dCT.These methods require an initial reconstruction from the subsets of data from which the motion can be estimated accurately.However, without a periodicity assumption, this is not applicable to the scenarios in this paper, where only a single projection is available at a time.Another method [100] for cone-beam dCT proposes a multi-agent consensus equilibrium [101]-based technique using separate deep denoisers for axial, sagittal, and coronal slices of the object.
A deep-learning-based method [102] for dynamic photoacoustic tomography (PACT) introduced a neural field (NF) to represent the dynamic object with fewer parameters, com-bining it with a simple total variation (TV)-based regularizer.Despite providing an efficient representation for the dynamic object, the method outperforms its comparison benchmarks only in some of the projection-per-frame settings.

C. Contributions
1. To the best of our knowledge, RED-PSM is the first PSM-based approach to dynamic imaging that incorporates a particular (RED-based [13]) spatial prior that is learned and pre-trained.
2. We are not aware of any prior work that uses RED with an explicit low-rankness constraint.In RED-PSM we achieve parsimonious representation for the dynamic object using PSM, and reduce the data requirements further by incorporating the RED prior.
3. Unlike supervised learning-based methods for spatiotemporal imaging [59]- [72], [103]- [105] that learn a spatiotemporal model from training data, RED-PSM does not require ground-truth spatio-temporal training data, which is often not available or expensive to acquire.Thus RED-PSM offers the best of both worlds: a learned spatial model using readily available static training data, and unsupervised single-instance adaptation to the spatio-temporal measurement.
4. A novel and effective ADMM algorithm for the resulting new minimization problem enforces PSM as a hard constraint and incorporates the learned RED spatial regularizer.
5. The method is supported by theoretical analysis: we show convergence of the proposed PSM-based objective to a stationary point, and do so for a highly non-trivial learned regularizer.To date, the convergence of RED has been analyzed in the nonlinear measurement setting only for the phase retrieval problem [19].Our analysis of RED-PSM is the first convergence result of RED in the bilinear (bi-convex), nonconvex scenario.This convergence guarantee is of practical importance.First, it provides a mathematical justification to terminate the algorithm after sufficiently many iterations.Second, it eliminates concerns about the solution degrading after too many iterations as in the case of DIP-based methods.
6. Compared to a recent DIP-based [14] algorithm, RED-PSM achieves better reconstruction accuracy with orders of magnitude faster run times.
7. To improve and speed up the empirical convergence of RED-PSM, we use a particular fast projection-domain PSM initialization scheme [93], [94] for the spatial and temporal basis functions.The accelerated and reliable convergence with this initialization scheme is important for applications that require fast turn-around between acquisition and reconstruction.
8. A version of the approach with a patch-based regularizer is shown to provide almost equivalent reconstruction accuracy.This makes the proposed method conveniently scalable to high-resolution 3D or 4D settings.
An earlier and partial version of this work, missing, among other things, the detailed discussion of previous work, convergence analysis, and some of the experimental results, was presented at a conference [106].

II. PROBLEM STATEMENT
In a 2D setting, illustrated in Figure 1, the goal in the dynamic tomography problem is to reconstruct a time-varying object f (x, t), x ∈ R 2 vanishing outside a disc of diameter D, from its projections g(•, θ, t) = R θ {f (x, t)} obtained using the Radon transform operator R θ at angle θ.Considering time-sequential sampling, in which only one projection is acquired at each time instant, and sampling uniform in time, the acquired measurements are {g(s, θ p , t p )} P −1 p=0 , ∀s, where s is the offset of the line of integration from the origin (i.e., detector position), and P is the total number of projections (and temporal samples) acquired.The sampling of the s variable is assumed fine enough and is suppressed in the notation.The angular sampling scheme, the sequence {θ p } P −1 p=0 , with θ p ∈ [0, 2π], is a free design parameter.Our objective in dynamic tomography is to reconstruct the underlying dynamic object {f (x, t p )} P −1 p=0 from the timesequential projections in (1).The challenge is that because each projection belongs to a different object, the projections in (1) are inconsistent.Therefore, a conventional, e.g., filtered backprojection (FBP) reconstruction as for a static object results in significant reconstruction artifacts.This is to be expected, as the problem is severely ill-posed: an image with a D-pixels diameter requires more than D projections for artifact-free reconstruction, whereas only one projection is available per time-frame in the time-sequential acquisition (1).Several dynamic tomography methods [79], [89], [100] group temporally neighboring projections and assume the object is static during their acquisition.However, this reduces the temporal resolution, and any violation of this assumption (as in the case of time-sequential sampling) leads to mismodeling in the data fidelity terms.

III. PARTIALLY SEPARABLE MODELS (PSM)
For spatio-temporal inverse problems such as dynamic MRI and tomography, the underlying object can be accurately represented using a partially-separable model (PSM), which effectively introduces a factorized low-rank prior to the problem.For dynamic tomography, a PSM can represent the imaged object f , or its full set of projections g.In this paper we use an object-domain PSM.
The representation of a dynamic object f (x, t) by a K-th order partially separable model (PSM) is the series expansion ( This model facilitates interpretability by separating the spatial structure from the temporal dynamics.Empirically, modest values of K provide high accuracy in applications to MR cardiac imaging [10], [107], [108].Theoretical analysis [94] shows that for a spatially bandlimited object undergoing a time-varying affine transformation (i.e, combination of timevarying translation, scaling, and rotation) of bounded magnitude, a low order PSM provides a good approximation.As detailed later, as a standard choice in modeling temporal functions, we use a d-dimensional representation with d ≪ P .A similar idea was used in [11] to represent the low temporal bandwidth of cardiac motion in dMRI.Together with the PSM for the object this leads to a significant reduction in the number of representation parameters for a spatio-temporal object with a D-pixel diameter and P temporal samples: from ≈ P D 2 to ≈ KD 2 + Kd, with K ≪ P .Because d ≪ D, this corresponds to a factor of P/K ≫ 1 compression of the representation, providing an effective parsimonious model for the spatio-temporal object f .Also, by propagating the PSM object model to the projection domain, it enables quantitative analysis [94] of the choice of optimal sequential projection angular schedule.

IV. PROPOSED METHOD: RED-PSM
The overall RED-PSM framework for recovering dynamic objects explained in this section is illustrated in Figure 2.

A. Variational Formulation
We use a discretized version of the PSM (2) for the dynamic object, with the object f (•, t) at each time instant t = 0, 1, . . ., P − 1, represented by a N × N -pixel image (a "time frame", or "snapshot").Vectorizing these images to vectors , the entire dynamic object is the . It will also be useful to extract individual time frames from f .Denoting the t-th column of the P × P identity matrix by e t , we have f e t = f t , i.e., e t extracts the t-th column of f .Applying the PSM model, we assume f = ΛΨ T ∈ R N 2 ×P , where the columns Λ k and Ψ k of Λ ∈ R N 2 ×K and Ψ ∈ R P ×K are the discretized spatial and temporal basis functions for the PSM representation, respectively.
Assuming that the x-ray detector has N bins, the projection of the object at time t is computes the projection at view angle θ t .
We formulate the recovery of f as the solution f = Λ ΨT to the following variational problem The first term is the data fidelity term measuring the fit between available undersampled measurements g t of the true object and the measurements obtained from the estimated object f = ΛΨ T ∈ R N 2 ×P .The second term with weight λ > 0 is a spatial regularizer injecting relevant spatial prior to the problem.It is applied to the PSM ΛΨ T column by column, that is, to individual temporal image frames.The last two terms with weight ξ > 0 prevent the bilinear factors from growing without limit. 1 Finally, the identity Ψ = U Z is an implicit temporal regularizer that restricts the temporal basis functions Ψ to a d-dimensional subspace of R P spanned by a fixed basis U ∈ R P ×d where d ≥ K.The action of U on columns of Z may be interpreted as interpolation from d samples to the Psample temporal basis functions.In practice, we incorporate this identity by explicit substitution (reparametrization of Ψ in terms of the free variable Z ∈ Z d×K ) into the objective, and the minimization in (3) over Ψ is thus replaced by minimization over Z.This reduces the number of degrees of freedom in Ψ to a fixed number dK, independent of the number P of temporal sampling instants.For notational conciseness, we do not display this constraint/reparametrization in the sequel, but it is used throughout.

B. Incorporating Regularization by Denoising
For the spatial regularizer ρ(•) we consider "Regularization by Denoising (RED)" [13].RED proposes a recovery method using a denoising operator D : R N 2 → R N 2 in an explicit regularizer of the form Recent studies using this regularizer provide impressive performances for various static image reconstruction tasks including 1 Inclusion of these Frobenius norm terms also happens to lead to connections with nuclear norm, but, as discussed in detail in Section V, our problem formulation (even if the RED regularizer is disregarded) is not equivalent to a formulation encouraging a low-rank solution by a nuclear norm or other Schatten norm penalty such as used in [44], [46], [47].Instead, the factorized form f = ΛΨ T (as in other PSM-based matrix recovery works such as [10], [11]) introduces this rank constraint explicitly.
high-dimensional cases [109] and phase retrieval [110].This regularizer was also combined with a DIP-based fidelity term in [111].However, we avoid this due to disadvantages related to speed, overfitting, and convergence guarantees.
While it provides significant flexibility for the type of denoisers that can be used, RED still requires D to be differentiable and locally homogeneous, and to satisfy the passivity condition ∥D(f t )∥ ≤ ∥f t ∥, for the theoretical analysis of RED to apply. 2   Next, we consider the optimization in (3).For the conventional variational formulation an efficient choice are iterative algorithms [13] that use the standard "early termination" approach [112], and only require a single use of the denoiser per iteration.However, the regularized PSM objective in (3) does not allow to propagate the RED updates on f to the respective basis functions efficiently.
To overcome this difficulty, we perform a bilinear variable splitting f = ΛΨ T and obtain our final formulation where + λρ(f e t ) Since the PSM is enforced as a hard constraint, the estimated object f is constrained to have rank(f ) ≤ K. Problem ( 5) is non-convex even if ρ is convex, because of the product between unknowns Λ and Ψ.
We propose an algorithm based on ADMM to solve (5).To this end, we form the augmented Lagrangian in the scaled form, [113], [114] where γ ∈ R P ×N 2 represents the scaled dual variable associated with the constraint f = ΛΨ T and β > 0 is the penalty parameter.
Then, ADMM can be used to solve (7) as in Algorithm 1. 2 While many powerful denoisers have been demonstrated to satisfy these conditions in [13], recent work [20] provides another framework to explain the good performance of RED with denoisers not satisfying them. 5: end for Line 4 in Algorithm 1 then corresponds to the variational denoising for all t of the "pseudo image frame" (Λ (i) Ψ (i)T + γ (i−1) )e t with regularization λρ(f t ).Instead of solving this denoising problem by running an iterative algorithm to convergence, we follow the RED approach [13], [20], and for each t replace the f t update in Step 4 by a single fixed-point iteration step using the approach of early stopping [112] (Sec.4.3.2),taking advantage of the gradient rule where D ϕ is the denoiser.This results in Algorithm 2, which requires only a single use of the denoiser per iteration of ADMM.Furthermore, as in Algorithm 1, the modified Line 4 in Algorithm 2 can be performed in parallel for all t.
Algorithm 2 RED-PSM with efficient f step Notes: Inputs, and Lines 1-3 and 5-6 are the same as Algorithm 1.The f step is applied ∀t.

C. Regularization Denoiser
The regularization denoiser D ϕ has a deep neural network CNN (DnCNN) [115] architecture and is trained in a supervised manner on a training set of 2D slices f i ∈ R N 2 , i = 1, . . .N of one or more static objects similar to the object of interest, assuming that such data will be available in the settings of interest.Thus, the RED steps are agnostic to the specific motion type.The training objective for the denoiser is where the injected noise η i ∼ N (0, σ 2 i I) has noise level σ i ∼ U [0, σ max ] spanning a range of values, so that the denoiser learns to denoise data with various noise levels.

D. Computational Cost
Space (memory requirements) and time (operation count) complexities for two variants of the RED-PSM bilinear ADMM algorithm are shown in Table I.The operation counts are for a single outer iteration.As the number of outer iterations typically has a weak dependence on the size of the problem, the scaling shown tends to determine the run time.Furthermore, thanks to its structure, the algorithm also offers many opportunities for easy parallelization, so that actual runtime can be proportionally reduced by allocating greater computational resources.See the Supplementary Material Section VII-B for the detailed analysis of computational requirements.
The complexities for the proposed Algorithm 2 are given in the first column of Table I.Space complexity is dominated by the storage of Λ, and scales proportionally to image crosssection size in pixels N 2 , and the order K of the PSM.The time complexity is dominated by the computations of the gradient with respect to Λ, and scales proportionally to the size N 2 P of the spatio-temporal object, PSM order K, and the number M i of inner iterations used to solve optimization subproblems (Lines 2 and 3 in Algorithm 2).
Finally, in the second column of Table I, we investigate the patch-based version of the proposed algorithm described in Section VI-D6.Given a patch size N B ≪ N and stride s ≤ N B , this alternative increases the operation count by ( N B s ) 2 since it operates on overlapping patches.For example, for the settings in our experiments, s = N B 2 , increasing the operation count by a factor of 4. This is a modest price to pay, because this alternative also reduces the space complexity by a factor of ( N B N ) 2 .For example, for N B =8 and N =128 as used in our experiments, this corresponds to a reduction by a factor of 144 in space complexity.Thus, this variant of RED-PSM enables scaling for high-dimensional and high-resolution settings.

Complexity PSM-fidelity
Patch-based TABLE I: Time and space complexities for two variants of the RED-PSM algorithm for a single outer iteration of the bilinear ADMM.
Mi is the number of inner iterations for each iteratively solved subproblem, and NB ≪ N , s ≤ NB.

V. CONVERGENCE ANALYSIS
In this section, we follow an approach similar to recent work on ADMM for a bilinear model [116] to analyze convergence.We show that under mild technical conditions, the objective in Algorithm 1 is guaranteed to converge (with increasing number I of iterations) to a value corresponding to a stationary point of the Lagrangian, that is, satisfying the necessary conditions for first order optimality.
In practice, Algorithm 2 with the efficient f step version, which we implemented and used in the experiments reported in Section VI, has better run times, and rapid empirical convergence.However, its analysis requires additional steps, which are not particularly illuminating.Therefore we focus on the analysis of the nominal Algorithm 1.
In spite of the similarity, at a high level, to the problem and analysis in [116], our problem formulation and algorithm differ in several aspects, which require modifications in the analysis and proof.In particular, different to [116], where the bilinear form only appears in the constraint, the bilinear form appears both in our objective function and constraint.To satisfy strong convexity for the respective subproblems, our objective also uses the Frobenius norm terms for the PSM factors, instead of the interim proximal terms in [116].Moreover, to allow the efficient f -step in Algorithm 2, we use a different order of updates than [116].Importantly, we account for the explicit RED regularization and its required properties in the proof of convergence.Our analysis includes a proof (similar to [117]) of the boundedness of the iterates to justify the existence of points of accumulation of the iterate sequence, which appears to have been inadvertently left out in [116].However, different to [117], which has objectives with linear constraints, our objective (3) has a bilinear constraint.
To simplify the notation, we replace the separate computation of the projections of time t-frames in the data fidelity term by using the operator R : R N 2 ×P → R N ×P that computes the entire set of P projections at view angles θ t of the image frames at times t, t = 1, . . ., P of dynamic image f , i.e, of each of its columns indexed by t, producing g = Rf ∈ R N ×P .When applied to the PSM, R performs R θt ΛΨ T e t for each t.We also aggregate the contribution of the RED regularizer into ρ(f ) ≜ P −1 t=0 ρ(f t ) : R N 2 ×P → R, and the denoiser into D : R N 2 ×P → R N 2 ×P , which performs D for each column of f indexed by t.Then, the Lagrangian function with dual variable γ can be rewritten as where the inner product is defined as ⟨A, B⟩ = Tr(A T B).
The corresponding augmented Lagrangian is Then, we can state the subproblems with respect to each primal variable in Algorithm 1 as Algorithm 1 will be shown in Theorem 1 to converge to a stationary point of Problem (5), which is defined below.
Definition 1 (Stationary solution for Problem (5)).The point W * = (Λ * , Ψ * , f * , γ * ) is a stationary solution of the problem (5) if it satisfies the stationarity and primal feasibility conditions for the variables of the Lagrangian in (10): By its definition, a stationary solution of ( 5) satisfies the necessary first-order conditions for optimality.
To state Theorem 1, we need to introduce some additional definitions pertaining to the denoiser D used in RED.
Moreover, we assume the gradient rule of RED (8).This is shown to hold true when the denoiser D is assumed locally homogeneous (LH) [13] and Jacobian symmetric (JS) [20].It was shown [13] that several popular denoisers are practically LH, and that for (8) to hold for the explicit regularizer ρ in (4), JS is necessary [20].
In our analysis, we only require the gradient rule (8) to hold, rather than the explicit form (4) of the regularizer.Even for denoisers without LH and JS, the gradient rule can be used to define the fixed point iteration in the algorithm [20], and, more recent works on using denoisers for regularization [118]- [122] (in particular [120]) shows how to learn a denoiser jointly with the gradient so that the gradient rule is satisfied.
Finally, we will also assume Lipschitz continuity of the denoiser with some Lipschitz constant L D , i,e., ∥D ϕ (f . This means that the denoiser has some finite gain L D , which is satisfied by any reasonable denoiser.
Our main convergence result is the following.
Theorem 1. Suppose that the denoiser D ϕ satisfies the gradient rule in (8), and is Lipschitz continuous with some Lipschitz constant L D , and strongly passive.
The proof of Theorem 1 is provided in the supplementary material Section D.
Regarding convergence to the globally optimal solution, more can be said.Problem (3) is non-convex even if ρ is convex, because of the product between unknowns Λ and Ψ.However, similar to Theorem 7.1 of [123] (which generalizes arguments in [124], [125]), thanks to the inclusion of the Frobenius norms of the factors Λ and Ψ in the bilinear form, when ρ is convex (i.e., when the gradient rule ( 8) is assumed to hold), the global minimum f = Λ ΨT of (3) can be shown to coincide with the global minimum in where the penalty weight ξ > 0 is identical to the Frobenius norm penalty in our RED-PSM objective (5).
Importantly, Problem ( 16) is non-convex too, because of the rank constraint.In other words, it is not equivalent to a conventional nuclear norm penalized problem.Hence, although the Frobenius norm penalties on the PSM factors in our problem formulation (3) lead to a connection to nuclear norm, Problem (3) is not equivalent to a formulation encouraging a low-rank solution by a nuclear norm penalty.
However, returning to the question of determining global optimality of a candidate solution to (16), consider the convex version of Problem (16), without the rank constraint, and denote its solution by f Convex .If it so happens that rank( f Convex ) ≤ K, then the low-rank constraint in Problem ( 16) is not active, and f Convex = f .It then follows [123] that the optimality conditions of the convex problem (without the rank constraint) can be used to assess the global optimality of a candidate solution f = Λ ΨT produced by the algorithm solving (3).

A. Datasets
Three categories of data sets are used in this work.Walnut Dataset: We use the CT reconstructions of two different (static) walnut objects from the publicly available 3D walnut CT dataset [126].We create a dynamic test object by synthetically warping the central axial slice of one of the walnut objects using a sinusoidal piecewise-affine timevarying warp [127].To be precise, the image is divided into a N × N uniformly spaced rectangular grid, and the following vertical displacement is applied on each row separately to drive the temporally varying warp ∆ n,t = −C(t) sin(3πn/N ), n ∈ {0, . . ., N − 1}, where C(t) is a linearly increasing function of t and C(0) = 0. Static axial, coronal, and sagittal slices of the other walnut object are used to train the denoiser D ϕ .
Compressed Object Dataset: The compressed object data set is obtained from a materials science experiment [128] with a sequence of nine increasing compression (loading) steps applied to an object, with a full set of radiographic projections collected (using Carl Zeiss Xradia 520 Versa) and reconstructed by the instrument's software at each step.Using this quasi-static data set, a fixed axial slice is extracted from each of the 9 reconstructions.These nine extracted slices corresponding to nine time points are interpolated to P time frames using a recent deep learning-based video interpolation algorithm [129].The denoiser D ϕ for our experiments on this data set was trained using the axial slices of the static pre-compression and post-compression versions of the object, which would be available in actual dynamic loading experiments of this type.
In a conference submission [130] citing this work, the method has been applied to additional tomographic scenarios of materials compressed under load, further confirming the advantage of the proposed method and algorithm over the compared methods.
We note that all algorithms compared in this paper are agnostic to both the synthetic warp applied on the static walnut slice and to the data-driven interpolation method used for the compressed object.
Spatio-temporal projection data for each dataset is simulated by a parallel-beam projection with N =128 detector bins of the dynamic phantoms, a single projection at each of the P time instants.The sequence of projection angles {θ t } P −1 t=0 (a free experimental design parameter) was chosen to follow the bitreversed view angle sampling scheme, which has been shown [93] to provide an especially favorable conditioning of the recovery problem.The simulated measurements are corrupted using AWGN with standard deviation σ = 5 • 10 −3 .This noise level leads to the FBP (with Ram-Lak filter) of the full set of P =512 projections at each time instant having a PSNR of approximately 46 dB.When, in the actual experiments with sequentially sampled data, only 1/P of this data is used, the PSNR of the reconstruction may be expected to be lower.
Ground-truth frames for P = 4 are shown in Figure 3.
Cardiac dMRI Dataset: For a more direct comparison with the setting and data used in dMRI works, we also test RED-PSM on the "retrospective" cardiac dMRI data in [14].The data includes 23 distinct time frames for one cardiac cycle.Details of the data and experiments are in Section VI-D4.

B. Comparison Benchmarks
PSM-TV: Similar to the proposed approach, this algorithm also uses a PSM to represent the object, but instead of the RED regularizer, the regularization penalizes the discrete 2D total variation of the temporal frames of f at each time instant.To this end, the constraint f = ΛΨ T is implemented by substitution into the objective in (5), and the definition of ρ is changed to ρ(•) = TV(•), and the rest of the objective is kept the same.We consider spatial (PSM-TV-S) and spatiotemporal (PSM-TV-ST) alternatives of TV.The former computes TV only spatially in a single frame at each t whereas the latter also computes differences between temporally adjacent pixels at t − 1 and t + 1.The unconstrained problem is then solved for { Λ, Ψ} (using Adam optimizer in Pytorch).Finally, the estimated object is obtained as f = Λ ΨT .
TD-DIP [14]: TD-DIP is a recent method based on the Deep Image Prior (DIP) approach. 3It uses a mapping network M α and a generative model N β in cascade to obtain the estimated object at each time instant f t from fixed and handcrafted latent representations χ t .Because TD-DIP was originally proposed for dynamic MRI, we modified the objective minimally for dynamic tomography as min α,β t For the comparisons in this work, the mapping network and generator architectures, latent representation dimensionality, optimizer, learning rate, and decay schemes are are identical to those in the available implementation [131].The original work focuses on the beating heart problem and thus proposes a helix-shaped manifold for χ t with number of cycles equal to the number of heartbeats during measurement acquisition.Since we do not have a repetition assumption for the motions of the walnut and compressed object, we use a linear manifold as explained in the original paper [14].Thus, for clarity, in Section VI-D the method is sometimes denoted as "TD-DIP (L)".

C. Experimental Settings
All methods are run on a workstation with an Intel(R) Xeon(R) Gold 5320 CPU and NVIDIA RTX A6000 GPU.In practice, we used a minor variation of Algorithm 2, where we combined the subproblems for Λ and Ψ, and minimized with respect to both basis functions simultaneously using gradient descent with Adam [132] optimizer.
Denoiser and training.Each convolutional layer in the denoiser network is followed by a ReLU nonlinearity except for the final single-channel output layer.We tested both direct and residual DnCNN denoisers, where the former predicts the denoised image and the latter estimates the noise from the input.We use the denoiser type that performs better for each object, but the differences are minor.Further architectural details for the denoisers in our experiments are in Table VII in the Supplementary Material.We use three pre-trained denoisers, one for each of the three object types.In each case, the same pre-trained denoiser was used for all values of P .
The upper limit for noise level used in training the denoiser was set to σ max = 5•10 −2 .For the dynamic walnut object, the denoiser D ϕ is trained on the central 200 axial, 200 sagittal, and 200 coronal slices of another static walnut CT reconstruction downsampled to size 128×128.For the compressed object, axial slices of pre-compression and post-compression static versions of the object, containing 462 slices in total, are used to train D ϕ .For the cardiac MRI setting, the denoiser was trained on the static MRI training slices of the ACDC dataset [133].For all datasets, D ϕ is trained for 500 epochs using the Adam optimizer with a learning rate of 5•10 −3 .As mentioned above, we evaluate both direct and residual DnCNN denoisers.
Temporal Basis.In compressed material and cardiac dMRI data experiments, we use the parametrization Ψ = U Z with a fixed basis U that corresponds to a cubic spline interpolator, and for the warped walnut we use DCT-II, to interpolate the low-dimensional temporal representation Z to Ψ.
Initialization.Unless stated otherwise, the spatial and temporal basis functions are initialized using the SVD truncated to the rank of the dynamic object estimate produced by a recent projection-domain PSM-based method "ProSep" [94].If the ProSep estimate has rank smaller than K, the remaining basis functions are initialized as 0. Otherwise, all spatial basis functions are initialized as 0 and the temporal latent representations z k are initialized randomly as z k ∼ N (0, I).
Tomographic Acquisition Scheme.All methods mentioned in this paper use the bit-reversed angular sampling scheme, over the range [0, π].For time-sequential acquisition, the bitreversed scheme was shown [93], [94] to provide favorable results via better conditioning of the forward model in comparison to alternatives.In a standard CT scanner, the speed of rotation might have to be significantly increased to implement this scheme, possibly leading to greater motion blur.However, for several scenarios this would not be much of an issue.These include radial acquisition in MRI; a CT scanner with electronic beam deflection [134]; and settings where the acquisition time is dominated by the time to acquire each view rather than the rotation time, e.g., micro-CT, or imaging of dense materials.
To help address the challenge in implementing the bitreversed scheme in other, more physically constrained settings, the number of distinct view angles P can be reduced and these views can be repeated periodically without a performance drop as also shown in Figure 11.With a reduced P , the bit-reversed scheme can be implemented more conveniently, e.g., by multiple source-detector pairs, or by carbon nanotube sources [135], [136].
Run Times.For P = 256 and using the specified computational resources and parameter settings, to reach the peak PSNR during optimization, RED-PSM with ProSep initialization requires 50 < iterations < 150 taking about 2 to 6 minutes whereas TD-DIP with batch size P typically requires > 30k steps, taking about 3.5 hours to complete.Hence, RED-PSM provides a speedup over TD-DIP by a factor of 35 to 105.Depending on the parameter configuration, the speedup factor may vary.However, the proposed method provides a significant run time reduction in all cases.
Evaluation Metrics.Four quantitative metrics were implemented for comparing different method performances: (i) the peak signal-to-noise ratio (PSNR) in dB; (ii) the structural similarity index (SSIM) [137]; (iii) the mean absolute error (MAE); and (iv) the high-frequency error norm (HFEN) [138].The latter is defined as HFEN(f, f r ) = ∥LoG(f ) − LoG(f r )∥ 2 where LoG is a rotationally symmetric Laplacian of Gaussian filter with a standard deviation of 1.5 pixels.

D. Results
1) Reconstruction accuracies for different P : RED-PSM is compared in Figure 4 and Table II with PSM-TV-S, PSM-TV-ST and TD-DIP (using, for both a-periodic objects, a linear latent representation).Parameter configurations for the experiments are provided in Table V in Supplementary Material A. While Figure 4 facilitates the assessment and comparison of trends in the metrics for different P for the various methods, Table II provides the detail for a more precise quantitative comparison.In fact, since the plotted range of metrics such as PSNR in Figure 4 is very large due to varying P , Table II emphasizes important differences between methods for the same P .

Remarks:
1) The scale differences in the MAE and HFEN between the two objects are due to working with un-normalized densities.
2) In all TD-DIP experiments, optimization is stopped early to achieve the best PSNR reconstructions, assuming such a "stopping oracle" is available.In a more realistic setting, in the absence of this oracle, TD-DIP optimization with continued iterations suffers from overfitting, and produces in these experiments degraded results.For instance, for the warped walnut slice with 40k iterations, PSNR, SSIM, and MAE degrade significantly to 19.5 dB, 0.822, and 3.1•10 −3 for P = 32, and 24.1 dB, 0.824, and 2.1•10 −3 for P = 64.Thanks to the accurate spatial prior and the convergence properties, we do not encounter such a problem for RED-PSM.
3) Because the performance of TD-DIP varies with initialization, in each experiment we ran it three times, with different random initialization, each time using a "stopping oracle" to obtain the best PSNR for the given initialization.Table II reports the average of the best PSNR reconstruction accuracies for TD-DIP in these three runs.Figure 4, complements this information, by showing in addition to the average results, also the best and worst of these runs (still using the stopping oracle to get the best PSNR per run).
As expected, for all methods, the estimates improve with increasing P .In terms of PSNR, the proposed algorithm performs on par with or usually better than TD-DIP (for all but the lowest P on the compressed material object), and consistently better than PSM-TV-S and PSM-TV-ST.The PSNR improvement of RED-PSM over TD-DIP enhances with increasing P , reaching 2.4 dB for the time-varying walnut with P = 256.Moreover, for the other three metrics, SSIM, MAE, and HFEN, the improvement of RED-PSM over other algorithms is more significant.Specifically, the reduction in MAE reaches up to and exceeds %50 for both objects for P = 256.These observations are valid for both objects, however, RED-PSM provides slightly greater improvement over TD-DIP in the walnut case.
We would like to emphasize the significance of these reconstruction quality improvements by comparing them to some representative examples.TD-DIP reports up to 3 dB PSNR and 0.005 SSIM improvement with respect to an older dMRI method in a single scenario.While providing comparable PSNR improvements (2.4 dB), we are able to provide four times larger SSIM (0.02) improvements relative to TD-DIP itself.Other recent dMRI [73] and dynamic photo-acoustic tomography [102] methods improve 1.5 dB, and at most 1 dB and often no improvement over their respective benchmarks, again in single imaging scenario for each method.
Figure 5 compares the reconstructions for both objects at two different values of t for P = 256.As expected, PSM-TV-S performs the worst among the compared methods and provides, for both objects, blurry reconstructions lacking finer details.The TD-DIP reconstructions improve somewhat over PSM-TV-S and PSM-TV-ST, but contain visible noise-like artifacts on the piecewise constant regions of the walnut object,  (R) and ProSep (Pr) initialization for the PSM methods.For TD-DIP, the reported accuracies are for the best PSNR using a "stopping oracle", averaged over three runs with random initial conditions.
which are alleviated by RED-PSM.This is manifested also in the absolute difference figures, with error for TD-DIP distributed throughout the interior regions of the walnut.Also, around the shell of the walnut, RED-PSM is further able to preserve sharper details.For the compressed material, in comparison to TD-DIP, RED-PSM shows reduced absolute error almost uniformly over the object.This difference is more prominent around the highly dynamic regions of the object, emphasizing the advantage of the proposed method.For a better understanding of RED-PSM results, we also display the reconstructed spatial and temporal basis Λ and Ψ for the timevarying walnut scenario in Supplementary Material C. Reconstructed x-t "slices" through the dynamic walnut are compared in Figure 6.The location of the x-t slice is highlighted on the t = 0 static x-y frame by a yellow line.Consistent with the comparison in Figure 5, RED-PSM provides reduced absolute error values throughout the respective x-t slice.Also, as more apparent on the error figures, TD-DIP leads to higher background errors.
Finally, a zoomed-in comparison of the time-varying walnut object for another time instant for P = 256 is provided in Figure 7.The comparison shows the better performance of RED-PSM at recovering the finer details clearly.
2) PSNR vs. t Comparisons: To complement the cumulative metrics in Figure 4 and Table II and the "snapshot" qualitative comparisons in Figure 5, we study how the reconstructed image frame PSNRs vary over the reconstructed time interval.The per frame PSNRs (in dB) of the walnut and compressed object reconstructed with P = 256 by the different methods are shown in Figure 8 as a function of t.For TD-DIP, the best PSNR obtained using a "stopping oracle" is reported, with the red shading indicating, for each t, the interval between the highest and lowest PSNR in three runs with different random initial conditions.For the warped walnut object, RED-PSM provides consistently better PSNR than the best-case TD-DIP for all t.For the compressed object, the same is true at about 70% of t points.Figure 8 also shows transient effects at the beginning and the end for both objects and all methods.In scenarios such as the object compression experiment, in which the initial and final state are static and could be measured using multiple projections, such transients could be eliminated.Similarly, in quasi-periodic scenarios such as cardiac imaging the effect of such transients would be minimal.
3) Effect of Initialization: The initialization of Λ, Ψ, and f plays an important role in the performance and convergence speed of RED-PSM.We observe significant speed-up when rather than a random initialization, we initialize the algorithm with ProSep [93] estimated reconstruction.Figure 9 shows PSNR vs. iterations comparison for different initialization techniques for the dynamic walnut object with P = 256.The rest of the parameters were selected identical to those indicated in Supplementary Material Table V.This experiment highlights the advantages of initializing with ProSep estimated basis functions: eliminating the need for multiple runs for a best-case result; and speeding up convergence considerably.
Combined also with the convergent algorithm eliminating the need for an unrealistic stopping oracle and the theoretical analysis, RED-PSM provides improved reliability which is of practical significance.("spokes") per frame at the bit-reversed angles.We used 1.4 and 2.8 cardiac cycles, with 23 × 4 = 92 spokes/cycle, for a total of P =128 and P =256 spokes.The problem is still severely undersampled compared to the experiment in [14] where 13 spokes are used per frame for 13 cycles, for a total of 13 × 23 × 13 = 3, 887 spokes.Since the data is periodic, we also tested the helix latent scheme (H) for TD-DIP.The metrics in Table III and the qualitative comparison in Figure 10 with zoomed-in reconstructions and absolute error maps, show that RED-PSM performs better than both versions of TD-DIP.for the retrospective dMRI data [14].
In [73], the authors compare their method to an older version of TD-DIP without the improved latent representation prior scheme for cardiac dMRI and report 1.5 dB PSNR improvement.In RED-PSM, we exceed this improvement on the latest version of TD-DIP in multiple scenarios.5) Acquisition with smaller number of distinct view angles: Since obtaining time-sequential projections from different angles in a sufficiently short time period can be physically challenging, we also test the performance of RED-PSM with an acquisition scheme that may be easier to realize physically: keeping the same total number of P projections, but taken at a smaller number P (called "period") of distinct view angles, which are also obtained using the bit-reversed angular sampling scheme.The comparison in Figure 11 shows that it is possible to use up to 1/8-th of the distinct view angles without performance loss for RED-PSM.

6) Patch-based RED denoiser:
To improve the scalability of the method to higher resolution and/or 3D dynamic object, conveniently, the objective in (5) can be manipulated to operate on the patches of temporal image frames of the time-varying object.This circumvents the need to store the complete image frame at a given time, and also enables the denoiser D ϕ to be both trained and to operate on patches of image frames.To showcase the potential of the suggested scheme, we replace the full-size D ϕ with a patch-based counterpart in the RED step and compare the performance with the originally proposed method for 2D dynamic objects.
The patch-based denoiser for RED updates is trained using where η i , σ i are set as in (9), and B l , with l ∈ {0, . . ., L − 1}, is the operator to extract the l-th patch of the image.The denoiser D ϕ operates separately on each patch.
To train the patch-based D ϕ , uniformly random rotations (multiples of π 2 ), and random horizontal/vertical flips, each with 1  2 probability, were used for data augmentation.The patch size was chosen 8×8 with a stride of 2.
Table IV compares the results for the two denoiser types for both objects, using the same denoiser training policy as in Section VI-C, and experimental configurations of Table VI in the Supplementary Material Section A. The results show little difference between the patch-based denoiser and full frame-based denoiser RED-PSM variants, thus verifying the effectiveness of the patch-based version.The analysis of the computational requirements of the patch-based RED-PSM variant in Section IV-D shows its potential for a highly scalable implementation.
We note that in a divergent beam scenario, the contribution of a patch to a projection will be position-dependent and this would require accurate bookkeeping.Details for doing so can be found in tile-based methods for fan-beam [139] and conebeam [140] tomography.However, we leave such analysis for future studies.

VII. CONCLUSIONS
We proposed RED-PSM, the first PSM-based approach to dynamic imaging using a pre-trained and learned (REDbased) spatial prior.The objective in the proposed variational formulation is optimized using a novel and effective biconvex ADMM algorithm, which enforces the PSM as a hard constraint.Unlike existing PSM-based techniques, RED-PSM is supported by theoretical analysis, with a convergence guarantee to a stationary point of the objective.The results of the numerical experiments show better reconstruction accuracy and considerably faster run times compared to a recent DIPbased algorithm.A patch-based regularizer version of RED-PSM provides almost equivalent performance with a massive reduction of storage requirements, indicating the potential of our framework for dynamic high-resolution 2D or 3D settings.
Possible directions for future work include the application of RED-PSM to different imaging scenarios other than tomography and MRI, and robust denoiser training for RED framework since the deep denoisers encounter varying artifact distributions during optimization.This could also improve the generalizability of the framework to different input types.

A. Experimental configurations
The PSM-TV and PSM-RED parameter selections for the experiments listed in Table II

B. Time and space complexity analyses
In this section, we analyze the operation count and storage requirements for the proposed algorithm and its patch-based version.
In this analysis, M i represents the number of inner iterations for iteratively solved subproblems, assumed common to the different subproblems, and we assume K ≪ N .Also, R θ(t) and R T θ(t) are implemented as operators, each requiring O(N 2 ) when applied to a N × N image or a projection of size N .
1) Storage requirements for the input quantities: When calculations are performed sequentially in t as in the following sections, the storage requirements for the primal and dual variables are calculated for f t , γe t , Ψ T e t and Λ.The storage of the spatial basis functions Λ dominates the overall storage cost at O(KN 2 ).The space complexity analyses of different subproblems in the following sections consider terms other than these input quantities.
2) RED-PSM: Expanding the Λ and Ψ subproblems in Algorithm 2 in t, we have and We first analyze the operation counts for gradients with respect to Λ and Ψ T e t , respectively.We note that in these gradient computations, the term R T θt g t is pre-computed and stored before the start of the algorithm, and (γ − f )e t and (R T θ(t) g t + (γ − f )e t ) are computed and stored at each bilinear ADMM outer iteration so that they do not contribute to the respective costs in each inner iteration, i.e, their cost does not scale with M i .
Starting with the gradient of ( 19) with respect to Ψ T e t , we have The overall operation count for this term is O(KN 2 P ) due to the computation of R θ(t) Λ T and Λ T (γ − f )e t at each t.These terms do not need to be computed for each inner iteration.When Ψ = U Z is used, the gradient for Z requires the multiplication of (20) with U T e t , adding a minor operation count O(dK).
The most efficient implementation in terms of space complexity is the sequential computation of the gradients for each t without increasing the operation count.It is also possible to parallelize the operations along t with the trade-off between the run time and the storage requirements.
Considering sequential computation for each t, the additional storage requirement is O(N K) due to R θ(t) Λ.
Next, we consider the gradient of ( 18) with respect to The operation count for this gradient term is O(KN 2 P M i ) due to R T θ(t) R θ(t) Λ(Ψ T e t )(e T t Ψ).Assuming sequential computations in t, the additional storage requirement is determined by the storage of the terms ).The operation count for the efficient implementation of the f -step in Line 4 of Algorithm 2 is dominated by the single forward computation of the denoiser for each t, leading to O(P C D ) where C D is the operation count of the denoiser D ϕ for a single frame.In addition to the input variables, the only storage requirement is due to the term ΛΨ T e t which is O(N 2 ).
F for each spatiotemporal basis pair for the setting described in Section VII-C.

3) Patch-based RED-PSM:
The patch-based variant of RED-PSM uses a reduced spatial size N B for its inputs where N B ≪ N .Compared to the originally proposed RED-PSM formulation in (5), the operation count is also dependent on the stride s of the patch extraction operators and the depth of the deep denoiser architecture.Assuming equal-depth or shallower deep convolutional denoisers for patch-based inputs, the total operation count due to the efficient denoiser step stays the same or decreases.Thus, the operation count is again determined by the gradient of the data fidelity term with respect to the patch-based spatial basis functions, and the result approximately scales by the increase in the total stored number of pixels as O(KN 2 P M i ( N B s ) 2 ) where the stride s ≤ N B .The gradients with respect to the data fidelity term require the computation of projections of multiple spatial patches for a given view angle at time t.However, these projections can be computed separately and accumulated.Thus, the space complexity reduces to O(KN 2 B ) since we only need to store a single spatial patch of Λ at a given time.
A summary comparison of the two variants of RED-PSM of this section is provided in Section IV-D of the manuscript.

C. Sample reconstructed spatial and temporal basis functions
To help interpret the operation of the proposed RED-PSM algorithm, we display the reconstructed spatial and temporal basis functions Λ and Ψ for the time-varying walnut scenario with P = 256 and DCT-II latent temporal basis U in Figure 12.The energy corresponding to each basis pair F is shown in Figure 13.

D. Proof of Theorem 1
In this section, we provide the detailed steps for the proof of Theorem 1.
We begin by establishing some simple consequences of the assumed properties of the denoiser.First, the objective H(f, Λ, Ψ) (6) is lower bounded by H ∈ R because the RED regularizer ρ(f ) is non-negative, as follows from the strong passivity assumption on D ϕ .
Second, we have the following result.Third, we establish strong convexity of the objectives in the different subproblems in Algorithm 2. Clearly, for any choice of positive constant ξ the objectives S Λ (12) and S Ψ (13) in the subproblems for Λ and Ψ are strongly convex with moduli α Λ ≥ ξ and α Ψ ≥ ξ, respectively.
As we show next, with the assumed β > L, the objective S f in ( 14) is strongly convex too with modulus α f ≥ β − L > 0.
To prove this, we rewrite S f = ω(f )+ 1 2 (β −L)∥f ∥ 2 F where All that remains, is to show that ω(f ) is a convex function.This follows immediately by recalling Lemma 2, that λρ(f ) is gradient Lipschitz continuous with gradient Lipschitz constant L = λ(1 + L D ), and applying Lemma 3 below to the first two terms of ω(f ).Lemma 3. (Thm.2.1 of [141]) If ρ(f ) : R n → R is Lipschitz continuously differentiable on a convex set C with some Lipschitz constant L, then ϕ(f ) = ρ(f ) + 1 2 β∥f ∥ 2 F is a convex function on C for every β ≥ L.
With these preliminaries established, our proof of Theorem 1 follows steps similar to [116](Sec.4.2), but for our own Algorithm 1: 1. Bounding the size of successive differences of the dual variables by those of the primal ones.
2. Showing that L β [f i , Λ i , Ψ i ; γ i ], the augmented Lagrangian, is a lower-bounded decreasing function.
3. Combining the first two steps and showing convergence to a stationary solution.
The following result upper bounds the successive differences of the dual variable by those of the primal.Lemma 4. Using the update rules in Algorithm 1, the following holds: where L = λ(1 + L D ).
That is, the augmented Lagrangian is lower bounded.
Combining these results, we achieve the inequality in (25), which concludes Part 1.

Fig. 3 :
Fig. 3: Ground-truth frames uniformly sampled in time for P = 4, for the time-varying walnut (top) and compressed object (bottom).

4 )Fig. 5 :Fig. 6 :
Figure 7.The comparison shows the better performance of RED-PSM at recovering the finer details clearly.2) PSNR vs. t Comparisons: To complement the cumulative metrics in Figure4and TableIIand the "snapshot" qualitative comparisons in Figure5, we study how the reconstructed image frame PSNRs vary over the reconstructed time interval.The per frame PSNRs (in dB) of the walnut and compressed object reconstructed with P = 256 by the different methods are shown in Figure8as a function of t.For TD-DIP, the best PSNR obtained using a "stopping oracle" is reported, with the red shading indicating, for each t, the interval between the highest and lowest PSNR in three runs with different random initial conditions.For the warped walnut object, RED-PSM provides consistently better PSNR than the best-case TD-DIP for all t.For the compressed object, the same is true at about 70% of t points.Figure8also shows transient effects at the beginning and the end for both objects and all methods.In scenarios such as the object compression experiment, in which the initial and final state are static and could be measured using multiple projections, such transients could be eliminated.Similarly, in quasi-periodic scenarios such as cardiac imaging the effect of such transients would be minimal.3)Effect of Initialization: The initialization of Λ, Ψ, and f plays an important role in the performance and convergence speed of RED-PSM.We observe significant speed-up when rather than a random initialization, we initialize the algorithm with ProSep[93] estimated reconstruction.Figure9shows PSNR vs. iterations comparison for different initialization techniques for the dynamic walnut object with P = 256.The rest of the parameters were selected identical to those indicated in Supplementary Material TableV.This experiment highlights the advantages of initializing with ProSep estimated basis functions: eliminating the need for multiple runs for a best-case result; and speeding up convergence considerably.Combined also with the convergent algorithm eliminating the need for an unrealistic stopping oracle and the theoretical analysis, RED-PSM provides improved reliability which is of practical significance.4)Cardiac dMRI data experiments: In this setting, different to previous experiments, we used 4 k-space radial lines

Fig. 9 :
Fig. 9: Advantage of ProSep-based vs. random initialization of RED-PSM (see Sec. VI-C).For random initialization, the area in blue between the best and the worst PSNR for each iteration highlights the varying performances of five different runs with different random initializations.

Fig. 11 :
Fig.11: Reconstruction PSNR and SSIM for the time-varying walnut vs. the number of distinct view angles P using RED-PSM with different total number of views P .

F 2 F
and show that ω(f ) is a convex function.Since 1 2 (β −L)∥f ∥is convex quadratic for β > L, it then follows (by one of the alternative definitions of strong convexity) that S f is a strongly convex function with modulus α f ≥ β − L.

TABLE II :
Reconstruction accuracies for for different P .Random

TABLE III :
Reconstruction accuracies for RED-PSM and TD-DIP

TABLE IV :
Performance comparison for different denoiser types for RED-PSM.Patch-based denoisers also use DnCNN and have the same configuration and training policy as the full image denoiser.
are provided in Table V.Likewise, Table VI shows the parameter configurations for the denoiser type comparison experiments for RED in Table IV.Finally, the architectural information for the DnCNN denoisers used throughout this work is in Table VII.

TABLE V :
The parameter selections for the reconstructions in TableII.The latent penalty weight was selected as ξ=10 −1 the walnut, and ξ=10 −3 for the compressed object experiments.

TABLE VI :
Parameter configurations for the denoiser type study experiments in TableIV.