GPU Accelerated 3D Tomographic Reconstruction and Visualization From Noisy Electron Microscopy Tilt-Series

We present a novel framework for 3D tomographic reconstruction and visualization of tomograms from noisy electron microscopy tilt-series. Our technique takes as an input aligned tilt-series from cryogenic electron microscopy and creates denoised 3D tomograms using a proximal jointly-optimized approach that iteratively performs reconstruction and denoising, relieving the users of the need to select appropriate denoising algorithms in the pre-reconstruction or post-reconstruction steps. The whole process is accelerated by exploiting parallelism on modern GPUs, and the results can be visualized immediately after the reconstruction using volume rendering tools incorporated in the framework. We show that our technique can be used with multiple combinations of reconstruction algorithms and regularizers, thanks to the flexibility provided by proximal algorithms. Additionally, the reconstruction framework is open-source and can be easily extended with additional reconstruction and denoising methods. Furthermore, our approach enables visualization of reconstruction error throughout the iterative process within the reconstructed tomogram and on projection planes of the input tilt-series. We evaluate our approach in comparison with state-of-the-art approaches and additionally show how our error visualization can be used for reconstruction evaluation.


INTRODUCTION
T OMOGRAPHIC reconstruction is a process of transforming a series of tilted projection images (tilt-series) into a 3D volumetric representation (tomogram), where the information is stored in voxels organized in a regular grid.The problem was first addressed in the mathematical field by Johann Radon over a century ago.It was later put into practice in the 1930s as focal plane tomography using X-ray imaging in medicine, and many other modalities such as magnetic resonance imaging (MRI) [1], positron emission tomography [2], [3], functional MRI [4] and others within the medical domain and broader.The modalities mentioned above usually produce a series of uniformly distributed projections with small amounts of noise thanks to the characteristics of the imaging techniques and the objects under reconstruction.However, this is not true for specific domains where limitations are imposed in the imaging technique to avoid destroying the sample, as it is the case for high-energy electron microscopy (EM) techniques applied to fragile biological samples.For instance, cryo-electron tomography (cryo-ET) is capable of revealing the viral and cellular structures at molecular, or even atomic details [5], [6].To avoid damaging the sample during the image acquisition in these techniques, the energy of the electron beam and the number of projections is limited.Due to these limitations, projection images created in electron microscopy have a low signal-to-noise ratio, a limited number of projections, and a limited maximum tilt angle.
The tomographic reconstruction problem is addressed using a variety of algorithms, such as Fourier-domain reconstruction [7], back-projection reconstruction [8], fanbeam reconstruction [9], algebraic reconstruction techniques [10], [11], [12], and recently deep learning [13], [14].While the first three approaches usually work well when we have a large number of projections with relatively low noise levels, they produce poor results when there is a limited number of projections or under high levels of noise, as is the case for electron microscopy.Algebraic reconstruction techniques provide better results under these conditions, but they require several iterations making them computationally more expensive.
There have been distinct approaches to improve the reconstruction quality for limited angle projection data, including deep-learning-based reconstruction methods [13] or traditional reconstruction techniques (e.g.back-projection reconstruction) coupled with a deep learning denoising approach.Here, denoising is applied either on the tilt-series before the reconstruction [15], [16], [17], or after the reconstruction on tomograms created with traditional techniques [18], [19].One difficulty with such complex image processing pipelines is that the final reconstructions cannot easily be traced back to the individual raw input data to obtain an estimate of the reconstruction error or uncertainty.This makes it difficult to assess how reliable and trustworthy the reconstructions are, given that cryo-ET operates close to the physical limits and there are usually no alternative imaging methods to validate the results.
In this article, we present a GPU accelerated iterative tomographic reconstruction approach that integrates denoising in the reconstruction process by alternating between minimizing the re-projection error with algebraic reconstruction techniques and decreasing the noise using various regularization and denoising methods.The reconstructed tomograms are much cleaner than one could obtain using the existing approaches, removing the need to denoise the tilt-series prior to the reconstruction or denoise the reconstructed volumes.More importantly, the achieved reconstructions are denoised and, at the same time, directly tied to the original projection data in a way that enables a direct consistency check between the reconstruction and the raw input projection images.Once reconstructed, the tomograms can be visualized in 3D with volume rendering tools included in the framework.Moreover, we can visualize the error distribution and magnitude during the reconstruction process on individual projection planes or within the reconstructed tomogram.
To summarize, the main contributions of this paper are: a novel GPU accelerated technique for cryogenic electron tomography which iteratively reconstructs and denoises the tomograms, achieving excellent denoising results and sharp details in a matter of minutes; our approach is based on a flexible and robust optimization framework that integrates reconstruction and denoising, removing the necessity of pre-reconstruction or post-reconstruction denoising and maintaining the relationship between the original tiltseries and the reconstructed tomogram; a method for visualizing the reconstruction error throughout the reconstruction, either on individual projections or within the reconstructed volume.In Section 2 we present how our approach relates to the existing research.In Section 3 we present our method and implementation details followed by the results and evaluation in Section 4. In Section 5 we present the conclusions and possible future extensions and adaptations of the presented method.

RELATED WORK
There are two well-known families of techniques for tomographic reconstruction.The first, direct methods, encompasses methods that exploit the Fourier slice theorem [20].
An overview of these methods can be found in a book by Hsieh [21].The second, iterative methods, is a family of iterative algorithms that aim to solve tomographic reconstruction as a system of linear equations.
One of the first techniques from the direct methods is Weighted Back-projection (WBP) [22], with later adaptations [23], [24], where the reconstruction includes two main steps.First, the projections are filtered (typically in the Fourier domain) to avoid over-smoothed reconstructions due to the imbalance in the available low and high-frequency information.Second, the filtered projections are back-projected into the reconstruction using the adjoint operator of the Radon transform [21].
The iterative reconstruction methods begin with the Algebraic Reconstruction Technique (ART) presented by Gordon et al. [10] and later adaptations the Simultaneous Iterative Reconstruction Technique (SIRT) [11] and Simultaneous Algebraic Reconstruction Technique (SART) [12].In ART methods, the object is reconstructed into a discrete grid by iteratively projecting the current volume estimate, creating corrections based on the difference between the original projections and re-projections of the current volume estimate, and then back-projecting the corrections into the volume.
It has been shown that iterative methods outperform the FBP methods in the case of limited projections [25].Moreover, the noise is handled better by algebraic methods [8].These are also the reasons we implement our method as an iterative method which improves the reconstruction results by minimizing the error term.For an overview of more 3D reconstruction methods for multiple EM modalities, see work by Sorzano et al. [26].
As Pyle and Zanetti [27] already show, there are many software packages and toolboxes available for tomographic reconstruction such as IMOD [28], Air II -a Matlab toolbox [29], ASTRA Toolbox [30], [31] and its Python wrapper TomoPy [32], and AuTom [33] which are integrated in many microscopy pipelines.IMOD implements the FBP and SIRT approaches, Air II, and AuTom implement ART approaches, and ASTRA Toolbox offers FBP, SIRT, SART, and other reconstruction approaches.Most of these packages offer CPU as well as GPU implementations.Our approach solves tomographic reconstruction as an optimization problem, using SART or SIRT together with various regularizers to integrate denoising in the reconstruction process, and it is implemented for execution on GPUs.
In the specific case of cryo-ET on biological structures, the measured projections are very noisy.The reason for this is the destructive nature of the image acquisition process, where the specimen gets destroyed if exposed to the electron beam with higher energy.To avoid the sample damage due to the electron radiation, the image acquisition is performed using lower electron doses, resulting in a lower signal-to-noise ratio in the output tilt-series.A good overview of the current cryo-ET reconstruction approaches is presented by Donati [34] and the current strategies on data processing for cryo-ET as well as for subtomogram averaging are presented by Pyle and Zanetti [27].The authors give a partial but extensive list of software tools used in individual steps of the cryo-ET pipeline before and after the tomographic reconstruction and identify their functionalities.
Additionally, due to the tilt limitation in cryo-ET, there is the missing wedge problem due to the lack of projections at high tilt-angle views [35].While the effect of the missing information can be minimized already during the reconstruction step, this problem is usually addressed separately, since adding the missing information into the data is not part of reconstruction when not considering a single particle approach where the same particle is present in many parts of the sample.This is true for works presented by Trampert et al. [36], Yan et al. [37], and Ding et al. [38].
In order to handle angle-limited projection data, W€ urfl et al. [13] proposed a deep-learning-based reconstruction method for limited angle tomography.Another common attempt is to couple a reconstruction technique, e.g., backprojection reconstruction, with a deep learning denoising approach.Here, denoising is applied either on the tilt-series before reconstruction [15], [16], [17], or after the reconstruction on tomograms created with traditional techniques [18], [19].Alternatively, reconstruction methods including regularization terms have been proposed to handle this ill-posed problem, but only for 2D synthetic limited angle data [39], [40].An example of the current reconstruction pipeline for obtaining state-of-the-art results in cryo-electron tomography (cryo-ET) is presented by Yao et al. [41].It consists of downsampling the reconstructed tomograms using a binning approach and then applying subtomogram averaging to manually selected subtomograms.The tomograms are reconstructed using IMOD's weighted back-projection [28], and then binned 4Â for further processing.Hundreds of tomograms are reconstructed and binned, and thousands of subtomograms are manually selected to apply subtomogram averaging.
Due to the low signal-to-noise ratio in cryo-ET tilt-series and the reconstructed tomograms, there is a need to denoise the data either before or after the tomographic reconstruction or even at both times.A recent overview on using the denoising in the cryo-ET reconstruction by Frangakis [42] presents what kind of methods are typically used prior or after the tomographic reconstruction ranging from simple linear, non-linear, bilateral, and trilateral filters in the spatial domain, filters in Fourier and Wavelet domains and denoising approaches using neural networks [16], [18].The denoising using neural networks requires further studies to be well understood.There are still some concerns on whether such methods used in the pre-reconstruction step can introduce additional artifacts and/or biases.In our approach, we use a well-understood approach to denoising during the reconstruction step, which to the best of our knowledge does not introduce any biases into the data.
Multiple model-based iterative reconstructions approaches have been proposed for cryo-ET.Goris et al. [43] present a regularized algorithm that combines SIRT with Total Variation (TV), [44] for cryo-ET of nano-structured materials.These samples are not as susceptible as biological samples which allows using higher electron doses, obtaining projections in a higher angular range (À70 to 78 Þ and a higher number of projections (each 2 , 75 projections) with significantly less noise.B€ ohning et al. [45] present a cryo-ET reconstruction method based on compressed sensing.They propose using 3D secondorder TV for tomographic reconstruction, casting the reconstruction problem as a regularized optimization problem, solved using a primal-dual hybrid gradient method implemented in python using ASTRA Toolbox [30], [31] for the data term.The virus particles in the tilt-series are separated and centered on the tilt axis for individual reconstruction.The reconstructed individual particles are then cropped and used for sub-tomogram averaging.The pipeline requires manual intervention between the steps, and relies on several software tools for the different steps.Yan et al. [37] applied the Model-Based Iterative Reconstruction (MBIR) [46] algorithm for cryo-ET reconstruction.The MBIR framework uses a probabilistic model for the measurements (data term) and a probabilistic model for the object (regularization).Our framework is based on proximal algorithms [47], which allows for seamless combination of different data term solvers and denoisers.We experimented with SART and SIRT for the data term, and TV, Huber loss [48] and Non-Local means (NLM) [49] for regularization.Parting from aligned tilt-series, our framework is a self-contained software that allows 3D visualization of tomograms created through integrated reconstruction and denoising.
It is expected that the reconstruction processes are not perfect, which is especially true when done with insufficient input data in the first place.Uncertainty visualization is a central topic in visualization both for 2D as well as in 3D data as is reflected by multiple surveys [50], [51], [52], [53], [54].There are only a few methods that address uncertainty in volume visualization, and it is not common for the current cryo-ET reconstruction pipelines to provide this uncertainty to the user, which is an option in our case.We provide an error tilt-series, which includes a re-projection error image for each original projection image.The error tilt-series is reconstructed into an error volume that shows the uncertainty in the reconstructed volume.The work by Kniss et al. [55] allows users to explore the uncertainty of surface boundaries in volumetric data interactively.Lundstr€ om et al. [56] presents how probabilistic animation can be used for uncertainty visualization in medical volume rendering, which was later also applied to concurrent visualization of real-time fMRI data presented by Nguyen et al. [57].Lindholm et al. [58] propose the use of spatially conditioned transfer functions using local material distributions, which can also be applied to uncertainty visualization.This results in using separate transfer functions for individual modality, e.g., one being the volume of reconstructed values and the other volume of uncertainty values, and blending them in the rendering process.We experimented blending the error volume and the reconstructed volume in the rendering process, but found that visualizing the error and the reconstruction individually allowed for an easier interpretation of the error.Dinesha et al. [59] explore an option of using HDR volumetric rendering where regular transfer function is used for mapping the intensity values into HDR color space, and the luminance component of the color is exploited for capturing uncertainty.

METHOD
We propose a reconstruction framework that creates denoised 3D tomograms from previously aligned noisy 2D tilt-series and allows direct visualization of the reconstructed tomograms.We present tomographic reconstruction as a regularized optimization problem, which is solved using a proximal joint-optimization technique that integrates denoising and reconstruction, and we describe the proximal operators that are required in the framework.A diagram of our method's pipeline is depicted in Fig. 1.

Tomography as an Optimization Problem
The projection images coming from cryogenic electron tomography represent the transmission images of parallel rays passing through a sample.The image formation model is derived from the Beer-Lambert Law [60].The measured intensity for a particular pixel p i is where r p i ðtÞ is a ray associated with the pixel p i , s is the attenuation cross-section, commonly known as density and Î0 ðp i Þ is the intensity of the electron beam.The goal is to reconstruct a discrete volume s s s s s s sðx; y; zÞ from the available ray integrals, usually in log space as In log space, the relationship between the projections and the 3D volume becomes linear and is simply the line integral of the volume densities sðx; y; zÞ along the ray r p i ðtÞ.
With this, the discretized image formation model can be represented by a weight matrix W W W W W W W that contains the contribution of each volume element for each ray integral as in algebraic reconstruction techniques, where tomographic reconstruction is presented as a linear algebra problem is the unknown volume that we want to reconstruct, p p p p p p p 2 R M is the set of known ray integrals (tilt-series), and the weight matrix W W W W W W W 2 R MÂN represents the contribution of each voxel to each ray integral.We formulate tomographic reconstruction as the following optimization problem The first term in Eq. ( 3) is a data fitting term that measures how good of a fit the current volume estimate is to the original projection data.The second term in Eq. ( 3) is a nonlinear regularization term that constrains the space of acceptable solutions, which is required to deal with the noisy and angle-limited tilt-series from cryogenic electron tomography.To apply known non-linear solvers, we rewrite the data term as where G is a non-linear function and K K K K K K K is a matrix.Then, we rewrite Eq. ( 3) as the following constrained optimization problem 3.2 Proximal Framework for Cryo-ET Eq. ( 4) is a standard problem in numerical optimization, and there are many non-linear solvers with guaranteed convergence when F and G are convex.There is a family of such solvers called proximal algorithms, which are powerful and flexible optimization algorithms that can be used to solve non-smooth, constrained, distributed or large-scale optimization problems.Proximal algorithms have a modular nature, since the optimization problems are split into smaller sub-problems called proximal operators, and each sub-problem can be solved independently.In our case, we require proximal operators for F and G. Let HðÁÞ be a function that we want to minimize (i.e., either F or G).The proximal operator of HðÁÞ can be interpreted as getting closer to the minimum of HðÁÞ while staying close to a reference point u u u u u u u, and is defined as [47] prox H ðu u u u u u uÞ ¼ arg min where The proximal operators of many functions are easy to compute and have a closed form solution.Having the proximal operator for a given set of functions creates the possibility of solving complex optimization problems using proximal algorithms that combine these proximal operators, in our case, the proximal operators of F and G.

Linearized Alternating Direction Method of Multipliers
The Linearized Alternating Direction Method of Multipliers (LADMM) [47] (Algorithm 1) is a proximal algorithm that can be used to solve optimization problems of the form in Eq. ( 4).
Algorithm 1. Linearized ADMM þ y y y y y y y ðtÞ Þ y y y y y y y ðtþ1Þ ¼ y y y y y y y In Algorithm 1, K K K K K K K is a matrix related to the regularization term, r; m arise when manipulating ADMM to solve problems of the form arg min [47] Section 4.4.2),z z z z z z z and y y y y y y y are additional variables.T 1 is the number of iterations of the proximal algorithm.

Proximal Operators
To apply Algorithm 1, we need the proximal operators for our data fidelity term and regularization terms.

Data Term Proximal Operator
We want a proximal operator to solve the data fidelity term in Eq. (3).By definition That is, given a current volume . This is simply a least-squares problem, whose analytical solution is [47] Any linear solver can be used to solve Eq. ( 7).In our implementation, we use SART [12], which is a proven algorithm for tomographic reconstruction with a known proximal operator [61].With SART, we can represent the projection matrix W W W W W W W and its transpose in a procedural way, avoiding the explicit generation of the system matrix, which would otherwise incur an infeasible memory cost.Note that even if W W W W W W W is not full rank, the inverse in the proximal operator is well-conditioned thanks to the addition of I I I I I I I.The proximal operator for SART corresponds to Algorithm 2, where T 2 is the total number of SART iterations when calling the proximal operator (independent from T 1 ), a is the relaxation coefficient, m is the regularization parameter of the proximal operator, p p p p p p p is the original projection data and u u u u u u u is an initial estimate of the reconstruction.

Regularization Operators
Anisotropic Total Variation Proximal Operator.This regularizer was proposed by Rudin, Osher, and Fatemi and is known as the ROF model [44].It was conceived under the idea that images with excessive (and probably noisy) details have high total variation (the integral of the absolute image gradient is high).In other words, images are assumed to be sparse in the gradient domain.Anisotropic Total Variation (ATV) is defined as the ' 1 norm of the image gradient.In our framework, we define G ATV ðÁÞ ¼ k Á k 1 and K K K K K K K 2 R 3NÂN as the forward difference matrix that produces the discrete volume gradient.Since the proximal operator of the ' 1 norm is the soft thresholding function [47], the proximal operator of G ATV is the soft thresholding function applied to each element of the volume gradient where at voxel n and r 1 is a regularization parameter that controls how fast the volume gradient shrinks.
Huber Penalty Proximal Operator: This regularizer was proposed by Peter J. Huber [48].The Huber penalty can be applied on the volume gradient to reduce the total variation, avoiding the sparse gradients produced by the ' 1 norm.It is defined as a piecewise function composed of ' 2 and an ' 1 segments x j j d; dð x j j À 1 2 dÞ; x j j > d: ( The proximal operator is again a soft-thresholding operation applied to each element of the volume gradient where r 2 controls how fast the gradient shrinks and d controls the transition point between the linear and quadratic segments in G Huber . Non-Local Means as a Proximal Operator.Any Gaussian denoiser (e.g., Non-Local Means [49]) can be interpreted as a proximal operator with a fixed regularization parameter as the assumed variance of the noise, as shown in the works by Venkatakrishnan et al. [62] and Heide et al. [63].We will use the Non-Local Means algorithm as a proximal operator in our framework since its operation principle resembles subtomogram averaging, and it is a common denoiser in cryo-ET pipelines (e.g., Thermo Fisher's Amira Software [64]).For each pixel in an image, Non-Local Means performs denoising by searching pixels with neighborhoods (square-shaped patches centered at the pixel) similar to that of the pixel under denoising in a squareshaped search zone centered at the pixel.The denoised value is computed as a weighted average of the pixel values in the search zone.The weight of each pixel in the search zone is proportional to the distance to the center pixel and the similarity of the neighborhoods of each pixel and the center pixel.
Assuming some prior distribution Gðv v v v v v vÞ, the Non-Local Means algorithm [49] can be used directly in the framework as a proximal operator (with where The regularization parameter is the assumed variance of the noise s 2 nlm .For details, see the supplementary material, available online. Using the Regularization Proximal Operators.In our implementation, we use TV and Huber penalty interchangeably in combination with NLM.First, we use Algorithm 1 with TV or Huber by performing the z update with Eq. ( 8) or Eq. ( 10) for several iterations (typically 50 À 80).Then, we incorporate NLM to reduce the noise in the reconstructed volume v v v v v v v, using TV+NLM or Huber+NLM for a few iterations.We found that two final iterations with NLM improve the results by further reducing the noise levels without blurring the details in the reconstructed structures.

Implementation Details
Fig. 2 includes a 2D diagram of the reconstruction scene.We can think of this diagram as the reconstruction of a single horizontal slice.The volume (green rectangle) is centered at the origin.The rays travel from the source (red line) to the detector (not in the diagram) which is parallel to the ray source, located on the other side of the volume.In the domain of cryo-ET, the rays present parallel geometry and calculations can be easily parallelized.Our framework includes two implementations for the SART and SIRT proximal operators.The first is based on ray casting [65], and the second is based on voxel splatting [66].In the voxel splatting implementation, the weights are read from pre-integrated lookup tables that are computed by line integrals of rays traversing radially symmetric filter kernels [67] at different distances.A 2D example of the ray-kernel interaction is shown in Fig. 3.

Ray Casting Implementation of SART/SIRT
Forward Projection and Correction Term.From each pixel p i in the ray source, a ray is cast across the volume towards the corresponding pixel in the detector.Samples are taken at equidistant points along the ray.At each sample point, a volume value is retrieved by tri-linear interpolation.The correction term is computed immediately after calculating the line integral as where Iðp i Þ is the measured line integral value of the current ray in the log space, S corresponds to the number of samples, sample rate is the distance between samples (in voxels), v k s is the current volume estimate at sample point s, and ray length is the length in voxels of the ray-volume intersection.
Back-Projection.For each voxel, we cast a ray parallel to the current view pointing to the detector.We compute the point at which the ray intersects the detector, and retrieve a correction value with bi-linear interpolation of the correction image.The voxel is updated with the interpolated correction value, times the relaxation factor.In this way, the forward projection and backward projection are not exact transposes of each other, but we avoid data races in both operations, and the result is practically the same as with the splat-based implementation, where the operators are transpose of each other.

Voxel Splatting Implementation of SART/SIRT
Forward Projection.Each voxel is splatted into the detector and updates all the pixels of a ray-sum image that are inside the splatted voxel's kernel support.A weights image is created accordingly.Data races can occur if two voxels try to update the same pixels at once, either in the ray-sum image or the weights image.
Correction Image.For each pixel in the correction image, the correction term is computed as where a is the relaxation parameter, Iðp i Þ is the measured line integral in the log space, ray sum image½i is the value of the pixel i of the computed projection, and weights image½i is the sum of the weights for the pixel i. Back-Projection.For each voxel, we splat the voxel in the detector and retrieve a correction value by interpolation using the pre-computed weights.The forward projection and back-projection operators are transposes of each other.If we use linear interpolation, the result is practically the same as with the ray-based implementation.There are no data races in the back-projection operation.

CUDA Implementation
Our framework is implemented in CUDA.The most important kernels are described below.Ray-casting Forward Projection: Launches a thread for each pixel in the projected image and each thread computes a line integral and a correction term.It produces a correction image that is used in the backprojection.Ray-casting Back-projection: Launches a thread for each voxel.The corresponding voxel value is updated based on a correction term which is retrieved from the correction image.Splatting Forward Projection: Uses one thread per voxel.A synthetic projection and corresponding weights image are computed as described in Section 3.4.2.Splatting Correction Image: Using one thread per pixel, computes a correction image based on the previously computed synthetic projection, weight image and original projection.TV Proximal Operator: Executes one thread per voxel.Each thread updates the corresponding three components of the variable z z z z z z z.NLM Proximal Operator: Executes one thread per voxel.Each thread updates the corresponding voxel value based on similar patches in the surrounding neighborhood.

Horizontal Padding for Limited Angle Reconstructions
If the reconstructed volume has the same width as the projection images, the rays in the borders of the projection image are going to have a shorter intersection with the reconstruction volume as the tilt angle is increased.As seen in Fig. 2, rays at tilt angle zero intersect the volume exactly at the boundaries, and all the rays intersect the volume for the same distance.Rays from views with maximum tilt angle f will traverse the volume for different distances.
Horizontal padding is required so that the distance that the rays travel inside the volume is the same for each view.For this, the required volume width vw including padding is Where pw is the width of the projection images, u ¼ 90 À f, vd is the volume depth.The obtained value should be rounded up to the next even value.For example, for projection width = 1024, the padded volume width is 2568.Further details are included in the supplementary material, available online.

3D Visualization of the Reconstructions
We use a simple adaptation of the regular Direct volume rendering ray marching approach [68] for visualizing the reconstructed tomograms.The rendering pipeline also includes soft shadows [69] and local ambient occlusion [70].The visualizations include intermediate results (SART+TV) and final results (SART+TV+NLM).

Visualization of the Error
The re-projection error of the reconstruction can be optionally generated.An 'error tilt-series' is created as follows: For each viewing angle included in the input tilt-series, we compute a synthetic projection with the estimated volume, and then a 're-projection error image' is created as the absolute value of the difference between the original and estimated projections.The error contribution in the individual error projections from the tilt-series is spread sparsely throughout the whole projections resulting in a very noisy error image.Using the error tilt-series, an 'error volume' is created by applying SART to the error tilt-series.Finally, to enhance the biggest error contributions for visualization purposes, we apply the following steps: 1) we first apply 3 Â 3 Gaussian blur; 2) next we threshold the intensity values below 1  8 -th of the max value; 3) finally we adjust the gamma value of the image to 2.0 to shift the mid-tones to brighter values.This approach pronounces the regions in the error projection images with the highest error values and also pronounces structural features.

Parameter Settings
We denote the number of iterations as T 2 Â T 1 , where T 1 is the number of iterations of the proximal algorithm, and T 2 is the number of SART iterations per iteration of the proximal algorithm.The last two iterations use TV+NLM, the rest use only TV.Unless otherwise specified, the parameters in our method are: . A final iteration of the data term proximal operator is performed to enforce consistency between the original projections and the reconstructed volume.
There is no set of optimal parameters for every dataset.A parameter search is necessary for every data source (different specimens, different acquisition devices, etc.), but we found that the set of optimal parameters is mostly unchanged for different datasets from the same source (e.g., datasets 1 and 2) and only individual parameters need to be adjusted.For example, the total number of iterations T 2 Â T 1 provides control over the ratio of data fidelity (T 2 ) and regularization (T 1 ), a controls the rate of convergence during SART iterations, r 1 controls the amount of TV regularization on a proximal iteration, the NLM parameters nlm s , nlm w , nlm skip can be adjusted according to the size of the virions present in each dataset and s 2 nlm to control the regularization strength.

Computation Hardware and Time
A Volta V100 (32 GB) GPU was used for the reconstructions, which took about 30 minutes for the SARS-CoV-2 datasets and 1 hour for the HIV-1 dataset.

Experiments With Non-Local Means
We experimented with 3D and 2D NLM.The search region and the neighborhoods have a finite extent, are squareshaped in 2D and cube-shaped in 3D.The side of the search zone is controlled with nlm s parameter, and is 2 Â nlm s þ 1 (always odd).The side of the neighborhoods is controlled similarly, with parameter nlm w .The size of the neighborhoods should be set accordingly to the size of sub-structures in the tomograms.We noticed that the reconstructions obtained with 3D and 2D NLM were very similar, and since the computational cost of the 2D version is much lower, we selected it for the rest of the experiments.However, we still found the 2D version computationally expensive, and we experimented with skipping some of the pixels in the search zone.The parameter nlm skip controls the pixel skipping.There are nlm skip ignored pixels between every two considered pixels on each axis of the search region.For the 2D versions, we found that the quality of the result was not affected significantly for nlm skip 3, and the computational cost was greatly reduced.Care should be taken to include the pixel in the center of the patch when skipping pixels since it is the one with the higher weight/contribution.We analyzed two different versions of 2D NLM: XY NLM: Applying 2D NLM on each XY slice of the volume XZ NLM: Applying 2D NLM on each XZ slice comparing neighborhoods on XY slices, with the idea of balancing the resolution between XY and XZ (tiltaxis) slices.However, the XY slices produced by XZ NLM presented horizontal line-shaped artifacts, and there was not a noticeable improvement in the image quality on the XZ slices (Fig. 4).Therefore, we selected 2D XY NLM as the default NLM operator for the framework.We experimented with applying NLM in the last, in the last two, and in the last three iterations only.We found that two iterations are enough to achieve good denoising, with nlm s ¼ 21, nlm w ¼ 7, nlm skip ¼ 3, and more iterations tend to blur the details.

TV Versus Huber Penalty
A comparison between TV+NLM and Huber+NLM is presented in Fig. 5.The sparse gradients enforced by TV achieve a cleaner, more homogeneous background.Huber achieves a slightly noisier reconstruction, but sharper details in the reconstructed virus particles.We refer the reader to the supplementary material for additional comparisons between TV+NLM and Huber+TV, available online.The results in the following sections were obtained using TV+NLM.

Masking the Fiducials
In order to minimize the artifacts coming from the fiducial markers, we created masks corresponding to the fiducial markers for each projection imaged for the two SARS-CoV-2 datasets.The masks were used to avoid back-projecting the pixels corresponding to the fiducials during the execution of Algorithm 2. An example of the artifacts removed by masking the fiducials on dataset 1 can be seen in Fig. 6, which is an XZ slice.Additional images can be found in the supplementary material, available online.

Ablation Study
To show the contribution of the different regularizers in our framework, we created reconstructions of dataset 1 with different configurations.The results are shown in Fig. 7. From left to right, first we have 2Â80 iterations of proximal iteration (Section 4.1 of [47]) using the SART proximal operator (Algorithm 2).The second and third images correspond to LADMM (Algorithm 1) using SART+TV (Algorithm 2, Eq. ( 8)), after 2Â20 and 2Â80 iterations respectively.The rightmost image corresponds to 2Â80 iterations of LADMM with Total Variation, using Non-Local Means instead of Total Variation in the last two iterations, plus a single final execution of the data term operator in the end.Without regularization, the reconstructions fit the noise in the projection data, leading to noisy reconstructions where it is hard to distinguish between virus particles and background.Regularization allows creating cleaner reconstructions by alternating between reconstruction and denoising, greatly reducing the noise and reconstructing rich details with the pass of iterations.Finally, Non-Local Means further reduces the noise in the empty regions while preserving the features of the virus particles.

Progress Over Time
We can see how the reconstruction progresses over time in Fig. 8, which was created from a tilt-series of another SARS-CoV-2 specimen (dataset 2) with 3Â55 iterations of LADMM.With the advance of the iterations, the features of the virus particles are gradually reconstructed and at the same time, the noise is diminished steadily.If we use NLM in the last two iterations, the noise in the empty regions is greatly reduced without significantly distorting the features of the virus particles gained through the reconstruction process.After the 55 iterations, a last execution of the data term operator is performed to ensure that the reconstructed volume fits the original projection data.The 3D visualizations at different stages show that the noise is gradually removed, and the final result allows visualization of the virus particles.Reconstructions with up to 3 Â 100 iterations are available at the supplementary material, available online.

Comparison With Another Model-Based Reconstruction Method
We compare our results against those obtained with MBIR [37] on dataset 1.The reconstruction parameters for MBIR were: diffuseness 0.2, data offset 32768, smoothness 0.08, sigma X 1:52677 Â 10 À5 , 600 outer sub-iterations, 100 inner sub-iterations.The reconstruction time was 4 hours and 30 minutes (roughly 9 times slower than our method) using a processor Intel Xeon E5-2687W v4 (2 Â 3 GHz processors).The results are very alike (Fig. 9), which is expected given the similarity between the two techniques.Additional results in orthogonal slices are included in the supplementary material, available online.

Comparison With a Pre-Reconstruction Denoising Method
We processed dataset 1 using Topaz Denoise [16], developed for denoising cryo-ET tilt-series.The denoised tiltseries was reconstructed with 20 iterations of SART without regularization since the tilt-series was denoised and using additional regularization (e.g.LADMM with TV) produced over-regularized (blurry) reconstructions.We compare the result with the one from our framework using LADMM 2Â80 iterations with TV, NLM instead of TV in the last 2 iterations plus a final data term iteration (Fig. 10).Regardless of the pre-filtering applied to the tilt-series, the result using Topaz Denoise still contains noise in some sections of the reconstruction, and some details in the virus particles (e.g. the particle in the top-left corner) are blurred.Our result presents homogeneous denoising and sharper details.We created reconstructions from three tilt-series including the influenza virus particles (dataset 3) using IMOD, IsoNet, and our framework.First, we used IMOD's SIRT algorithm to create reconstructions with 10 and 20 iterations.We selected the result after 10 iterations since it was cleaner and presented better contrast.This is in accordance with what we found when running more than a hundred iterations with only SART in our ablation studies.We used IMOD's tomograms (bin-4 and bin-8) for training IsoNet (3 tomograms were used for training) for 20 iterations with the recommended configuration [19].Finally, we created reconstructions with our framework, using LADMM for 2Â80 iterations with and without NLM in the final two iterations.The results are displayed in Fig. 11.IMOD's SIRT allows to Fig. 7. Reconstructions from a SARS-CoV-2 tilt-series (dataset 1) using our framework with different settings.Without using regularization (first from the left), the reconstruction fits the noise in the projection data.By alternating between reconstruction (SART) and denoising (TV), the features are reconstructed and the noise is removed with the pass of iterations (second 2x20 iterations, third 2x80 iterations).Including NLM in the last two iterations further reduces the noise, preserving the reconstructed features.
reconstruct features in the virus particles, but there is a strong presence of noise in the reconstruction, which would only worsen with more iterations.IsoNet bin-4 results present slightly better contrast than IMOD's, but the noise level is very similar.IsoNet bin-8 achieves better contrast than bin-4 and a moderate level of denoising, but at the expense of image resolution.Our method manages to reconstruct the features of the virus particles and achieves excellent denoising results thanks to the iterative joint reconstruction and denoising nature of our technique.

HIV-1 (EMPIAR-10643)
We created reconstructions from tilt-series including 'HIV-1 GagdeltaMASP1T8I assemblies' (dataset 4) available at the Electron Microscopy Public Image Archive (EMPIAR-10643) [71], [72].This dataset presents repetitive structures that are expected to work well for Non-Local Means but also any other denoising method based on self-similarity, including classical subtomogram averaging and IsoNet.We   ), and ours using TV+NLM (NLM in the last 2 iterations).Additionally, the deconvolution step in IsoNet's framework was used for all the methods.2D vertical slices and 3D visualizations of the results are depicted in Fig. 12. IsoNet bin-3 achieves results very similar to IMOD's reconstruction (see supplementary material for IMOD versus IsoNet bin-3 versus IsoNet bin-8, available online).IsoNet bin-8 achieves good denoising results, but the final result presents lower resolution than the other methods (1366Â1366Â380 versus 512Â512Â150).Our technique achieves cleaner and sharper reconstructed features.To compare the sharpness and contrast of the results, we used a method suggested by a field expert.The idea is to plot the intensities along a line to create an intensity profile, and we chose this dataset since it presents repetitive 'spikes' in the outer part of the virus particles, which is perfect for this method.The volume intensities were normalized from 0 to 255.We perform this analysis for IMOD's, IsoNet's bin-3, and our results on a line across the lower spikes of the leftmost particle in Fig. 12.A close-up image included the line is shown in Fig. 13, and the respective intensity profiles correspond to Fig. 14.In the intensity profiles, the maximum peaks correspond to empty space between the spikes, and the minimum values correspond to the spikes.For the majority of the spikes, our method presents smaller minimum values (darker spikes).In IMOD's and IsoNet's results there are several saturated blobs in the empty regions resulting from the deconvolution step due to the higher presence of noise, Fig. 12. 2D slices and 3D visualizations of reconstructions from HIV-1 tilt-series (dataset 4, EMPIAR-10643) using different methods.IMOD reconstructed the virus particles, but the 3D visualization is noisy.IsoNet reduces the noise from IMOD's reconstruction and allows for 3D visualization (at half the spatial resolution).Our method achieves cleaner results that produce cleaner 3D visualizations.producing higher maximum values in the profile.Our method presents a more homogeneous background with significantly less blobs thanks to the achieved level of denoising.There is a clear visual difference in the reconstructed quality of the spikes between our method and the others, which can be appreciated in the regularity of Ours' intensity profile and at the spike at around 95th pixel (the sixth minimum in our profile), since the spike is counted as two in the other methods.

Additional Comparison With IsoNet
We performed additional comparisons between our method (bin-4) and IsoNet (bin-8, half the spatial resolution) using dataset 2. The results are illustrated in Fig. 15.IsoNet achieves better denoising results, which allows for a more clear view on the internal structure of the virus particles.However, the spike's stem used to connect the membranes should have some density, and the spike's head should not have the encircled density.Model based regularized approaches can provide more reliable results than learning-based reconstruction or refining methods.

Example of Error Visualization
Assuming that the reconstruction process was successful, the error should be uniformly distributed and not correlated with the reconstructed volume.We created two reconstructions of a tilt-series part of the EMPIAR-10643 dataset, one with good alignment, and one with a misaligned version of the same tilt-series.In Fig. 16, we compare slices of the error volumes.The left image corresponds to the error volume of the reconstruction created with the aligned tilt-series, and the right image to the error volume of the reconstruction created with the misaligned tilt-series.The images were processed as described in Section 3.6 to enhance the structures in the error.The error slice from the misaligned tiltseries contains visible structure in comparison with the aligned version, and there are white diagonal smudges corresponding to fiducials (both encircled in yellow).
We can observe the positive effect of regularization in Fig. 17, which includes slices of the error volumes from a nonregularized reconstruction (SART only, left) and a regularized reconstruction (SART+TV+NLM, right).The error corresponding to the non-regularized reconstruction includes structures that clearly resembles the virus particles (encircled in yellow, see Fig. 7 for reference).In contrast, the error from the regularized reconstruction presents practically no structure correlated to the volume, which indicates that the reconstructed virion's features obtained with regularization match the features available in the original projection data.

CONCLUSION
We have presented a new GPU accelerated framework for tomographic reconstruction of tilt-series from cryogenic electron microscopy.The framework is based on a flexible and robust optimization framework that integrates the reconstruction and denoising processes and achieves sharp denoised reconstructions, while preserving the relationship between final reconstruction and original tilt-series.The framework transforms noisy tilt-series into denoised visualizations in a short time and using a single piece of software.We demonstrate improvements in denoising and reconstruction quality compared to state-of-the-art methods.The framework's source code will become available upon publication.Given the flexibility of the proximal framework, a possible next step could be the use of self-supervised deeplearning-based denoising approaches as additional regularizers in the framework.

Fig. 1 .
Fig. 1.Pipeline of our framework.The input is aligned tilt-series, which are used to create denoised tomograms by alternating between reconstruction and denoising instead of denoising the tilt-series or the resulting tomogram.The re-projection error and a corresponding error volume can be generated optionally.

¼ u u u u u u u y y y
y y y y ð0Þ ¼ 0 for t ¼ 1 . . .T 2 do for all projection images P u do y

Fig. 2 .
Fig. 2. Top view of the reconstruction scene.Horizontal padding is required for the extreme views.

4. 11
Qualitative Comparison With Other Reconstruction and Denoising Methods 4.11.1 Influenza Tilt-Series

Fig. 8 .
Fig.8.Reconstruction progress from SARS-CoV-2 tilt-series (dataset 2) using our framework.The noise is reduced with the pass of iterations.The final iterations using NLM+TV reveal the virus particles in the reconstructed volume 3D visualization.

Fig. 14 .
Fig. 14.Plot of the intensity profile along the line.The minimums (corresponding to spikes) are more pronounced in the profile from our reconstruction (blue).

Fig. 15 .Fig. 16 .
Fig. 15.Comparison between Ours (SART+TV+NLM, bin-4) and IsoNet (bin-8) on dataset 2. The virus inner structure is clearer after IsoNet processing.However, the spike's stem used to connect the membranes should have some density, and the spike's head should not have the encircled density.

Fig. 17 .
Fig. 17.Slice of the error volume, dataset 1. Left: SART+TV+NLM.Right: SART only.The error from the reconstruction without regularization presents structures correlated to the virus particles.