Synthetic PET via Domain Translation of 3-D MRI

Historically, patient datasets have been used to develop and validate various reconstruction algorithms for PET/MRI and PET/CT. To enable such algorithm development, without the need for acquiring hundreds of patient exams, in this article we demonstrate a deep learning technique to generate synthetic but realistic whole-body PET sinograms from abundantly available whole-body MRI. Specifically, we use a dataset of 56 18F-FDG-PET/MRI exams to train a 3-D residual UNet to predict physiologic PET uptake from whole-body T1-weighted MRI. In training, we implemented a balanced loss function to generate realistic uptake across a large dynamic range and computed losses along tomographic lines of response to mimic the PET acquisition. The predicted PET images are forward projected to produce synthetic PET (sPET) time-of-flight (ToF) sinograms that can be used with vendor-provided PET reconstruction algorithms, including using CT-based attenuation correction (CTAC) and MR-based attenuation correction (MRAC). The resulting synthetic data recapitulates physiologic 18F-FDG uptake, e.g., high uptake localized to the brain and bladder, as well as uptake in liver, kidneys, heart, and muscle. To simulate abnormalities with high uptake, we also insert synthetic lesions. We demonstrate that this sPET data can be used interchangeably with real PET data for the PET quantification task of comparing CTAC and MRAC methods, achieving ≤ 7.6% error in mean-SUV compared to using real data. These results together show that the proposed sPET data pipeline can be reasonably used for development, evaluation, and validation of PET/MRI reconstruction methods.


I. INTRODUCTION
There is currently an unrealized potential for PET/MRI systems in synergistic and quantitative reconstructions that account for and leverage simultaneous data acquisition of PET, which provides functional tissue information, and MRI, which provides excellent anatomic information, to correct for artifacts, motion, and improve localization [1]. An example of one of the challenges is quantitative PET reconstruction, which requires accurate attenuation maps that are not directly measured by MRI. This affects the quantification of PET from reconstructed imagery, since the photon attenuation map is embedded in the forward system model. As a result, the development of novel attenuation correction methods and other advanced PET/MRI reconstructions requires real or realistic data, which can be difficult and/or expensive to obtain. With increased PET/MRI adoption, it is necessary to establish standards for the quality of reconstruction, which can vary based on subtleties of PET data collection, including scanner geometry and detector nonidealities, but also the choice of reconstruction algorithm, attenuation correction method, and patient anatomy (e.g., scattering and hyper-attenuation). Simulating the whole range of patient variability in terms of anatomy and patient-specific radiotracer uptake is infeasible, e.g., using purely digital phantoms and Monte Carlo simulation [2], [3], necessitating the acquisition of real patient PET data. For PET/CT systems, qualification methods are established by the American College of radiology (ACR) using qualitative evaluation of whole-body clinical scans and quantitative evaluation using a ACR PET Phantom, a cylinder based on the Jaszczak Deluxe Flangeless ECT phantom with the spheres removed, a PET faceplate composed of several fillable cylinders, and acrylic rods of various diameters [4]. PET reconstruction performance can also be measured using the NEMA phantom, which is composed of multiple fillable spheres and cylindrical inserts that aim to mimic attenuation and activity found in different parts of the human body [5].
Unfortunately, phantoms used for PET/CT are insufficient for evaluating PET/MRI reconstruction performance because they cannot evaluate modern MR-based attenuation correction (MRAC) methods that rely on detecting typical human anatomy from MRI data. These methods include vision-based atlas techniques [6], [7], [8], joint-reconstruction of attenuation and activity [9], [10], and direct prediction of pseudo-CT via deep learningbased domain translation [11], [12], [13], [14]. A physical PET/MRI phantom to evaluate reconstruction performance would require an anthropomorphic distribution of materials with properties that match both 511-keV photon attenuation as well as MRI properties of proton density, T1, and T2, which is extremely challenging especially for bone due to its high attenuation but rapid T2 decay rate [15].
Consequently, the standard approach to evaluating PET/MRI performance is to utilize human subject datasets acquired on PET, CT, and MR [11], [16]. This allows for a relative performance measure, by comparing the standardized uptake value (SUV) of MRACbased PET reconstructions relative to PET reconstructions utilizing CT-based attenuation correction (CTAC) [16]. However, for sites to conduct such evaluations, numerous PET/CT/MR scans are required to characterize scanner and algorithm performance at operating points exhibiting natural imperfections that impact the physics of PET collection, such as those arising from detector characteristics, scattering, or unexpected attenuation [17]. This patient-specific data is expensive to collect, hindering new PET/MRI algorithm development that normally requires recollecting PET data.
In this article, we present a method for generating synthetic PET data using routinely collected and abundantly available MRI that naturally captures important scanner and detector imperfections, adapts to varied tracer distributions and anatomy, and allows for insertion of synthetic lesions. We believe this will allow for the creation of large and diverse synthetic data for development, evaluation, and validation of PET reconstruction algorithms. Our approach leverages recent work in deep learning-based domain translation using fully convolutional networks (FCNs) and in Section II we describe how to train a 3-D residual UNet to predict SUV-normalized synthetic PET (sPET) imagery from whole-body postconstrast T1-weighted MRI (Fig. 1). This requires only paired input and output image examples, and-crucially-no additional annotation or scanner geometry details. For this problem, we assume a supervised setting, where the absolute and relative error between the measured (reconstructed) and FCN-generated volumes provide a quantitative measure of performance, albeit at different scales that must be balanced. Note that an approach based purely on generative adversarial networks (GANs) is not desirable here, since we require the sPET volumes to correspond anatomically to the MRI volumes to support PET/MRI reconstruction research. To this end, in Section III, we show that the predicted sPET imagery can be forward projected to generate sPET time-of-flight (TOF) sinogram data that can be used interchangeably with real PET sinogram data in vendor-provided reconstruction algorithms. We further evaluate this capability for qualification research by performing a classical PET-SUV quantification experiment, comparing reconstructions with CT-and 2-point MR-Dixon-based AC maps, using both synthetic FDG-PET and measured FDG-PET sinograms. Our results show that evaluation using sPET can achieve < 8% quantification error in mean-SUV in synthetically inserted lesions compared real PET data (averaged over a several synthetic lesions in a cohort of patients), suggesting the wide applicability of domain-translated sPET for PET reconstruction algorithm development and qualification research. The role of synthetic lesions as proposed and demonstrated in this study is to provide methods for evaluation and optimization of image reconstruction algorithms. These algorithms continue to change and, with the introduction of deep learning/ AI methods for image reconstruction and denoising algorithms, many new parameters are being introduced and more robust methods for evaluation and optimization are needed to demonstrate the clinical impact of the image processing algorithms

A. Prior Work
Prior work in deep learning-based domain translation has demonstrated that FCNs based on UNet-like encode-decoder architectures are widely applicable to a range of 2-D and 3-D cross-modality medical image translation tasks, including MRI-to-CT [11], PET-to-CT [18], and MRI-to-MRI [19]. These architectures are typically trained independently for each anatomical region (e.g., head, chest, and pelvis) of interest. For PET/MRI specifically, a major focus has been in MRI-to-CT domain translation for enhanced attenuation correction maps, which are ultimately combined with measured PET sinogram data for enhanced image reconstruction [11], [13], [18]. Recently, such architectures have been applied to the reconstruction and denoising of low-dose PET imagery, including using supervised [20], [21] and unsupervised [22] methods, and extensions to dynamic PET reconstruction [23]. In some cases, these image-enhancement techniques have been shown to successfully improve diagnostic interpretability [24].
In contrast to these works, the focus of this article is domain translation of whole-body MRI-to-PET without any initial PET data, i.e., to produce a novel image series we refer to as synthetic PET (sPET). While previously in-silico PET image generation has been explored using physics-based simulation tools such as GATE [25] with Monte Carlo techniques, such as PENELOPE [26] and SimPET [3] to reproduce realistic image quality, a predominant issue here is knowing realistic spatial distribution of physiologic PET uptake to seed the simulation. Our work addresses this issue by using a neural network to learn from real PET scans, such that realistic physiologic uptake can be inferred from abundantly available MRI. This is an important point since we do not believe sPET can accurately predict patient-specific functional information for diagnosis.

B. Contributions
Thus, our contributions are as follows.

1.
We introduce a deep learning method for generating whole-body 3-D sPET volumes from one or more routinely collected MRI series, including a balanced loss function that improves reconstruction of both low-and high-SUV regions.

2.
We evaluate the utility of sPET in a downstream development task involving the quantification of PET SUV in images reconstructed using MR-and CTAC, demonstrating that sPET sinograms can be used seamlessly in place of real PET data for PET/MRI qualification with minimal impact to the observed quantification error in synthetically inserted lesions.

II. SYNTHETIC PET VIA DOMAIN TRANSLATION
Although the physics and acquisition are fundamentally different, MRI and PET imagery share a great deal of structural similarity due to contouring of patient anatomy by physiologic uptake. This similarity can be exploited by FCNs to efficiently and implicitly implement the codebook C: ℝ MR n ℝ PET n mapping MRI to PET-SUV imagery using a cascade of nonlinear filters, avoiding explicit storage of input-output pairs (x, y) in a database. Note that this map C describes a statistical relationship between MRI and PET, and not a causual or functional one. Besides being differentiable and amenable to backpropagation-based optimization using historical PET/MRI datasets, FCNs have strong spatial regularization properties that reduce degeneracy across image patches to create seamless and realistic anatomy-conforming 3-D PET imagery from MRI.
Here, degeneracy refers to the typical inconsistencies in the codebook arising from the fact that PET and MRI contain different (orthogonal) information about a patient. The inverse image C −1 (y) of a 3-D PET patch y ∈ ℤ PET n may not be unique, since different anatomical regions can experience the same uptake. Conversely, a given 3-D MR patch x ∈ ℝ MR n may have multiple images C(x) ∈ ℤ PET n , corresponding to various patterns of PET uptake across individuals. Thus, the map C is general, which frustrates conventional atlas and dictionarybased implementations that must keep track of this in ℝ MR n [27]. In comparison, due to the supervised training process, FCNs naturally choose y E[C(x)] for sPET given input MRI x. In this respect, the task of predicting PET from MRI is distinct from approaches predicting full-dose PET imagery from low-dose PET imagery, since those models are responsible for enhancing the signal-to-noise ratio (SNR) of existing activity images [20], [21], rather than directly learning anatomy-conforming physiologic biodistributions of PET uptake.

A. Assumptions
In this article, we assume the availability of historical PET/MRI datasets of patients receiving a calibrated (full) dose of the same PET radiotracer. Although the proposed method is applicable to varying dose levels, low-dose PET imagery exhibits lower SNR, and is therefore not ideal for training. In this work, we register scanner-reconstructed whole-body 18 F-FDG-PET and post-contrast T1-weighted MRI volumes, collected on a 3.0 T ToF PET/MRI scanner (Signa, GE Healthcare, Waukesha WI), to the MRI image space and resample to 1-mm isotropic resolution using the ANTS toolbox interface provided via Nipype [28]. To increase the regularity and identifiability of MR structures, we apply contrast-limited adaptive histogram normalization to the resampled MRI volumes, using a kernel-size of 100 mm and clipping limit of 0.05 [29]. For consistency, we convert the raw PET intensity values (counts) to SUV using known radiotracer dose, half-life, positron fraction, elapsed time, and patient weight [30]. Finally, we split our dataset into 40 wholebody PET/MRI training exams, 16 whole-body PET/MRI testing exams, and 20 independent pelvic PET/MRI testing exams where corresponding CT was available (discussed in Section III). We make no explicit assumption of age, race, gender, or ailment, other than through the image characteristics of the acquired dataset.

B. Model Architecture
By fiat, we choose a 3-D residual UNet architecture that combines the well-studied 2-D/3-D UNet [31] with residual (skip) connections [32] and convolutional upsampling (Fig. 1). In our implementation, we take a one-channel 3-D MRI volume as input, and employ 3 × 3 × 3 convolutional kernels followed by 2 × 2 × 2 maxpooling in each layer of the encoder (channel dimensions: [32,64,128,256, 512]), and 3 × 3 × 3 convolutional upsampling kernels in each layer of the decoder (channel dimensions: [256, 128, 64, 32]), ultimately resulting in a one-channel 3-D output. This architecture can be adapted to multichannel inputs (multicontrast MRI) and outputs (multiple PET radiotracers and/or dose levels) by modifying the first and last layers of the network, respectively.
Inference is performed by breaking large whole-body MRI volumes into smaller overlapping volumetric patches with dimensions divisible by 32 (e.g., [128 × 128 × 128] mm, with 50% overlap) prior to applying the 3-D UNet, and taking the sample mean of the resulting outputs at each 3-D grid position to assemble the full whole-body volume. While the aforementioned resampling ensures MRI is processed at nearly native resolution to allow recognition of fine structural details, the PET groundtruth is considerably upsampled, especially in the z dimension. This can be remedied by resampling the predicted volumes to the native PET image space and resolution, e.g., prior to performing PET/MRI reconstruction (Section III).

C. Learning
One of the primary challenges with domain translation of MRI to PET is maintaining high accuracy across the full dynamic range of PET. Although SUV scaling does provide a more consistent and intuitive numerical range, we find that explicit control in the objective function is required to prevent smoothing over suitable minima. For example, the histogram distribution of a whole body 18 F-FDG-PET exam (Fig. 2) reveals that different tissues differ in the amount of physiologic uptake. For example, in the lungs, heart, and liver there is often increased activity between [1,4] SUV, and in regions, such as the bladder and brain the recorded SUV can be greater than 20. In particular, since we are interested in using the predictions of our model for PET quantification studies, we require high accuracy across all relevant scales. This precludes the use of simple p-norm objective functions, such as the mean absolute error (MAE), that may be dominated by the high absolute or relative error in one or more histogram bins.
To address this, we minimize the balanced objective J total = J + λJ LOR (1) where J LOR represents a regularization function with parameter λ, and J is a linear combination of absolute and relative errors across B different histogram bins, expressed as follows: Rajagopal et al. Page 6 where E = | F (x) − y| is the conventional voxel-wise absolute error, x is the MRI input volume, y is the groundtruth PET volume, and F (x) represents the predicted synthetic sPET. In (2), ℎ j represents an indicator variable selecting the voxels belonging to bin j of the B-bin histogram of y, and ϵ is chosen as 1e-3 to prevent overflow. The histogram bins (Fig. 2) and corresponding weights (α = [1, 1, 1, 1, 0], β = [0, 0, 1, 1, 1]) were chosen based on empirical observation to prevent domination of J by high absolute errors in high-SUV regions or by high relative errors in low-SUV regions. The intention of this flexible formulation with α and β is to define a family of functionals that can be tailored to different patient datasets, PET tracers, and anatomic regions.
To further improve both the perceptual image quality and convergence, during training we integrate and compare the groundtruth PET y and the predicted sPET y = F (x) along random angles using a projection operator R θ, ϕ , mimicking tomographic data collection in a uniform, isotropic attenuating media along hypothetical PET lines of response (LOR), as follows: In addition to tying together the performance of different tomographically related voxels, J LOR measures the error in the coarse scale of predictions on a line-by-line basis. For example, if a 3-D image patch shows little to no activity, R θ, ϕ y 2 will be nearly zero, whereas a patch from a region with high uptake may yield either high-or low-valued R θ, ϕ y 2 . This improves convergence and combats overfitting by supervising the spatial distribution of sPET without explicit assumptions of patient anatomy.
For all results shown in this article, we used the Adam optimizer with an initial learning rate of 1e-4, weight decay of 1e-3, and effective batchsize of 16 [128×128×128] mm volumetric patches generated systematically (in a random order) from the aforementioned whole-body 18 F-FDG-PET/MRI dataset.
To improve convergence during training, we defined a custom 3-D image patch sampler that performs round-robin sampling of different PET/MRI phenotypes present in the training dataset. These phenotypes were determined by first cataloging all the volumetric patches in the training dataset and computing their intensity histograms. Using k-means clustering (K = 10), we computed a semantic grouping of these histograms that defined the different PET/MRI phenotypes that were sampled cyclically during model training.

D. Image Quality Metrics
We measure the quality of predicted sPET using quantitative error metrics, including the MAE, mean relative absolute error (MRAE), and the 3-D structural similarity index measure (SSIM). For each exam we compute MAE over all voxels N, as follows: while we compute MRAE only over voxels K of at least 0.1 SUV, as follows: The 3-D-SSIM captures this information in a different way, accounting for differing scales and magnitudes through a measure of correlation within a 3-D window, as follows: SSIM(x, y) = 2μ x μ y + c 1 2σ xy + c 2 μ x 2 + μ y 2 + c 1 σ x 2 + σ y 2 + c 2 (6) where μ x and σ x 2 represent the mean and variance of volume x, μ y and σ y 2 represent the mean and variance of volume y, σ xy represents the covariance of x and y, and c * is chosen proportional to the dynamic range of pixel values [33].

E. Results on Whole-Body 18 F-FDG PET-MR Datasets
We find that prediction of synthetic FDG-PET, domain translated from T1-weighted postcontrast MRI, works well despite the lack of salient tracer specific or functional information in MRI (Fig. 3). Numerical results comparing the effect of different training objectives on test-set performance is shown in Table I. Qualitative analysis reveals that physiologic uptake is predicted accurately and reconstructed seamlessly throughout the body without obvious spatial artifacts, except in regions where we expect variable uptake (e.g., heart and bladder).
In the myocardium, for example, FDG-PET uptake depends on patient metabolism, which can vary across exams for even a single patient. Similarly, in the bladder PET uptake is often dependent on a patient's water consumption and timing of voiding [34].
The MAE and MRAE results show that incorporation of both balanced histogram losses and tomographic projection-based losses can significantly reduce the quantitative error in the prediction of sPET from MRI. The SSIM results show that this reduction in error boosts the image quality of the sPET image relative to the real PET image. The inclusion of SSIM is important to assess the realness of sPET, in lieu of reporting MAE and MRAE within different organs and anatomical structures.

III. PET QUANTIFICATION USING SYNTHETIC PET
PET/MRI quantification is important for establishing the accuracy and reproducibility of PET reconstructions when the photon attenuation maps are inferred entirely from MRI. As the error in PET/MRI reconstruction is composed of errors involving prediction of the attenuation map and errors involving the reconstruction (e.g., choice of the objective function), a standard approach is to measure the compound effect caused by the AC map by directly comparing PET volumes reconstructed with MRAC and CTAC voxel-wise and regionally [16], [35].
Specifically, we evaluate the applicability of our MR-derived sPET imagery for algorithm development by replicating an MRAC versus CTAC PET SUV quantification task using sPET data in place of real list-mode PET data. To achieve this we forward project sPET data into sinogram space using vendor-provided software that incorporates scanner geometry, detector response, and normalization.

A. Reconstruction Model and Parameters
For time-of-flight PET (ToF-PET), the measured sinogram data is modeled within the forward model as follows [36]: where y pt represents the ToF projection data measured by the scanner, x is the PET image to be found, and the system matrix A models the probability of an event emitted in voxel m to be detected by detector pair p within the signed timing bin number t, summarizing the attenuation of the media along PET LoR, patient-scanner geometry, and detector efficiencies. b pt corresponds to the background counts of the timing bin t and detector pair p.
For this model, a basic reconstruction approach is to solve the optimization problem where R is a regularization function (e.g., total variation). In practice, vendorprovided ordered-subset expectation-maximization (OSEM) or ToF-OSEM with pointspread-function (PSF) modeling are used for clinical imaging [36], [37]. In our experiments, we utilize clinical image reconstruction parameters for the GE Signa PET/MRI (Table II).

B. Synthetic Sinogram Generation and Lesion Insertion
For a given system matrix A, a reconstructed PET image x can be projected into the sinogram domain by applying the forward model (7) to yield y simulated . The forward projection tool provided with the Duet to toolbox (v02.03, GE Healthcare) performs this operation on a synthetic volume of dimension equal to the reconstructed volume, to generate a synthetic lesion sinogram that is added to the sinogram corresponding to x. Image reconstruction can then be performed on this "lesion-inserted" sinogram, as if it were the real sinogram, using a variety of methods (e.g., ToF, PSF, and regularization).
We exploit this mechanism to generate sPET sinogram data from domain-translated sPET imagery. However, as Duetto currently does not incorporate scatter simulation, we perform reconstructions with scatter estimation and correction turned off. As this introduces an additional discrepancy between real PET and sPET reconstructions, in both cases we start by forward projecting a 3-D "source" volume x source to yield a simulated ToF-sinogram that is subsequently inserted with synthetic lesions (Fig. 4).

C. Quantification Experiment Summary
The pelvic CTAC versus MRAC FDG-PET reconstruction and SUV quantification experiment can be summarized as follows.

1.
Forward project x source using a registered CT-based attenuation map to yield sinogram y simulated . To evaluate the applicability of different sPET sources for this pipeline, we choose x source as follows.

a.
Real PET x real : The true patient activity distribution, corresponding to measured patient sinogram y real .

b.
Reconstructed Patient Phantom x live : A PET/CT image volume, reconstructed from measured PET sinogram data with a CT-based attenuation map.

c.
Uniform SUV∼1 x uniform : We threshold a T1-weighted postcontrast MRI volume to define a body-mask filled with activity corresponding to SUV 1.

d.
Synthetic sPET x syntℎetic : An sPET volume generated from a T1-weighted postcontrast MRI using the aforementioned 3-D UNet.

2.
Forward project synthetic lesions specified by a 3-D volume x lesion to yield y lesion-simulated . In our experiments, a board-certified radiologist annotated four sites for lesion insertion in each pelvic MR exam: a) in the acetabulum; b) sacrum; c) rectum; and d) lymph nodes. These locations were specifically identified to challenge the ability of MR-based reconstruction to reproduce activity surrounded by soft tissue and bone. For each location, a spherical lesion with diameter 12 mm and activity corresponding to SUV 8 was added to a zero-filled x lesion volume.

3.
Reconstruct lesion-inserted sinograms using vendor-provided CTAC and MRAC methods (with parameters specified in Section III-A), resulting in PET images x CT and x MR , respectively, for each x source .

4.
Evaluate voxel-wise and regional absolute and relative error between x CT and x MR in each lesion volume of interest (VOI) for each x source for each exam. Evaluation within each VOI can also provide a quantitative measure of accuracy, since the activity was synthetically inserted.
In particular, we evaluate the ability of different synthetic sinograms (corresponding to a choice of x source ) to reproduce the CTAC versus MRAC "quantification error" Δ quant , normally estimated using real measured PET sinogram data. We quantify this by computing and comparing deviation of error in mean-, max-, and peak-SUV between x CT and x MR for each  (11) where V represents an indicator function for voxels in a VOI, quant represents the mean-, max-, or peak-SUV computation in a VOI, we take Δ true as the corresponding mean-, max-, or peak-SUV quantification error computed using the reconstruction patient phantom x live as the source, and δ and γ represent absolute and relative quantification error, respectively. To benchmark systematic error arising from the reconstruction and reprojection in the experimental procedure, we also compare to the quantification error arising from using measured sinogram data y real corresponding to the true patient distribution x real (i.e., following the standard approach in [16]).
For each patient exam, we select five different VOIs: lesion voxels corresponding to the four annotated regions (acetabulum, sacrum, rectum, and lymph), and "background," representing all nonzero voxels outside of the synthetic lesions. Quantification error is computed for each VOI by comparing mean-, max-, and peak-SUV between CTAC-based and MRAC-based reconstructions. Subsequently, we compare the quantification error predicted by each PET data source to that predicted by the aforementioned reconstructed PET/CTAC live phantom. The absolute error is quantified for the background pixels, but the relative error is not since many voxels are devoid of any activity, positively skewing (overestimating) the mean relative error computation. In lieu of individual regions within the pelvis, the relative error in background voxels is better evaluated qualitatively by comparing slices in the transverse plane (Fig. 5).

D. Results on Pelvic 18 F-FDG PET/MR/CT Datasets
Numerical results presented in Tables III and IV indicate that domain-translated MR-based sPET can achieve low absolute and relative deviation in quantification error compared to the quantification error predicted by the live PET/CTAC phantom source for synthetically inserted pelvic lesions. Table III shows that sPET-based evaluation to compare CTAC and MRAC-based reconstruction achieves SUV errors that were very similar to the measured PET-based evaluation across inserted lesions and in the background. The percent quantification errors in Table IV shows that sPET-based evaluation was more similar to measured PET-based evaluation than uniform SUV∼1-based evaluation, outperforming for mean-SUV evaluations across lesion types. This suggests the applicability of synthetic sPET as a suitable replacement for real measured PET in PET-SUV quantification tasks. In the supplementary material (Figs. 6 and 7), we provide the Bland-Altman plots that compare the CTAC-versus MRAC error computed by the various types of phantoms and the Live Phantom. Each column represents a different sPET phantom. Each row represents a different error metric (absolute error or relative error in mean-SUV, peak-SUV, or max-SUV). This analysis shows no significant differences between sPET and measured PET using the aforementioned figures of merit.

IV. DISCUSSION
Overall, we have shown that MR-derived synthetic FDG-PET accurately captures the background physiologic distribution of PET imagery, creating images with realistic spatial distributions, and that it can be combined with synthetic lesion insertion to provide data for the evaluation of PET quantification methods. However, the main limitation we observed is that it is smoother than corresponding full-dose imagery, perhaps due to the implicit regularization properties of convolutional networks (e.g., exploited by DIP techniques [22]). While this is desirable for enhancing low-dose or noisy PET/MRI, it is not entirely beneficial for our application due to mismatches in the intensity distribution used in the quantification experiment.
This is a good opportunity for future works that make use of GANs, which may seek to better match the statistical distribution of sPET and PET to increase its realism, rather than simply regressing by value. Note that a pure GAN approach based on noise vectors is not valid here because it may not provide anatomic conformity between the MRI and sPET image, which is important to maintain for PET/MRI reconstruction algorithm research. Instead, adversarial losses may be added to the existing approach to increase realism and to help reduce artifacts in the regions where variable uptake is expected, or where patch-based inference lacks sufficient context to prevent gridding or stitching artifacts (although the effect of these artifacts is often reduced after forward projection to sinogram space). In this respect, the physics-based tomographic LOR loss utilized in this article not only works to increase the realism of sPET but also improves its quantitative accuracy.
We believe that such physics-based approaches are crucial for the development of quantitative imaging and dataset generation techniques based on neural networks. While the tomographic LOR loss used in this work improves the quantitative error rates and qualitative realism (partially captured by SSIM) associated with sPET, advanced physicsbased modeling could further improve both the realism as well as the applicability of the developed approach to more PET/MRI systems, e.g., by utilizing their system matrix to optimize directly in the singoram domain, or measure congruence after image reconstruction.
Results from the downstream PET SUV quantification experiment indicate that sPET can serve as an adequate surrogate for real data in an MRAC versus CTAC quantification experiments. This experiment also indicates that the PET background distribution does not significantly impact quantification performance when using synthetically inserted lesions and without any scatter and randoms simulation. Thus, further investigation of a more complete reconstruction is required to determine whether the PET background distribution affects quantification for real lesions. Based on the realistic appearance of sPET, we believe it will be an important tool in evaluations when accurate background distribution is required.
Although in some cases the estimate based on sPET underestimates the benchmark (reconstructed patient phantom) error, the strong agreement over a number of exams (N = 20) and lesions indicates that sPET may be used as a qualification method when a large number of exams is required. This is precisely the domain for which sPET was designed, as large MRI/CT databases can be retrospectively utilized to establish a large sPET/MRAC/CTAC dataset for scanner and algorithm qualification. Interestingly, in many cases the quantification error predicted by the synthetic sPET phantom more closely matches the quantification error predicted using real PET sinogram data, compared to even that of the live PET/CT phantom. However, for most VOIs and phantom types, the deviation in quantification error is minimal. Here, synthetic sPET has an advantage over the uniform SUV∼1 phantom, because it a represents a realistic, anthropomorphic PET uptake pattern.
One limitation of our methodology is that it does not directly model noise or count statistics associated with PET data collection, which has been shown to impact the performance of PET reconstruction algorithms [38], [39], [40]. To address this, we note that the MR-based sPET images proposed in this article can be treated simply as an ideal source volume and, thus, readily combined with Monte Carlo PET simulators, such as with GATE [25], SIMSPET [41], or SimPET [3]. An alternative data-driven approach to address this issue may be to utilize a adversarial training, which can increase the realism of sPET, thereby indirectly capturing statistical noise properties of PET acquisition in the image domain.

V. FUTURE WORK
Detailing the generation of sPET from 3-D MRI and, importantly, demonstrating its utility in downstream qualification research, opens the path to new research directions that can enable us to study new PET image reconstruction algorithms that can address important clinical questions. For example, virtual PET clinics have been previously proposed as a technique to evaluate PET detector systems and patient studies in a virtual simulation environment [2]. This could also be extended to address 4D PET/CT and PET/MRI modalities [42], enabling new approaches to diagnose cancers, such as the identification of recurrent gliomas using FET PET [43], [44]. In another vein, sPET can also be used to directly improve image reconstruction algorithms themselves, e.g., by generation of a deep learning prior image that can help regularize PET image reconstruction [45]. These applications provide a strong motivation for future work in curating large databases of PET/MRI with multiple MRI contrasts and PET radiotracer images, which could mirror and complement the impact of other synthetic MRI [46], [47]. In this respect, the methods developed in this article provide the framework and context necessary for such development.

VI. CONCLUSION
In conclusion, we have demonstrated a method using deep learning to generate realistic, synthetic whole-body PET data from MRI, and that it is a suitable substitute for real PET data in a reconstruction evaluation task. The sPET data, which mimics physiologic tracer distribution, can be combined with synthetic lesion insertion to mimic abnormal regions of high update. We demonstrated its equivalent performance to real PET data for comparing CTAC and MRAC for PET reconstruction, and believe this result combined with the apparent realism of the synthetic images will make this method broadly applicable for evaluating the robustness of PET/MRI reconstructions and component techniques, including attenuation correction, scatter correction, and MR-guided reconstruction algorithms, using large and diverse patient datasets.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. 3-D residual UNet architecture for generating sPET from MRI, requiring only paired (registered) PET/MRI data without annotation. Histogram distribution of a whole-body PET exam reveals disparate levels of physiologic activity across different anatomy. (a) and (b) Test-set evaluation of whole-body MR-based synthetic FDG-PET (sPET) in comparison to real 18 F-FDG-PET/MRI. sPET mimics the typical physiologic uptake of FDG, showing high uptake in the brain and bladder as well as moderate uptake in liver, kidneys, heart, and muscle. High relative error with the real PET data is expected in many regions where there is typically high physiologic variability between subjects (e.g., tumors, heart, and bladder). While (a) and (b) represent patient exams from the intentionally withheld test set, (c) represents an exam from the additional validation set (with corresponding Pelvic PET/CT) exhibiting significant stitching artifacts (blue arrows) in the T1w-MRI between bed positions as well as loss in resolution in the head (green arrows). Various transverse slices in the abdomen are shown for comparison on the right of (c). Evaluation and inclusion of this exam in the validation cohort demonstrates that the proposed 3-D UNet is able to recover reasonable FDG-uptake even in the presence of significant domain shift, a common issue when applying deep learning algorithms to clinical data acquired on a different scanner, or with different imaging protocols and image quality checks.

Fig. 4.
Measured and simulated sinograms representing different PET sources with corresponding synthetically inserted lesion sinograms. The annotation (yellow arrow) highlights a region affected by lesion insertion. Example evaluation of synthetically inserted lesions into 3-D reconstructions using various PET data sources (anterior is superior in our presentations). For the PET data source (columns), we compute a reconstruction using CTAC and MRAC, and compute the absolute and relative errors for each slice. Shown here is a single slice from a single patient with contributions from three synthetically inserted lesions. The error in the sPET prediction is considerably lower than using the phantom with SUV∼1, and has a similar distribution to using real PET data.