Forecasting the Pharmacokinetics With Limited Early Frames in Dynamic Brain PET Imaging Using Neural Ordinary Differential Equation

In dynamic brain positron emission tomography (PET) studies, acquiring a time series of images, typically lasting more than an hour, is necessary to derive pharmacokinetic parameters. Analytically, these parameters are estimated by establishing kinetic models such as compartment models that consist of sets of ordinary differential equations (ODEs), and by fitting the sparse time-activity curve (TAC) of the tracer. Yet, these models are simplified approximations of highly complex underlying processes, and sufficient samples of TAC are required throughout the entire acquisition, which is not only impractical but also hindered by patient involuntary motion and intrinsic noise. Therefore, recovering samples in missing timeframes are often required, which, in practice, is achieved by interpolation or extrapolation. Here, we introduce a novel deep-learning-based method that utilizes neural ODE (N-ODE) to predict TAC in the extended timeframes by mimicking the analytical method in a data-driven manner. By training N-ODE to solve and fit sets of ODE such that the solution replicates the observed TAC, the N-ODE converges to the functional shapes that best describe the underlying pharmacokinetic processes. We customized N-ODE to predict the full-dynamic images (12 frames, 60 min), hence pharmacokinetic parameters, given limited early-frame images (7–9 frames, 20–30 min). For proof of concept, the proposed N-ODE was applied to simulated and clinical $^{\mathrm{ 18}}\text{F}$ -PI-2620 brain PET. We demonstrated that the proposed N-ODE delivered promising performance, indicated by bias, variance, and mean absolute error as well as pharmacokinetic parameters, such as rate constants, standardized uptake value ratio (SUVr), and binding potential ( $\mathrm {BP}_{\mathrm {ND}}$ ).


I. INTRODUCTION
P OSITRON emission tomography (PET) is a molecular imaging technology that uses a radiotracer to visualize the metabolic and physiological processes in the body in vivo. For brain studies, the recent advances in PET tracers made it possible to assess several histopathological targets of neurodegenerative disorders, such as glucose metabolism, amyloid, tau, and neuroinflammation [1], [2], [3], [4], [5], [6]. The common application of PET in the clinics yields a simple snapshot of regional tracer concentration using a single timeframe post-injection, which is referred to as static PET. On the other hand, dynamic PET provides comprehensive information about the pharmacokinetics of the tracer, such as tracer delivery (K 1 ), volume of distribution (V T ), or binding potential (BP ND ) by delineating both the spatial and temporal profiles of the tracer uptake [7], [8], [9]. Analytically, such pharmacokinetic parameters are estimated through kinetic models that translate hypotheses regarding the physiological system of dynamic processes in the tissue of interest, which is decomposed into several compartments, in a mathematical formulation [10], [11], [12], [13]. These models inevitably accompany a set of ordinary differential equations (ODEs), and the pharmacokinetic parameters are determined by solving the ODEs and fitting the time-activity curve (TAC) of each voxel.
Yet, analytical models are simplified approximations of highly complex underlying physiological and biological This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ processes and are not perfectly consistent with the measured data due to intrinsic PET noise, especially at the voxel level [14]. Additionally, sufficient numbers of samples across the entire protocol are required to perform the modeling, which is hindered by patient movement [15]. As such, in routine practice, the reduced acquisition time PET has been proposed, where kinetic modeling is performed with missing TAC samples either being ignored or being recovered by various interpolation or extrapolation methods [16], [17], [18], [19], [20], [21]. Scott et al. [16] combined arterial spinning labeling (ASL)-derived cerebral blood flow (CBF) data with 18 F-florbetapir amyloid PET kinetic modeling and reduced the acquisition time from 60 to 30 min. In their work, the missing samples in reference region TAC were extrapolated. Heeman et al. [17] and Kolinger et al. [18] proposed the dualtime window protocols for 18 F-flutemetamol, 18 F-florbetaben, and 18 F-MK-6240 which consist of two separate PET scans with a resting period in between. The omitted data points in the resting phase were interpolated either linearly or using compartment models. Viswanath et al. [19] shortened 65-min dynamic FDG protocol in long axial field-of-view (LAFOV) PET scans to 15-20-min dual-time window protocol where 10-15 min early frame and late 5 min were used. Wu et al. [20] demonstrated that a much-reduced scanning time can achieve the net influx rate, Ki, imaging using the nonlinear estimation method. Liu et al. [21] validated that an acquisition time reduction to 45 min was possible in a single-time window protocol to draw kinetic parameters of 18 F-FDG as done by full dynamic protocol. In works from Varsha, Wu, and Liu, neither interpolation nor extrapolation was performed to construct absent samples in TAC.
Here, we introduce a novel deep-learning method based on neural ODE (N-ODE) [22], which recovers images in the missing timeframe by mimicking the analytical modeling in a data-driven manner. That is, by training N-ODE to solve sets of ODE such that the solution replicates the observed TAC, the N-ODE parameters converge to the functional shapes that best describe the underlying pharmacokinetic processes. To the best of our knowledge, N-ODE has been explored in various domains to leverage the existing well-established physical or chemical models that are explained by the ODE system in a machine-learning manner [23], [24], [25], yet, not in PET studies. In this article, we built an N-ODE-based model that learns the spatiotemporal features of the dynamic PET images and the underlying pharmacokinetic processes by embedding convolutional long short-term memory (LSTM), namely, seq2seq [26], [27] and variational autoencoder (VAE) [28] architect into N-ODE network. We customized the model to predict the total frames (12 frames, 60 min), hence the pharmacokinetic parameters, given a few early frames as input (7-9 frames, 20-30 min). For proof of concept, we applied this model to the simulated as well as the clinical dynamic 18 F-PI-2620 brain PET. With limited early frames alone, we demonstrated that the proposed method is capable of recovering the images across full dynamic protocol, indicated by bias, variance, and mean absolute error (MAE), as well as the pharmacokinetic parameters, such as rate constants (K 1 , k 2 , k 3 , k 4 ), BP ND , and the standardized uptake value ratio (SUVr).
The proposed method has the potential to solve the challenges largely in dynamic PET research, such as reducing the protocol acquisition studies or rapid dual-tracer studies [29], [30].

II. MATERIALS AND METHODS
The mathematical concept of N-ODE in relation to the compartment model is explained in Section II-A, and the study flow is revisited in Section II-B. The simulation and the clinical data for the preparation of this article are described in Sections II-C and II-D. The indices to assess the model performance are summarized in Section II-E.

A. Compartment Model and N-ODE
For simplicity, we consider N-ODE with regard to a simple compartment model. For each compartment, the change of tracer concentration is described in terms of a linear, first-order, constant-coefficient ODE as where C i (t) is the concentration in compartment i in time t, and K 1 , k 2 , k 3 , and k 4 are rate constants. For instance, the two-tissue compartment model using four rate constants (2T4CM) is written as The typical interpretation is that C 0 (t) is the tracer concentration of the input function derived from arterial plasma. C 1 (t) represents free and nonspecifically bound tracer concentration, and C 2 (t) is specifically bound tracer concentration in tissue.
Analytic solutions for 2T4CM are derived [31] by using Laplace transformation, as and the total concentration of tracer in the region of interest (ROI) can be expressed as where N-ODE is a deep-learning method different from the conventional compartment models, as it uses trainable parameters to fit the data. That is, the key idea of N-ODE is built on parameterizing a continuous dynamical system that is in the format of ODE with an initial value, using a neural network (NN) Here, f (·) is parameterized by , which are the weights of the NN, instead of the rate constants described in (1)- (3). h(t) can be defined as the function governing state evolution over time t, such as C T (t) expressed in (6). The output state h(t i ) in time t i can be mapped using (7) and the ODE solver. N-ODE is trained by means of the loss function, evaluating the distance between the output h(t i ) and the desired measurement where ODEsolve is given by the established ODE solvers, such as Runge-Kutta methods. For backward propagation, the adjoint sensitivity method is applied to the ODE solver to compute the gradients of loss with respect to parameters for memory efficiency [22].

B. Study Design
Fig . 1 illustrates the study design. In this work, we set the timeframe for both simulation and the clinical brain PET data as [5 × 120(s); 4 × 300 (s); 3 × 600 (s)]. Our baseline was to predict the full dynamic 4-D images (12 frames, 60 min), given the initial input (9 frames, 30 min) using the proposed N-ODE model. We then analyzed the model with fewer early input frames (8 frames, 25 min; 7 frames, 20 min). Fig. 2(a) outlines the proposed N-ODE-based method. The multilayered LSTM [26], namely, seq2seq architect [27], was built as a recognition network for VAE to map the earlyframe images, x t 0 , x t 1 , . . . , x t ef , into a local initial state, z t 0 , through the hidden state, h t 0 , h t 1 , . . . , h t ef . Seq2seq consumed the early-frame images sequentially backwards in time, and the VAE encoder outputted q ø (z t 0 |x t 0 , . . . , x t ef ). A global latent dynamics are shared across all time frames, and N-ODE produces a set of latent features for each frame, z t 1 , z t 2 , . . . , z t tf , by solving ODE using the Runge-Kutta of order 5 method, given the initial value of z t 0 The latent feature derived by the ODE solver is then translated back to image space by the VAE generator, The classic VAE loss is expressed as where the probabilistic generator can be expressed as p θ (x|z), given the data sample, x, projected in the latent space, z. The posterior distribution p θ (z|x) is obtained by using the prior distribution p(z) and the probabilistic generator p θ (x|z), such that p θ (z|x) ∼ p(z)p θ (x|z). The encoder learns an approximation q ∅ (z|x) to the posterior distribution p θ (z|x), where ∅ denotes the parameters of the encoder, and θ stands for those of the generator. The first term in (10) defines the reconstruction loss, while the second term acts as a regularization term, in the form of the Kullback-Leibler (KL) divergence between the latent distribution learned and the prior distribution. In practice, the generator input is resampled by the encoded latent features z where ε represents a random noise sample.
In this work, instead of q ∅ (z|x), the model seeks The model was trained end-to-end and negative Gaussian log-likelihood loss was used as the reconstruction loss term together with KL divergence as described in (10).
The recognition seq2seq network was built with 2 convolution layers with 32 channels of hidden states. We parameterized the dynamics function f (·) with six convolutional layers for each encoder and generator. The original image size was (91, 109, 91, 12), each of which accounts for the dimension across x, y, z, and the time, t, respectively. However, for computational efficiency, the patches are extracted randomly that are half of the original dimensions in the x-, y-, and z-axis for training as depicted in Fig. 2(b). For the test set, the regular grid patch was obtained, and the outputs of each patch are returned to the original image size, averaging the values in overlapping areas. All images are normalized to maximum value before feeding into the model. We trained on NVIDIA GeForce RTX 2080 Ti graphic card.

C. Simulated Dynamic Brain PET
The input function was first simulated, and the dynamic brain PET was created by using 2T4CM described in (4)-(6). Feng's model was used to generate the input function [32] where A 1 , A 2 , and A 3 are amplitude coefficients, and λ 1 , λ 2 , and λ 3 are eigenvalues of the system. For simulation, we did not consider the metabolite correction [8]. The dynamic brain PET image was generated using Hammers atlas to draw ROIs [33]. For each ROI, the concentration of the tracer, C S (t), is simulated as  where V B is the vascular volume fraction inside the ROIs, which is fixed as 0.05 for this work. The left and right lobes of each ROI are fixed to have the same concentration. The parameters required for the input function and the rate constants for 2TCM were randomly sampled from the uniform distribution with the predefined minimum and maximum values as shown in Table I.
The noise was assumed to be proportional to the number of counts per time frame. The simulated 4-D brain images were reconstructed using an in-house ordered-subsets expectationmaximization (OSEM) algorithm [30]. 600 and 100 samples were created for training and test set, respectively. 18 F-PI-2620 PET scans in combination with computed tomography (CT) was performed in a full dynamic setting (0-60 min post-injection). The injected dose was 189 66 ±13. 19 MBq and applied as a bolus injection. All data were acquired in Munich on a Siemens Biograph True point 64 PET/CT (Siemens, Erlangen, Germany) or a Siemens mCT (Siemens, Erlangen, Germany). The dynamic brain PET data were acquired in 3-D list mode over 60 min, and reconstructed into a 336 × 336 × 109 matrix (voxel size: 1.02×1.02×2.03 mm 3 ) using the built-in OSEM algorithm with 4 iterations, 21 subsets, and a 5-mm Gaussian on the Siemens Biograph and with 5 iterations and 24 subsets on the Siemens mCT. A low-dose CT served for attenuation correction. All images were spatially normalized to the Montreal Neurological Institute (MNI) space using an in-house tracerspecific PET image template (PMOD V4.1). Fifty eight sets were collected in total, where 56 sets and 17 sets are used as training and test, respectively. Amongst 56 training dataset, 22 were from mCT and 34 were from Biograph. Out of 17 test data set, 6 were from mCT and 11 were from Biograph.

E. Evaluation of the Performance
We assessed the performance of the proposed method by calculating the following properties for each ROI and timeframe: where N is the total number of voxels in one ROI, x i and x i are the ground truth and the prediction of the ith voxel, respectively, and x N represents the average predicted value of voxels in the ROI. While MAE describes the average magnitude of error produced by the proposed model, bias characterizes how far the model's predictions are off from the ground truth. As bias is normalized to the prediction itself, it represents the error taking the magnitude of TAC into consideration. In addition, the pharmacokinetic parameters from both the ground truth and the predicted images are estimated and compared. For simulation datasets, the rate constants K 1 , k 2 , k 3 and k 4 were evaluated for each ROIs. On the other hand, voxelwise indirect parameters, such as SUVr and binding potential (BP ND ), are assessed for 18 F-PI-2620 datasets, as an arterial input function was not available. The average image of late 30 min-3 frames was normalized by the mean uptake in cerebellum gray matter to derive SUVr [34]. BP ND images were estimated using a noninvasive Logan graphical plot (t * = 20 min, k 2 = 0.22 min −1 ) and cerebellar gray matter as the reference region [34]. Fig. 3(a) illustrates the average bias, variance, and MAE of the outputs resulting from the proposed N-ODE-based model, in each of the 12 frames. Lower bias and variance were observed in the later frames than in the early frames regardless of the number of early input frames. MAE was higher in the middle frames when the model was given more input frames (9 frames, 30 min). However, with fewer input frames (7 frames, 20 min; 8 frames 25 min), MAE was similar across all frames except for the first 1 or 2 frames. The average bias, variance, and MAE indicated the better performance with fewer input frames (7 frames, 20 min) than with more input frames (9 frames, 30 min). Yet, the profiles of bias, variance, and MAE were almost identical when the model was given with 7 input frames (20 min) and 8 input frames (25 min). Fig. 3(b) depicts the linear fitting result between the rate constants estimated from the ground truth and the predicted dynamic image. The model performed worse in predicting k 4 , compared to predicting K 1 , k 2 , and k 3 . In general, eight input frames resulted in the best prediction of the rate constants, except for k 2 where seven input frames achieved best. Fig. 4 illustrates the examples of the test data sets that showed the best and the worst MAE. For all three settings with different numbers of input frames, the same data sets were identified as the best and worst results. Fig. 4 depicts the examples when seven frames (20 min) were given as an input. The worst prediction that was made in this case presented high bias near edges, however, preserved the overall contrast between the ROIs across all timeframes. For both best and worst examples, the horizontal and vertical strikes were observed, which resulted from patch-based learning.  Table II shows the average bias, variance, and MAE resulting from the proposed model on the 18 F-PI-2620 data for each of the 12 frames. The average bias, variance, and MAE improved with more input frames, although the differences were small. Fig. 6 is the heat map representing the average bias, variance, and MAE in ROIs drawn from the Hammers template across all 12 frames. On average, the bias, variance, and MAE decreased with more input frames, yet, their general aspects were unchanged. For instance, the central structures, such as caudate nucleus (CaudateNucl) and corpus callosum (Corp_callosum) as well as ventricle such as frontal horn (FrontalHorn) presented higher bias than other ROIs, regardless of the number of input frames. Overall, the central structures, including putamen, thalamus, and substantia  nigra (S_nigra), showed relatively high MAE. However, low MAE was observed in the pallidum, corpus callosum, and frontal horn. The frontal lobe, including the middle frontal gyrus (FL_mid_fr_G), inferior frontal gyrus (FL_inf_fr_G), and superior frontal gyrus (FL_sup_fr_G), reported relatively high variance. The variance was more dependent on ROI than on the timeframes. In other words, the ROIs which showed comparably higher variance presented higher variance throughout the entire timeframes. However, bias and MAE were dependent not only on the ROIs but also on the timeframes. Relatively high MAE was observed in early 1-5 frames, compared to later frames. Conversely, higher bias was observed in the later frames, in comparison to the early frames.

IV. DISCUSSION
In this article, we introduced N-ODE and proposed the N-ODE-based method that recovers the missing images in dynamic brain PET, allowing the pharmacokinetic parameters such as rate constants or BP ND to be estimated with a few numbers of early frames alone. The proposed N-ODE-based method, which is established to resemble the analytical methods such as compartment modeling, exploited the intrinsic pharmacokinetics of the tracer by solving and fitting sets of ODEs, however, in a deep-learning manner. We demonstrated the result of the proposed method with the simulated and the clinical 18 F-PI-2620 dynamic brain PET images. We then assessed the performance with various numbers of early input frames that are given to the model as an initial value decreasing from 9 to 7, which is equivalent to 30-20 min. We believe that this work will contribute to the dynamic PET studies in general, not only in a direction to reducing the scanning time but also to rapid multitracer studies [29], [30].

A. Time Frames Used in the Proposed N-ODE-Based Model
In this proposed study, the duration of the early frames was extended in comparison to typical common settings. We observed that the initial value holds significant importance in solving ODEs using the N-ODE-based model. Therefore, to mitigate the impact of noisy initial values, the duration of the early frames was increased. Additionally, implementing a postprocessing or reconstruction algorithm that reduces noise may further enhance the performance of the proposed method.

B. N-ODE-Based Model on Simulation and 18 F-PI-2620 Data
The results of the simulation data indicate that the extended duration of early frames did not harm the prediction of K 1 and k 2 . However, the prediction of k 3 and k 4 was suboptimal (Fig. 3). As the proposed model predicts the entire 12 frames given 7, 8, and 9 frames, it may be simpler for the model to forecast the initial 7-9 frames than last 5-3 frames as they are seen by the model. Due to the fact that K1 and k 2 are closely related to the perfusion, or early frames, these parameters may be easier for models to predict than k 3 or k 4 Furthermore, k 4 was estimated more poorly than k 3 , possibly because of its relatively small value and error propagation.
The proposed method using N-ODE is demonstrated to be sensitive to the initial value as evidenced by the heatmaps of bias and variance from 18 F-PI-2620 data presented in Fig. 6. Poor prediction of the initial value in certain ROIs, such as central structures, resulted in a lack of improvement in predictions made for subsequent time frames. However, the MAE was observed to be relatively lower in some of the central structures. This can be attributed to the fact that MAE is calculated as the difference between the ground truth and prediction value, and these ROIs have lower values. On the other hand, these ROIs resulted in a high bias and variance, as both indices are normalized by the ground-truth value itself [(14) and (15)]. The ROIs which are small in size and value suffered more in prediction when the noise was presented. On the other hand, distinctive horizontal lines were found in bias and MAE heatmaps, indicating that both indices were affected also by the time frame. Specifically, 18 F-PI-2620 peaks before the fifth frame, which is equivalent to 10 min [34], resulting in higher MAE up until the fifth frame. However, the bias was low until the last few frames. As a result, we observed that when MAE loss was selected as the reconstruction loss, predictions were poorly made in later frames as the model sought to find an optimal solution by primarily focusing on the early frames.

C. Characteristics of Kinetics and the N-ODE-Based Model
The number of early frames used as input to achieve the best result was different between the simulation and the clinical 18 F-PI-2620 data. The bias, variance, and MAE indicated that the proposed model performed better with fewer numbers of input frames (7 frames, 20 min; 8 frames, 30 min) for the simulated data as shown in Fig. 3(a), however, the reversed trend was found for the clinical data as summarized in Table II. We reflect that this is coming from the intrinsic dynamic profile of the tracer and the model architect. For 18 F-PI-2620, the TAC peaks at around 5 min [34] and drops rapidly, while, for the simulation data, TAC washes out slowly. As the convolutional LSTM model, seq2seq, is utilized as a recognition network for embedding the early frames backwards in time, the proposed model seemed to have a tradeoff between the early-frame seq2seq encoder and the complexity of the ODE system, which requires optimizing the numbers of early input frame images for the best result. The embedding of the earlyframe images relies heavily on the pharmacokinetics defined by the hidden state in the final output of the seq2seq architect [27]. Hence, it is challenging for the seq2seq network to deal with the limited number of input frames. On the other hand, as the seq2seq runs backwards in time in the proposed model, the excessive number of early frames may result in a suboptimal initial point as described in (9) through hidden states. There was no consistent trend found in terms of the number of input frames when it comes to predicting pharmacokinetic parameters such as the rate constants, SUVr, and BP ND . Particularly, 7 frames (20 min) alone achieved robust performance in deriving a voxel-level image of SUVr and BP ND as shown in Fig. 5, and we believe this method has the potential to lead a much-reduced scanning time protocols in dynamic PET studies.
One common aspect of the proposed model on both data was that it learns the pharmacokinetics more efficiently when the average spatial variance of the initial frame is large, which is the case for tau or amyloid PET scans in the Alzheimer's disease (AD) [2], [3], [4]. As shown in Fig. 4 best example, the proposed method performed the best with the image that appeared to have a relatively larger contrast between different ROIs in early frames. For 18 F-PI-2620 datasets, it indicated that the proposed method worked better with the test set 0 that has regional high uptakes in the occipital and frontal lobe as well as the limbic area. However, we observed that the proposed model resulted in better prediction results when the temporal variance is small throughout the entire timeframes. As illustrated in Fig. 4, the overall groundtruth TAC of the best example test data increased consistently until the 5th frame (10 min) and then decrease with time, whereas the overall ground-truth TAC of the worst example peaked at the 8th frame (25 min) and washed out more slowly. Since the model imprints the early-frame information through the seq2seq architect, the dramatic changes in TAC within early frames seemed to be lost in the hidden state of the seq2seq model, which resulted in poor estimation for the data that contain the extreme gradient of TAC in the early frames.

D. N-ODE-Based Model and the Auto-Regression-Only Model
As a tool for regression, it has been reported that the N-ODE-based method has a few advantages over the autoregression-only model, such as LSTM-based models in learning dynamic data [22], [23], [26], [27]. One of the benefits is that the N-ODE-based method leverages less computational burden than LSTM models [22], [23]. Furthermore, the N-ODE-based models maintain the satisfying performance with irregular time points [23], which is often the case in dynamic PET cases. To our best knowledge, the investigation has not been made to evaluate both models on dynamic PET images, let alone each of the models. We evaluated the regression-only models using the same architect described in Fig. 2. Instead of utilizing the VAE-N-ODE architect, we exploited the seq2seqonly model to directly predict the dynamic 18 F-PI-2620 brain PET. The resulting voxel-wise SUVr of the predicted dynamic PET is illustrated in Fig. 7. The seq2seq-only model poorly estimated the pharmacokinetics given the limited number of input frames (7 frames, 20 min) in both test datasets. To that end, we believe that the N-ODE-based model is preferable, particularly when only very early perfusion images are available.
Additionally, we observed the different textures of the output images between the two models. The seq2seq-only model tends to smooth the output more than the N-ODE-based model. As a result, the prediction made from the seq2seq-only model exhibited a clearer outline of the ROIs such as fissures, in output. We speculate this occurred from the architect of models. In the seq2seq-only model, the predictions are made from the very first frames to the respective frames by regressing over convolutional layers. However, for the N-ODE-based model, the latent features that represent the predictions are first estimated using ODE solvers, and the final predictions are made using convolutional layers. In other words, the lump of convolution layers that is used to regress the final predicted outputs is massive in size in the seq2seq-only model, compared to the N-ODE-based model. This contributes to the overall smoothness of the output images, which was an unexpected benefit for the seq2seq-only model. However, due to this aspect, the images with the input presenting less contrast in the first place, such as test set 1 showed a highly biased output, especially with the limited number of input frames (7 frames, 20 min).

E. Limitations
Although we have conducted our study both on simulated and clinical dynamic data, one limitation of our work is that it is trained by the restricted amount of clinical brain PET data. We initially adopted patch-based learning to meet the computing resources, however, this approach served as data augmentation, resulting in agreeable results with confined training data. However, the model repetitively presented high bias, variance, and MAE in central structures that are relatively small in size, regardless of the numbers of the input frames. In this regard, we believe that the model has room for improvement when the patches that contain the central structures are visited more often during training. Another limitation which originated from patch-based learning is that the outputs inevitably display the horizontal and vertical strikes.

F. Clinical Implication and Future Direction
In clinical routine, the bottleneck for dynamic PET imaging is the long acquisition time, which is not only impractical but also hampered by patient movement. Furthermore, the intrinsic noise of PET images hinders the performance of the kinetic modeling, especially at the voxel level. The proposed method can address these two main limitations in clinical dynamic PET studies.

1) Reduced Time Protocol:
The recent efforts were made to reduce the scanning time while allowing the parametric images to be drawn [17], [19], [20], [35], [36], [37], [38]. As the proposed model predicts the entire frames of the dynamic PET given a few early frames alone, it provides the solution to reduce the scanning time while deriving the pharmacokinetics of interest in brain studies such as perfusion or binding perfusion. Compared to the previous studies, the proposed method utilized not only the temporal information of the dynamic PET images but also spatial information through a mass of convolutional layers and the ODE solving mechanism, which contributed to the promising performances. We demonstrated reducing from 60 to 20 min is possible in 18 F-PI-2620 dynamic PET scans. With the recent development of the new scanner [39], the proposed method potentially can achieve a very-short dynamic scanning time [21]. Yet, all tracers display different spatiotemporal patterns and extensive studies on finetuning the proposed method for other tracers will be part of future studies.
The advantage of the proposed method in reducing the time protocol is that it does not require the AIF, which is a challenge for the analytical method. One straightforward way to predict the later frames in dynamic PET analytically is to use compartment modeling to fit the images only with early frames and use the derived rate constants to predict the late frames [21]. However, this requires the determining AIF and the accurate kinetic model, both of which add the uncertainty of the prediction. Furthermore, with only a few early frames, the model suffers from the sparse sample size for the fitting. In particular, predicting k 3 or k 4 is challenging, as both rate constants are associated with the later frames. Instead, the proposed method avoids this issue by learning from the data themselves.
Yet, the AIF is still required in reduced scanning time, when the aim is on deriving micro parameters such as rate constants. One way is to infer from predicted images using an imagederived input function (IDIF), which will be incorporated in the future study.
2) Dual-Tracer Studies: Contrary to the dual time window protocol, or coffee break protocol [17], [18], [19], the proposed method does not require the late frame images to derive the full dynamic or parametric images, which leaves the margin for other tracers to be administered after the cut-off frame set for seq2seq architect as shown in Fig. 2. Thus, another critical application of the proposed method is dual-tracer studies.
Dual-tracer brain PET is a promising technique that measures the distribution of two tracers in the brain by a single scan with tracer administrations staggered in time, characterizing dual aspects of function without repeated sessions of scanning separated by hours or days [29]. Integrating complementary information of both tracers may enhance the sensitivity and specificity for presymptomatic staging as well as the differential diagnosis of heterogeneous pathogenesis with similar clinical phenotypes [6]. However, recovering the separate scans of each tracer from the dualtracer scan is challenging, as all tracer signals for the PET scanner are indistinguishable 511-KeV annihilation photon pairs, and no explicit information is available that indicates which tracer gave rise to the PET coincidence-pair measurement.
Similar to the single-tracer dynamic PET studies, the analytical explanation for deriving the pharmacokinetic parameter from dual-tracer studies is led by establishing a compartmental model. However, for dual-tracer studies, the parallel configuration is often used [29]. On the other hand, studies were held recently that use deep-learning models, which typically solve the blind source separation problem. Ruan and Liu [40] and Xu and Liu [41] proposed the deep-learning model which seeks to single out the contribution of each tracer from the mixture of signals using a deep NN. Particularly, Ruan et al. utilized stacked autoencoder (SAE) and Xu et al. employed a deep belief network (DBN).
Instead of focusing on the separation of the signal, the proposed method offers the novel approach to recover the signal from each individual tracer by aiming to predict the full dynamic images of the first administered tracer, given earlyframe images. In this way, the dynamic images of the second tracer are readily estimated by subtracting the signal of the first administered tracer from the total measurement where both overlap. The future direction of the current work includes the feasibility study of recovering single tracers in dual-tracer signal using the proposed method.

V. CONCLUSION
In this work, we established the deep-learning model using N-ODE that predicts the full dynamic frame images, hence the pharmacokinetic parameters, such as rate constants and BP ND , given the limited number of input frame images. We demonstrated that the proposed method showed promising results with both simulated and clinical brain PET data, indicated by bias, variance, MAE, and pharmacokinetic parameters. In particular, we showed that SUVr and BP ND parametric images can still be derived with 20 min 18 F-PI-2620, instead of 60 min. Additionally, we believe that this approach sheds new light on rapid dual-tracer studies.
ACKNOWLEDGMENT Matthias Brendel received speaker honoraria from GE healthcare, Roche, and Life Molecular Imaging (LMI) and is an advisor of LMI; Henryk Barthel received reader honoraria from LMI; and Osama Sabri received research support from LMI. The remaining authors declare that they have no known conflicts of interest in terms of competing financial interests or personal relationships that could have an influence or are relevant to the work reported in this article.