Impact of Diffusion–Perfusion Mismatch on Predicting Final Infarction Lesion Using Deep Learning

We report a study that validates the impact of diffusion-perfusion mismatch in a deep learning (DL) model predicting the final infarction lesion from baseline magnetic resonance imaging (MRI). From 472 consecutive patients with acute ischemic stroke, we gathered baseline and follow-up MRI having intervals of 3–7 days, and initial and final infarction lesions were segmented. Four U-Net-based DL models from baseline MRI with different combinations of diffusion-weighted imaging (DWI), perfusion-weighted imaging (PWI) maps, and initial diffusion-restricted lesion prediction map ( $\text {Pred}_{\text {init}}$ map) were trained to predict the final infarction lesion. Five-fold cross-validation was used for training and testing. As an external test set, 55 patients from another institution were analyzed. Dice similarity coefficient (DSC) was compared between the models and subgroups according to the presence of lesion growth and/or diffusion-perfusion mismatch. The model using the PWI maps and $\text {Pred}_{\text {init}}$ map showed the best mean DSC (0.422 and 0.486 for internal and external test set, respectively). This model showed better performance in predicting rapid lesion growth compared with the baseline model (mean DSC difference, 0.040; 95% confidence interval: 0.018–0.062). Using the PWI map with initial diffusion-restricted lesion prediction improved the performance of DL model in predicting the final infarction lesion from baseline MRI.

mating the lesion age, since it reflects the net increase of 48 water during the transition of acute ischemic stroke to the 49 subacute stage [9]. Thus, FLAIR, DWI, and PWI are essential acute ischemic stroke lesions on DWI [10], [11]. 56 In this study, we aimed to develop a DL model to predict  in patients aged ≥ 19 years; 2) time from symptom onset to 88 initial MRI of < 24 h; 3) DWI, FLAIR, and PWI included 89 in the initial MRI; and 4) follow-up MRI including DWI 90 and FLAIR 3-7 days after the initial MRI. We excluded 91 668 patients due to the following conditions: 1) performance 92 of mechanical thrombectomy prior to the follow-up MRI, 93 2) missing raw data of DWI or PWI required for image 94 processing, and 3) failed automatic co-registration of ini-95 tial and follow-up studies or inadequate image quality for 96 interpretation. We excluded patients who underwent mechan-97 ical thrombectomy because the procedure directly affects 98 the perfusion status, which would subsequently affect the 99 prediction of final infarction lesion by the model (leading to 100 underestimate the lesion growth when such data are used for 101 training). Finally, 472 patients were enrolled in this study as 102 training and internal test datasets. Out of the 668 excluded 103 cases, for 208 patients whose DWI and FLAIR of the ini-104 tial study were adequate, DWI and FLAIR of the initial 105 study were used for training the DWI lesion segmentation 106 model (Pred init ). As an external test dataset, 55 patients 107 who underwent initial MRI studies from January 2016 to 108 December 2016 at Asan Medical Center (AMC) were 109 enrolled using the same inclusion and exclusion criteria 110 (Fig. 5). Although SNUBH followed a CT-based triaging 111 system, AMC adopted an MRI-based triaging system.

112
The demographic and clinical data of the patients were 113 collected including age, sex, risk factor of stroke such as 114 hypertension, hyperlipidemia, or heart disease, stroke etiol-115 ogy, National Institutes of Health Stroke Scale rating, mod-116 ified Rankin Scale (mRS), intravenous thrombolysis (IVT), 117 and the time elapsed from the symptom onset to hospital 118 visit (Table 5)   The ground truth masks were carefully drawn using      To construct a possible prediction model on 3D images, 188 we developed a DL model that is a streamlined version of the 189 3D multiscale residual U-Net [13], which won the first prize  [14]. Our DL model, 192 whose architecture is described in the appendix ( predicting lesion outcome with multi-spectral MRI despite 197 the difficulty of the task [14]. 198 For the input images, we considered employing lesion 199 prediction map at the baseline (Pred init map) in addition 200 to diffusion-and perfusion-weighted maps. We separately 201 trained a segmentation model with the same architecture 202 (Table 6) using DWI, ADC, FLAIR, and the ground truth 203 maps at the same time point (i.e., the baseline lesion mask 204 was used for the baseline image), and the trained probability 205 maps obtained from this segmentation model were used as 206 an input of the prediction model. The inputs for the resulting 207 models (Models 1-4) are summarized in Table 1.
where y i is the ground truth value (either 0 or 1) and p i is 218 the predicted probability of i-th voxel, we can compute and 219 optimize the parameters of the DL model with respect to this 220 loss function. As we are trying to predict the lesion occurring 221 for only small regions on an image, DSC is more suitable 222 than other common losses such as the binary cross-entropy 223 or mean absolute error loss, because the former tends to favor 224 true positive predictions over true negatives. 225 We applied patch sampling for constructing the training 226 batch. The training batch comprised of 3D patches with a 227 size of 20 × 64 × 64 voxels, sampled from preprocessed 228 images. To accelerate the training process (or to save the 229 memory capacity demanded for the sampling), we randomly 230 selected at most 100 patients for every 100 iterations, and all 231 the patches for the training batch were sampled within those 232 selected patients for that 100 iterations. We optimized the 233 training loss using ADAM [17] for at most 20,000 iterations. 234 The early stopping method was applied by computing the 235 mean DSC of the validation set for every 100 iterations. Then, 236 the model parameters with the highest validation DSC was 237 selected.
patients into five sets randomly, where three sets were used 240 in training, one set was used for validation, and one set was 241 used for testing for each fold. All models were trained with 242 an identical split of the folds in a five-fold CV.
For the evaluation of the models, prediction maps about 244 the final infarction lesion on test data were drawn out using  Welch's t-test was conducted to test the difference in the 263 means of the numerical variables, and the chi-squared test was 264 performed for categorical variables to test their homogeneity. 265 We also applied correlation tests to analyze the effect of  The loss curve of the trained model is presented in Fig. 6. 286 We could notice that the models with Pred init (Model 3, 287 Model 4) achieves optimal weight quicker than other models.

288
The test measures of each model are presented in Table 2.  Compared with the baseline model (Model 1), Model 4 294 showed a significant difference in mean DSC in total data 295 under 5% significance level. In terms of subgroup analy-296 sis, there was a statistically significant difference between 297 two models in subgroups such as internal and external test 298 sets, data with presence of lesion growth, data with DWI-299 PWI mismatch, and DWI-PWI mismatch with lesion growth. 300 Particularly, the mean DSC increment of the subgroup 301 with lesion growth was 0.040 (95% confidence interval: 302 0.018-0.062) (Fig. 1). In addition, we could report that the 303 volume of the lesion significantly affects the accuracy of 304 the prediction map; for bigger lesions, the prediction maps 305 from models were more likely to achieve higher DSC, which 306 increases the mean and decreases the variance of the test DSC 307 (Fig. 2). For both models, mean test DSCs were significantly 308 higher in subjects with lesion volume ≥ 10 mL than in those 309 with lesion volume < 10 mL (p < 0.001 for both) on the 310 result of the internal test set. However, the external test set 311 did not show such a significant difference for both models 312 due to the small sample size and high variance in the test 313 DSC (Table 3). The results of the correlation tests between mean DSC 315 and DWI-PWI mismatch or the rate of lesion growth in 316 Model 1, Model 2, and Model 4 are shown in Table 4. 317 For Model 4, DWI-PWI mismatch and lesion growth (growth 318 more than 20%) features were negatively correlated with 319 the test DSC (ρ = −0.134 and − 0.225, respectively; p = 320 0.002 and 0.011, respectively). For Model 1, DWI-PWI mis-321 match exhibited a significant negative correlation with the 322 DSC (ρ = −0.086, p = 0.049), but the effect of lesion 323 growth was uncertain (p > 0.05). For Model 2, neither 324     (Fig. 4).

340
The main focus of our study was to generate an algorithm to 341 predict the final infarction lesion from the initial MR imaging 342 using the real-world dataset. To that end, we trained a DL 343 model with a U-Net architecture using the consecutive dataset 344 consisting of 472 patients with acute stroke from a tertiary 345 stroke center. In addition, we performed external validation 346 using the data from another large institution. The results of 347 our study show that the addition of perfusion maps and the 348 VOLUME 10, 2022  Many studies have applied machine learning or DL 357 approaches for stroke lesion segmentation [10], [11] Considering that our data has relatively small lesion volume, 366 this result is in line with that of Kim et al. [11], indicating that 367 smaller lesions tend to show a wide distribution of the DSC.

368
On the other hand, in the subgroup having final lesion volume 369 larger than 10 mL, the mean DSC of initial lesion segmenta-370 tion achieved up to 0.649 (Table 3), which is comparable with 371 previous studies.    test sets, the performance of models using PWI maps was 404 significantly higher than that only using DWI and FLAIR 405 maps. In addition, the setting of this study may reflect the 406 situation of the real world that different triaging methods are 407 currently used across the hospitals.

408
It is clinically known that DWI-PWI and DWI-FLAIR mis-409 matches are useful in determining the time point of infarction 410 and possibility of the lesion growth [3], [6], but such predic-411 tions from analysis of the mismatches are still subjective, and 412 the presence of DWI-PWI mismatch does not always identify 413 lesion growth [21]. In this regard, our proposed method can 414 aid in deciding appropriate treatment options by enabling 415 accurate and timely lesion prediction. To highlight the effects 416 of DWI-PWI mismatch, we adopted a probability map for 417 initial lesion from a segmentation model (Pred init map). As a 418 result, although a mere addition of PWI (Model 2) failed to 419 show improvement of DSC, Model 4 (which is composed of 420 DWI, ADC, FLAIR, Tmax, TTP, and Pred init map) showed 421 a significantly higher DSC compared to other models. Thus, 422 it seems that the combination of Pred init map and PWI maps 423 led the network to pay more attention to DWI-PWI mismatch. 424 In a recent study by Lin et al. [22], the velocity of infarc-425 tion lesion growth was found to be associated with the ther-426 apeutic benefit of mechanical thrombectomy. More specif-427 ically, they found that mechanical thrombectomy increased 428 the odds of good clinical outcome for patients with rapid 429 lesion growth of 25 mL/h. Since our proposed models are 430 designed to predict the final infarction lesion based on initial 431 MRI, we believe that our models, after some modulation, 432 might aid in assessing the clinical outcome by predicting the 433 lesion growth velocity. 434 Interestingly, we found that the proposed models were 435 able to predict the occurrence of the final infarction lesion 436 for approximately 30% of the cases where the lesion only 437 appeared at the follow-up study and not at the initial study 438 (Fig. 4). Hence, a future study to confirm the utility of the 439 proposed method in distinguishing transient ischemic attack 440 and actual infarction might be helpful.

441
Of note, the mean DSC of Model 2 (composed of DWI, 442 ADC, FLAIR, TTP, and Tmax maps) of the external test set 443 was lower than the internal test set, whereas the mean DSCs 444 of all the other models of the external test set were higher 445 than the internal test set (Table 5). We hypothesize that this 446 discrepancy may stem from the different magnetic strength of 447 the MRI scanners in the two institutions ( This study has several limitations. First, the datasets used 454 in this study were retrospectively collected and may not 455 be sufficiently large to address the variability in scanning 456 protocols and hardware implementations across the institu-457 tions. Nevertheless, we enrolled consecutive patients with 458 acute ischemic stroke, which outnumbers the training dataset 459 of the ISLES 2017 by more than 10-fold, even though the 460    Table 5 summarizes overall statistics of the study 491 population. Table 6 summarizes the train architecture.