Towards an AI-Based Objective Prognostic Model for Quantifying Wound Healing

Chronic wounds affect millions of people worldwide every year. An adequate assessment of a wound's prognosis is critical to wound care, guiding clinical decision making by helping clinicians understand wound healing status, severity, triaging and determining the efficacy of a treatment regimen. The current standard of care involves using wound assessment tools, such as Pressure Ulcer Scale for Healing (PUSH) and Bates-Jensen Wound Assessment Tool (BWAT), to determine wound prognosis. However, these tools involve manual assessment of a multitude of wound characteristics and skilled consideration of a variety of factors, thus, making wound prognosis a slow process which is prone to misinterpretation and high degree of variability. Therefore, in this work we have explored the viability of replacing subjective clinical information with deep learning-based objective features derived from wound images, pertaining to wound area and tissue amounts. These objective features were used to train prognostic models that quantified the risk of delayed wound healing, using a dataset consisting of 2.1 million wound evaluations derived from more than 200,000 wounds. The objective model, which was trained exclusively using image-based objective features, achieved at minimum a 5% and 9% improvement over PUSH and BWAT, respectively. Our best performing model, that used both subjective and objective features, achieved at minimum an 8% and 13% improvement over PUSH and BWAT, respectively. Moreover, the reported models consistently outperformed the standard tools across various clinical settings, wound etiologies, sexes, age groups and wound ages, thus establishing the generalizability of the models.

to heal in an orderly and timely manner [2], imposing a considerable financial burden on the healthcare systems of these countries and negatively impacting the quality of life of patients suffering from chronic wounds [3].Within the USA, an estimated $25 billion is spent every year on wound management and treatment costs [1].Despite this massive burden, efficient wound care treatment remains a challenge even for experienced clinicians.Clinicians rely on standard prognostic tools based on subjective wound assessments to identify and triage worsening wounds, as well as to provide reliable and quantifiable wound outcomes (e.g., probability of delayed healing or amputation) to measure the efficacy of different clinical interventions [4].
A typical wound assessment entails examining wound extent, burden and severity [2].The wound extent is determined by measuring the wound dimensions, e.g., area and depth, and amounts of different tissues, e.g., epithelial and necrotic, present within the wound bed.The amount of necrotic and slough tissue within the wound bed has been found to be directly related to the worsening of wounds [5].The wound burden is a function of wound extent and other attributes, such as infection, inflammation, and wound edges.The amount and type of exudate, along with markers of inflammation such as induration (firmness of tissues with margin) and edema (shiny and taut skin), help care providers flag local wound infection [6].On the other hand, the condition of wound edges and surrounding tissue is an effective indicator of wound healing.For example, the presence of attached wound edge along with an advancing border of epithelium can indicate that a wound is healing, whereas the presence of rolled edges can indicate that a wound's healing progress has stalled [7].The wound severity is guided by wound burden, patient, and environmental factors, such as age, sex, ethnicity, socio-economic status, co-morbidities (e.g., diabetes mellitus, renal disease) and systemic agents (e.g., dialysis, vasoactive drugs).Together, these attributes can influence the access to as well as the quality of care needed for ideal wound healing [2].
Standard wound assessment tools used in clinical practice focus on subjectively quantifying wound extent and burden.A majority of the wound assessment tools have been developed for pressure injuries specifically.For instance, the Pressure Ulcer Scale for Healing (PUSH) [8], specifically tracks the healing progress of pressure injuries.PUSH uses only three wound characteristics, which are wound area, exudate amount and presence of a particular tissue within the wound bed.The PUSH features were found to account for only 39-57% of the variation in wound progress over time [9], [10].Therefore, other tools, such as, DESIGN [11], Wound Healing Scale [12], Sessing Scale [13] and Sussman Wound Healing Tool [14], attempted to develop an improved tool to monitor the healing of pressure injuries.These tools additionally rely upon characterizing wound depth, Fig. 1.Individual modules of the end-to-end prognostic model trained using both subjective and objective features.Wound severity is assessed using wound, patient and environment related factors [2].In this system, we have used pre-existing deep learning modules to quantify wound size and tissue amounts [21].
infection, maceration, hemorrhage, location and wound edges.Similarly, prognostic tools, such as PEDIS [15], SINBAD [16] and diabetic foot ulcer assessment scale [17], have been developed specifically for monitoring diabetic wounds.The validity of these tools across different wound types is limited [18].On the other hand, the Bates-Jensen Wound Assessment Tool (BWAT) is a prognostic tool that has been used in various clinical trials and clinical settings across chronic wound from a variety of etiologies [19].Using BWAT, clinicians monitor wound healing by subjectively assessing eleven wound attributes, including wound area, depth, edges, tissue types, exudate amounts and types, skin colour, induration and edema.
The subjective assessment of wound characteristics is, time consuming, susceptible to misinterpretation and high inter-rater variability [20], [21].Moreover, using integer numbers for subjectively scoring wound characteristics leads to lower sensitivity in tracking progress of smaller wounds or wounds in the final stages of healing in comparison to larger wounds [9].Furthermore, the clinicians' level of expertise also contributes to the accuracy of the assessment [22].The camera-based monitoring of wounds offers a non-contact way of assessing wounds, while improving the sensitivity, accuracy and speed of wound assessments [23], [24].To this end, this study explores the feasibility of replacing subjective factors, related to wound extent, with their objective representations for wound prognosis, derived from wound images obtained using Swift Medical's wound imaging platform.Swift Medical provides a wound imaging platform that has been demonstrated to be quicker [24], reproducible and accurate [25], as well as having robust AI-integration for assessing wound areas, margins and tissue composition [21], and is therefore ideal for the development of objective metrics.Additionally, the wide adoption of the solution has enabled a large dataset, for example a recent study of Swift's real-world dataset was ten times larger than any previously published study [26].This rich dataset is ideal for the development of machine learning models for wound care prognostics.
Deep learning based models have been found to perform accurately on wound and tissue segmentation tasks, as reported in [21], [27], [28], [29], [30].Therefore, these models are ideal candidates for objectively determining wound characteristics, such as wound size and tissue amounts, that quantify wound extent.However, the prognostic capability of such techniques is not very well understood.There have been some attempts to develop an end-to-end system for wound prognosis using wound size determined by wound segmentation models [31], [32].However these models were evaluated on a limited set of wound etiologies and time-series.Moreover, these did not evaluate the prognostic capability of deep learning based tissue segmentation techniques.
Therefore, in the present study we have attempted to establish the prognostic capability of a system that replaces subjective representation of all aspects of wound extent, as shown in Fig 1 .To replace traditional clinician measurements of wound extent, we used wound segmentations and tissue amounts from two pre-existing models called AutoTrace and Autotissue, as described in [21].These models were considered ideal for the task as these were trained using significantly larger dataset and validated across a larger variety of wound etiologies compared to other deep learning models.The estimated wound area and tissue amounts, along with other wound characteristics, were then used to train prognostic models using the Cox Proportional Hazards (Cox-PH) model [33].These models are easily interpretable and can be used for providing quick and actionable insights to the clinicians related to wound prognosis.Furthermore, we have provided a comparison of our techniques against standard clinical prognostic tools, such as PUSH and BWAT, using the dataset acquired from different clinical settings across various patient demographics, and time-to-heal.The study also provides a guide to the development of completely objective prognostic models in the future by identifying the features correlated with wound healing.To the best of our knowledge such a comprehensive characterization of wound prognostic techniques has not been reported in previous studies.

A. Dataset Description
The dataset was extracted from Swift Medical Inc's anonymized wound care database [21].The study was deemed exempt from informed consent requirements after review from Research Review Board Inc., an independent review board located in Ontario, Canada. 1 This proprietary dataset was derived from 2,361 skilled nursing facilities (SNFs) and 141 home healthcare facilities (HHFs), spread across North America.The dataset focused on four frequently encountered wound types including pressure injuries, venous ulcers, diabetic wounds and arterial ulcers.We included wounds having at least four evaluations and were tracked for longer than three weeks, which filtered out recently acquired wounds and reduced artificially elevated levels of censoring in the dataset.
In total, the dataset consisted of 2,151,185 wound evaluations and images derived from 201,463 wounds and 98,407 patients distributed across all of the aforementioned facilities.The distribution of wound etiologies is shown in

B. Feature Engineering
Swift's Skin and Wound App is used across different clinical settings that include SNFs and home healthcare facilities.It was observed from our dataset that different clinical facilities leverage different wound assessment tools to monitor wound healing.This necessitates engineering features compatible with wound assessments used across facilities.In the following subsections, the subjective and objective feature engineering modules of our system are described.In the subjective feature engineering module, we have described the transformations that were used to create compatible features derived from the subjective assessment of the wound.Whereas, machine learning models and computational scaling techniques used to extract objective information, from wound images collected as part of every wound evaluation, are described in the objective feature engineering module.
1) Subjective Feature Engineering Module: Various encoding techniques were applied to wound and patient characteristics, such as integer, boolean, float and one-hot encoding for use in our new wound prognostic models (Table III).Moreover, missing values for each feature were encoded as boolean within a 'Feature Unknown' category, 2 and wound locations were mapped to six different body locations as shown in Fig. 3.The generated features were used to train new wound prognostic models.For comparison, PUSH and BWAT scores were computed based on their respective subjective assessment guidelines.Missing PUSH or BWAT scores within evaluations were imputed using a forward filling methodology, meaning that missing values within a series of evaluations were replaced with the immediately present preceding value.
2) Objective Feature Engineering Module: Objective features were extracted directly from the images taken during wound evaluations.Deep learning-based models were employed for objective determination of wound region and quantification of different tissue types within the wound region.The segmented wound regions and tissue regions were subsequently used to objectively determine the wound area and relative amounts of different tissue types within the wound bed.The module adopted for wound region segmentation and wound tissue segmentation are summarized in the following subsections, and the methods and performance are described in-depth in a separate article [21].
The wound segmentation module, called AutoTrace model, is based on a deep convolutional encoder-decoder neural network architecture with attention gates in the skip connections and several other customizations which make it suitable to run on mobile devices.The encoder block allows the model to extract features, whereas the decoder block produces a wound segmentation mask from the learned features.In total, the AutoTrace model consists of approximately 3.5 million parameters.Some specific customization within the AutoTrace model architecture included replacing normal convolution blocks with depth-wise separable convolutional layers to reduce the computations, and implementing strided depth-wise convolutions that can learn to downsample activations.
AutoTrace was trained using more than 400,000 image-label pairs with wound region labels determined by clinicians.The model was tested on 2,000 image-label pairs of various wound types, including pressure injury, venous, diabetic and arterial wounds.The model performance was characterized using mean intersection over union (mIOU) between user traced wound regions mask (target mask) and the AutoTrace predicted wound region mask (predicted mask), which represents the ratio between the number of common pixels between the target and predicted masks to the total number of pixels present across both masks.The trained AutoTrace model achieved a mIOU of 0.86 for wound region segmentation.Segmented wound regions were then used to compute wound areas by scaling wound regions with respect to a calibrant sticker of known size, called HealX, which is typically placed on the same plane close to the wound.The computed wound area was then appended to the original wound series data, and normalized relative to the initial wound area followed by log-transformation for statistical normality [34].
The wound tissue segmentation module, called AutoTissue model, is also based on an encoder-decoder neural network architecture.However, it uses an EfficientNetB0 architecture [35] as the encoder, whereas the decoder is made up of 4 blocks, each consisting of a single 2-D bilinear upsampling layer followed by 2 depth-wise convolutional layers.In total, the AutoTissue model consists of approximately 3.8 million parameters.
AutoTissue was trained using 17,000 anonymized wound images labeled by trained labelers and curated by wound clinicians.The model segments detected wound region into 5 separate tissue categories including epithelial, granulation, slough, eschar and other, the latter of which included mostly healthy tissue and the HealX calibrant sticker.The model was tested on a set of 383 images corresponding to stage 2 pressure injuries, arterial, venous and diabetic wounds.The mIOUs for other, epithelial, granulation, slough and eschar tissue were 1, 0.42, 0.69, 0.69 and 0.85, respectively.Wound tissue amounts were extracted for each image by first applying the AutoTrace model to focus on the wound bed, followed by the application of the AutoTissue model to determine various tissue regions within the wound bed.Next, the number of pixels within each segment was counted and the percent area of each segment was computed with respect to the total number pixels that covered the wound bed.Finally, the percentage area of each segment was log-transformed.
3) Feature Computation: As the image-based objective features were not readily available for the majority of the retrospective data collected using Swift Skin and Wound App, these models were retrospectively applied on approximately 2.1 million images collected over time.In order to reduce the processing time, the computation was executed on an Amazon Web Services (AWS) r5a.2xlarge instance-based auto-scaling cluster.Parallel batch jobs were orchestrated using Argo Workflows, an open source container-native workflow engine.Furthermore, computing objective features on the cloud mitigated the risk of raising Protected Health Information leakage concerns from wound images, as the images were securely stored on AWS and processed inside Docker containers.The computation of objective features for 2.1 million images was completed within 40 hours using 20 parallel running jobs, effectively reducing computation time by 20x compared to running the objective feature extractor sequentially on a set of 2.1 million images using an instance based on a single central processing unit.

C. Dataset Preparation
The dataset from SNFs was stratified into training and testing sets using an 80%-20% split, resulting in 151,680 training and 37,920 testing wounds.The train/test split was performed such that the percentage of each wound type was consistent between training and testing sets.The dataset from home healthcare facilities was stratified such that the data from facilities that employ the BWAT was set aside for testing in order to have sufficient wound series for reliable testing performance.This resulted in 10,118 training and 1,745 testing wounds.Finally, the training datasets from SNFs and home healthcare facilities were combined to form a single training set.The testing sets were split into ten and three non-overlapping sub-samples for SNFs and home healthcare facilities, respectively.

D. Prognostic Model Development
Survival analysis techniques allow researchers to investigate the relationship between one or more predictor variable and the time when an event (e.g.failure of a mechanical system, death in biological organisms) occurs.Cox Proportional Hazards models estimate the effect of a predictor variable on the hazard function for the event of interest [33], and are one of the most commonly used regression techniques in survival analysis techniques.The hazard function describes the probability (or risk) of occurrence of an event based on covariate levels.The hazard function for the Cox PH model has the form where, h(t) is the expected hazard at time t, h 0 (t) is the baseline hazard which represents the hazard when all the covariates (X 1 , X 2 , . . .Commonly, the term exp(b n ), is referred to as the Hazard Ratio (HR) and is used to measure the magnitude of the treatment difference [36], [37].For example, when comparing the effect of an investigational treatment, X represents the treatment indicator.It is assigned a value of 1 for patients who receive the treatment, whereas for patients in the control group, it is assigned a value of 0. Then, represents the HR between the two groups.A HR = 1 (b = 0) has no effect on event probability.However, if b > 0, then HR > 1, which indicates an increase in event hazard and thus a decrease in the length of survival.If b < 0, then HR < 1, which indicates a decrease in event hazard and an increase in the length of survival.Due to the straight-forward interpretation of model weights that is possible with Cox PH models, these models can be trained to characterize the effects of covariates on various wound outcomes (or endpoints), including a healed or infected status of a wound, or an amputation status of a limb.The resulting predictions from the trained model can be used as a prognostic index to alert clinicians of the risk of occurrence of adverse wound outcomes.In this research, a prognostic model was trained to characterize the effect of different covariates on the 'healed' or 'closed' status of the wound.Specifically, we trained our model using a variation of the Cox model called Cox time-varying proportional hazards [38] which allowed us to account for changes in covariates over time, such as increases or decreases in wound area and tissue amounts.
1) Feature Selection: We selected the features used in our models using a stepwise forward-backward feature selection process on our training dataset.In the forward step, we trained multiple univariate models and kept only the features that were significant using likelihood ratio tests.Next, in the backward selection step, various multivariate models were created by removing one feature at a time and tested against a multivariate model with all the forward selected features.The selection criteria was based on a p-value threshold of 0.05 in both steps.The lifelines package v.0.21 [39] for Python was used to train the Cox models and compute likelihood ratio tests for feature selection.
The predictive information for a subset of features was quantified using the adequacy index [40]: where, LR s is log likelihood explained by the subset of features and LR is the log likelihood explained by the entire set of features.The feature importance was computed as 1 − A, which represents the fraction of new information contributed by the feature to the model.Finally, the features with feature importance greater than a pre-defined threshold were selected.
2) Model Characterization: The selected features were used to train six different models using different combinations of features as listed in Table IV.Model 1,2 were completely based on subjectively assessed features.Specifically, Model 1 included features that encapsulate wound extent along with other wound severity related factors, whereas, Model 2 only included features Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV PROGNOSTIC MODELS AND THE FEATURES THAT WERE USED TO TRAIN THEM
encapsulating extent.In Model 3 , we computed tissue amounts using the AutoTissue segmentation model described before, thus making Model 3 a hybrid model that uses both subjective and objective information.On the other hand, Model 4 completely replaced subjectively assessed wound area and tissue amounts with objectively computed features using AutoTrace and AutoTissue models.Model 5 leveraged hybrid information related to wound extent and severity, where tissue amounts were computed using the AutoTissue model.Finally, in Model 6 we used hybrid information that included objectively determined wound extent based features, and subjectively determined wound severity related factors.
The models were trained using the data from four different wound etiologies that included pressure injuries, diabetic, venous, and arterial wounds, thus resulting in wound-agnostic models.The trained models were compared against each other and standard wound assessment tools, such as PUSH and BWAT.Moreover, the wound-agnostic model performances were characterized for different clinical settings (i.e., SNF vs. Home-Health), sexes (i.e., Male vs. Female) and age groups (e.g., 30-39 vs. 50-59).Finally, wound-specific models were trained using the data from individual wound etiologies and their performance metrics were compared against the wound-agnostic models.
3) Performance Evaluation: Typically, the goodness-of-fit of a risk-score producing models, such as the Cox model, is quantified using the so-called Harell's Concordance Index (or C-Index) [41].Given a prognostic model that produces risk scores 'η i ' and 'η j ' for every pair of wounds 'i' and 'j' (with i = j) with time-to-heal 'T i ' and 'T j ,' respectively, a pair '(i, j)' is considered 'concordant' if 'η i > η j ' and 'T i > T j ,' or 'η i < η j ' and 'T i < T j '.However, a pair '(i,j)' is considered 'discordant' if 'η i > η j ' and 'T i < T j ,' or 'η i < η j ' and 'T i > T j '.In other words, a wound pair is concordant if the wound whose event is more imminent is given a higher risk score than the wound whose event is more distant in time.The C-Index for such a model is computed as follows:

of Concordant P airs No. of Concordant and Discordant P airs (4)
However, Harell's C-Index is not appropriate for dynamic prediction models that incorporate longitudinal covariate data.As an alternative, the incident/dynamic time-dependent area under the Receiver Operator Characteristic (AU C I/D ) curve is suitable for dynamic prognostic models [42], [43].The AU C I/D compares the predictions of incident cases, defined as the wounds that heal at the time point at which the discriminative ability is assessed, with dynamic controls, defined as the wounds that have not yet healed.The AU C I/D assesses the concordance of the predictions at time points 't k ' among incident cases, i.e., wounds with time-to-heal T = t k , and dynamic controls, i.e., wounds with T > t k and is defined as: where 'X i ' and 'X j ' are the risk scores generated for two different wounds 'i' and 'j'.The AU C I/D directly incorporates the effects of longitudinal covariates like wound area and tissue amounts within wound bed, and can be interpreted as the probability that a random wound that healed on a given day is given a higher score than another random wound that has not yet healed on that given day.Furthermore, a concordance summary (C t ) can be estimated from AU C I/D as the weighted average of the area under the time-specific ROC curves using: where, w(t) = 2.f (t).S(t), f (t) represents the the distribution of failure times T and S(t) represents the survival time [43].We used AU C I/D and C t to quantify and compare the performance of different models.

A. Feature Selection
The stepwise feature selection process helped identify the significance of individual features in developing the prognostic model for wounds.The feature representing Location-2 (torso and back, as shown in Fig. 3) was found to be insignificant (p > 0.05) and was removed from further analysis.The importance of each remaining feature was quantified using the Adequacy Index, and the resulting relative feature importances are shown in Fig. 4. The features with Adequacy Index greater than the empirically determined threshold of 0.5 were finally selected.

B. Wound-Agnostic Model
The selected were used to train multiple prognostic models as outlined in previous sections.Table V lists the model coefficients and hazard ratios per feature per prognostic model.It can be noted that HRs per feature were consistently greater than or less than 1 across different models, except HRs for SNF and Location-5.Generally, other tissue amount, epithelialized and attached edges, missing features and location features had HR > 1, whereas the remaining features, such as wound area, slough and eschar amounts, had HR < 1.

C. Wound-Agnostic Model Performance
The performance for each prognostic model was determined using the concordance index (C t ) of wound pairs in the testing set.Table VI shows the C t for each model at skilled nursing facilities and home health facilities, respectively.Moreover, it compares the performance of each individual model across different wound types 3 .Using an Analysis of Variance (ANOVA) model, it was determined that there was a statistically significant effect of model types on C t (F 9,50 = 673.52,p < 0.001).Tukey's post-hoc tests revealed significant differences between all combinations except: Model  An additional ANOVA model determined that there was a statistically significant effect of wound type on model performance with F 4,15 > 15, p < 0.001 for Model 3−6 .Subsequent Tukey's post-hoc tests revealed that Model 3−6 performed statistically significantly better for pressure injuries compared to venous and diabetic wounds within SNFs.Furthermore, using Bonferroni corrected independent t-tests it was found that all prognostic models, except Model 2 , performed significantly better (t > 5.74, p < 0.0084) under the home health setting.
Figure 5 compares the performance of prognostic models against PUSH scores at skilled nursing facilities, specifically for pressure injuries (as PUSH is not valid for other wound types).The best performing prognostic models (Model 5,6 ) performed at least 8-9% better than PUSH and 7% better than PUSH when Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.missing PUSH values were imputed using a forward filling technique.ANOVA showed that there was a significant effect of model type on C t for pressure injuries (F 9,70 = 673.52,p < 0.001).Tukey's post-hoc tests revealed significant differences between all prognostic models and PUSH.The forward filled PUSH scores had significantly higher C t compared to normal PUSH scores.

TABLE VI PROGNOSTIC MODEL COMPARISON ACROSS CLINICAL SETTINGS USING (C t )
Figure 6 compares the performance of prognostic models against BWAT scores at home healthcare facilities for pressure injuries, venous, arterial and diabetic wounds.The best performing prognostic models (Model 5,6 ) performed at least 13-14% better than BWAT and 11-12% better than forward filled BWAT.ANOVA showed that there was a significant effect of model type on C t with F 2,21 = 34.93,p < 0.001.Tukey's post-hoc tests revealed significant differences between BWAT and all prognostic models, except Model 2 .The forward filled BWAT score had slightly higher C t compared to normal BWAT score, but this observation was not statistically significant.
1) Dynamic Performance: Figure 7 shows the dynamic AU C I/D for each prognostic model along with standard wound assessment tools used across different clinical settings for pressure injuries.In order to completely characterize the behaviour of the prognostic models across time, the dynamic AU C I/D was computed for an augmented testing dataset.The augmented testing dataset consisted of the original testing data along with wound series that were shorter than three weeks in age.Model 5,6 consistently performed better than PUSH, forward filled PUSH, BWAT and forward filled BWAT scores across different time points.Generally, each prognostic model performed better than standard wound assessment tools with the exception of Model 2 , which was trained using subjectively identified tissue presence and user traced wound area.The t-tests with Bonferronicorrected p-values were performed on the SNFs dataset to reveal significant differences between all models across each time point.We found that Model 5,6 performed better than PUSH and forward filled PUSH scores at least until 150 days since the first evaluation of the wound.Moreover, there was no significant difference between Model 5,6 across time.Similarly, significant performance differences between Model 5,6 and BWAT were observed in home healthcare facilities, at least until day 80.
Sex-Wise: Figure 8 shows the performance of woundagnostic models across female and male demographic of the population, highlighting that all models performed equally well across sexes.A Mood's median tests performed per model confirmed that there was no significant difference in performance between female and male demographics (χ 2 ≤ 1.8, p > 0.1).
3) Age-Wise: Figure 9 shows the performance of woundagnostic models across different age groups of the population.Models performed well across different age groups.

D. Wound-Specific Model Performance
The performances of the wound-agnostic models were compared against wound-specific models developed using data from specific wound types.Fig. 10 illustrates the performance comparisons between the wound-agnostic and wound-specific  models for injuries, diabetic, arterial and venous wounds at SNFs.Mood's median tests performed across model types did not reveal any significant difference between wound-agnostic and wound-specific models for pressure injuries.However, Model 3,4,6 showed statistically significant (χ 2 = 4.05, p < 0.05) minute differences (up to 1%) between wound-specific and wound-agnostic models for arterial, venous and diabetic wounds.

IV. DISCUSSION
The feature importance analysis identified the most predictive features for wound prognosis, and the most important feature in this analysis was determined to be the amount of exudate.Exudate is a part of the normal wound healing process, however, in chronic and non-healing wounds with persistent inflammation or infection, it leads to delayed healing due to the presence of excess bacteria and abnormal levels of inflammatory mediators and protein digesting enzymes [6].Furthermore, the remaining features identified as significantly predictive are also directly associated with wound healing.For instance, a firmly attached wound edge has been found to promote wound healing, dark red or beefy-looking granulation tissue within the wound bed is indicative of infection, and wound area is directly correlated with wound closure and healing [7].Induration and edema were selected during the stepwise selection process, but individually, these features did not account for significant amounts of variations within the data, suggesting that other features encoded a portion of the underlying information represented by induration and edema.
Feature importance information also serves as a guide to build an objective prognostic tool for wound care.Individual machine learning models can be to objectively quantify and characterize the relevant features.The resulting objective features can be used downstream to build a prognostic tool through replacing their subjective feature counterparts.To this end, the results of this research have shown feasibility of developing prognostic models using objective representations of wound extent.The results from the analysis surrounding feature importance indicate that future research can benefit from focusing on the development of objective representations of factors affecting wound burden by modelling and quantifying exudate amount and wound edges.
The developed prognostic models are directly interpretable through hazard ratios corresponding to features constituting a particular model.A feature with HR > 1 leads to early wound healing (or decrease in wound survival time), whereas a feature with HR < 1 leads to delayed wound healing (or increase in wound survival time).The features with HR < 1 that are numerically encoded, such as tissue type presence, normalized wound area, objectively determined granulation, slough, or eschar amounts, exudate amount, and exudate type are features for which an increasing value corresponds to delayed wound healing.Therefore, if necrotic tissue is identified in the wound bed, or wound area increases, or amount of slough tissue increases, or heavy exudate amount is observed at the following wound evaluation, then the risk of delayed wound healing increases.In contrast, features with HR > 1 that are numerically encoded, such as other tissue amounts, are features for which an increasing value corresponds to faster wound healing, and thus reduced risk of delayed wound healing.This observation is corroborated by the findings in [21], where the 'other' tissue feature was found to predominantly quantify the amount of healthy tissue.One-hot encoded and boolean features can be interpreted similarly.Therefore, the presence of a feature with HR < 1, such as the presence of non-attached or rolled wound edges, or a wound location as on the foot (Location-5), are features that correspond to delayed wound healing.Conversely, the presence of a feature with HR > 1, such as the presence of attached or epithelializing wound edges, or a wound location as on the face (Location-1), are features that correspond to faster wound healing.
In this work, a new prognostic model based on subjective features, Model 1 , was introduced and demonstrated to lead by 4% and 7% over PUSH and BWAT, respectively.Through comparing Model 1 to Model 5,6 , and Model 2 to Model 3,4 , it is evident that replacing subjective information with objective information leads to more accurate prognostic models.The worst performing model, Model 2 , indicates that using very few subjectively assessed features do not capture the complete information needed to track the progress of wound healing.The purely objective prognostic model, Model 4 , consistently performed 6% and 9% better than PUSH and BWAT, respectively, thus providing further validity to the information extracted from the AutoTrace and AutoTissue wound segmentation models.AutoTrace and AutoTissue simply require an image of the wound to compute accurate wound area and tissue composition, thus Model 4 can serve as a faster and more reliable prognostic tool for wound care assessment compared to standard wound assessment tools.Using a hybrid approach such as Model 5,6 results in an even more accurate prognostic index.A further benefit of the developed models is that they are wound agnostic in nature and have been validated for four different wound etiologies, including pressure injuries, venous, arterial and diabetic wounds.Although, both Model 5 and Model 6 perform equally well, Model 6 would be preferable in practice, as it uses more objective features which are expected to reduce subjectivity and allow quicker assessment of wound severity.The performance of the prognostic models proposed here are expected to be directly correlated with the accuracy of the features derived from the segmentation models.Therefore, it is easy to replace AutoTrace and AutoTissue modules with more accurate segmentation models.However, for most accurate results it would be appropriate to retrain the prognostic models using the new information.
The developed prognostic models performed better than the standard wound assessment tools across two clinical settings, four different wound types, sexes and age groups, establishing their validity across different segments of the dataset.The C t for the best performing models, Model 5,6 , was at least 0.70 and 0.76, across SNFs and home healthcare settings, respectively.This indicates that the probability of the model-derived risk score being lower for a faster-healing wound than for a longer-healing wound is 70% for SNFs and 76% for home healthcare facilities, assuming that the healed event occurred within the first 180 days of the wound's existence.For home healthcare, this represents a large increase in concordance relative to BWAT, a widely used etiology-agnostic wound assessment tool, which is only about 65% concordant.The differences in concordance across settings also suggest that the models performed better under home healthcare compared to SNFs, though the reason for this observation is unclear.Across SNFs the models performed better for pressure injuries compared to venous and diabetic wounds.This bias could be due to pressure injuries comprising approximately 70% of the total dataset used in training the models.Therefore, wound-specific models were developed by training prognostic models on datasets consisting of specific wound types.Comparing against wound-agnostic models, it was observed that wound-specific models offered statistically significant improvements, though the magnitude of improvement was minuscule (1-1.5%).However, the generalizability of wound-agnostic models across different wound etiologies far outweigh the improvement in performance offered by woundspecific models.
The model performance did not show any significant difference in performance between sexes, thus establishing model generalizability across both males and females.Unfortunately, the model performance could not be characterized across a younger (< 30 years old) population due to insufficient data, as chronic wounds are rare in younger demographics.However, C t was computed for older populations per age group and the developed models were found to be adequate, with the best performing model resulting in C t ≥ 0.70.This establishes the fairness of our prognostic models with regard to demographic segments of sex and age.The generalizability of our models across different ethnicities remains to be tested in future work.
The model performance was characterized across time using dynamic AU C I/D .It was used to compare model performance against standard wound assessment tools used across different clinical settings.The AU C I/D computed for SNFs showed an increase in performance initially up to the first 21-28 days since the first evaluation of the wound, reaching a stable performance state up to 90 days and then slowly decreasing in model performance up to 180 days.Specifically, for Model 5,6 the AU C I/D ranged between 0.71 and 0.72 within the first 20-120 days of evaluation, indicating that on any day t between 20 and 120 days of evaluation, the probability of a lower risk wound healing faster than a higher risk wound, as predicted by the prognostic models, is at least 0.71.Moreover, we observed that throughout the evaluated time range, our models performed significantly better than the PUSH scores.Similarly, under home healthcare settings, the prognostic models performed better than BWAT scores across time, though the results were not significant after 80 days of evaluation.The lower performance on the initial few weeks of evaluation at SNFs indicates that it is likely difficult to determine wound prognosis for relatively younger wounds using the features used in this study.However, this observation could not be corroborated at home healthcare facilities.

V. FUTURE WORK
The future analysis will explore the feasibility of objectively determining features related to wound burden, such as wound edges, exudate amount and exudate types, and their prognostic abilities.In addition, the future models would leverage machine learning techniques, such as random survival forest [44], Deep-Surv [45], that model non-linear interactions between features.This will lead to the development of a completely objective, more accurate and faster prognostic tool.We will also explore the generalizability of the reported models across different ethnicities, which is absent from the current analysis due to the unavailability of the information.
Moreover, the current analysis does not consider the impact of prognostic indices on clinical interventions.As future work, we aim to develop a decision support system, as shown in Fig. 11, that can assist clinicians in determining the optimal interventions needed to heal the wounds quicker.Given the wound care challenges dependent on skills and experience, being able to suggest clinical interventions that could alter the risk trajectory may provide significant clinical value and make wound care more equitable.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

VI. CONCLUSION
In conclusion, this study demonstrates the feasibility of developing prognostic tools based on the objective information derived from wound imaging.The feature importance investigation equips researchers with a guide to develop a more accurate, objective prognostic tool.Furthermore, this study presents superior prognostic models compared to standard wound assessment tools used in the field today across different clinical settings, wound etiologies, sex, age-groups and time.This indicates that the inclusion of the newly developed prognostic tools into standard wound care practice can aid in accurate and faster detection of high risk wounds, assisting clinicians in better decision making and improving outcomes.

Fig. 3 .
Fig. 3. Wound locations are mapped across six color coded body locations.
X n ) are set to zero and b 1 , b 2 , . ..b n are the model weights that quantify the association between covariates and the hazard function.The model weights are computed by maximizing the partial likelihood which enables the modelling of the effects of the covariates without the need to model the hazards over time.

Fig. 4 .
Fig. 4. Feature importance as determined using Adequacy Index.The selected features are shown with an asterisk ( * ).
1 vs. Model 3 , Model 1 vs. Model 4 , Model 3 vs.Model 4 and Model 5 vs. Model 6 .It can be seen that Model 5 and Model 6 were the best performing models across different clinical settings and wound types, and Model 2 performed the worst.

Fig. 5 .
Fig. 5. C t across different models and PUSH score for Pressure Injuries at SNFs.

Fig. 6 .
Fig. 6.C t across different and BWAT scores for Pressure Injuries, Venous, Arterial and Diabetic wounds at Homehealth facilities.

Fig. 7 .
Fig. 7. Dynamic AU C I/D across different models different clinical settings.

Fig. 8 .
Fig. 8. C t for male and female demographic across different models at SNFs.

Fig. 9 .
Fig. 9. C t per age group across different models at SNFs.Note that C t was not computed for age < 30 years due to the absence of sufficient data.

Fig. 10 .
Fig. 10.Comparison of wound-agnostic and wound-specific models across wound types Table I, and the distribution of patient sex and age are is shown in Table II and Fig 2, respectively.

TABLE I TOTAL
WOUND SERIES PER WOUND TYPE ACROSS SKILLED NURSING FACILITIES (SNF) AND HOME HEALTHCARE FACILITIES (HHF)

TABLE III SUBJECTIVE
FEATURE LIST AND FEATURE ENGINEERING TRANSFORMATIONS