Impact of Sleep and Training on Game Performance and Injury in Division-1 Women’s Basketball amidst the Pandemic

We investigated the impact of sleep and training load of Division - 1 women’s basketball players on their game performance and injury prediction using machine learning algorithms. The data was collected during a pandemic-condensed season with unpredictable interruptions to the games and athletic training schedules. We collected data from sleep monitoring devices, training data from coaches, injury reports from medical staff, and weekly survey data from athletes for 22 weeks.With proper data imputation, interpretable feature set, data balancing, and classifiers, we showed that we could predict game performance and injuries with more than 90% accuracy. More importantly, our F1 and F2 scores of 0.94 and 0.83 for game performance and injuries, respectively, show that we can use the prediction for informative analysis in the future for coaches to make insightful decisions. Our data analysis also showed that collegiate athletes sleep less than the recommended hours (6-7 instead of 8 hours). This coupled with a long hiatus in games and training increases the risk of injury. Varied training and higher heart rate variability (due to better quality sleep) indicated a better performance, while athletes with poor sleep patterns, were more prone to injuries.


I. INTRODUCTION
S PORTS data analytics has been gaining significant attention through collegiate and professional leagues and Esports. Any insights regarding athletic performance can make a difference in games in terms of athletic performance or preventing injuries, ultimately determining the overall success of a team. This transdisciplinary research has brought athletes, athletic trainers, exercise scientists, engineers, and data scientists together to investigate a Division-1 basketball team's season where COVID-19 caused unprecedented disruptions. Using daily sleep patterns of the athletes combined with subjective training and survey data, this project positioned itself in a way that game performance and injuries could be predicted with machine learning (ML) methods.

A. ATHLETIC PERFORMANCE
Sleep and recovery strongly influence physical qualities such as strength, anaerobic power, flexibility, and physical performance [1]. Additionally, sleep is known to affect cognitive, emotional, and mental performance [2]. Athletes, in general, need more than eight hours of sleep to feel rested; however, their average hours of sleep fall below seven hours a night due to academic, social, and athletic commitments [3]. On the other hand, sleep extension beyond 8 hours per day for several weeks was shown to improve shooting accuracy for collegiate men's varsity basketball team and reaction time for college varsity tennis players [4,5].

B. ATHLETIC INJURIES
Collegiate athletes are reported to spend up to 41 hours per week on athletics, resulting in inadequate time for restorative sleep [6]. It has been reported that sleeping less than 8 hours per night resulted in higher injury rates in adolescent athletes [7,8]. Furthermore, a simultaneous increase in training load, training intensity, and decreased hours of sleep resulted in a significantly higher risk of injury in this population [9]. Limited research exists addressing the association between sleep and injury rate, and data is limited to the use of subjective questionnaires to characterize sleep metrics [10].

C. MACHINE LEARNING METHODS IN BASKETBALL
Modern basketball analytics focus on action-oriented and proactive strategies [11]. ML in basketball can help roster composition, gameplay strategies, and player flow during a game [12]. Commonly, classification, regression, and clustering techniques are used in basketball analytics [13]. For example, the impact of a player's game performance on win probability was carried out by using the Bayesian linear regression model [14]. In addition, Wang et al. [15] used neural networks on basketball videos to classify offensive strategies such as player movement and location, in-bound plays, etc. Similarly, with the goal of improving the team performance, Piette et al. [16] used network analysis to find the path during the gameplay that generated maximum points and predicted the outcome.
On the other hand, injury prediction is challenging due to the multidimensional parameters coming from psychological, social, nutritional, and training workload [17]. Data scientists have used artificial neural networks (ANN), Bayesian networks, decision trees, clustering, and Support Vector Machines (SVM) for injury prediction of a player [18]. Talukder et al. [19] utilized game data, workload, and team schedules with random forest and an autoregressive model with logistic regression to predict player injury. Knee injury and heart defect detections have also been explored using ML [20,21].

D. COVID IMPACT IN ATHLETICS
Covid-19 has caused the cessation of daily activities by teams and athletes in sports. Athletes missed technical, tactical, and psychological preparation that disrupted injury prevention, nutrition, and recovery programs [22] potentially raising the risk of injury, [23]. Grazioli et al. [24] found that after 63 days of quarantine, professional soccer players lost lean body mass and fat mass resulting in degraded sprinting ability and lower body power. It has been suggested that athletes must be gradually exposed to physical training after detraining periods [25].

E. PROPOSED WORK
This holistic study incorporated training data from strength coaches, sleep and recovery information from a wrist-worn device, and perception questionnaires from the Women's Division-I basketball team to create a platform for predicting game performance and injury risk assessment. Our research questions (RQs) were: RQ1: Which training, sleep, and survey data features are more important for the following?
(a) Game performance (b) Injuries RQ2: Can we predict injury and game performance based on training, sleep, and athletes' perception for data collected during a pandemic? Considering the above RQs the paper focuses and contributes in the following directions: • Selection of an interpretable feature set: We have performed intensive feature selection to reduce the dataset complexity and created an interpretable feature set containing 12 features for game performance and 14 for injury risk prediction. • Game performance and injury prediction: We applied ensemble machine learning algorithms on a reduced data set based on feature importance to predict game performance and injury with an F1 score of 0.94 and F2 score of 0.83 respectively. • Impact of pandemic on injuries: We collected data during the pandemic, where team schedules were very different from previous seasons due to game date changes, cancellations, and quarantine protocols.

A. DATASET
The Division-1 Women's Basketball team at Sacred Heart University were the subjects of this study (Institutional Review Board approval number 170720A on 9/14/2020). The methodology of the data collection is outlined in Fig. 1. Following data from 16 players were collected for 25 weeks: (1) volume load, monotony, and strain from resistance and metabolic conditioning training 3 times a week, i.e. 3/week.
(2) Daily (7/week) sleep schedule and heart rates of the athletes via WHOOP strap [26]. (3) The perception of athletes' recovery and stress levels were measured with subjective surveys (twice a week, 2/week, Mondays and Thursdays). The collected data were evaluated against game statistics and injury reports for developing ML models to predict game performance and injuries. Fig. 2 shows a snapshot of the dataset. It contains a few features from sleep, recovery, training, perception (survey), game performance and injury [26].

B. QUANTIFICATION OF TRAINING LOAD
Training load was calculated as a weekly composite score by summing the total work completed in sports training, metabolic conditioning, resistance training, and gameplay.
Following each session, a session rating of perceived exertion (sRPE) [27] was calculated by multiplying the time taken to practice by the athletes' subjective rating of perceived effort (1-10 scale) [28,29]. Total week load and weekly standard deviation (SD) were calculated by summing sRPE for the week and calculating the SD. Weekly resistance training load was calculated by summing the total weight lifted during the week. Monotony was calculated by taking the mean daily load and normalizing it by the weekly SD of training load. Finally, training strain was calculated by taking the total weekly load and multiplying it by the monotony score. Overall, 6 data features were collected weekly [30]. populations [35].

E. GAME PERFORMANCE
It is important to quantify performance of a player in a game, therefore it was calculated through statistics developed by John Hollinger [36]. The game score roughly measures player's productivity in a game [37]. As shown in Eq. 1, points scored (+), steals (+), and turnovers (-) contribute directly, whereas other game statistics are weighted.
T urnover (1) Game score holistically looks at all parameters of a box score like offensive play, defensive rebounding, steals, blocks and turnovers to measure the player's impact in that game. The parameters benefiting the team contributes positively to it while committing fouls or turnovers have negative impact. A game score of 10 is average and 40 is considered exceptional. The game score is simple to compute from easily available parameters, can be explained well and it statistically quantifies performance of a player. Therefore, we use it in the present work to measure player's performance.

F. INJURY DATA
Injury data were extracted from medical injury reports generated as injuries occurred. After the season, all original injury reports were shared with the research team, subject numbers replaced player names to ensure confidentiality, and data of interest were extracted for data analysis. This data included the date of injuries, occurrence of injuries (contact or noncontact), types of injuries (e.g., leg, shoulders, strain, tear, etc.), and how long the athlete was not available for practice or games.

III. DATA IMPUTATION
Sleep data and questionnaire responses were not available completely as reasoned below: 1) Incomplete WHOOP strap data due to improper attachment to the wrist (hence no data collection), neglecting to charge, loss of charger, and not keeping the WHOOP app on the phone running in the background to allow data to upload to the research server. 2) Incomplete surveys as the athletes did not always complete the surveys, especially during COVID lockdown and in-season (when games were played). Even though the game performance and injury data were only available when applicable, missing data from sleep and surveys constituted a source of bias that could negatively impact the ML algorithms. The choice of an appropriate imputation technique could subsequently ease feature selection and classification analyses. In this work, we utilized data imputation techniques to fill the missing data. Single imputation techniques such as global mean, cluster mean, conditional local mean, and K-Nearest Neighbor (KNN) fail to reflect the uncertainty about imputed values [38]. Therefore, a multiple imputation approach called Multivariate Imputation by Chained Equation (MICE) was utilized [39]. This method imputes each feature in the dataset sequentially, allowing the prior imputed data values to be used in the model to predict the following features. MICE has effectiveness for datasets with a high missingness rate (>50%). It has also been observed that datasets related to breast cancer, heart disease, 26 capital letters in English alphabets and spam emails when imputed with MICE, resulted in best feature selection when compared to a single imputation technique or no imputation [40]. Each of the datasets had large number of missing values and they include categorical and numerical features and contain data from different fields. Imputation power can be improved by increasing the number of imputations for a small dataset [41]. A key finding suggests that data missingness should not be used as a measure to use multiple imputations. Correctly specified MICE can reduce bias and improve analyses for any proportion of missingness in MAR data [42]. It means using MICE is beneficial for all proportions of missing data provided the imputation model is correctly specified and it includes all variables related to missingness. While a MICE imputer is proven to be effective, we cannot conclude that MICE is effective in all situations. Graham et al. [41] suggested that the number of imputations depends on the dataset size, the models used to impute, the missingness rate and computing resources.
Applying MICE reduces the bias during the subsequent feature selection technique [40]. The effectiveness of the MICE imputer is demonstrated in Table 1 which shows statistical measures of the observed and imputed values. The statistical measure of percentage bias (PB, ideal is <5%) of Resting Heart Rate (RHR) is 1.2%, Wake Periods is 0.6% and for Physical game Performance Capability (PPC) is 0.0004%, indicating effective imputation. Table 2 shows the imputation percentages after MICE implementation. A total of 2800 data entries (rows) is possible from 16 athletes, 7 days a week of 25 weeks. It is important to mention that injury and game performance data were not imputed in order to avoid introducing bias. There were a total of 165 games and 11 injury incidents during the 25-week period. Moreover, training load data was not filled in either as non-training days were entered as 0, except monotony, which was a weekly measure and entered as the maximum value considering a worst-case scenario. In the case of monotony, defined as the differences from daily training activities and calculated as a week, no training (which happened during the pandemic-related shutdowns) weeks meant the worstcase scenario. In these weeks, there were no training, so a 0  variety, resulting a monotony of 1, the maximum value. This valuation makes the data imputation consistent and does not introduce bias.

IV. FEATURE IMPORTANCE, INTERACTION EFFECT ANALYSIS A. FEATURE IMPORTANCE
Feature importance analysis was conducted to address our RQ1. With this analysis, we not only intended to reduce the dimensionality of the dataset but also aimed at understanding how significantly the features contributed to the game performance and injury respectively. The feature importance techniques explain the contribution of an individual feature of the dataset to a decision made by an ML algorithm to ultimately gain insights from a model [43,44]. Numeric datasets such as ours primarily utilize regression models for analysis where feature saliency or importance are used [45]. Not only does feature importance help explain the significance of each feature to the model, but it also reduces the curse of dimensionality [46] by suggesting features that the model may drop. The curse of dimensionality indicates that in machine learning, number of training pairs needed to estimate a function grows exponentially with respect to dimensionality of the input features [46]. It is a problem when dataset has too many input features and very few observations. It can lead to over fitting of the model as features are more than observations. The problem can be solved by reducing dimensionality of the input features by retaining only the most important ones through mechanism like feature importance. Game performance and Injury were treated as two separate target variables for feature importance analysis. With respect to each target variable, we apply correlation-driven, Extreme Gradient Boosting (XGBoost), and Random Forest (RF) [47] techniques one by one to the data and obtain ranked importance of features. The suitability of these techniques is discussed as follows: (1) Correlation (CORR)-driven feature importance -High correlation depicts linear dependency between attributes, helping with the feature importance and removing the redundancy and hence the bias in the dataset [48]. (2) XGBoost -Boosted decision trees were constructed within the model. The importance of features was given a score calculated by how much an attribute contributed to making critical decisions with the decision trees [49]. For one decision tree, importance was calculated based on how an attribute split improves the Gini Index, weighted by the node's records. In the end, the feature importance of all the VOLUME 4, 2016 trees was averaged [49]. (3) RF -This method is a bagging technique that uses multiple decision trees as a base model to arrive at a consensus using trees rather than relying on a single tree for decision. RF provides interpretability as it is easy to compute the contribution of features to the decision. RF uses a change in accuracy or Gini index to find the feature importance when excluded from the decision. While RF is built in parallel with aggregated results, XGBoost is built sequentially. Fig. 3 is the correlation heat map of all sleep and recovery, training load, perception, game performance, and injuryrelated features. First 8 labels on the map refer to the survey results, which were explained in Section II.D. Following 6 parameters are associated with the training (Section II.B), while the next section are recovery parameters from the WHOOP data. Last two parameters are the output parameters; game performance and injury. Areas demarcated by black rectangles from left to right show the dataset groups of survey, training, and sleep, respectively. Dark red or blue colors represent high correlation (positive and negative, respectively), and it can be seen that several correlations exist between data groups, such as training load and the survey data on stress and recovery.
For each target variable -game performance and injury, top features from the sorted set were selected by taking a weighted rank-sum (hard voting) CORR, XGB and RF for each attribute. We further dropped features with high correlation based on the correlation matrix to lower the bias. For instance, sleep disturbances are dropped while retaining wake periods as these two features are highly correlated (Fig. 3). Game statistics to calculate the game score were not included in the analysis to avoid introducing bias. In order to reduce the feature set, we tried predicting the game performance and injury using a RF regressor and classifier respectively with different sizes of feature sets. By using forward addition of features and backward elimination of features on the basis of their significance in the prediction, we tried fitting an RF regressor and classifier respectively over different sizes of feature sets and kept track of the Mean Squared Error (MSE) and R -squared (R 2 ) for game performance and F2 score for injuries. We found the least MSE (0.17) and highest R 2 (0.89) values with 25 features for game prediction and the highest F2 score was achieved with 29 features for injuries, hence these features were selected for prediction analysis. Table 3 shows the top 10 contributors to game performance and injury. Strain and Rapid Eye Movement (REM) sleep are the most significant contributors to game performance and injury, respectively. Both strain and REM sleep were assessed using daily collected values.

B. INTERACTION EFFECT
When multiple features have an impact or significantly contribute to the prediction of a dependent feature, their joint effect on the prediction is sometimes significantly greater than the sum of the parts. This is called the interaction effect [50]. Interaction modeling helps understand how two features work together to impact a dependent feature. It also represents the relationship between dependent and independent features better [50]. In a complex study like in the proposed paper, even the independent variables might interact with each other. It is important to find how third variable influences the relationship between an independent and dependent variable. If the real world behaves this way, it is critical to incorporate interactions in the modeling. For example, the relationship between game performance and RHR probably depends on the respiratory rate in the present work and this knowledge helps during dimensionality reduction. Interaction modeling was performed over the dataset for game performance prediction and injury prediction individually. After removing multicollinearity from the dataset, pair wise interaction effect was checked for all the features. Forward adding and backward elimination of features was used keeping a check upon the significance level of the polynomial interaction terms on the dependent feature. Like done previously, we tried fitting an RF regressor and classifier respectively over datasets containing different combinations of significant features and interaction terms and monitored the MSE and R2 for game performance and F2 score for injuries. We found the least MSE (0.09) and highest R2 (0.95) value with 12 features for game prediction and the highest F2 score was achieved with 14 features for injuries. We thereby concluded that the best feature set for game performance and injuries consisted of 12 and 14 features, respectively. Table  4 shows features and interaction terms in the optimal dataset for game performance and injury.
There were 38 features considered in all for this analysis: 22 features from sleep monitoring using the WHOOP strap, 8 features from survey data, 6 features from quantification of training load, 1 feature from game performance and 1 feature from injury data. After the feature importance and interaction analysis, we selected 12 (5 features from sleep monitoring, 1 from survey data, 2 features from quantification of training load, 3 interaction features from sleep monitoring and 1 interaction feature -a combination of sleep monitoring and training load) and 14 (4 features from sleep monitoring, 1 from survey data, 4 features from quantification of training load, 1 feature from game performance, 1 interaction feature from sleep monitoring, 2 interaction features -a combination of sleep monitoring and game performance and 1 interaction feature -a combination of sleep monitoring and training load) features for game performance and injury risk prediction respectively. VOLUME 4, 2016

C. FEATURE INTERPRETABILITY
It is vital to interpret the impact of individual features on target variables for game performance and injury. We used Partial Dependence Plots (PDP) to show the marginal effect on the target variables and analyze each target's two most dominant features based on hard voting. As shown in Fig. 4, training strain and daily training load average are positively correlated with game performance. The PDP of strain depicts that if an athlete trains more, she is likely to perform better. Daily average too is a training feature and its high value (>200) improves the game performance. Therefore, the weekly training strain and daily training load average are associated with improvements in performance. Similarly, we analyzed the impact of the two most dominant features for injuries (based on hard voting): REM sleep percentage (REM%) and Respiratory Rate (RR). REM sleep is a restorative sleep stage and one of the human body's four sleep cycle. In this stage human brain converts short term memories to long term ones. It is significant for a player as REM sleep helps them to retain the technical skills learnt on that day. With 22% -28% sleep time in the REM stage i.e., 90 Minutes in approximately 7 to 8 hours of REM sleep can help athlete to gain benefit from the day's practice. [32]. The PDP shown in Fig. 5 indicates that low (<20%) and high (>30%) REM% increases the chances of injury. The RR is a reasonably stable metric and 12 -18 respirations per minute (RPM) in a state of rest is considered normal, while any deviation indicates an unusual pattern [32]. The PDP shows that outside the typical range, sleep disturbances increase, which increases the chances of injury.

V. PREDICTION OF GAME PERFORMANCE & INJURIES
RQ2 aims at predicting the players' game performance via the game score and calculating a risk assessment analysis through injury prediction. These two features in the optimal data sets (game performance -12 and injury 14) were set as the target variables, and ML models were applied to them individually. Data balancing techniques were used due to the small number of game performances and injury observations. The proportion of the missing data is directly related to quality of the inference and statistical analysis. However, there is no established norm on an acceptable missingness rate of the data [51]. In this work the MICE imputer has been able to neutralize the impact high data missingness during training of the ML model. Due to the limited number of games and injuries compared to the entire dataset, the data is scarce and prediction can have a significant bias, as is the case in health sciences [52]. A problem with imbalanced classification is that there are too few examples of the minority class for a model to effectively learn the decision boundary. Therefore, there is a need to increase their numbers. The best approach to overcome this problem is to synthesize new examples from the minority class using balancing techniques. These balancing techniques perform undersampling and oversampling of data which improves the performance of the decision tree based algorithms applied for prediction [53]. Preprocessing using Synthetic Minority Over Sampling with Gaussian Noise (SMOGN) was performed for data balancing on the game performance data [54]. SMOGN was not applied to injury as injury labels are binary. Pre-processed game performance data and original injury data were then evaluated on Synthetic Minority Oversampling Technique (SMOTE) [55]. Since SMOTE may introduce noise as it increases probability of class overlap, game performance data was evaluated with a combination of SMOTE and Edited Nearest Neighbor (ENN) to remove any newly generated sample that differs from two of their three nearest neighbors [56]. Borderline SMOTE creates synthetic samples along the boundary of the majority and the minority classes in the dataset [57], which is more suitable for injury prediction models.
The k-means clustering algorithm was applied over the game score to create clusters with bad, average, and good game performance labels. Similarly, the injury records were clustered to have two labels, injury, and no-injury. We used the XGBoost classifier for both predictions due to its gradient boosting framework and generalization capabilities [58].
Due to the low number of available target records in game performance and injury, accuracy percentage cannot solely evaluate the model performance. Therefore, we used F1 and F2 scores to evaluate the classifier's effectiveness (Fig. 6). The choice of F1 score for predicting game score labels stems from the fact that both False Positive (FP) and False Negative (FN) are equally costly for game performance and F1 balances them during evaluation. On the other hand, F2 focuses more on minimizing FN. Also, it is crucial to predict as many injuries as possible before the game, which is captured well by F2. F measure, i.e., F(β), is given as follows: where, β = 1 or 2 represents F1 or F2, P stands for precision, and R stands for recall. The effectiveness of a model increases as an F1 or F2 score approaches 1. Stratified 10-fold cross validation was used during model training. It ensured that the fold had approximately equal number of observations for each class. This generalized the model well as estimates would have a low bias. The final dataset, which was balanced using the SMOGN followed by SMOTE with ENN, contained 72 records for game performance prediction after the oversampling and undersampling. By using the traditional 70 : 30 training : prediction ratio, 51 records were used for training while 21 (totaling to 72 with 70 : 30 ratio) were used for testing. While there were 72 records for the game performance, there were only 11 injury recordings, which was only balanced with Borderline SMOTE without oversampling. The entire dataset of 352 records was divided into 289 and 63 where 9 and 2 injuries were present in the data, respectively. The XGB classifier predictions resulted in an F1 score of 0.94 for the game performance and F2 score of 0.83 for injury. Values on the diagonal of each matrix in Fig. 6 represent the correct prediction while the cells labeled with red (1 for game performance and 2 for injury) are false negatives (errors in prediction).
While stratified 10-fold cross validation partitions the dataset into training and testing sets randomly, to take the subject-to-subject variations and the tendency of autocorrelations in the time series data belonging to a single subject into account, a leave-one-subject-out analysis was also performed [59]. In this approach, for each athlete, the data belonging to the rest of the athletes was used for training and that belonging to her was used for testing. Repeating the exercise for all athletes, an average F1 score of 0.94 was achieved using the XGB classifier for prediction of game performance and average F2 score of 0.80 for injury.

A. GAME PERFORMANCE
A multifaceted approach was taken to the monitoring of game performance to quantify various pressures placed on the athletes. Several features were highlighted for importance (Table  3) including training strain, daily training load average, and HRV. In our study the athletes averaged lower (6.8 hours per day) than the recommended sleep hours which likely increased the sleep need and sleep debt during the data collection period. Sleep, both in amount and quality, is important for recovery and subsequent adaptations from training. This fact is also corroborated as sleep need and sleep debt hours appear in the top 10 features for game performance. Team average Resting Heart Rates (RHR) (58.70 bpm) and HRV fluctuations  during this period may be reflective of the overall stress placed on the athletes from training, competition, and academic stress.
The second game performance monitoring approach was the measurement of subjective stress as reported by the athletes. While overall higher scores for recovery and lower scores for stress are desired, our analysis revealed that emotional balance (recovery metric) and lack of activation (stress metric) reflected game performance as they both are highly ranked at 4 th and 6 th place in feature importance, respectively. As the student athletes in this study were training and competing during times that overlapped with academic pressures, emotional stress may have reflected overall pressure placed on them by sports-related and academic domains. The observations are valid as EB, and overall stress features also appear in the top 10 features impacting game performance.
The final approach was the monitoring of the entire training and competition workload experienced by the athletes. The primary stresses placed on the athletes are both physiological and biomechanical. From the feature selection, the most predictive features were training strain and daily average training load. These measures reflect total work completed through the week and the variability of that work. Athletes should attempt to avoid large spikes in training volume and periods of no or little training. During this study, two periods of reduced or absent training were present due to COVID-19-related protocols. The lack of training for 7 to 14 days (potential reduction in physical capabilities) VOLUME 4, 2016 followed by training or competition resulted in decrease in LA, increased monotony, and decreased weekly SD, which all showed up in the feature importance analysis. Based on these results, it can be suggested that through consistent monitoring of workload, return to play protocols should be developed to reintegrate athletes back into training and competition slowly.

B. INJURY
Consistent, varied, and structured training is essential in the physical development of athletes. In this study, a total of 11 injuries occurred during or immediately following four separate shutdown periods, ranging from 10-16 days in length, in which no games were played. These frequent periods of unexpected shutdowns resulted in the athlete's acute (1week) training load far exceeding that which they had done over a several week (chronic) period, similar to [60]. The majority of injuries in the present study were chronic/overuse injuries such as tendonitis, which aligned with the findings of a study on youth basketball players [61].
In addition to erratic and unstructured training, another factor that has been shown to increase injury prevalence is sleep dysfunction, which was also correlated to the respiratory rate through recovery. Consistent with previous research, we found that our athletes were sleeping too little (Mean: 6.8 hours, Standard deviation: 1.5 hours). As observed, sleep indeed impacts injury, and it was validated as REM, awake hours, and sleep need appearing in the top 10 features.

C. PREDICTION
Accuracies of the XGBoost classifier for game performance and injury prediction were found at 95.3% and 96.8%, respectively. However, these could be misleading due to the very low number of records for game score (165) and injuries (11) compared to the size of the sleep and training load data. Therefore, F measures were used to evaluate the performance of the prediction models. With F1 of 0.94 and F2 of 0.83, we believe our models can correctly predict the game score and injury with reasonable confidence. The proportion of imputed data may impact the prediction and the bias introduced due to imputation. In the present case, we assumed that the data is Missing at Random (MAR), which ensured the production of unbiased data after imputation and was evident from the low PB values. However, due to the large proportions of missing data (Table 2), the predictions must be considered as insights or hypothesis-generating results rather than final decisions for coaches.
The 3 x 3 confusion matrix on the test set for game performance has only one FN in a three cluster classification. The Hollinger game score [36] used to predict game performance is only a rough measure of player's productivity. Thus the predicted game performance provides only partial information to the coaches, and other parameters such as game contribution and practice efforts can be considered to have a more complete assessment of the true impact of a player during the season.
As seen in the 2 x 2 confusion matrix, the binary injury model predicted four injuries (2 True Negative (TN) and 2 FN) and 59 no-injury (all True Positive (TP), 0 FP). In the injury case, FN is better than FP (injured player predicted as not injured). The coaches prefer over prediction because unknowingly making an injured player play (FP) can prove far more costly to the team compared to a non-injured player predicted as injured (FN).

VII. CONCLUSION
Due to the inherently chaotic nature of sports, coaches and scientists are constantly searching for insights related to an athlete's game performance and injury potential. This study attempted to quantify both; the game performance aspects associated with basketball while simultaneously understanding potential injury risks. Through a multi-disciplinary approach, the authors tried to quantify inputs (training, stressors, and perceptions) and outputs (game performance and injury) in a collegiate basketball setting. The game performance analysis revealed (RQ1 a): 1) questionnaire data, training load, HRV, sleep need, and debt are associated with game performance; 2) Collectively, a combination of questionnaire data, training load metrics, and whoop band metrics can provide insights into game performance.
Similarly, the analysis revealed that significant outcomes associated with injury are (RQ1 b): 1) Restorative sleep, respiratory rate, training monotony, training strain, and total minutes played; 2) A compressed competition schedule and training irregularities likely increased injury risk; 3) The athletes measured in this study slept less than the recommended amount for athletes.
Given the high injury rates previously reported in basketball, specifically collegiate basketball, it is imperative to further explore this relationship within the context of collegiate basketball to quantify training loads and allow clinicians to quantify the risk of subsequent injury. Collectively, this data highlights the need for gradual reintroduction to activities for the athletes' following layoffs and a consistent return to play protocol.
Our prediction models were reliable with high F1/F2 scores that are very encouraging to be used during a basketball season by working with the coaches to close the loop on insightful decisions (RQ2). Regardless of whether or not we will face another pandemic in the near future, our results will pave the way for understanding the impact of lengthy breaks in a season or unforeseen player absences. As collegiate athletes struggle with lack of sleep and meeting academic, athletic, and social responsibilities, we believe our data collection methodology with a validated machine learning framework will help improve the performance of athletes while reducing the risk of injuries.

ACKNOWLEDGMENT
The research team wants to thank the players and coaching staff for their cooperation. We also are grateful for the support