Correlation-Aware Sport Training Evaluation for Players With Trust Based on Mahalanobis Distance

With the widely-adopted idea of health and longevity, sports have been becoming one of the most popular entertainment ways of the public. For the majority of sports, players need to know about their concrete physical conditions in a real time manner so as to pursue a good sport score or ranking in a competition or a race. Generally, we can achieve the above goal through analyzing and evaluating the daily training scores of each player. However, there are often multiple physical trainings for players and various correlations are existent among them, which significantly decrease the fairness and trust of player training score evaluation and ranking since traditional multi-dimensional data integration solutions are often based on a strong hypothesis, i.e., the involved multiple dimensions are independent with each other. In view of this shortcoming, we introduce the Mahalanobis Distance into the multi-dimensional player training score evaluation and further propose a correlation-aware player training score evaluation method with trust (abbreviated as CPEMD) based on Mahalanobis Distance. As Mahalanobis Distance can eliminate the hidden linear correlations among the involved multiple dimensions, we can guarantee the fairness and trust of Mahalanobis Distance-based player training score evaluation and ranking results. At last, we use a case study to show the feasibility of CPEMD in this paper.


I. INTRODUCTION
With the continuous progress of social reformation and economic developments, people's living conditions have gained considerable improvements [1]- [3]. In this situation, more people are paying their attentions to the improvement of life health quality. With the widely-adopted idea of health and longevity, sports have been becoming one of the most popular entertainment ways of the public so as to ensure the highquality physical conditions in daily life [5]- [7]. As a result, more and more people are engaging in various sport activities and have gained another identity or role, i.e., the so-called players.
For the majority of sports, players need to know about their concrete physical conditions in a real time manner so as to The associate editor coordinating the review of this manuscript and approving it for publication was Chi-Hua Chen . pursue a good sport score or ranking in a competition or a race or an exercise. Generally, we can achieve the above goal through analyzing and evaluating the daily training scores of each player. However, there are often multiple physical trainings for players, such as football, running, swimming, volleyball, tennis ball and so on. To get a comprehensive sport score of a player, we often need to calculate the sum of the multiple scores corresponding to the multiple sport items. Thus, through the derived comprehensive score of each player, we can evaluate the practical physical conditions of players and rank them accordingly if necessary.
However, the above comprehensive score calculation solutions are often confronted with a big challenge, i.e., they often assume that the involved multiple sport training scores are independent with each other. While actually, for different sport items, their training scores are often correlated with each other [8]- [11]. For example, a player who swims well often has high lung vital capacity; and therefore, he or she is prone to get a higher training score when taking the running sport. Likewise, a taller man often player better than a shorter man in terms of volleyball and basketball due to the body height advantages of the former man. Therefore, in this situation, it is inappropriate to evaluate the comprehensive score of a player by calculating the sum of his or her engaged multiple sport items directly, since the hidden correlations among different sport training scores may decrease the fairness and trust.
Considering this challenging issue, we introduce Mahalanobis Distance into the multi-dimensional player training score evaluation problem and further propose a correlation-aware player training score evaluation method with trust (abbreviated as CPE MD ) based on Mahalanobis Distance. As Mahalanobis Distance can eliminate the hidden linear correlations among the multiple dimensions involved in the player training score evaluation process, we can guarantee the fairness and trust of Mahalanobis Distance-based player training score evaluation and ranking results output by CPE MD method.
In summary, the contribution of this research paper is three-aspect.
(1) We study the hidden correlations among the multiple training scores of players and regard it as a key factor that influences the objectiveness, fairness and trust of the player sport physique evaluation.
(2) We introduce the Mahalanobis Distance into the multidimensional player training score evaluation problem and further propose a correlation-aware player training score evaluation method with trust, i.e., CPE MD .
(3) A case study and a set of experiments are provided to demonstrate the concrete processes of CPE MD method, through which we show the feasibility of our proposal in eliminating the linear correlation relationships among the multiple sport training scores of players.
We organize the rest structure of our paper as follows. Related work is presented in Section 2 to introduce the current research status of the field. In Section 3, a motivating example is shown to better describe the research significance and importance of our paper. Concrete process or procedure of our proposed CPE MD method is clarified in Section 4 with more details. A case study constructed from real-world applications is presented in Section 5 and experiment evaluation is made in Section 6. At last, in Section 7, we simply conclude the research paper and point out the possible investigation directions or topics in the upcoming research work.

II. RELATED WORK
Sport training score evaluation is an essential aspect to objectively observe the health conditions or sport physique of players. A big volume of researchers has paid their attentions on the topics and resolutions associated with player evaluation.
In [12], the authors analyze the multiple factors or aspects that could be recruited to objectively evaluate a player in football, no matter grassroot players or high-level players.
The mentioned factors do not include the traditional factors at club levels, such as player transfer, squad formation and strategic planning. Therefore, it is more probable that excellent grassroot players could be evaluated and mined from massive candidates. This evaluation strategy is beneficial to both candidate grassroot players and team manager. In [13], the authors introduce psychophysiology into the player quality or level evaluation. In concrete, the authors assess the sport level of each player through collecting the answers of various physiological questions designed by the authors beforehand. Thus, through the obtained ''question-answer'' pairs, we can approximately evaluate the category of a player especially when he or she is confronted with different challenges or difficulties. Loyalty is an important aspect for evaluating a player in various sport items. Motivated by this hypothesis, the authors in [14] investigate the loyalty status of golf players in current Vietnam. In their provided study and statistical analysis, a player's perceived value and satisfaction have played key roles in keeping the loyalty of the player. As a contrast, the service quality of golf course is not as important as other influencing factors, which is an interesting phenomenon that is discovered in the novel research work.
In [15], different guidance or strategies are provisioned by the authors for different roles involved in a basketball team or club. On one hand, they offer quantified models or strategies for guaranteeing the healthy running or operations of clubs or teams. On the other hand, they design a set of effective metrics or evaluation tools tailored to the characteristics of basketball item, through which the player conditions or types can be depicted and profiled conveniently. Likewise, in [16], the authors develop an algorithm to evaluate and pick out the most appropriate players for a match or a competition. In concrete, they propose a sport level evaluation framework for players, through which the real-time competitive conditions of a player can be monitored and evaluated in a quantified way. The provided framework includes a variety of evaluation algorithms that are tailored to the sport characteristics or features of cricket item. In addition, from the perspective of clubs or teams, the high trust between players, clubs and spectators are crucial to maintain the healthy collaborations among them. Inspired by this observation, the authors in [17] study the trust relationship among different stakeholders in a sport team or club. To achieve this goal, blockchain technology is recruited to guarantee the dependability of released player transfer information by the club, because fake player transfer news or information probably decreases the trust or satisfactions of spectators towards involved clubs. In this way, sport clubs can advertise their player transfer fee information that is sensitive enough to their spectators without revealing partial sensitive club privacy.
Recently, various machine learning techniques are brought forth and have been widely applied in various business domains. A typical application domain is sport business or industry. In [18], machine learning techniques are employed to predict a considerable number of issues associated with sport clubs, such as player sport conditions, player injury conditions, club economy running conditions, club competition status, and so on. In addition, the paper also studies the long-term prediction of players or clubs. For example, whether the club competition status this year is better than last year? what is the concrete ranking of a club in the next season? who will win the next season in a football league? Such predictions have provided a good basis for the healthy running of a football club because it can enact appropriate running strategy according to the predicted results. Similar work is done in [19] where the authors also use machine learning techniques to predict the various scores or performances associated with football clubs. A featured point in [19] is that it offers a set of evaluation metrics that are especially suitable for the football players and football clubs as well as their involved football matches. In addition, the authors also investigate the features or patterns that discriminate the top players from the other players who are not qualified enough. Traditional player evaluation mechanisms are often based on statistical techniques in mathematics, which is prone to produce delayed statistical results and overlook other valuable information hidden in non-statistical data records. In view of this shortcoming, a novel player evaluation method is proposed in [20] which is mainly based on the deep learning technology. In concrete, the authors use the statistical data in matches to quantify the player performances while use the news articles after matches to evaluate the player performances qualitatively. Thus, through combining these quantified and qualitative evaluation results, we can objectively evaluate the overall performances of a player comprehensively. Similar work is done in [21] where the Convolutional Neural Network (CNN) technique is used to achieve the same player evaluation goal as in [20].
With the above analyses, we can find that although many researchers have paid their attentions on player performance or score evaluation, they seldom consider the possible inner correlations among the involved multiple dimensions in evaluation. As a consequence, the finally derived player evaluation results are probably inaccurate and not trusted enough, which do harm to the health conditions or performance evaluation of players. Considering this drawback, we introduce the Mahalanobis Distance into the multi-dimensional player training score evaluation problem and further propose a correlation-aware player training score evaluation method with trust, i.e., CPE MD based on Mahalanobis Distance. As Mahalanobis Distance can eliminate the hidden linear correlations among the multiple dimensions involved in the player training score evaluation process, we can guarantee the fairness and trust of Mahalanobis Distance-based player training score evaluation and ranking results outputted by CPE MD method. Details of CPE MD method will be described briefly in the following sections.

III. MOTIVATION
A motivating example is presented in Fig.1 to further describe the research significance of our paper. As the example shows, a player Tom engaged in four sports: running, biking, weight lifting and boating. As the sports of running and biking both require players to own high leg strength and endurance, Tom's score on running and biking is often correlated with each other. Likewise, the sports of weight lifting and boating both require players to own high arm strength and endurance, Tom's scores on weight lifting and boating are also correlated with each other. In summary, different correlations are existed in the four sports engaged by Tom. In this situation, simple addition of the four sport scores will lead to an inaccurate and untrusted evaluation result for Tom, which challenges the objective and fair player evaluation.
In view of this challenge, we introduce Mahalanobis Distance into the multi-dimensional player training score evaluation problem and further propose a correlation-aware player training score evaluation method with trust, i.e., CPE MD based on Mahalanobis Distance. As Mahalanobis Distance can eliminate the hidden linear correlations among the multiple dimensions involved in the player training score evaluation process, we can guarantee the fairness and trust of Mahalanobis Distance-based player training score evaluation results outputted by CPE MD method. Concrete steps of CPE MD will be clarified in the following sections.

IV. SOLUTION: CPE MD
In this section, the novel player training score evaluation method CPE MD will be introduced in detail, to minimize or reduce the possibly existed linear correlations among multiple training scores of players and thereby enhance the trust of the final player evaluation results. The proposed idea is mainly based on the Mahalanobis Distance which can introduce the covariance information among different training scores into the evaluation process while covariance depicts the correlations among different dimensions well. In general, CPE MD consists of the five steps in Fig.2. Step 1: Calculate average value vector µ of sport items. We assume that there are n players, i.e., p 1 , . . . , p n , and m sport items, i.e., sp 1 , . . . , sp m . According to the n players' training scores over m sport items, we can get a player-sport training score matrix M which is an n * m matrix as presented in (1). In matrix M , s i,j denotes the training score of player p i over sport item sp j .
Next, we calculate the average value vector µ corresponding to the m sport items. Concretely, for each column sp j (1 ≤ j ≤ m), we calculate its average value av j based on the n training scores of sp j , i.e., s 1,j , . . . , s n,j . In other words, equation (2) holds. Then with av j (1 ≤ j ≤ m), we can get the average value vector µ as in (3).
Step 2: Calculate covariance matrix Cov of sport items. In this step, we calculate the covariance matrix Cov of the m sport items: sp 1 , . . . , sp m . In concrete, according to the player-sport training score matrix M and the average value vector µ, we can obtain covariance matrix Cov as follows.
First, for matrix M , we use the average value vector µ (av 1 , av 2 , . . . , av m ) to realize the data regularization. Concretely, for each entry s i,j of matrix M , we execute the operation in equation (4) and then get a new entry value s # i,j . Afterwards, we get a new matrix M # as in (5).
Then we can calculate the covariance matrix Cov of the m sport items as in (6) where M #T is the transposed matrix of M # . After that, we can get a m * m matrix Cov. Since M # is an n * m matrix, Cov is an m * m matrix. According to Mahalanobis Distance theory, matrix Cov reflects the linear correlation information among the m sport training scores. For example, in Fig.1, as the sports of running and biking both require players to own high leg strength and endurance, Tom's scores on the running and biking are often correlated with each other. Likewise, the sports of weight lifting and boating both require players to own high arm strength and endurance, Tom's scores on weight lifting and boating are also correlated with each other. Thus, we can consider to reduce the linear correlation information among different dimensions by using the covariance matrix Cov. Here, if no correlations are present among different dimensions, the entries in covariance matrix Cov would be equal to zero.

Cov
Step 3: Calculate Mahalanobis Distance of each player. According to matrix M in (1), we can get a player training score vectorp i for each player p i (1 ≤ i ≤ n). For example, p 1 = (s 1,1 , s 1,2 , . . . , s 1,m ). Then through the covariance matrix Cov, we can calculate the Mahalanobis Distance of each player p i (1 ≤ i ≤ n) based on the equation in (7). Here, D M (p i ) denotes the Mahalanobis Distance of player p i , p T i is the transposed vector ofp i , covariance matrix Cov is available from (6) and Cov −1 represents the inverse matrix of matrix Cov.
Thus, through the calculation operation in equation (7), we can obtain the Mahalanobis Distance D M (p i ) of player p i (1 ≤ i ≤ n). Actually, D M (p i ) measure the Mahalanobis Distance between each player's training score vectorp i and the original point 0. As covariance matrix Cov is considered when calculating the Mahalanobis Distance D M (p i ), the linear correlations among the training scores of m sport items (i.e., sp 1 , . . . , sp m ) of p i could be alleviated or minimized as much as possible. As a consequence, the obtained Mahalanobis Distance D M (p i ) for player p i could be regarded as an objective measurement or evaluation metric of p i 's comprehensive training score even if multiple training scores are available for p i . VOLUME 10, 2022 Step 4: Player evaluation and ranking. In Step 3, we have obtained the Mahalanobis Distance D M (p i ) for player p i (1 ≤ i ≤ n). According to the Mahalanobis Distance theory, the larger D M (p i ) is, the farther thatp 1 is away from the original point 0. As original point 0 is an ideal reference point for evaluating all the players, we hope that the Mahalanobis Distance D M (p i ) ''the larger, the better''. With the above analyses, we can evaluate and rank the training scores of all n players p 1 , . . . , p n based on the Mahalanobis Distance D M (p i ) (''the larger, the better'').

Algorithm 1 CPE MD
Inputs: (1) p 1 , . . . , p n : players Formally, the pseudo code of the proposed CPE MD method is described in Algorithm 1.

V. CASE STUDY
To further introduce the main idea of our proposed CPE MD method, a case study is presented in this section to demonstrate the concrete steps introduced in Section IV. In the case study, there are four players (i.e., p 1 , p 2 , p 3 , p 4 ) and two sport items (i.e., sp 1 , sp 2 ) as well as the resulted 4 * 2 matrix M as follows.
Step 1: Calculate average value vector µ of sport items. With the player-sport matrix M , for the two sport items (i.e., sp 1 , sp 2 ), we calculate the average values of sp 1 and sp 2 , respectively. The results are: av 1 = 4.5, av 2 = 4. Therefore, according to the equation (3), the average value vector µ = (4.5, 4).
Step 2: Calculate covariance matrix Cov of sport items.
According to the player-sport matrix M and the average value vector µ (derived in Step 1, i.e., µ = (4.5, 4)), we can calculate the new matrix M # based on equations in (4)-(5), as follows.
Next, according to derived M # and equation (6), we can calculate the covariance matrix Cov as follows. As there are only two sport items sp 1 and sp 2 , the covariance matrix Cov is a 4 * 2 matrix. Furthermore, we can calculate the inverse matrix of Cov, i.e., Cov −1 as below. Step 3: Calculate Mahalanobis Distance of each player. In this step, according to the covariance matrix Cov as well as its inverse matrix Cov −1 (derived in Step 2) and the equation in (7), we can calculate the Mahalanobis Distance of each of the four players, i.e., p 1 , p 2 , p 3 and p 4 . In concrete, the four players' Mahalanobis Distances from the original point 0 are listed as below. Step 4: Player evaluation and ranking. According to the Mahalanobis Distances the four players derived in Step 3, i.e., D M (p 1 ), D M (p 2 ), D M (p 3 ) and D M (p 4 ), we can rank the four players (i.e., p 1 , p 2 , p 3 , p 4 ) in descending order as follows. In other words, the training score of player p 4 is the best while the training score of player p 1 is the worst.

VI. EVALUATIONS
To further show the innovation of the proposed CPE MD method in this paper, we test a set of experiments as follows. In concrete, the experiment dataset used for evaluations is WS-DREAM. Two methods are used for comparison purpose are: Euc-D (Euclidean Distance based evaluation) and Cos-D (Cosine Distance based evaluation). Experiments are all run in a desk computer with 8 GB memory and 2.4 GHz processor and the software configurations include Win 7 OS and Python 3.6. Experiments are executed 100 times and we report their final average values for evaluation. Concrete comparison results are shown as follows.
Test 1: Covariance comparison of three methods.
The advantage of CPE MD method is that it considers dimension correlation. Therefore, the covariance of CPE MD should be small. To validate this hypothesis, we compare the covariances of three methods with respect to the size of used records for each user in WS-DREAM. The results are reported in Fig.3 where Fig.3(a) records the covariances of response time in WS-DREAM and Fig.3(b) records the covariances of throughput in WS-DREAM. It can be seen from Fig.3(a) that the covariances of three methods all increase with the growth of the number of records, which is due to the fact that: more records used for evaluation will bring larger accumulated covariance value. However, our CPE MD outperforms Ecu-D and Cos-D methods because CPE MD considers the correlation among different dimensions. Similar results could also be observed in Fig.3(b) whose results will not be analyzed repeatedly.
Test 2: Time cost of three methods.
In CPE MD method, the covariance matrix among different dimensions is used for objective evaluation, which will bring additional time cost compared to other methods. To validate this hypothesis, we compare the time costs of three methods with respect to the size of used records for each user in WS-DREAM and the size of users for each dimension in WS-DREAM. The results are reported in Fig.4 where Fig.4(a) records the time costs of three methods with respect to the size of records and Fig.4(b) records the time costs of three methods with respect to the size of users. Here, we only consider the response time dimension of WS-DREAM dataset.
It is seen from Fig.4(a) that the time costs of three methods all increase with the growth of the number of records, which is due to the fact that: more records used for evaluation will bring additional processing time. Moreover, our CPE MD does not perform better than Ecu-D and Cos-D methods because CPE MD considers the correlation among different dimensions. However, the time cost of CPE MD is close to those of Ecu-D and Cos-D. Similar results could also be observed in Fig.4(b).

VII. CONCLUSION
Training score evaluation is a promising way to know about the physical conditions of a player. However, there are often multiple physical trainings for players and various correlations are existent among them, which significantly decreases the fairness and trust of player training score evaluation and ranking because traditional multi-dimensional data integration solutions are often based on a strong hypothesis, i.e., the involved multiple dimensions are independent with each other. In view of this shortcoming, we introduce Mahalanobis Distance into the multi-dimensional player training score evaluation and further propose a correlationaware player training score evaluation method with trust based on Mahalanobis Distance, i.e., CPE MD . Through Mahalanobis Distance, we can minimize the hidden linear correlations among multiple training scores of players, so as to guarantee the fairness and trust of player evaluation and ranking. Finally, a case study is presented to show the concrete procedure of CPE MD .
However, player evaluation is often a multi-dimensional decision-making problem while multi-dimensional decisionmaking problems often involves multiple-type data fusion, weight assignment as well as the optimization issue [22]- [29]. Therefore, we will further study the multisource data integration problem with weighting. In addition, Mahalanobis Distance requires additional time cost to compute the covariance matrix of different dimensions; therefore, its time complexity is not very low. While time cost is critical for real world applications especially for the big data scenario [23], [30]- [35]. Therefore, we would continuously refine our algorithm to further reduce its time costs so as to meet the quick response requirements from users. At last, in our CPE MD method, user privacy is not considered since we more focus on correlation elimination. While in most big data applications, privacy is always a critical concern especially when users are sensitive to their personal data [36]- [41]. Considering this drawback, we will further investigate the current research work associated with privacy protection and refine CPE MD by introducing user privacy.
TENGFEI FAN received the bachelor's degree in sport education from Beijing Sport University, China, in 2011. He is currently a Lecturer with the Department of Physical Education, Qufu Normal University. He has published several research papers in international journals and conferences. His research interest includes sport education.
SHENGLI TIAN received the bachelor's and master's degrees from Qufu Normal University, China, in 2011 and 2014, respectively. He is currently a Teacher with the 1st Middle School of Juye County, China. His research interests include school sports and humane sociology of sports.
ZHICHEN HU received the bachelor's degree from Nanjing University of Information Science and Technology, China, in 2020, where he is currently pursuing the master's degree in computer science and technology. His research interests include knowledge graph, cloud computing, and big data.
XINTONG FAN received the bachelor's degree from Qufu Normal University, China, in 2020, where she is currently pursuing the master's degree in computer science and technology. Her research interests include big data and recommender systems.
SIFENG WANG received the bachelor's degree from Qufu Normal University, China, in 1999. He is currently an Associate Professor with the School of Computer Science, Qufu Normal University. His research interests include big data and recommender systems. VOLUME 10, 2022