A Feature-Weighting Approach Using Metaheuristic Algorithms to Evaluate the Performance of Handball Goalkeepers

Handball experts agree that the most crucial position in a handball match is that of the goalkeeper. Their performance can be a good predictor of a team’s ranking in tournaments. Despite this, few studies have been conducted on the relevance of every elite goalkeeper’s action to their performance in the match. This paper provides the features or criteria for objectively evaluating a handball goalkeeper based on their actions during a match. For this purpose, the feature-weighting problem is formulated as an optimization problem. The problem is solved using eight metaheuristic algorithms to adjust the weights of the features. Computer experiments using real data from the 2020 Women’s and Men’s European Handball Championships are carried out with these algorithms. The algorithms optimize the weights based on three metrics. The first metric is to identify the best goalkeeper; the second metric is to identify the top five goalkeepers, regardless of order; and the third metric is to identify and order the top five goalkeepers. A case study is carried out with real data from the 2021 Women’s and Men’s World Handball Championships, where the best goalkeeper found in both cases with the optimized weights coincide with the best goalkeeper chosen by the International Handball Federation (IHF). Finally, the paper shows the particularities and specific difficulties involved in evaluating handball goalkeepers.


I. INTRODUCTION
Handball is an important and popular sport in Europe. The German Premier League (Handball-Bundesliga) has an annual budget of approximately 80 million euros, and approximately 750, 000 people play this sport in multiple leagues [1]. For example, the top teams in France have an annual budget of 1 million Euros [2]. These are good examples that show the importance of handball in Europe, as well as its impact on people and the economy.
In handball, although the scientific and professional attention given to quantitative data analysis has grown very rapidly over the past few years, little work has been done on evaluating a player in a game-by-game scenario [3]. Thus, handball is a long way behind the analysis existing The associate editor coordinating the review of this manuscript and approving it for publication was Kok-Lim Alvin Yau .
in other sports, such as soccer [4] or ice hockey [5]. The existing approaches [6] used to represent the effectiveness of a handball team or a single handball player is now insufficient for evaluating the performance of players in a match.
Nevertheless, there is now growing interest in evaluating the performance of handball teams, which involves establishing an impartial method of assessment. However, there are very few references in the literature compared to other sports [7], including basketball, ice hockey or football [8]. These assessments are often based on expert opinions, and they do not always agree on the importance of the chosen criteria. For this reason, the challenge is to conduct an unbiased assessment of the players based on their main actions. In this context, some recent studies have been proposed. A fuzzy framework to evaluate player performance based on expert consensus is described in [9], [10]. In addition, the application of machine-learning methods for predicting player performance using specific athletic characteristics rather than handball actions is addressed in [11].
Goalkeepers play a crucial role in handball. In coaching communities, it is well-known that a goalkeeper's performance can predict the team's ranking in major tournaments [12]. Despite this, few studies have been conducted on elite goalkeepers. For example, the methodology presented in [6] allowed the effectiveness of a handball team or a single handball player to be evaluated. However, it was insufficient for evaluating the performance of a handball goalkeeper in a match. In this context, the majority of the proposals in the literature take a statistical approach. In [13], the authors conducted a full study of the shot effectiveness of each handball position with respect to the goalkeeper. Furthermore, [14] and [15] proposed methodologies based on statistical studies to evaluate the performance of handball goalkeepers.
This paper aims to obtain a transparent and comparable evaluation of the performance of handball goalkeepers. For this purpose, a feature weighting process is formulated as an optimization problem. This problem is solved by applying six metaheuristics and two memetic algorithms. As a result, a set of weights is established for each goalkeeper's action, which can be applied to evaluate goalkeepers and choose the best in a tournament.
For these purposes, the weights of the 19 indicators provided by the European Handball Federation (EHF) for each match for handball goalkeepers are obtained through an optimization process. The proposed optimization procedure is solved by metaheuristic algorithms and guided by the following criteria: (1) the identification of the best goalkeeper chosen as MVP by the tournament organizers; (2) the identification and ranking of the five best goalkeepers chosen by the organization and supporters; and (3) a combination of criteria (1) and (2). The solution obtained is then compared to a fuzzy-based solution for feature weighting based on the expert's opinion [10] and on a weighted scheme based on statistics extracted from the literature and domain knowledge [16]. Finally, the results obtained were validated at the 2021 Women's and Men's World Handball Championships.
To summarize, the main contributions of the approach presented in this paper (see Figure 1) are as follows: • The problem of the performance assessment of a handball goalkeeper is formulated as a feature weighting optimization problem. The aim is to quantify the capability of several performance indicators to identify the best goalkeepers in a tournament. • The results obtained enable the best goalkeeper in a tournament to be identified, as demonstrated in the presented case study. Additionally, it provides valuable information about what factors are influential in distinguishing the best goalkeepers from the average. The rest of this paper is organized as follows: The background of the problem is introduced in Section II. Section III formulates and states the problem addressed in this work. Then, Section IV describes the algorithms used to solve the problem. In Section V, the experimental settings are stated, while the corresponding experimental results are described in Section VI. Finally, the conclusions are presented in Section VII.

II. BACKGROUND
This section explains the grounds for feature selection and feature-weighting problems, and gives a short explanation of sports analytics.

A. FEATURE SELECTION AND FEATURE WEIGHTING
Feature selection is a common and important preprocessing step in data mining and machine learning. The aim of this step is to reduce the number of features or variables used to describe the instances of a dataset. An efficient feature selection process saves processing time when a machine learning algorithm is trained and there is a reduction in the use of memory space. Moreover, reducing the number of features provides better data visualizations and avoids issues derived from the curse of dimensionality problem [17]. The aim of feature weighting is to measure the importance of features and to assign an appropriate weight to each feature. This task is critical in some machine-learning problems. If the weights are not properly set, the model's accuracy may be even worse. The feature-weighting problem has been addressed in the literature through two main approaches: i) using iterative algorithms that search the optimal weights using a performance metric for optimization [18], [19], and ii) using models that employ information priori, such as conditional probabilities and class projection [20].
The complexity of feature selection and feature-weighting problems lies in the function or performance metric to be optimized. It is usually a performance metric of a classifier. According to the model employed to solve the classification problem, the associated computational burden is higher or lower. Evolutionary algorithms and metaheuristics have been widely applied to address the feature-weighting problem. Thus, [18] proposed a binary PSO algorithm for feature selection and weighting to improve the performance of a K-Nearest Neighbor (KNN) classifier. Then, [21] designed a hybrid GA where Support Vector Machine (SVM) were embedded to solve seven classification problems where Receiver Operating Characteristics (ROC) and Area under the curve (AUC) were used as performance metrics. Recently, [22] proposed a feature-weighting approach based on a PSO algorithm to train a learning vector quantization classifier. Finally, [19] addressed the problem through a multi-objective approach considering relevance and redundancy metrics. In this paper, a wide variety of metaheuristic algorithms from the state-of-the-art have been chosen to solve the feature-weighting problem. Thus, the PSO, GA, BA and ABC algorithms are applied in this paper. Moreover, GSA, CGSA and two memetic variants of GSA, the MGSA and MCGSA are also used to solve the problem and to evaluate handball goalkeepers. GSA is chosen because it is a recent and efficient metaheuristic algorithm that has obtained good results in the feature-selection problem [23], [24], but it has not been used much in the feature-weighting problem. Additionally, GSA has been successfully tested in many real-world domains, such as electrical [25] or chemical [26] engineering, as well as in machine-learning applications, such as training neural networks [27]. A deeper review of GSA can be found at [28]. Other alternatives that might be useful for this purpose in the future are the gravitational search algorithm with chaotic neural oscillators [29] and the grasshopper optimization algorithm (GOA) [30].

B. GOALKEEPER PERFORMANCE EVALUATION
Today, there is a growing interest in evaluating the performance of handball teams, which involves establishing an impartial method of assessing players. These assessments are often based on the opinions of various experts who do not always agree with the importance of the indicators chosen. In the worst case, assessments are merely subjective. The aim is to be able to make an unbiased assessment of the players based on their statistics in the main actions in each match [31].
Recent papers, such as [11], have assessed different Machine Learning (ML) models (linear regression, decision trees, support vector regression, radial-basis function neural networks, backpropagation neural networks and long short-term memory neural networks) for predicting particular types of athletic performance in female handball. Prediction in this field is a complicated task when using conventional methods due to the diversity and complexity of specific types of athletic performance with nonlinear relationships between them. This is one of the first studies using machine learning in sports science for handball players, and the results are encouraging for future studies. The main difference between the work and this paper is that their authors applied a methodology to evaluate athletic performance in female handball players, while this study aims to assess the performance of handball goalkeepers.
Another example of a recent study is [9]. It presents a novel approach based on fuzzy multicriteria group decision-making tools to select those criteria that best represent the handball player's performance in a match and set their relevant weights. The authors aggregate expert judgments by a fuzzy model, overcoming some drawbacks of classical systems, including the definition of the relevance of each criterion using linguistic labels. The methodology is adapted to the peculiarities of handball and the aspects that distinguish it from other sports.
There were few studies assessing goalkeepers, despite their importance in handball. In [12], the authors analyzed goalkeepers' save performances during matches at the 2015 Men's World Championships. A tracking cam-era system and bespoke software were used to analyze time-motion performance parameters and evaluate the save rates for each goalkeeper. [13] conducted a complete study of the shot effectiveness of each handball position concerning the goalkeeper. The authors concluded that the overall performance of goalkeepers was affected by anthropometric characteristics, skilled perception, reaction time, experience and quality of defense. However, [14] applied three methods to the goalkeepers of the 2018 European Men's Handball Championship based on: (i) the goalkeeper's save percentage; (ii) the goalkeeper's save percentage and time played; (iii) further considering the distance from which the shots were taken. Furthermore, [15] carried out a statistical study to classify the goalkeepers in the 2018 Women's European Handball Championship into three categories based on their saving efficiency, time played and number of matches played. Finally, an empirical analysis of three representative strategies for weighting evaluation criteria (a fuzzy approach, a metaheuristic optimization strategy and a statistical method) to evaluate handball goalkeepers in a tournament was presented in [10].
In this study, the methods defined enable formalizing the handball goalkeeper performance assessment process as an optimization problem. Thus, by learning from the results of previous competitions, a set of weights was obtained to quantify the experts' subjectivity. Using these weights, the handball goalkeeper's performance in a tournament or a match could be objectively established.

III. FORMULATION OF THE PROBLEM
The main contribution of this paper is the formulation of the goalkeepers' evaluation problem as a feature-weighting problem and its resolution by means of metaheuristic and memetic algorithms. In this section, the feature-selection problem is formulated first. Then, feature weighting is formulated, and the problem addressed in this paper is stated.

A. THE FEATURE-SELECTION PROBLEM
The feature-selection problem discussed in section II is then formulated. We let D be the dataset comprising n observations or instances, which is shown in (1) as follows: Each instance of D is described over a set of m features or variables, as follows: (2).
Therefore, according to equations (1) and (2), the D dataset has dimensions of n × m, where n specifies the length of the dataset in terms of the number of instances, while m specifies the width of the dataset depending on the number of features. The aim of the feature selection problem is to select the optimal subset of features that optimizes a performance metric. This metric is usually a loss function (L), which must be minimized, or an accuracy metric (A), which must be maximized. To do that, a feature selection algorithm ( ALG) obtains the set of optimal discrete weights, denoted by W * , which determines the features to be selected. Therefore, the weight of feature i is computed, which is shown in (3) as follows: According to the notation described, the feature-selection problem is defined in (4) as follows: Problem (4) sets out the feature-selection problem. The optimal set of weights W * for the space of features F can be obtained in two ways: first, given a dataset D and an algorithm ALG, an accuracy metric is maximized. Next, given D and ALG, a loss function is minimized. In both cases, the optimal weights of the features are natural values on the interval [0, 1].

B. THE FEATURE-WEIGHTING PROBLEM
The feature-weighting problem can be considered a generalized feature-selection problem. Analogous to the featureselection problem, the aim of feature weighting is to obtain the optimal set of weights (W * ), which optimizes a performance metric. However, the weights of the features are real or continuous values (usually on the interval [0, 1]) instead of binary values. Thus, the features in F are selected or not selected, but the weights associated with each feature determine the importance or relevance of each feature for solving the problem at hand. Given a dataset D, an algorithm ALG and a performance metric defined by an accuracy metric (A) to be maximized or a loss function (L) to be minimized, the feature weighting problem can be formulated similarly to the problem in (5) as follows: To address problem (5), the major concern or first step is to determine the codification of the solutions for the optimization problem. There must be a simple codification that does not increase the computational cost in solving the problem, and is versatile in such a way that the coding does not change regardless of the algorithm or the resolution method employed. For these reasons, in this study, each solution (player) is codified as a vector of m dimensions, representing the set of features whose weights must be optimized. Thus, each solution contains, at the i position, the weight associated with feature i denoted by w i . Figure 2 shows the codification of a solution. This is a simple and versatile codification, since it is independent of the algorithm used. At the beginning of each algorithm, the population is generated in the form of a matrix with as many rows as the VOLUME 10, 2022 population size chosen in the algorithm (representing players) and as many columns as the number of features whose weights are to be optimized. The initial values of the weights are set randomly within the limits of the weights (which are detailed in Section V) following a uniform distribution.
Then, these solutions evolve over the course of the iterations until a stop criterion is met. During this procedure, weights are updated according to the specific rules of the algorithm used to solve the problem. Finally, the best solution stores the set of optimal weights.
A second major concern is the choice of the objective or fitness function to be optimized. This is a key point because, although the choice of the optimization algorithm and the characteristics of the dataset are two important factors, the computational complexity of the problem is mainly given by the objective function. In this paper, the following objective functions are maximized: • Mean Reciprocal Rank (MRR): This function tries to measure, ''Which is the first relevant item?'' In this case, ''Which is the Most Valuable Goalkeeper?'' is measured. The use of this metric is justified since its optimization allows us to obtain an adjusted weighting scheme to select the best goalkeeper of a championship. Thus, given a dataset D where best is the player ranked first, the aim is to obtain a set of weights for each feature from the features set to build a ranking where best ranks first. Formally, it is computed as the inverse of the position of the best player in the ranking obtained in (6) [32] as follows: 1 rank best (6) where N is the number of times the ranking is built or the number of repetitions of the experiment and rank best is the position occupied in the ranking by the best player. The MRR function is bounded between 0 and 1. The main drawback of MRR is overfitting, since adjusting the weights to only consider the best goalkeeper may blind the algorithm, preventing it from generalizing well enough, and returning a weighting scheme that sometimes may not have a useful meaning for drawing meaningful sporting conclusions.
• Mean Average Precision (MAP): This function evaluates the complete list of ranked items up to a specific cut-off c. Thus, the MAP computes the elements of the top c that are found, heavily weighting the error made at the top of the list and gradually decreasing the importance of the error as it descends to the lower items on the list [33] and [34]. This is a useful metric since it does not struggle with the overfitting problem as much as MRR, as it considers not only the best goalkeeper but also the best goalkeepers of a given top c to obtain the optimal set of weights. In this way, the weights obtained enable more meaningful conclusions to be drawn at a sporting level. This paper uses a simplified version of the MAP, which is computed using only the elements of the identified top c, which is shown in (7) as follows: Thus, the MAP is a bounded function between 0 and 1, where the values of the MAP are continuous values that compute the number of top c goalkeepers detected out of the total of c. In the experiments carried out, c = 5 is set.
• Mean Weighted Metric (MWM): This function is a weighted metric based on the MRR and the MAP. The application of this metric is justified by the fact that many applications require the use of more sophisticated metrics to weight or combine simpler metrics. It is computed according to (8) as follows: where N is the number of repetitions. In this case, the MWM is used, giving more importance to the MAP metric (α = 0.75) than to the MRR (β = 0.25), as the former is more complete. This parametrization has been used in this paper since the MAP metric considers not only the best goalkeeper but also the top-c, which is considered even more important than MRR, without ignoring the latter. However, the MWM can be defined at the discretion of the researcher, paying more attention to the preferred metric and considering that α + β = 1. Therefore, according to available data and the type of evaluation to be made, it is possible to optimize this metric to obtain more precise conclusions in terms of sport. As with the other metrics, the MWM is bounded between 0 and 1.

IV. METAHEURISTIC ALGORITHMS
This section describes the metaheuristic and memetic algorithms used to solve the feature weighting problem, paying special attention to GSA, CGSA and the memetic variants MGSA and MCGSA, since they are the most recent and novel of all the algorithms used.

A. PARTICLE SWARM OPTIMIZATION
Proposed by Kennedy and Eberhart [35], the PSO algorithm is inspired by bird flocking behavior. Thus, a population of solutions is first initialized according to the procedure given in the last section. Each solution (or particle in the PSO metaphor) has two main components: location, which represents the location of the solution in the search space, and velocity, which defines the direction and the distance to where the particle goes. The choice of PSO for the computational experiments carried out in this paper is due to its good results in other sports areas, such as player detection and tracking in soccer [36], [37]. To achieve convergence, PSO particles update their velocities, and therefore, their positions in the search space converge to optimal regions. Velocities are updated by C 1 and C 2 , which are the acceleration coefficients, and α 1 and α 2 , which are random numbers that control the update in the search space. Convergence is also ensured due to particle having memory. This means that each particle remembers its best position, which is called l b (local best solution), as well as the best solution reached by the swarm, which is called g b (global best solution). PSO is a very popular metaheuristic algorithm from the state-of-the-art. This leads to many modifications of the algorithm to improve its efficiency. The pseudocode of a canonical PSO is shown in Algorithm 1. Finally, survey studies on PSO operators and variants can be found in [38], [39].
Algorithm 1 Pseudocode of a Canonical PSO Algorithm 1: (Initialization). Let t = 1 and generate a (random) set of feasible solutions and speeds (particles) and save them in (Update position). Of each particle i in X t 6: x t+1 (Fitness Evaluation). Evaluate the objective function f (x t+1 i ) for each solution (particle) i in population X t+1 . Initially developed by Holland [40], GA is probably the best-known evolutionary algorithm. It is inspired by the natural selection process that takes place in species. In GA, a population of solutions is initialized following the procedure described in the current paper. Each solution in the population is an individual, and its codification is the chromosome. Once the population is initialized, it is updated by evolution rules, ensuring convergence. Thus, a selection operator chooses two solutions (parents) according to their fitness to obtain a new solution using a crossover operator (run with a probability p cross ). Then, a mutation operator with p mut probability is applied to the child solution. In a generation, a new population of child solutions is generated. Finally, child solutions replace old solutions if their fitness is better. The pseudocode of a canonical GA algorithm is shown in Algorithm 2.
There is a wide variety of selection operators (roulettewheel, tournament, ranking selection. . . ), crossover operators (uniform crossover, n-point. . . ) and mutation operators. The reader can find an in-depth description of genetic algorithms in [41], [42]. Recent works about GA applied to sports motivate the use of this algorithm in the experiments carried out in this paper. For example, [43] presented a hybrid ACO-GA algorithm to schedule sports competitions, [44] applied GA to tagging text from baseball videos and [45] used a GA-BP neural network to predict a sport's performance.
Algorithm 2 Pseudocode of a Canonical GA 1: (Initialization). Let t = 1 and generate a (random) set of feasible solutions (individuals) whose representation (chromosome) is saved in while the stopping criterion is not satisfied do 3: Select n individuals (parents) to create a mating pool 4: while n new children solutions have not been created do 5: (Selection Operator). Select randomly two parents from the mating pool 6: (Crossover Operator). Apply crossover operator to parent solutions with p cross probability 7: (Mutation Operator). Apply mutation operator to child solutions with p mut probability 8: end while 9: Replace old population by the new, according to the fitness 10: end while C. BAT ALGORITHM Inspired by the foraging process of bats, BA is a metaheuristic proposed by Yang [46]. This algorithm, which is similar to PSO, models the echolocation phenomenon that occurs in bats. It allows bats to compose an image of their surroundings through the acoustic energy transmitted by the bats of the swarm. Thus, the algorithm starts by initializing a set of solutions (bats) in the manner described in section III. Each bat (solution) flies randomly with a velocity at a position, and both are initialized at the initialization step. Additionally, bats fly with a fixed frequency f min , updated by means of β random numbers, varying wavelength λ and loudness A 0 to search for prey. In this algorithm, frequency, pulse rate and loudness control the trade-off between exploration and exploitation. The pseudocode of canonical BA is shown in Algorithm 3. The recent application of BA to plan training sessions based on the mobility and heart rate data obtained from the players [47], and [48] has motivated the choice of BA for the computational experience of this paper. A deeper discussion about BA is provided in [42].

D. ARTIFICIAL BEE COLONY ALGORITHM
In 2005, Karaboga developed the ABC algorithm [49]. It is a similar algorithm to GA, as it borrows some of the ideas and operators from this algorithm. However, the ABC algorithm VOLUME 10, 2022 Algorithm 3 Pseudocode of a Canonical BA Algorithm 1: (Initialization). Let t = 1 and generate a (random) set of feasible solutions (bats): positions (X), speeds (V), frequencies (F), pulse rate (r) and loudness (A).
(Generate a solution). Around x i with probability r i 8: (Generate). Random new solution. 9: if (rand < A i and f (x i ) < f (x best )) then 10: (Accept). New solutions 11: (Update). Increase r i and reduce A i 12: end if 13: end while is inspired by the foraging process of a bee colony. First, a population of initial solutions is generated in the same way as the other algorithms described in this section, following the procedure shown in section III. In contrast to the rest of the algorithms, each solution in the search space is not only a bee but also a food resource for bees. Each food resource has more or less quality depending on its fitness value.
There are two kinds of bees in a bee colony: employed and unemployed bees. The first are those that are exploiting a food resource. The second can be classified as scout bees, which find new promising food resources, and onlooker bees, which wait in the nest until they choose a food resource to exploit. Employed bees, thus, search in their neighborhood new promising food resources by means of a perturbation (which simulates a mutation operator by an α random number and the difference vector between i solution and another random solution in population k). Onlooker bees are recruited to the discovered food sources according to a set of probabilities (following a roulette wheel typical in GA). When better solutions around a food source are not found, scout bees search randomly for a new food resource to exploit. This optimization process is shown in Algorithm 4, where a canonical ABC algorithm is shown. Despite being one of the most applied metaheuristic algorithms, there are few applications of ABC to the sports domain as far as we know. One of them is [50], where the authors employed a modified ABC to manage repositioning a bike-share system. The reason for this has motivated the use of ABC in the experiments performed in this paper. More details about the ABC algorithm can be found in [51].

E. GRAVITATIONAL SEARCH ALGORITHM
GSA is a metaheuristic inspired by the theory of Newtonian gravity in physics. It was first proposed by Rashedi Algorithm 4 Pseudocode of a Canonical ABC Algorithm 1: (Initialization). Let t = 1 and generate a (random) set of feasible solutions (food resources) and save them in X 1 = {x 1 1 , · · · , x 1 N } 2: (Assign employed bees). To initial food resources 3: while the stopping criterion is not satisfied do 4: for bee i in employed bees do 5: (Find solution). Around the current food resource (Change food resource). If the solution obtained is better than the current solution (food resource) 7: end for 8: (Assign onlooker bees). j to food resources by for bee i in onlooker bees do 10: (Find solution). Around the current food resource (Evaluate new solutions). By means of (Update food source). If the one obtained is better than the old one 13: end for 14: (Abandon food resource). If it does not change for more than l iterations 15: end while in [52]. In this algorithm, each solution in the search space corresponds to a mass in a metaphorical universe. The exploration and exploitation stages of the algorithm are driven by the interactions produced between particles in accordance with the universal law of gravity. Thus, heavy masses attract smaller masses, guaranteeing the convergence of the algorithm.
Initially, a population of N solutions is randomly initialized, i.e., the position and speed values are initialized for each solution. Then, the objective or fitness function is evaluated for all the solutions. The fitness value is related to the mass of each solution in the GSA metaphor. The mass of solution i in iteration t can be computed, which is shown in (9) as follows: where m i t is computed through (10) as follows: where worst t is the solution with the worst fitness value in iteration t and best t is the solution that has the best fitness value in iteration t. Then, the gravitational force acting from each solution or mass to the rest is computed through the gravitational law of Newton, which is shown in (11) as follows: (11) where F ij is the gravitational force on mass i from mass j. M pi is the passive gravitational mass of i, while M aj is the active gravitational mass of j. Note that M ai = M pi = M i . R ij is the Euclidean distance between masses i and j. In the original algorithm, R is used instead of R 2 because it provides better results [52]. The parameter is a small positive constant used to avoid dividing by zero. G t is the gravitational constant, which is initialized at the beginning of the algorithm and reduces over time, according to the following formula: . where G 0 is the initial value of the constant, α is a coefficient known as the learning rate, t is the current iteration and T is the maximum number of iterations. This function controls the exploration and exploitation trade-off in the algorithm. Thus, at the beginning of the algorithm, high G values encourage exploration, while in the end, low G values encourage exploitation. The total force that acts on an object i is then computed as a randomly weighted sum of the forces exerted by the other objects, with the purpose of adding a stochastic behavior to the algorithm. It is calculated by means of (12) as follows: where rand j is a single, uniformly distributed random number on the interval (0, 1). However, to improve the balance between exploration and exploitation, only the Kbest masses or solutions apply their force to the rest of the masses at iteration t. The Kbest function depends on the time t, and its value Kbest t = Kbest(t) is linearly reduced in such a way that, at first, all the masses exert their force on the rest, guaranteeing exploration, while in the end, only the Kbest masses exert their force on others, encouraging exploitation. In this case, the summation in (12) is taken as j ∈ Kbest t −{i}. Finally, at the end of the iteration, Newton's second law is applied to compute the current acceleration of each mass, which is computed by a t i = i . According to these formulae, the GSA ensures the movement of the particles towards particles with heavier masses, which permits exploitation when the heavier particles move slowly. The exploration and exploitation trade-off is controlled by both the G t and Kbest t functions. An overview of the pseudocode of GSA is shown in Algorithm (5). The justification for choosing GSA as a reference optimization algorithm for the experiments made in this paper is based on its good performance in a multitude of domains and on the recent application in the selection and estimation of parameters for the selection of players in the call-up and in the line-up in a match [53], [54]. Ensuring a good equilibrium between the exploration and exploitation stages is a key point in the design of a metaheuristic algorithm. In the original formulation of the GSA, the G t and Kbest functions are the two components that control the balance between the two stages. Later, in [55], the authors proposed the CGSA. In this algorithm, it was shown that the use of chaotic maps applied to the G t function enhanced the balance between exploration and exploitation, improving the results of the original GSA. Specifically, the sinusoidal chaotic map provided the best results [55], [56]. A chaotic map is an expression that, when added to a function, introduces systematic changes in it, producing chaotic behavior. In the original GSA, G t is constantly decreased over the iterations, and thus, the algorithm either explores or exploits. Adding a chaotic map to the gravitational constant chaotically changes the value of G t while decreasing it during iterations, providing exploration and exploitation at the beginning and at the end of the algorithm. Specifically, in the case of CGSA with a sinusoidal map, the gravitational constant follows the expression G t chaotic = G t +Chaotic sin (t), where Chaotic sin (t) is the sinusoidal chaotic map, which corresponds to the equation x t+1 = ax 2 t sin(πx t ), where a = 2.3, according to the parametrization given by [55].
As a result, CGSA can be codified according to Algorithm 5 in the same way as GSA. The only difference between them is the definition of the gravitational constant G t .
As far as we know, there are no applications of the CGSA algorithm in the sports domain. This reason, along with the good results obtained by CGSA with respect to the base GSA algorithm in many optimization problems of different fields, has led to its use in the experiments carried out in this article.

G. MEMETIC (CHAOTIC) GRAVITATIONAL SEARCH ALGORITHM
In this subsection, the memetic approach used to improve the local search skills of the GSA, and the CGSA is described.
One of the most important problems of metaheuristic algorithms is that although they encourage exploration, they are not as good as gradient-based methods at exploitation VOLUME 10, 2022 tasks. For this reason, Memetic Algorithms (MAs) appeared as an alternative for hybridizing both kinds of algorithms, introducing local search procedures into metaheuristic algorithms to improve their local search skills (see [57], [58]). In this paper, the approach proposed in [56] is used to design the proposed MAs.
When an MA is designed, there are a few main aspects that it is necessary to address: i) when the local search will be applied, ii) which local search method will be used, iii) where it will be executed in the algorithm flow and iv) how much the local search will be run. Then, the approach employed in this paper and proposed in [56] to address these matters is discussed.
First, regarding when the local search is applied, the approach is based on the characterization of the local minimum. If the algorithm is capable of distinguishing whether a local minimum is found, then the local search is applied. To do this, it is necessary to characterize the concept of local minima. The algorithm finds a local minimum when a new best solution is found in a different search environment from the best solution found. Then, in this situation, the local search is applied. The criterion employed to detect a promising search environment is the fulfillment of the following two rules based on the parameters ε ≥ 0 and γ > 0 as follows: Rule 1: Rule 1 implies that a better point has been achieved in the current iteration t and that x t i * is the best point explored thus far. Moreover, Rule 2 constrains point x t i * not to belong to an environment around the (best) point x t−1 * , and thus, it is appropriate to apply an exploitation stage to this new search area. Second, regarding which local search method to apply, the descent methods are a good alternative for exploring the neighborhood of point x t i * . These approaches start from a point y k and build a descent direction d k based on the gradient vector of f in y k at that point. At a later stage, the line search is carried out as follows: Once the approximated solution α * of the one-dimensional problem (13) is found, the step size, a new point y k+1 = y k + α * d k is obtained. Then, the previous procedure is repeated from that point. Some examples of these methods are the method of the steepest descent, which takes d k = −∇f (y k ), or Newton's method, which takes where ∇ 2 f (y k ) is the Hessian matrix of f at y k . The first method has a linear local convergence rate, while in Newton's method, it is quadratic. Nevertheless, numerically computing ∇ 2 f (y k ) involves a large number of operations. The quasi-Newton (qN) methods avoid this disadvantage by using the observed behavior of f (y) and ∇f (y) to approximate the Hessian matrix. Even though the qN method considers this approximation, it continues to exhibit a super-linear convergence rate. This is the reason that the qN method is used in this study. One of the most popular and commonly used Hessian updating methods is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula. It approximates the Hessian matrix ∇ 2 f (y k+1 ) as follows: where T indicates the transpose as follows: The starting matrix H 0 can be set to any symmetric positive definite matrix, for example, the identity matrix. Following this, regarding where the local search is run, the criterion used in this work is to evaluate the previous rules at the end of each iteration of the main algorithm. Thus, at the end of each iteration of GSA or CGSA, the rules are evaluated, and the qN method is introduced in the algorithm if the previous rules are satisfied.
Finally, the amount that the local search will be run is based on the concept of tolerance. Tolerance is a lower bound on the size of a step. If the algorithm attempts to take a step that is smaller than the tolerance value, the iterations of qN end. Thus, the stopping criterion used in the qN method is as follows: (17) The tol parameter controls the intensity of the local search. First, it is convenient to explore the neighborhood sparsely, increasing its intensity as the algorithm progresses. For this reason, the parameter tol = 0.01 is initialized and is reduced to tol = tol/10 each time the qN method is applied. Thus, using the framework described previously, two memetic algorithms based on GSA and CGSA are built: MGSA and MCGSA. In the first, the qN method is introduced into GSA, while in MCGSA, the qN method is added to CGSA, in accordance with the framework explained above. Algorithm 6 shows a canonical pseudocode of the MGSA and MCGSA algorithms. The good results obtained by these memetic algorithms based on GSA in synthetic and real-world problems compared to state-of-theart algorithms [56] justify the choice of including MGSA and MCGSA in the computational experiments carried out in this paper. Indeed, the MCGSA algorithm is taken as a reference, since it has provided the best results in the literature in a large set of different optimization problems [56].

V. EXPERIMENT SETUP
Data from the goalkeepers in both the Men's and Women's European Handball Championships held in 2020 are used for the experiments in this study. The data was obtained by Algorithm 6 Pseudocode of MGSA and MCGSA 1: (Initialization). Let t = 1 and build a (random) set of feasible solutions X 0 = {x 0 1 , · · · , x 0 N } and speeds V 0 = {v 0 1 , · · · , v 0 N } for the system to be optimized (initial population). Set X 0 = X 0 . Let tol> 0 be a tolerance parameter. 2: while a stopping criterion is not met do 3: (Exploration Stage). Perform one iteration using the GSA or CGSA method (Rules for detecting the change of promising neighborhood). Compute best t , f t−1 * , x t i * and x t * If Rule 1 and Rule 2 are satisfied, perform the Exploitation Stage. Otherwise, set t = t + 1 and repeat the Exploration Stage.

5:
(Exploitation Stage). Perform several iterations of the quasi-Newton method until a stopping criterion based on tol is met, and let y t * = A qN (x t i * ). Decrease the value of the parameter tol.  processing all the statistics collected for each match by the European Handball Federation (EHF). The methods shown in Figure 1 can be observed in the more detailed illustrative example in Figure 3.
Therefore, in the first place using the data from both the Men's and Women's European Handball Championships, the optimization problem has been solved using all the optimization algorithms. Later, the results obtained by each algorithm were validated with the data gathered from the 2021 Men's and Women's Handball World Championships.

A. DATA DESCRIPTION
All the data from the matches in both European Championships were gathered and analyzed. All the games represent the official statistics of the EHF, and the data was collected by qualified observers at each championship. The EHF statistics provided a set of 19 variables to represent the performance of each goalkeeper in a match. Thus, the features for evaluating the specific position of the goalkeeper can be summarized as shown in Table 1: Feature values are standardized. However, values in the interval [0, 1] are not suitable because positive and negative actions are valued. Therefore, the problem must model actions that add and subtract scores. In addition, the expert trainers and the results obtained indicate that the standardized range [−2, 2] is better than [−1, 1] for a better scalability of the problem. Consequently, the domain of the positive variables are [(0 + µ), 2] and that of the negative ones are [−2, −(0 + µ)], with µ as a threshold. This is because variables that represent positive actions must have values higher than 0 in all scenarios and variables that represent negative actions always have values lower than 0.
The interval [−2, 2] has been chosen to restrict the weights of each goalkeeper's action in a match within understandable and easily manageable boundaries. Therefore, every action equally contributes similarly to the performance evaluation of the player. For this reason, the minimum value for the weights of every action has been established as |µ| = |0.2|; thus, any action during the match should contribute to the player's score. However, the maximum absolute value has been set to 2 with the objective of bounding the player score. µ is a value 10% of the maximum value.
To evaluate the feasibility of the proposal, the five goalkeepers who were selected as the best goalkeepers in both tournaments (see Table 2) have been used as references. Each of the goalkeepers was nominated on the basis of their performances during the tournament (40% weighting on the decision regarding the most valuable goalkeeper was given to the fans, who expressed their choices through votes, while the remaining 60% went to a panel of EHF experts). Our purpose is to compare the ranking obtained by applying the optimized feature-weighting schema with both top 5 rankings using three performance measures.

B. EXPERIMENTAL SETTINGS
In this section, the parametrization of the experiments is given. A total of 100 repetitions to optimize the MRR, the MAP and the MWM metrics have been computed for each of the eight algorithms tested: PSO, GA, BA, ABC, GSA, CGSA and the memetic algorithms, MGSA and MCGSA. When metaheuristic algorithms are run, many repetitions must be executed to obtain a realistic measure of the performance of the algorithm. According to [56], 100 repetitions of the execution of each algorithm have been run. The experiments were designed using the MATLAB programming language and ran on a computer with an Intel Core i5 processor at 2.9 Gigahertz and 16 GB of RAM. This computational experiment was carried out for the datasets from both Championships.
The population of all algorithms has a size of 30 individuals. This decision is justified by the trend in metaheuristics literature to use low population sizes (between 20 and 50 individuals. When one hundred iterations are run using this population size, 3000 function evaluations are executed. 100 iterations have been used in these experiments to provide a computational experience that is sufficiently comprehensive, but also at a reasonable cost. Then, this execution is repeated 100 times, leading to a computational burden of 300000 function evaluations per algorithm. Additionally, we check when MGSA and MCGSA are used to provide a fair comparison. Regarding the implementation and parametrization of PSO and GA, particleswarm and ga MATLAB functions have been used for reproducibility using the default parameters. Concerning BA, a MATLAB implementation provided by the author in [59] is chosen to use open code available to any researcher. This implementation establishes an initial loudness A = 1 and a pulse rate r = 0.1 for all bats. Furthermore, the frequency is bounded on the interval [0, 2], while two parameters α = 0.97 and γ = 0.1 are used to control the increase in r and the decrease in A. The specific implementation of ABC is taken from [60]. It is also an open code version of ABC, which enables the reproduction of the results obtained in this study. In this implementation, all bees are initially assigned to onlooker bees, while l = (0.6 · 19 · 30), with 19 being the number of variables in the optimization problem and 30 being the population size. Finally, regarding the GSA algorithm and its variant CGSA, as well as the memetic algorithms based on the MGSA and MCGSA, Table 3 shows their parametrizations, which are taken from [56].

VI. RESULTS AND DISCUSSION
In this section, the results obtained by the algorithms explained in Section IV are shown using the data from both the Men's and Women's European Championships. Next, with the weights obtained, a case study is carried out with the data from the last Men's and Women's World Championships, where the metrics used in this study and in two other recent works in the literature are compared. Finally, a complete performance analysis between the different algorithms is carried out.

A. ALGORITHM RESULTS
First, the results of the weights obtained by the optimization algorithms are interpreted to provide meaningful handball To obtain these weights, the results obtained using the GSA, the CGSA, the MGSA and the MCGSA have been averaged over the total number of executions, considering all the statistics collected from both tournaments. These algorithms have been chosen to obtain the figures since they are from the same family of algorithms, as well as the most recent and novel algorithms employed in this paper. In addition, computing the mean of all algorithms applied could distort the results shown. It is important to highlight that the values obtained have not been optimized based on the reference values indicated by experts, such as [16]. Instead, they are obtained from the optimization process guided by the previously mentioned metrics and algorithms. Each of the metrics defines a different objective: the MRR identifies the most valuable goalkeeper of the tournament; the MAP selects the top 5 best goalkeepers; and finally, the MWM considers both aspects of the evaluation. Figure 4 shows that the best male goalkeepers make a difference in the feature 9mSaves, as it is where the top 5 goalkeepers stand out. The second most valuable feature is WingSave, where the best goalkeepers of the tournament stand out and present the highest values. The most substantial difference between the results occurs in feature 6mCSaves. This is due to Gonzalo Pérez de Vargas (the most value goalkeeper) being far below (5 saves) than the rest of the goalkeepers in the top 5 (the rest are in the interval [12,15] saves). In contrast, 6mCGoals shows a very high negative value in the MRR metric because Gonzalo Pérez de Vargas is the goalkeeper who receives the least number of 6−meter goals. Figure 5 shows that the best women goalkeepers make the difference in the feature WingSaves, as it is where the top 5 goalkeepers stand out. The average percentage of all women goalkeepers is 32%, and the top 5 belongs to interval [36,47]%. The second most valuable feature is 6mCSaves, where the best goalkeepers accumulate many saves. The largest difference between metrics lies in this feature because S. Toft (best goalkeeper) reaches the highest value with 10% 6mCSaves. In contrast, 7mPGoals reaches a very high negative value in the MRR metric results because S. Toft receives few goals of this kind. Finally, the WingSaves feature is very important in both tournaments and is even more valuable in the female category. The most substantial difference between the weights of the features is that in the male category, 9mSaves are very important, while in the female category, they are the least important feature. However, 6mCSaves is a very important feature in the best woman goalkeepers, but it is the least relevant in the male category. Therefore, a male handball match is much more physical, there are more shots from more than 9 meters, and good goalkeepers make the differences there, while the female handball goalkeeper tries to reach 6 meters and shots from less distance.

B. CASE OF STUDY
In this section, the results obtained after applying the previously computed weighted schemes in the analysis of goalkeepers of the 2021 Men's and Woman's World Handball Championships are shown in (Figure 6). The purpose is to identify those named the tournament's best goalkeepers by the International Handball Federation experts. The organization did not report information on the top five goalkeepers of the tournaments.
For this purpose, we compare our proposal with the following approaches: (1) a fuzzy method to weight criteria based on the subjectivity of experts [9] and (2) a weighted scheme based on statistics extracted from the literature and domain knowledge [16].
The design of the proposed approaches is different in terms of the sources of the weighting scheme. For the fuzzy approach, we use the opinion of a set of experts. Meanwhile, the statistical approach is based on reference levels and VOLUME 10, 2022 the statistics of the championship. Finally, as previously explained, the proposed approach is aimed at optimizing the weighting scheme according to several performance measures. Each approach has been proven to establish weighting schemes in a decision-making process [61].
These weighting schemes have been applied at both the 2021 Men's and Woman's World Handball Championships to verify the approach's usefulness. As a result, both the most valuable Goalkeepers, Andreas Palicka (SWE), the best Goalkeeper in the 2021 Men's World Championship (see Table 4) and Sandra TOFT (DEN), the best goalkeeper at the 2021 Women's World Championship (see Table 5), have been correctly identified using the optimized weighting schemes obtained through several combinations of measures and algorithms.
The result is considerably better than that obtained with the non-optimization approaches. We can conclude that the proposal can be used as a data-driven mechanism to quantitatively determine the evaluation criteria used by the panel of experts who evaluate goalkeepers during official championships, which are generally opaque and unknown to the audience. The proposal is also suitable for selecting the best goalkeepers without subjectivity.

C. PERFORMANCE ANALYSIS
Once the results of the feature-weighting problem are described and analyzed, this subsection analyzes the performance of the different algorithms employed. Thus, Table 6 and Figure 7 show the mean results over the 100 executions of each algorithm in the different performance metrics.
As seen in the male competition, all the algorithms obtain the optimal value when the MRR metric is optimized. For the MAP metric, PSO, GA and GSA obtain the best mean value in comparison with the other algorithms, closely followed by CGSA and the memetic variants, MGSA and MCGSA. Finally, regarding the MWM metric, PSO obtains the best value. Then, the MCGSA algorithm returns the secondbest value. Since the MWM is the most complex metric, it can be seen that the MRR and MAP values are close to the optimum value, although this is not the case for the MWM metric. Despite this fact, the weights schema obtained by all algorithms optimizing MAP achieve identification of the best goalkeeper, while the weighting schemas for the MWM metric given by CGSA, MCGSA, GA and ABC also achieve identification of the best goalkeeper, unlike the weight schema obtained by PSO, which provides the best fitness value, but it does not manage to identify the best goalkeeper. Regarding MRR, the weighting schemas produced by all algorithms do not succeed in identifying the best goalkeeper. This could be due to the overfitting produced by adjusting the weights to obtain the best weight.
For the female data, Table 6 shows the mean results over the 100 executions of each algorithm for the three performance metrics. As with the male data, all algorithms reach the 30568 VOLUME 10, 2022   optimal value, where the MRR metric is optimized. However, for the MAP metric, the CGSA, GA and MCGSA return the best results, which is higher than for the MAP for the male category and is very close to the optimum value. Finally, regarding the MWM metric, once again, PSO gives the best result, closely followed by CGSA and MCGSA. This is also higher than for the MWM values for the male data. With regard to the weighting schemas obtained by the algorithms, for MRR, the weights obtained by GSA, CGSA, MGSA and MCGSA are able to identify Sandra Toft as the best goalkeeper. Checking the weights obtained when the MAP metric is optimized, all algorithms except BA identify the best goalkeeper. Finally, regarding the MWM metric, all algorithms, except BA and ABC, identify Sandra Toft as the best goalkeeper.
To compare the performances of the algorithms, the Wilcoxon rank-sum test is performed to determine whether a difference between algorithms is statistically significant. To do this, the MCGSA is taken as a reference and compared to the rest, since it is the most sophisticated algorithm that is being used. It should be noted that a p−value of less than 0.05 means that there are substantial differences between the two algorithms. Furthermore, h = 1 means that there are substantial differences in favor of MCGSA, h = −1 means that the differences are in favor of the algorithm that it has been compared with, and h = 0 means that there are no substantial differences between the two algorithms. Tables 7 and 8 show the results of the tests for the male and female categories. According to these results, it is possible to confirm that there is no substantial difference between the algorithms in the optimization of the MRR, for which all the algorithms obtain the optimal solution in both categories. Regarding the MAP metric in the male category, there is only a substantial difference in favor of MCGSA with respect to BA and ABC. However, in the female category, there are substantial differences in favor of MCGSA with respect to BA, ABC, GSA and MGSA. Finally, concerning the MWM in the male category, there are substantial differences in favor of the MCGSA in comparison to the BA, ABC, GSA and the MGSA. However, the only substantial difference is for MCGSA, which occurs against the PSO algorithm. Concerning the MWM metric in the female category, there are only substantial differences in favor of MCGSA with respect to the BA and ABC algorithms.

VII. CONCLUSION AND FUTURE WORK
Selecting the best goalkeeper is a key question in handball, since the goalkeeper is one of the most critical positions. Furthermore, the procedure for determining the best goalkeeper is very complicated, since it focuses on the examination of several different criteria. The main contribution of this paper is the formulation of this complex problem; the application of six metaheuristic algorithms (PSO, GA, BA, ABC, GSA and CGSA) and two memetic algorithms (MGSA and MCGSA) using three performance metrics (MRR, MAP, MWM) to calculate the weights of the features is useful in evaluating the goalkeepers' performances.
Two experiments are performed using the data collected by the EHF from all the matches of the last two Men's and Women's European Championships. Then, the feature weightings of 19 variables that determine goalkeeper performances are optimized using the eight previously mentioned algorithms and three metrics. The weights obtained for all the variables are exhaustively analyzed. The results obtained enable the most important features that have a greater negative and positive weight to distinguish the good goalkeepers from the best to be chosen. The higher negative and positive weights obtained with optimization embody the experts' criteria to decide the best goalkeepers. Subsequently, the results obtained in the female and male categories are compared. The results express the physical difference in the ability to shoot from far distances, which is superior in the male category. Two case studies are carried out for both Men's and Women's World Championships to validate the results obtained. This confirms that it is possible to identify the best goalkeeper and the top five goalkeepers in a tournament.
Therefore, a summary of our research achievements is as follows: (1) We formulate the problem to evaluate the performance of handball goalkeepers based on all their actions during a handball match. (2) We utilize optimization to calculate a feature-weighting of each goalkeeper's action with 8 metaheuristic algorithms based on 3 metrics. The weights obtained are a characterization of the best goalkeepers in this sport. (3) The results are validated with real data from two handball world championships identifying the best goalkeepers selected by the IHF with the weights obtained in this work, which corroborates that the optimization process coincides with the criteria of the experts. (4) Finally, a featureweighting approach enables us to objectively evaluate the performance of handball goalkeepers based on all their actions during the match. This will have many applications in high-performance teams.
Finally, a proposal for upcoming research involves expanding our research to address the performance of players in all handball positions (wings, backs, center backs and line players), not just goalkeepers. Additionally, our purpose is to complete our evaluation approach based on statistics with other features related to movement and pose extracted from video tracking [62]. In addition, the application of the obtained results from the goalkeeper evaluation model can help those who make decisions about team line-ups, signing new players, choosing national team goalkeepers or choosing better goalkeepers for matches, among many other applications.