A Balanced Squad for Indian Premier League Using Modified NSGA-II

Selecting team players is a crucial and challenging task demanding a considerable amount of thinking and hard work by the selectors. The present study formulated the selection of an IPL squad as a multi-objective optimization problem with the objectives of maximizing the batting and bowling performance of the squad, in which a player’s performance is estimated using an efficient Batting Performance Factor and Combined Bowling Rate. Also, the proposed model tries to formulate a balanced squad by constraining the number of pure batters, pure bowlers, and all-rounders. Bounds are also considered on star players to enhance the performance of the squad and also from the income prospects of IPL. The problem in itself is treated as a 0/1 knapsack problem for which two combinatorial optimization algorithms, namely, BNSGA-II and INSGA-II, are developed. These algorithms were compared with existing modified NSGA-II for IPL team selection and three other popular multi-objective optimization algorithms, NSGA-II, NSDE, and MOPSO-CD, on the basis of standard performance metrics: hypervolume, inverted generational distance, and number of Pareto optimal solutions. Both algorithms performed well, with BNSGA-II performing better than all the other algorithms considered in this study. The IPL 2020 players’ data validated the applicability of the proposed model and algorithms. The trade-off squads contained players of each expertise in appropriate proportions. Further analysis of the trade-off squads demonstrated that many theoretically selected players performed well in IPL 2020 matches.


I. INTRODUCTION
The Indian premier league (IPL) is a franchise-based compe-20 tition started in 2008 by the Board of control for cricket in 21 India (BCCI) to promote cricket in India [1]. IPL is the most 22 attended cricket league and has a significant contribution to 23 the social and economic sectors. In 2015, IPL contributed 24 11.5 B to the Indian economy. 25 The associate editor coordinating the review of this manuscript and approving it for publication was Sun-Yuan Hsieh .
In this league, the players are bought by high-profile own-26 ers through an auction process. Each franchise owner spends 27 a significant amount of money on purchasing players that 28 have the ability to win and which also fit their business objec-29 tives. Considering the huge amount of money invested in the 30 purchase of players, the franchise owners try to explore the 31 best possible combinations of players for their squad. 32 In view of this, the researchers in [2] have formulated IPL 33 team selection as a bi-objective optimization problem with 34 the aim to maximize batting and bowling performances and 35 satisfy the IPL regulations: budget constraint, bound on the 36 selecting an optimal national cricket team [7] and integer 93 programming for selecting playing XI team [8], T20 world 94 cup team [9], and IPL team [3]. The multi-objective IPL team 95 selection model has been solved using integer linear program-96 ming [3] and non-dominated sorting genetic algorithm-II 97 (NSGA-II) [2], [10]. NSGA-II [11] is a population-based 98 multi-objective optimization algorithm widely used to solve 99 the multi-objective COPs [12], including multi-objective 100 knapsack problems [13], [14]. 101 The present study developed two NSGA-II variants, 102 BNSGA-II and INSGA-II, for solving the proposed multi-103 objective IPL squad selection problem. BNSGA-II has a 104 binary chromosome representation, and INSGA-II has an 105 integer chromosome representation. The problem constraints 106 are handled using the constraint dominance principle, and a 107 repair mechanism is proposed to repair the infeasible solu-108 tions that occur in BNSGA-II due to the violation of a single 109 constraint in order to produce a sufficient number of fea-110 sible solutions. The efficiency of the proposed algorithms 111 is validated by comparing them with NSGA-II, an algo-112 rithm for the bi-objective IPL team selection problem [2] 113 and three other popular multi-objective optimization algo-114 rithms: NSGA-II [11], non-dominated sorting differential 115 evolution (NSDE) [15], and multi-objective particle swarm 116 optimization with crowding distance (MOPSO-CD) [16], 117 using hypervolume, inverted generational distance (IGD), 118 number of Pareto optimal solutions (NPS), and computa-119 tional time. Next, the trade-off squads obtained through the 120 best-performed algorithm are analyzed based on their cost 121 and the fielding performance. The performance of trade-off 122 squad players in IPL 2020 validates the efficiency of the 123 proposed model. A practical situation is also simulated where 124 a franchise owner favours certain players irrespective of their 125 cost. 126 In summary, the main contribution of this study is: 127 • Using the concept of knapsack problem for selecting an 128 IPL team.

129
• Designing a new model for selecting a balanced squad 130 for IPL.

131
• Developing two new algorithms, BNSGA-II and 132 INSGA-II, for solving the proposed model.

133
The rest of this paper is organized as follows: 134 Section 2 describes the related work. Section 3 defines the 135 multi-objective knapsack problem. Section 4 formulates the 136 proposed IPL Squad selection problem. Section 5 describes 137 the proposed methodology. Section 6 discusses the exper-138 iment results and comparison. Section 7 shows the analy-139 sis and discussion. Finally, the conclusion is presented in 140 Section 8.

142
This section discusses the optimization methods available in 143 the literature to select IPL team/squad players. In addition 144 to the optimization methods, researchers also used methods 145 based on other techniques, such as data envelopment anal-146 ysis [17], [18] and machine learning algorithms [19], [20]. 147 ods for single and multi-objective optimization models. The 149 comparative study between the related work on optimization 150 methods for cricket team selection and the proposed work is 151 given in Table 1. 152 Sathya and Jamal [7] proposed genetic algorithm for 153 selecting eleven out of fifty national players for a one-day 154 international(ODI). The factors such as the number of pacers 155 and spinners, composition of left-hander and right-hander, 156 and partnership records to select a more flexible, balanced, 157 and diverse team had been considered for selection purposes. 158 Ahmed et al. [2] proposed NSGA-II with a novel gene 159 representation and decision-making techniques for IPL T20 160 team selection. The resulting teams were compared with IPL 161 4th edition teams and found better theoretically. The authors 162 also demonstrated a dynamic auction-based player selection 163 to make the procedure more realistic. 164 Bhattacharjee and Saikia [9] selected an optimal squad of 165 15 players using binary integer programming and compared  Chand et al. [3] proposed an integer programming method 172 for IPL team selection that guaranteed optimality and demon-173 strated its scalability using two-objective, three-objective, 174 and five-objective formulations. The objectives of the prob-175 lem were based on batting, bowling, fielding, cost, and star 176 power. The construction of partial teams was done around the 177 preferred players. In addition, the ranking of players was done 178 using the players' performance to aid the decision-making.

III. MULTI-OBJECTIVE KNAPSACK PROBLEM
180 Given a knapsack with weight capacity W and a set of n 181 items in which i th item has weight w i ≤ W and profit per 182 objective v k i > 0, i = 1, 2, . . . , n, then the multi-objective 0/1 183 knapsack problem aims to fill the knapsack with given items 184 within its capacity such that the total profit is maximized.

185
Mathematically, a K -objective knapsack problem is defined 186 as follows:

192
The IPL squad selection problem is similar to the above-  A balanced squad contains players of each expertise in suit-201 able proportions. The proposed squad selection problem aims 202 to form a balanced IPL squad with maximum net batting and 203 bowling performance. The objectives and constraints of the 204 problem are detailed in the following subsections. The vari-205 ables and parameters used in the paper are defined in Table 3. 206

207
The batting average 1 indicates the run-scoring capability of 208 a player and estimates its batting performance. However, for 209 limited-overs cricket, slow batting will lead to defeat rather 210 than victory [4]; therefore, players with high batting strike 211 rate 2 are also needed since it indicates the rate of scoring runs 212 by a player.

213
It is not necessary that the players with high batting 214 averages also have high strike rates; for example, the IPL 215 2020 auctioned players with high strike rates and low bat-216 ting averages are shown in Table 4. The relation between 217 these batting statistics can further be visualized using Fig.1. 218 Here, the value of these statistics for pure batters and batting 219 all-rounders are normalized (due to much difference in their 220 scale) and sorted in decreasing order of the batting averages. 221 The figure shows that the batting averages of the players 222 decrease continuously, but their strike rates fluctuate. Also, 223 they do not have any relation such that considering one will 224 count both of them. Therefore, both factors should be consid-225 ered important for analyzing a player's batting performance. 226 Researchers in [4] have proposed an efficient Batting per-227 formance factor (BPF) for calculating the player's batting 228 performance. BPF is the weighted product of the normalized 229 values of batting strike rates and batting averages, as given 230 in (1). The higher value of α indicates a higher weightage to 231 The first objective of this study is to maximize the net bat- players.
The second objective of this study is to maximize the net 270 bowling performance (or minimize the -net bowling perfor-271 mance) of the squad as given in (4). Here, the objective func-272 tion f 2 denotes the sum of the CBRs of selected players.
Here, n is the total number of players, and CBR(i) is the 276 combined bowling rate of the i th player , i = 1, 2, . . . , n.

277
Note: Since a lower CBR is favourable, batters will be 278 selected as a bowler due to their lower CBR and not because 279 of their excellent batting statistics. Therefore, the CBR of a 280 batter is taken 100 as a penalty.  The trade-off squads created by applying the above IPL regu-299 lations may contain mostly all-rounders as they improve both 300 the batting and bowling performance of a squad. Also, due 301 to budget limitation, bowlers, bowling all-rounders, and new 302 players have a higher chance of being selected as these play-303 ers have lower auction prices compared to pure batters and 304 experienced players. Therefore, two additional constraints are 305 considered, one bounding the sum of the number of pure bat-306 ters and batting all-rounders (10)  ing data on all cricket matches).

350
The details of the players are given in The complexity of the proposed COP can be defined as 363 the possible number of players combinations for a squad, 364 6 Player with high-performance records in IPL and T20I is considered a 'star player'.    In BNSGA-II, initially, each squad has m number of play-426 ers, but after the first generation, the constraint (6) of main-427 taining the fixed number of players in each squad starts 428 violating. Due to this constraint, the infeasibility in the popu-429 lation results in fewer feasible population members despite 430 taking a large population size. Therefore, a repair mecha-431 nism is proposed to repair the offspring population in each 432 generation to produce sufficient feasible population mem-433 bers for the next generation. More clearly, the repair mech-434 anism helps remove the large infeasibility due to a single 435 constraint. The pseudocode for the repair mechanism is given 436 in Algorithm 1.

444
This section designs experiments considering two cases for 445 IPL squad selection. Case-I is an optimization problem with 446 objectives (2) and (4) subject to constraints (5-11), and case-II 447 is the optimization problem with objectives (2) and (4) sub-448 ject to constraints (5-9) and (12-17). The first experiment 449    The parameters for the proposed model and algorithms are 481 given in Table 7. The population size and the number of 482 generations are taken after fine-tuning the algorithm. Several 483 combinations were tried to optimize the performance of the 484 algorithm. It was observed that the population size of 200 and 485 the stopping criteria of 500 generations gave the best possible 486 results. Likewise, the other parameters were also fine-tuned 487 so that the Pareto optimal solutions cover the whole Pareto 488 front and with good convergence. Considering the brevity of 489 space only, the best parameters are shown in the study.  The model parameters a 1 , a 2 , m 1 , m 2 , m 3 , n 1 , n 2 , n 3 are 491 fixed for the squad length m = 23, as shown in Table 7 and 492 can be changed with the value of m. The value of parameter α 493 is set using sensitivity analysis as follows:   The relation between BA rank and SR rank for α = 0.1, α = 506 0.5, α = 0.6, and α =1 is shown in Fig. 8. For lower α, 507 there is not much difference in both ranks of players, but this 508 difference increases as the value of α increase from 0.5 to 1.

509
As the T20 is a shorter format than ODI, more hitters, i.e., 510 batters with superb strike rates, are needed to achieve a good 511 score. Therefore, the value of α is set to 0.6 for the proposed 512 model. This is the volume of the region dominated by the solution set 524 S in the objective space. Mathematically, ∀x i ∈ S, a hyper-525 cube v i is constructed with a reference point W and solution 526 x i as diagonal corners of the hypercube, the reference point 527 W can be found by constructing a vector of worst objective 528 values. After that, a union of all hypervolumes is calculated 529 as: This measure evaluates the proximity of the solution set S to 533 the solution set S * and is defined as the average distance of 534 each reference solution y ∈ S * from its nearest solution in S 535 as: where,d xy is the Euclidean distance between solution x ∈ S 538 and y ∈ S * .

544
This section compares the solutions obtained using various 545 algorithms for the proposed problem. The simulation is also 546 performed by fixing the preferred players and building partial 547 squads around them.
The performance metrics for each algorithm are calcu-  This result is further investigated statistically by perform-588 ing the Friedman test [23], which provides the algorithms' 589 ranking and analyses whether the results evaluated by differ-590 ent algorithms demonstrate any inequality. The null hypoth-591 esis assumes that the performance of all the algorithms is 592 equivalent or that none of the algorithms performs signifi-593 cantly differently. Therefore, the rejection of the null hypoth-594 esis shows a significant difference in the performance of the 595 algorithms. IBM SPSS is used to perform this test with critical 596 statistical significance α = 0.05. Here, the Friedman test 597 is performed for ranking the six algorithms (for each case 598 separately) on the basis of their hypervolumes −1 and IGDs on 599 40 runs. The Friedman mean rank and the corresponding test 600 statistics are shown in Table 9. As all p-values are less than 601 0.05, we reject the null hypothesis and accept that a signif-602 icant difference exists in the performance of the algorithms. 603 BNSGA-II has the highest mean rank in terms of hypervol-604 ume and IGD. NSDE has the second-highest mean rank in 605 both metrics for case-I, and INSGA-II has the second-highest 606   mean rank for case-II. However, INSGA-II is more compu-607 tationally efficient than NSDE, as can be seen in Tables 7-8.   instance, suppose 'Rohit Sharma' (the captain of the maxi-614 mum times IPL winning team) and 'Jasprit Bumrah' (one of 615 the best death-overs bowlers) are the preference of a franchise 616 owner for building a squad. Then, it is possible to obtain 617 balanced trade-off squads along with these preferred players 618 using BNSGA-II. For this purpose, we use a repair mecha-619 nism modifying Algorithm 1 such that the index of these two 620 players has '1' in the chromosome representation (Fig.3) to 621 maintain these two players in the squad, as shown in Table 10. 622 In this way, BNSGA-II helps incorporate the preference of 623 the franchise owner in squad selection which is helpful in a 624 dynamic environment, such as an auction.

626
This section analyses case-I and case-II on the basis of the 627 trade-off squads obtained using BNSGA-II. It also analyses 628 the trade-off squads based on cost, fielding performance, and 629 players' performance in IPL 2020.

631
Before analyzing the trade-offs for case-I and case-II, we first 632 obtain the trade-off squads that maximize the net batting and 633 bowling performance and satisfy the constraints (5-9) that 634 are based only on IPL regulations. The majority of players 635 in almost all squads are bowlers or all-rounders, as shown in 636 Fig. 12. Another concern is that some squads have no bowlers, 637 which is impossible in the actual situation as an IPL squad 638 should have the right proportion of pure batters, pure bowlers, 639 and all-rounders. 640 Further, case-I bounds the sum of pure batters (bowlers) 641 and batting all-rounders (bowling all-rounders); however, 642 as shown in Fig. 13, in most squads, more than 60% of players 643 are all-rounders. The reason for this is that all-rounders con-644 tribute to both the batting and bowling performance of the 645 squad. So, the result suggests that the squads using case-I are 646 also not balanced concerning players' expertise.  The '-net batting performance versus price' and '-net bowling 660 performance versus price' is shown in Fig.15. The figure 661 illustrates that a squad with higher batting (bowling) per-662 formance does not need to be costlier than the squad with 663 comparatively lower batting (bowling) performance. More 664 clearly, two squads with similar costs may have immense dif-665 ferences in their batting and bowling performances. Another 666 finding is that the squad having high net batting performance 667 is mostly costlier than the squad with high net bowling perfor-668 mance, as auction prices of pure batters are higher than pure 669 bowlers.      Table 12.   R. Sai Kishore, Rishabh Pant, and Virat Kohli. 14 of them did 708 not play in IPL 2020. The 30 players who played are marked 709 in bold, and their performance records are obtained from 710 iplt20.com. The players with one of the top ten records in IPL 711 2020 (in terms of most runs, most runs (over), best batting 712 average, best batting strike rate, fastest fifties, most fifties, 713 fastest centuries, most centuries, most fours, most sixes, high-714 est score (innings), best bowling average, best bowling econ-715 omy, best bowling strike rate, most wickets, most dot balls, 716 most maiden overs and player points) are shown in Table 13. 717 The remaining players are mainly in the top twenty of these 718 records. The above analysis supports the claim that the play-719 ers selected based on performance could perform well in the 720 upcoming matches. Innovations. His research and development expe-918 rience includes over 30 years in the industry and academia. He works in a 919 multi-disciplinary environment involving artificial intelligence, social net-920 works, conceptual lattice, information retrieval, semantic web, knowledge 921 management, data compression, machine intelligence, neural networks, web 922 intelligence, nature and bio-inspired computing, data mining, and applied to 923 various real-world problems. 924 925 VOLUME 10, 2022