A Video Game-Crowdsourcing Approach to Discover a Player’s Strategy for Problem Solution to Facility Location

In recent times, there has been a growing interest in the domain of computationally challenging problem solving within both scientific and organizational contexts. This study is primarily concerned with the extraction and comprehension of the methodologies and strategies employed by individuals when confronted with intricate problems, specifically those falling under the purview of NP-hard problems. The Facility Location Problem (FLP) serves as a prominent exemplar within this study’s framework. Traditionally, the handling of such complex problems has leaned upon intuitive reasoning and visual perception as the primary tools. However, these conventional approaches tend to provide only limited insight into the underlying processes employed in solving such problems. The present research seeks to bridge this knowledge gap through the utilization of advanced machine learning techniques for the purpose of categorizing and scrutinizing the strategies deployed by individuals in their attempts to tackle computationally challenging problems. The analysis conducted as part of this study unveils discernible and well-defined patterns and strategies that are employed by participants, some of whom have achieved notable levels of success. Remarkably, in certain instances, the outcomes achieved by these individuals have demonstrated a competitive edge when compared to the results produced by sophisticated computational methods, such as genetic algorithms. A fundamental component of our research methodology involves the application of heatmaps and clustering techniques. Through the normalization of results, our findings distinctly delineate two primary categories of games: those characterized by uniform player strategies and those characterized by a multitude of diverse and individualized tactics. Furthermore, our research employs a systematic approach to represent games by clustering them based on inherent similarities, utilizing cosine similarity as a metric for this purpose. By computing the averages of vectors within each cluster, we derive centroids that encapsulate the central tendencies exhibited by games belonging to that cluster. These centroids are then visually presented in a three-dimensional format, complemented by proportional spheres. These visual representations serve to vividly illustrate the dispersion and influence associated with each cluster. Our research significantly contributes to the understanding of human problem-solving strategies when confronted with computationally challenging problems. It unearths valuable insights regarding the potential for harnessing human intuition and expertise in addressing complex computational challenges. Through the integration of machine learning methodologies and intuitive visualizations, this work advances our comprehension of the approaches individuals employ to excel in solving computationally intricate problems.


I. INTRODUCTION
In contemporary society, technology plays an essential role in daily life, and the consequent generation of vast amounts of data has given rise to the field of Big Data.Through internet-connected sensors in various locations, individuals generate copious amounts of data, which are collated to form Big Data collections [1].These collections can be analyzed with specialized tools designed for Big Data to facilitate the decision-making process.The concept of crowdsourcing has emerged from this idea.Crowdsourcing entails utilizing the skills of a large pool of volunteers to find solutions to problems, the crowdsourcing approach is employed to disseminate specific problems to individuals with particular knowledge or expertise [2], [3], [4], [5], [6].Crowdsourcing is widely applied in many domains, ranging from identifying solutions to previously unsolved problems to creating virtual prototypes or 3-D designs.The objective of crowdsourcing is to identify innovative and high-quality solutions, and it has broad applications.Recent applications include smart cities, genetic variability, and Natural Language Processing research, which leverage tools such as Amazon Mechanical Turk [5], [7], [8].However, the use of crowdsourcing to explore the decision-making processes and strategies that users follow during problem-solving is infrequent.Extracting this knowledge and identifying strategies is of paramount importance if one aims to improve general problem-solving approaches.Such solutions have broad applicability across various contexts.
In a crowdsourcing setting aimed at solving tasks, active participation is crucial, necessitating the need for motivational strategies.Studies have identified various reasons why individuals participate in such platforms, including financial incentives, entertainment, networking, and skill acquisition.Crowdsourcing strategies offer several options to attract participants.In recent times, video games, a popular source of entertainment, have been utilized in crowdsourcing research to draw more people to the platforms.This approach, known as Video game-Crowdsourcing, seeks to provide participants with a sense of fulfillment while engaging in an experiment.Unlike typical video games, Video game-Crowdsourcing collects data from a game to achieve a specific objective; entertaining the participant is not the primary goal.Research has shown that such approaches can yield solutions that are competitive with those of experts, emphasizing the power of crowdsourcing [9].
Video game-crowdsourcing has been applied in various fields, including health, exercise, education, commerce, and environmental behavior, among others [10], [11], [12].The specific type of video game utilized depends on the application and the intended audience.When dealing with tasks involving predictions or a large volume of homogeneous tasks, scoring or leaderboard games are commonly employed.In contrast, more complex mechanisms are utilized for problem-solving or content generation to engage participants.Previous studies on the use of video game-crowdsourcing have predominantly focused on player modeling.Player behavior is a multifaceted concept that encompasses various aspects.One of these is player navigation within a game, which can be analyzed using visual models such as heat maps, the identification of regions where most actions occur, onehot encoding, or traces [13], [14], [15], [16], [17].Other applications of player modeling seek to create human-like agents that can act as either associates or opponents [18], [19], [20], [21].
The focus of our paper is on the development of effective strategies for solving Facility Location, a complex optimization problem with multiple objectives.In this problem, we are given a set of demand centers and must determine the optimal number of facilities to open to meet the overall demand while minimizing the total costs associated with facility opening and transportation [2], [4].Specifically, we seek to improve existing algorithms for solving this problem by prioritizing the development of effective strategies.To achieve this goal, we have designed a video game that crowdsources strategies for Facility Location.In our game, players are presented with a piece of land and a variety of different facility types, and must strategically place facilities in such a way that the overall demand is met while minimizing costs.We reward players with entertainment, and their decisions are stored in realtime, allowing us to reconstruct their matches and analyze their strategies.By focusing on the development of effective strategies for this problem, we hope to improve existing algorithms and ultimately contribute to the development of more efficient and effective solutions for Facility Location Problem (FLP).
In our quest to derive player strategies for addressing the Facility Location Problem (FLP), we have constructed an abstract model that emulates the dynamics of a typical match scenario.This model assigns the player the task of strategically positioning a facility within a two-dimensional space, denoted as S ⊂ R 2 , and bounded by a finite set of demand centers represented in a mesh format.The game is conceived as a sequential sequence of plays, involving facility placement, positional adjustments for subsequent plays, or game termination.Our research methodology has been pivotal in this pursuit, encompassing the utilization of heatmaps and clustering techniques as essential analytical tools.Through data normalization, we effectively categorized games into two primary classifications: the first category comprises games characterized by consistent and uniform player strategies, while the second category encompasses games that exhibit a diverse spectrum of individualized tactical approaches.
Furthermore, our research adopted a systematic approach for representing these games, leveraging cosine similarity as the metric of choice for clustering based on inherent VOLUME 11, 2023 123195 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.similarities.By computing vector averages within each cluster, we derived centroids encapsulating central tendencies observed among games within each respective cluster.These centroids were subsequently presented in a three-dimensional visual format, complemented by proportional spheres, providing insightful representations of the data's structural nuances.
Our approach has led to the identification of strategies utilized by high-scoring players in the FLP.We have observed that some of these strategies resemble those previously proposed in the literature, while others appear to be variations thereof that have not yet been reported.Notably, we have found that matches from players who performed poorly in our Facility Location game exhibit little to no discernible grammar structure when analyzed using our extraction of player strategies.Consequently, these matches contain no clear pattern that could be interpreted as conveying a strategy.Our focus is on the process of identifying strategies from players and improving existing algorithms for solving the problem.
This paper makes the following contributions: 1) We introduce a video game representation for the FLP.Additionally, we extract insights into how video game players solve complex computational problems using machine learning techniques.2) We present an abstract representation of a video game match that allows players to experience a sequence of plays while learning about a multi-objective optimization problem, specifically the FLP.Our approach provides the general public with an opportunity to develop an understanding of multi-objective optimization while playing a video game.Additionally, we extract insights into how players approach and solve complex computational problems using machine learning techniques.3) Our research involves identifying the strategy employed by video game players during matches, including a strategy not reported in the literature.By using crowd computing as a means to collect solutions to complex problems, our approach complements popular methods such as genetic algorithms.
We extract insights into how players approach complex computational problems using machine learning techniques.
The rest of this work is organized as follows: in Section II we present the background needed to address the problem we are focusing on, as well as the data sets used for our experiments.Section III some of the related work.Section IV details the implementation of our video game and provides the match representation for the gameplays provided by diverse video game players.Section VI contains the results we have obtained for extracted players strategies and their discussion.Finally, our conclusions and suggestions for future work are included in Section VII.

II. BACKGROUND
The FLP is a combinatorial optimization problem that arises in operations, distribution, and logistics.An instance of the problem consists of a set of demand centers, which are locations that need to be covered by a service with a varying a degree of severity, or demand.Demand centers are serviced by facilities, at a cost that depends on both the distance from the demand center to its nearest facility, and the fixed cost of operating the facilities.A solution to the problem is a set of locations where facilities will be installed, or opened, so that the total cost is minimized.The locations where facilities can be opened coincide with the locations of demand centers, which means that each demand center may have at most one facility opened within it.
The problem is encountered, for example, by a city's Fire Department for the placement of their facilities, considering the costs of maintaining equipment and firefighters on standby, as well as the risk factors of different areas and the response time given the street layout.Demand centers could be the city's neighborhoods, each with a different demand value (their population).Another example of the FLP is finding an optimal arrangement of networked sensors in a space, aiming to minimize the number of sensors needed while maximizing the coverage area [22], [23].
Formally, the FLP can be defined as three distinct multi-objective optimization problems [24], where: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
facilities and minimizing total distance.This allows them to understand how increasing the number of facilities can impact the overall distance traveled.The optimization problem seeks to identify the optimal solution that minimizes both objective functions while taking into account the potential trade-offs between them.2) Second subproblem: where the optimization problem presented involves two objective functions to be optimized simultaneously.
The first objective function aims to minimize the number of facilities that fail, while the second objective function seeks to maximize the distance after failures have occurred.To ensure that only open facilities are allowed to fail, constraint i is introduced.This constraint restricts the failure of facilities to only those that are currently open.To ensure that the decision variable v is a binary variable, constraint ii is introduced.This ensures that the value of v is either 0 or 1.To reassign demand centers to open facilities, constraint iii is introduced.This constraint ensures that all demand centers are assigned to an open facility, thereby meeting the demand requirements.Finally, constraint iv is introduced to ensure that all demand centers are reassigned, and the total demand is satisfied.
The optimization problem aims to find the optimal solution that simultaneously minimizes the number of facilities that fail and maximizes the distance after failures while satisfying all the introduced constraints.where the optimization problem at hand involves two objective functions that need to be minimized simultaneously.The first objective function seeks to minimize the distance traveled before failure occurs, while the second objective function aims to minimize the distance traveled after failure.

III. RELATED WORKS 1) CROWDSOURCING VIDEO GAMES FOR DIFFERENT COMPLEX PROBLEMS
Video games are widely acknowledged for their ludic characteristics, and annually, individuals allocate a significant amount of time engaging in video gameplay for diverse purposes, including but not limited to activities such as procrastination and the management of stress [25].In the context of crowdsourcing endeavors, video games serve as a medium through which individuals can make contributions to problem-solving tasks without necessitating conscious awareness or deliberate effort.The utilization of video games for crowdsourcing purposes has found application across a spectrum of domains, encompassing areas such as cybersecurity and computer vision.An illustrative instance of this phenomenon is exemplified by the web-based video game known as ''Peekaboom,'' which plays a pivotal role in facilitating the advancement of computer vision algorithms through the aggregation of substantial volumes of data gathered from a considerable cohort of participants.[26].Players have attested to dedicating an average of approximately 12 hours per day to engage in the gameplay of Peekaboom.
Remarkably, Video game-Crowdsourcing has demonstrated remarkable efficacy in amassing extensive datasets, while concurrently serving as a means to procure solutions of commensurate quality when juxtaposed with those generated by domain experts.To illustrate this phenomenon, the Bio-Games platform was employed for the annotation of red blood cell images, whereby participants achieved an error rate of 2% in contrast to expert-derived results, as substantiated in [27] and [28].
Beyond its application in image labeling, Bio-Games has been employed as an instructional tool for medical students, particularly in the context of disease classification training.Furthermore, scientific investigations have harnessed collective human expertise to enhance the efficacy of machine learning models.An exemplar in this regard is the utilization of EyeWire, a video game specifically designed for the purpose of translating three-dimensional reconstructions of retinal neurons into two-dimensional images.[11] The primary goal of the game is to enable players to delineate the trajectory of a neuron by applying coloration to pixels in proximity to the neuron or in novel spatial locations.Subsequently, the game contrasts various solutions provided by players and generates a scoring metric contingent upon the level of consensus among their responses.This video game is strategically devised to facilitate the training of a convolutional neural network, specifically geared toward the task of neural boundary detection.It also serves as a mechanism to guide players' attention towards regions where the algorithm exhibits uncertainty.
An additional utilization of Video game-Crowdsourcing pertains to its role in augmenting algorithmic solutions to complex problems.One noteworthy instance of this approach is its application in addressing the robust Facility Location Problem.In this context, the principal aim is to identify optimal facility locations that minimize costs while concurrently ensuring resilience and robustness in the face of uncertainties.[2], [4].To address the challenge of participant motivation in crowdsourcing applications, various methodologies have been introduced.These include approaches such as Majority Voting, Bag of Lemons, and VOLUME 11, 2023 123197 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 1.
User interface for the model of the FLP video game as presented in [4], [29].
Diverse Bag of Lemons (DBLemons), which aim to discern and distinguish high-quality ideas from those of lesser quality within the crowd.
One illustrative study implementing a cooperative crowdsourcing model, leveraging blockchain technology, pertains to the video game ''Cell Evolution.''In this game, participants are afforded the opportunity to cultivate and nurture a single cell within a dynamic ecosystem comprising millions of individual cells.The findings from this study concluded that assessing the motivational impact on participants presents a formidable challenge, warranting further investigation and refinement of measurement methodologies.

IV. GAME IMPLEMENTATION DETAILS
In our prior scholarly contributions, we have explicated the design and utilization of a video game that serves as a computational model for the Facility Location Problem (FLP) [4], [29].These articles have employed an abstract representation framework for depicting essential elements, including demand centers, facilities, associated weights, and assignments.Importantly, this abstract representation refrains from any direct correlation with real-world entities.For a visual depiction of the video game model, please refer to Figure 1.
The graphical interface for the FLP model is rendered on a two-dimensional white canvas.In this interface, demand centers are depicted as circular nodes, and facilities are represented as black squares inscribed within these circular nodes.The gameplay dynamics revolve around the player's decisions on facility placement within demand centers and the allocation of demand centers to specific facilities.The game structure encompasses a rapid sequence of levels, encompassing three distinct stages, including instructional tutorials and the incorporation of actual data sets.
It is pivotal to underscore that the design philosophy underlying these articles centers on ensuring an enjoyable and rewarding user experience.This design philosophy eschews the frustration associated with being ensnared in levels that may appear insurmountable.To validate and refine the gaming experience, extensive playtesting sessions were conducted, and the resulting gameplay data was meticulously stored on the server.In summation, the game's design and mechanics are meticulously crafted to tackle the FLP problem in an engaging and enjoyable manner, amenable to both playtesting and further development.

A. MATCH REPRESENTATION
The task of deducing a player's strategic decision-making process within a video game presents a formidable challenge due to the intricate interplay of factors, including the high-dimensional nature of the action space and the complex dynamics inherent in the game.The player's strategy is essentially an expression of the sequence of actions undertaken during the game, constituting a functional mapping from the game's state to a probability distribution across potential actions.However, the sheer size of the action space renders the exploration of the strategy space infeasible through exhaustive means.
To surmount this challenge, a common approach involves abstracting the game's representation, thereby diminishing its inherent complexity and facilitating the exploration of the strategy space.Abstraction, in this context, entails the selective removal of superfluous or redundant game details, resulting in a higher-level representation that enables a more efficient exploration of the strategy space.Moreover, this abstraction process facilitates the discovery of strategies that are both more broadly applicable and readily interpretable.
Abstraction techniques involve the reduction of dimensionality within both the state and action spaces.This is achieved by grouping similar states and actions, thereby consolidating them into single abstract entities.Consequently, this results in a more compact and manageable data representation, amenable to the application of reinforcement learning or other machine learning methodologies for the acquisition of optimal strategies.
We posit that the core of the player's strategy can be encoded within the sequence of facility placements and, possibly, the orientations chosen within the game board.Consequently, further abstraction from this sequence is contemplated.This sequence encompasses actions related to the addition, removal, and connection of facilities and demand centers, contingent upon their respective attributes such as weight or population.
In the context of gameplay analysis, each individual play within the game is subjected to a comprehensive examination, where the following parameters are scrutinized: • Node IDs: These are unique numerical identifiers assigned to each distinct node within the game environment.
• Node Weights: These numerical values serve as potential indicators of the significance or importance associated with a given node within the game.
• Revisits: This metric quantifies the frequency with which a player revisits nodes that have previously been traversed during the course of gameplay • Time Intervals: This temporal metric records the duration of time that elapses between successive actions or plays.Data preprocessing procedures are meticulously executed to prepare the collected gameplay data for subsequent analysis.These procedures encompass the following steps: • Determination of Revisitation Counts: The frequency of revisits to individual nodes during each play is tabulated, yielding insights into the revisitation patterns exhibited by players.
• Calculation of Average Node Weight: For each play, the average weight of the active nodes is computed.This measure provides a summary of the collective weight associated with nodes that are actively engaged with during gameplay.

V. EXPERIMENTAL RESULTS
The experimental methodology employed to extract successful strategies employed by skilled players following the resolution of complex problems, such as the Facility Location Problem (FLP), involves a multi-step process.Firstly, players are expected to have a high level of expertise in problem-solving and are chosen to ensure the quality of the data collected.Next, the participants are asked to solve a set of FLP instances of varying difficulty levels, with the aim of extracting or determine algorithm of their strategy.
During the problem-solving process, the participants' actions and decisions are recorded using various methods, such as our own crowd computing platform.
After the problem-solving process is completed, the collected data is analyzed to identify successful strategies employed by the skilled players.This analysis involves identifying the most common actions and decisions made by the players, as well as the reasoning behind these actions.Additionally, machine learning techniques such as clustering and classification algorithms may be employed to identify patterns in the data and extract successful strategies.The performance of the skilled players is evaluated using various metrics, such as the number of facilities opened and the distance to serve a demand center, and compared against other Pareto fronts to determine the effectiveness of the identified strategies.
Overall, this experimental methodology is designed to identify successful strategies employed by skilled players when solving complex problems, such as FLP.The combination of data collection, analysis, and validation techniques allows for a comprehensive understanding of the problem-solving process and enables the identification of effective strategies that can be used to improve problemsolving performance.

A. EXPERIMENTAL SETUP
Figure 2 provides an illustrative depiction of the experimental framework adopted for the purpose of elucidating effective strategies employed by expert players in resolving the Facility Location Problem (FLP).The approach entailed the creation of a video game, which was subsequently disseminated among a diverse player base employing a crowd-sourcing paradigm.Subsequently, an exhaustive analysis was conducted on the gameplay data, employing both conventional manual methodologies and machine learning techniques for classification, as expounded upon in the subsequent sections.
The experimental methodology, as delineated in Figure 2, comprises three fundamental stages.In the initial phase, we initiated data acquisition by establishing a tailored crowd computing platform.This platform served as the conduit through which participation was sought from a predetermined and quantifiable cohort, comprising 51 players, who engaged in the FLP-solving task via the medium of crowdsourcing.It is noteworthy that prior research endeavors have demonstrated the complementary nature of crowdsourcing when employed in conjunction with advanced techniques, such as genetic algorithms [2], [4].
The second stage of the experimentation endeavors to classify the optimal moves derived by players within our video game, specifically identifying those moves that yield the highest scores.The classification task is complex when performed by comparing Pareto fronts.To address this challenge, we employed the hypervolume as a manual classification method.
The third and final stage of the process involves the extraction of strategies deployed by players who actively engaged in the crowdsourcing aspect of the video game.Initially, we applied the K-means machine learning technique to cluster demand centers effectively.This utilization of the K-means technique enabled us to establish coherent clusters VOLUME 11, 2023 123199 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
delineating regions of interest within the overarching FLP problem.Given that Pareto fronts often encompass a substantial number of solutions distributed throughout a high-dimensional space, the direct visualization and comparison of these fronts present formidable challenges.The endeavor to compare two or more Pareto fronts holistically demands meticulous consideration and the utilization of appropriate analytical tools to facilitate a meaningful and insightful comparative analysis.As such, the visual assessment of multiple Pareto fronts presents a formidable challenge due to the intricate interplay of objectives and their multifaceted relationships, which necessitate more sophisticated analytical approaches.

B. MANUAL CLASSIFICATION OF GAMEPLAYS
To address these inherent challenges, several methodologies have been introduced for the purpose of Pareto front comparison.These approaches include the utilization of performance metrics designed to quantitatively assess the disparities between fronts, as well as the application of clustering techniques to categorize similar solutions.Such methods offer more objective and systematic means of comparing Pareto fronts, thereby enabling the identification of the most promising solutions.
In light of these considerations, we have chosen to employ the hypervolume as a robust performance metric.This metric allows us to systematically evaluate and compare playerbased solutions, thereby facilitating the identification of the most favorable outcomes in the context of our analysis.

A. PARETO SET COMPARISON
As previously discussed, the comparison of distinct Pareto sets poses inherent challenges attributable to variations in size, shape, and dispersion across these sets.Nevertheless, the application of the hypervolume metric as a performance indicator offers a valuable means of addressing these complexities.The hypervolume metric quantifies the volume of the multidimensional space enclosed by the Pareto set generated through an algorithmic process, taking into consideration both the overall spread and configuration of the set.Furthermore, the incorporation of a reference point in the hypervolume calculation introduces a standardized reference frame for comparisons among different sets.This reference point serves as a common benchmark against which algorithmic performance can be systematically evaluated.
The utilization of the hypervolume metric, therefore, serves to alleviate the challenges inherent in comparing disparate Pareto sets, offering a more objective and standardized approach to assessing algorithmic performance.
The hypervolume metric serves as a widely utilized measure for the assessment of algorithm performance in the context of generating Pareto-compliant solutions.It is crucial to underscore in such evaluations that the cumulative distance should invariably diminish as the number of open facilities increases.This phenomenon stems from the fact that the optimal solution for a problem involving n open facilities, upon the inclusion of any additional open facility, will inherently yield a reduced total distance for the n + 1 scenario.Consequently, the hypervolume is rigorously defined as the volume encompassed by the space covered by an approximation to the Pareto frontier, as generated by a specific algorithm.To facilitate this calculation, a reference point is employed to demarcate the region for each objective function.A larger hypervolume value signifies superior convergence and a more comprehensive coverage of the optimal Pareto frontier by the algorithm under evaluation.This metric provides an invaluable quantitative assessment of an algorithm's effectiveness in generating Pareto-compliant solutions, particularly regarding its convergence and solution space coverage capabilities.Fig. 4 presents the hypervolume obtained for the Pareto front generated by the crowdsourcing technique after n number of video gamers solved the FLP problem.A higher hypervolume value implies better performance achieved by a video gamer, as demonstrated in Fig. 4, where several players obtained high hypervolume values.Our goal is to identify Pareto fronts with the highest hypervolume value, as players are implicitly employing some efficient algorithmic strategies.This article aims to extract such strategies used for solving computationally complex problems.Fig. 4 depicts the performance indicator of hypervolume for one of the manually selected best plays.Our interest lies in identifying the highest scores achieved by players, which we define as plays with a hypervolume value of at least 19 units.Based on this criterion, the best plays, as observed in Fig. 4, were achieved by players 2, 8, 15, 17, 21, 24, 29, 30, and 31 (with player 21 having the highest performance among all players).These scores were obtained from a pool of 50 players, but some players did not complete the solution space, resulting in incomplete data that were subsequently discarded.From the remaining complete data, comprising 31 plays (which may not necessarily be the best), we utilized the hypervolume performance indicator to filter out plays with lower performance levels.This approach allowed us to identify the best plays based on their hypervolume values.Additionally, the hypervolume performance indicator is a widely accepted metric for evaluating the quality of multi-objective optimization results, making it a suitable choice for our analysis.
Overall, our selection of the best plays is justified based on the combination of manual selection and the rigorous  evaluation using the hypervolume performance indicator.The findings provide valuable insights into the performance of the players and their ability to achieve high-quality solutions in the context of the study.Further analysis and validation could be conducted to confirm the robustness and generalizability of these results.Our approach ensures a rigorous and systematic evaluation of player performance and facilitates the identification of the top-performing plays for further analysis and interpretation.

B. MACHINE LEARNING FOR PLAYERS' STRATEGIES EXTRACTION AFTER SOLVING THE FLP
To elucidate the problem-solving strategies employed by video game players when confronted with computationally   challenging tasks, it became imperative to establish a method for representing player actions in a vectorial format.This necessitated the development of a systematic approach for encoding the direction of player movement or the spatial allocation of their responses within the game environment.In the context of addressing the Facility Location Problem (FLP), players are tasked with the placement of facilities within designated demand centers.
Upon thorough visual examination, a notable pattern emerged wherein players did not engage in random facility placement; instead, they exhibited a deliberate strategy aimed at maximizing demand center coverage while minimizing the number of facilities deployed.This implied that players leveraged visual cues to select optimal facility locations, strategically positioning them to reduce travel distances.This strategic behavior bore a resemblance to the operation of genetic algorithms, which routinely explore the entire solution space by clearing previous configurations to identify optimal combinations.
Conversely, players adopted a more adaptive approach, often retaining certain parameters from prior configurations in subsequent moves and allocating one or more facilities across distinct demand centers.Given these discerned behavioral traits, the necessity arose to abstract the expansive search space.One viable avenue for achieving this was the application of machine learning techniques to cluster demand centers and subsequently derive player movement patterns based on their interactions with these identified clusters.As shown in Fig. 5 we have found clusters 10 running K-means model.
The advantages identified in the context of strategies extraction render it a pertinent problem of interest to both scientific researchers and organizations.
1) Increased efficiency: Successful strategies can help organizations streamline their operations, reducing costs, and increasing productivity.Moreover, these strategies can be leveraged in algorithm design or as the basis for machine learning models, leading to improved accuracy and efficiency in problem-solving.2) Improved decision-making: By extracting successful strategies employed by skilled players, organizations can gain insight into the problem-solving process, leading to improved decision-making.This can help to optimize resource allocation, minimize risk, and improve overall performance.3) Generalizability: Strategies extraction can help to identify successful problem-solving approaches that can be applied to a wide range of problem instances, increasing generalizability.This can help organizations respond more effectively to new and changing environments, reducing uncertainty and improving performance.4) Incorporation of human expertise: A human/computer algorithm can leverage the strengths of both humans and machines.Humans can provide domain-specific knowledge and expertise, while machines can perform computations at a scale and speed that is impossible for humans.Together, this can lead to improved accuracy and efficiency in problem-solving.5) Flexibility and adaptability: A human/computer algorithm can be more flexible and adaptable than a purely mechanical approach.Humans can adjust their approach based on new information or unexpected circumstances, while machines are limited by their programmed algorithms.This can lead to more effective and efficient problem-solving in dynamic environments.6) Improved performance: By identifying successful strategies, machine learning can help to improve the performance of both human and machine problem solvers.These strategies can be incorporated into algorithm design or used to train machine learning models, leading to more accurate and efficient problem-solving.In general, the strategies that were found after carefully extracting and analyzing the moves made by users, based on their gameplay, were as follows: 1) Video game players attack the multi-objective optimization problem (FLP) by first placing facilities in a central zone, measuring the benefit obtained.
If the benefit exceeds a threshold (after making several central placements), then the player moves towards the edges of the game (which becomes a new center but at the far upper or downer right or its opposite upper or downer left ends of the plane).Finally, upon exceeding the threshold within the new centers, players jump to the opposite extreme.Ultimately, this type of strategy is considered good as it yields a Pareto front with a high value performance indicator or a large hypervolume.
a) The strategy described can be seen as a heuristic approach to multi-objective optimization, specifically a variant of the so-called ''hill-climbing'' algorithm.The players start by making placements in a central zone, which can be seen as an initial solution.They then evaluate the performance of the solution and use it as a reference to improve their placement strategy.The threshold value represents a trade-off between the proximity of the facilities to the center and the overall benefit obtained, which is a typical characteristic of multi-objective optimization problems.
As the players move towards the edges of the game, they explore new regions of the search space, which can potentially yield better solutions.This can be seen as a form of local search, which is a common technique used in many optimization algorithms.Finally, by jumping to the opposite extreme, the players can further explore the search space and potentially find new and better solutions.When using K-Means with n = 10 clusters, we have observed that the patterns followed by the top-performing players shown as moves are very similar to those with lower scores.In other words, there is not a significant difference in how the best players transition between clusters compared to players with slightly lower scores.Additionally, it is worth noting that the highest scores do not differ significantly in terms of the calculated hypervolume value.Therefore, it is evident that the clusterings are similar.It is important to emphasize that clustering into larger clusters would likely fall into the NP-hard solution of the original problem.Hence, a more robust approach is needed to uncover the strategies employed by the players.
Using these clusters, we could identify how the best players solved the multi-objective optimization problem.This approach enabled us to establish patterns, such as C4C4C1C2C1C3C6C2C4C1, which indicates that within the first ten moves made by player X, they follow a winning strategy.Alternatively, we can discard a move if it does not follow a particular pattern.This enabled us to track how demand centers were being filled with facilities, as well as identify patterns and strategies employed by the top-performing players in solving computationally challenging problems such as the FLP.By following this approach, we were able to successfully extract the strategies employed by skilled players in solving complex problems, providing insights into their decision-making processes and potentially informing the development of more effective algorithms.
Notwithstanding the advantages observed during the application of K-means clustering for the purpose of grouping potential areas of interest, with the overarching goal of elucidating how players approach complex problemsolving tasks, our initial endeavor did not yield unequivocal patterns.This outcome primarily stemmed from the notable similarity in player moves along the Pareto front, a phenomenon substantiated by the hypervolume values acquired.Consequently, it was reasonable to anticipate analogous transitions within the clusters, thus impeding the identification of a discernible overarching strategy guiding players' actions.
In consideration of these findings, we opted to introduce a novel dimension into our analysis, namely, the incorporation of demand center weights.These weights encapsulate the population density at specific geographical locations, with higher weight values indicative of heightened population concentration in the corresponding locales.
In the subsequent section, we expound upon the revised approach that we undertook to uncover the patterns employed by players.While retaining the utilization of K-means clustering, we judiciously reduced the number of clusters and concurrently integrated the weight attributes associated with demand centers into our analytical framework.

C. USING K-MEANS WITH ELBOW METHOD AND COSINE SIMILARITY FOR PLAYER'S STRATEGIES DIFFERENTIATION
Our method harnesses machine learning techniques to extract meaningful patterns from expansive datasets.A pivotal aspect of our approach is the clustering of game plays based on several distinct features.Notably, we examine the frequency of node revisits in each play, the calculated average node weight, and the intervals recorded between consecutive game plays.
Transitioning to the clustering phase, the K-means algorithm is central to our approach.It partitions game plays into well-defined clusters, where each cluster represents game plays with shared attributes.For our current analysis, we used three clusters, a decision informed by the Elbow Method.
The ''Elbow Method'' constitutes a structured procedure employed to ascertain the most appropriate number of clusters within a given dataset.This method involves an iterative process wherein K-means clustering is applied with varying numbers of clusters, and the associated within-cluster sum of squares (WCSS) is computed at each iteration.The WCSS serves as a metric for quantifying the dispersion or variability observed within each cluster.As the number of VOLUME 11, 2023 123203 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
clusters increases, the WCSS generally exhibits a decreasing trend, reflecting the heightened concentration of data points within each cluster.However, there exists a pivotal point at which further increments in the number of clusters cease to yield a substantial reduction in WCSS.This inflection point, often resembling an ''elbow'' in the plot of WCSS against the number of clusters, signifies the optimal number of clusters.It is regarded as the point of equilibrium between minimizing WCSS and averting excessive fragmentation of data points, ultimately resulting in a coherent and interpretable clustering of the dataset.While this method helps determine the optimal number of clusters by identifying a point where adding more clusters doesn't provide significant betterment in the within-cluster sum of squares, it's worth noting that future analyses might consider different estimation methods or vary the number of centroids.As our understanding of the data deepens, and as the dataset evolves, experimentation with different numbers of nodes or centroids could provide further insights.
To conduct a comprehensive analysis of our dataset, we implemented a series of methodological steps.Firstly, we undertook a vectorial preparation process that incorporated the weight associated with demand centers into our dataset.Subsequently, we conducted a tally of instances in which a node was revisited during each play.Furthermore, we computed the average weight of nodes that remained active during each play.Lastly, we recorded and assessed the elapsed time between each play, recognizing its relevance as a temporal factor in our research context.
For the purpose of extracting and analyzing inherent patterns within the plays, we made the methodological choice of utilizing the cosine similarity metric.This decision was grounded in its capacity to quantify relationships and similarities among the data within a vectorial space.Consequently, we were able to uncover patterns and relationships among the plays, facilitating a deeper understanding of the underlying strategies in players' behavior.This analytical approach aided us in identifying meaningful trends and similarities within the collected dataset.Following the clustering, we leverage the cosine similarity metric to gauge the similarity between different game plays.Cosine similarity operates in a multidimensional space and offers a nuanced measure of how alike two game plays are, based on their respective features.The strength of cosine similarity lies in its ability to handle highdimensional data, offering a relative measure of orientation rather than magnitude.This makes it particularly apt for our use case.As our clustering evolves, and if future K-means analyses result in more centroids, cosine similarity can easily adapt to the higher dimensionality, ensuring consistent and reliable comparisons.
In pursuit of enhancing the robustness and reliability of our analytical procedures, we initiated a rigorous process for identifying and subsequently eliminating outliers, which are characterized by being extreme values that could potentially introduce bias or distortion into our results.In the context of our specific investigation, we have made the decision to exclude games with numerical identifiers 1 and 7, as their inclusion could potentially impact the integrity of our analysis.
Following the outlier removal step, we further refined our dataset through a procedure commonly referred to as ''data normalization.''The rationale behind data normalization is rooted in the need to standardize the range of similarity measurements within our dataset, rendering them more amenable to meaningful comparison and analysis.By normalizing the data, we ensure that each similarity measurement is scaled proportionally to a consistent range, thus mitigating the influence of differing magnitudes and enabling more reliable and interpretable comparisons.This normalization process is instrumental in achieving a more equitable and valid evaluation of our dataset, ultimately contributing to the quality and credibility of our analytical outcomes.The insights derived from the normalized cosine similarity are then visualized through a heatmap, offering a concise representation of the relationships and patterns within the dataset.
Our method transforms raw data into actionable insights, using a combination of preprocessing, clustering, and similarity measurement.The flexibility of our approach allows for adaptations and refinements, accommodating future changes in data or analytical techniques.As with any analytical method, it's essential to ensure data alignment with expected formats and to remain open to iterative refinements based on emerging insights or changing dataset characteristics.
Fig. 6 displays a distinction between two primary types of games.When the heatmap is viewed with normalized results, these two categories become clearly defined, providing a stark contrast in the strategies employed by players.
The first category, evident in the normalized heatmap, suggests games where strategies are consistent among the player base.This uniformity implies a dominant or popular strategy that many players gravitate towards.Such games may have well-defined paths to success or may be more intuitive for a majority of players.
Conversely, the second category appears more varied and encompasses games where players adopt a diverse range of tactics.The strategies here seem to be more individualized, suggesting that these games offer multiple avenues of success, each equally viable.Players in these games might be drawing from personal experiences or experimenting with unconventional methods.
However, it's worth noting that when viewing the heatmap with raw results, distinguishing between these two categories becomes more challenging.The nuances between the dominant and diverse strategy games are subtler, emphasizing the importance of normalization in bringing out these patterns.Fig. 7 presents a methodical representation of games clustered based on their inherent similarities.Leveraging cosine similarity, a metric that measures the cosine of the angle between two non-zero vectors, provides an effective way to gauge the similarity between high-dimensional data points.In this context, each game is represented as a vector in a multidimensional space, and the cosine similarity captures the cosine of the angle between these vectors.The closer the value is to 1, the more similar the games are in terms of their features.
By averaging the vectors within each cluster, we obtain centroids that encapsulate the central tendencies of the games in that cluster, providing a reduced-dimensional representation.The 3D representation further assists in visualizing the relative positions of these centroids.The addition of spheres, sized to 1/3 of the distance from each centroid to the origin, mathematically delineates the spread and influence of each cluster.
The choice of setting the sphere's radius to 1  3 of the distance from the centroid to the origin is mathematically significant.Given the centroid coordinates (x, y, z), the Euclidean distance from the origin to the centroid is given by By selecting the sphere's radius as r = d 3 , we ensure that the sphere's volume encapsulates approximately 1  3 of the space between the centroid and the origin along each axis.This choice provides a proportional representation of the cluster's influence in the 3D space.
Furthermore, positioning the sphere's center to 2 3 of the centroid coordinates ensures that the sphere's boundary touches the midpoint of the line connecting the centroid to the origin along each axis.Mathematically, this symmetrically divides the space into two equal parts, visually emphasizing the balanced influence of the centroid in its vicinity.

VII. CONCLUSION
Our analysis, through advanced techniques such as k-means clustering and cosine similarity, unveils distinct groupings among games.The judicious choice of visualization parameters offers an accurate representation of each cluster's influence.This research provides valuable insights for game developers and analysts.
In the dynamic realm of video games, understanding player behavior is paramount.How players approach challenges, make decisions, and develop strategies provides a wealth of information that can be harnessed for game design, player engagement, and even in broader applications such as algorithm development.This section delves deep into the strategies employed by players in the context of the Facility Location Problem (FLP).
Through a meticulous analysis combining Pareto set comparisons and machine learning techniques, we aim to uncover the underlying patterns, strategies, and behaviors exhibited by players.These insights not only shed light on player psychology and decision-making processes but also offer avenues for enhancing game mechanics and optimizing player engagement in future iterations.
Our application of k-means clustering, in conjunction with cosine similarity, has yielded significant insights into the underlying structures present within the game dataset.The identified clusters suggest inherent patterns and relationships among games that might not be immediately apparent through traditional analysis methods.which quantifies the volume of the multidimensional space enclosed by a Pareto set, serves as an effective tool to navigate the inherent complexities of comparing distinct Pareto sets.By incorporating a standardized reference point, it provides an objective measure for algorithmic performance evaluation.

A. CONCLUSIONS FROM PARETO SET COMPARISON
The evaluation reveals that the hypervolume diminishes as the number of open facilities increases, underscoring its utility in assessing convergence and solution space coverage capabilities.Specifically, in the context of the Facility Location Problem (FLP) solved through a crowdsourcing approach by video gamers, high hypervolume values were indicative of superior player performance.
We observed that the combination of manual selection and hypervolume-based evaluation provided a systematic approach to discern top-performing plays, underscoring the players' ability to derive high-quality solutions.This methodological approach promises further avenues for in-depth analysis, interpretation, and validation of player strategies in problem-solving.

B. CONCLUSIONS FROM MACHINE LEARNING ANALYSIS FOR PLAYER STRATEGIES
Fig. 5 provides a visualization of demand centers clustered K-means with n = 10.In our quest to decode the strategies video game players deploy to solve the FLP, we observed deliberate placement patterns, suggestive of strategic positioning rather than random allocation.These patterns are reminiscent of genetic algorithms, where configurations are iteratively improved, but with players showing adaptability by retaining beneficial past configurations.
The strategies, as discerned, indicate players often begin by placing facilities centrally, and upon surpassing a performance threshold, they expand towards the game's periphery.This heuristic approach mirrors the ''hill-climbing'' optimization algorithm, suggesting players utilize a localized search strategy combined with broader explorations for better solutions.
However, despite the promise of the K-means clustering, Fig. 5 revealed a convergence of strategies among both top-performing and moderately performing players.This similarity in approach, corroborated by analogous hypervolume values, signals the need for a more refined analysis method.To this end, we incorporated demand center weights, representing population densities, offering a richer dimension for understanding player strategies.
In is observed that, while K-means clustering offered initial insights, the nuanced strategies employed by players necessitated a deeper, multi-faceted approach for a comprehensive understanding.The findings provide a foundation for future algorithmic enhancements and game strategy optimizations.

C. IMPLICATIONS FROM K-MEANS AND COSINE SIMILARITY ANALYSIS
The distinct groupings derived from k-means clustering, refined by cosine similarity measurements, indicate that games can be grouped based on nuanced features.The clusters, representing games with shared characteristics, can be interpreted as genres or sub-genres within the gaming landscape.The centroids of these clusters, when visualized, provided a spatial representation of these groupings, emphasizing their relative distances and relationships.
The strength and cohesion within clusters and the separation between them highlight the efficacy of our approach.This suggests that game developers and analysts can leverage such methodologies to understand player preferences, streamline game development strategies, and potentially predict future gaming trends.

VIII. FUTURE WORK
The analyses undertaken in this study have illuminated various facets of player behavior and strategy formation, particularly in the context of the Facility Location Problem (FLP).However, several avenues remain to be explored, and the findings present opportunities for further investigation: • Expanded Dataset: Our current analysis is based on a limited dataset.Future work can incorporate data from a broader player base, spanning different skill levels and demographics, to provide a more comprehensive understanding of player strategies.
• Advanced Machine Learning Models: While K-means clustering provided initial insights, the employment of more sophisticated machine learning models, such as neural networks or ensemble methods, might yield deeper patterns or even predictive capabilities about player moves.
• Real-time Strategy Adaptation: Leveraging the findings, we can develop real-time adaptive game mechanics that adjust based on player behavior, offering challenges tailored to individual player strategies.
• Incorporation of External Factors: Player strategies could be influenced by external factors, such as game aesthetics, music, or even time constraints.A holistic analysis considering these parameters could offer richer insights.
• Behavioral Studies: Complementing the quantitative analysis with qualitative behavioral studies can provide context to the strategies employed, potentially uncovering the cognitive processes behind player decisions.
• Application to Other Games: The methodologies employed here can be adapted and applied to other video games or computational problems, broadening the scope of understanding player behavior across different gaming environments.
Conclusively, while our study has laid foundational groundwork, the domain of understanding player behavior and strategies in video games is vast and ripe for further exploration.Future endeavors in this direction hold the promise of not only enhancing gaming experiences but also contributing to fields like optimization, algorithm design, and behavioral psychology.
123206 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Figure 3
Figure 3 illustrates the challenge of visually comparing multiple Pareto fronts simultaneously.The complexity and multidimensionality inherent in Pareto optimal solutions render this comparison task particularly demanding.Pareto frontiers serve as representations of the trade-off between conflicting objectives, comprising a collection of non-dominated solutions that cannot be enhanced in one objective without compromising performance in another.Given that Pareto fronts often encompass a substantial number of solutions distributed throughout a high-dimensional space, the direct visualization and comparison of these fronts present formidable challenges.The endeavor to compare two or more Pareto fronts holistically demands meticulous consideration and the utilization of appropriate analytical tools to facilitate a meaningful and insightful comparative analysis.As such, the visual assessment of multiple Pareto fronts presents a formidable challenge due to the intricate interplay of objectives and their multifaceted relationships, which necessitate more sophisticated analytical approaches.To address these inherent challenges, several methodologies have been introduced for the purpose of Pareto front comparison.These approaches include the utilization of performance metrics designed to quantitatively assess the disparities between fronts, as well as the application of clustering techniques to categorize similar solutions.Such methods offer more objective and systematic means of comparing Pareto fronts, thereby enabling the identification of the most promising solutions.In light of these considerations, we have chosen to employ the hypervolume as a robust performance metric.This metric allows us to systematically evaluate and compare playerbased solutions, thereby facilitating the identification of the most favorable outcomes in the context of our analysis.

FIGURE 2 .
FIGURE 2. Experimental setup for extracting players strategies for solving the FLP.

FIGURE 3 .
FIGURE 3. Comparison between different Pareto fronts obtained through the crowdsourcing-based approach for the best scores obtained by video game players.

VOLUME 11, 2023 123201
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 4 .
FIGURE 4. Computed hypervolume after solving the Facility Location Problem (FLP) by using the crowdsourcing approach for Swain dataset.

FIGURE 5 .
FIGURE 5. Clusters for the FLP, using Swain dataset, after running the K-means using n = 10.

FIGURE 6 .
FIGURE 6. Heatmap of the cosine similarity for the game comparison of strategies K-means using n = 3.

FIGURE 7 .
FIGURE 7. Mathematical visualization of game clusters based on averaged features in a 3D space.

Fig. 3 and
Fig.3and Fig.4present the evaluation of different Pareto fronts using the hypervolume metric.This metric,

Binary decision variable that indicates whether facility i is open x ij : Binary decision variable that indicates whether location j is assigned to facility i u ij : Binary decision variable that indicates whether location j is assigned to facility i after failures have occurred
all demand centers F: Set of all candidate facilities G: Set of open facilities W: Set of open facilities that did not fail P(W): Power set of W f i : Cost of opening a facility at location i c ij : Cost of assigning demand center j to facility i y i :