A Modified Evolutionary Algorithm for Generating the Cycling Training Routes

We present a novel method for generating cycling training routes from geographical property graphs based on an Evolutionary Algorithm. The algorithm operators of crossover and mutation are adjusted for use in the Property Graph domain. Data fusion of geographical data from the OpenStreetMap, EU-DEM digital surface model, and existing training records is performed as a basis of the intersections-paths property graph. The proposed approach allows route generation based on their starting and ending points in the property graph and their distance and ascent. A property graph of all intersections and cycling roads is shown and generated for the regions of Podravje and Pomurje. The property graph used in the proposed algorithm’s feasibility demonstration is shown. This is done by presenting four different cases of routes generated with our algorithm. The algorithm allows for generating classic cycling routes of A to B nature, routes with more than two fixed points, and cyclical training routes. The research is concluded by offering further directions on route generation research.


I. INTRODUCTION
Our lifestyles are changing and constantly evolving in modern society. The annual working hours have decreased gradually in the OECD countries and have dropped by ≈11 % since the 1970s till now [1]. This additional free time has enabled us to pursue additional time for leisure. This time has been spent not only on entertainment, using Information and Communications Technology, but also on sports. According to the European Commission [2] 42.8 % of Europeans Union citizens perform sports activities at least once weekly.
Sports remain an integral part of daily human lives. People have engaged n them on both amateur and professional levels. Nowadays, each professional sport has organizations that define rules and organize competitions. Athletes and teams are listed in leagues, scoreboards, divisions, and tournaments to promote competitiveness [3]. The goal of each team or athlete is to win and be the best. To do this, the athlete must train and improve his performance.
The associate editor coordinating the review of this manuscript and approving it for publication was Gang Mei .
Sports training can be organized into four distinct interconnected phases, namely, planning, realization, control, and evaluation, during which the athlete is monitored closely by his coach, and adjustments are made at each iteration of sports training to reach his highest potential [4]. Because competitiveness in sports is high, training is an essential part of an athlete's routine. The training has evolved with the individual sports, where each improvement to the training routine can change the outcome.
One of the advances in sports training has been due to the inception of digital technologies and approaches in every step of sports training. Such approaches, broadly characterized as Smart Sports Training, have been found in at least 32 different sports [5]. Sports training can be a highly organized routine between an athlete and a coach. One of the primary roles is facilitation, supervision, and monitoring of the training session to achieve the set objectives [6]. Since coaches need to be paid, they provide additional costs to the athlete, who may find this a barrier to progressing within his discipline. This is particularly true for amateur athletes, who do not receive any monetary compensation for their endeavors, making it an VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ even more significant issue to pay for coaches. Such issues will be easily solvable in the future by incorporating Artificial Sports Trainers into the training process. Such a solution can provide a trainer to athletes who do not have access to one and also aid in the decision-making process of real trainers [7]. While sports such as athletics, football, volleyball, and soccer are performed and trained mainly in gyms and stadiums, which can be heavily monitored and predictable environments, sports such as cycling and running are trained primarily in the real-world outdoor environments of public roads, tracks, and paths. On the one hand, this can be a problem since it is much harder to plan individual training in detail if the environment is constantly changing, is not monitored as closely, and is much more unknown to the trainee and the trainer. On the other hand, this is a vast opportunity for development and research on training generation methods that can provide constantly changing and challenging training routines that will not be seen as repetitive to the trainee.
Sports training is a constantly evolving research field, where long-term success is measured in the competition results of athletes training under different training regimes/approaches. The existing approaches to training planning, as identified in a literature review [5], are related mainly to using historical training data of athletes (e.g. [8], [9], [10]) to improve their results. Another thing familiar with such approaches is that the algorithms and methods present the athlete with a spreadsheet of parameters he / she should achieve to ensure better results and progress. In contrast, this is a straightforward and quickly followed plan in sports based on repetitive routines (e.g., running, athletics, weight lifting). This approach has proved severely lacking in sports such as cycling. Even if the cyclist is presented with a spreadsheet training plan, he / she must know how and where to achieve it. The physical parameters of heart rate, cadence, and speed that the cyclist must follow in a training session must also be combined with the environmental parameters of distance, terrain, and altitude. Moreover, while approaches for generating the raw numbers of what the training should look like already exist and are used widely, the environment requirements are still an unknown domain.
To generate cycling training of high quality, the existing approaches for training generation, notably those proposing the environment and physical parameters, must be used as inputs for a computational intelligence method incorporating the real-world environment's geographical data to generate actual training routes.
An approach that allows an extensive and diverse set of data is needed to solve the identified problem and generate specialized and discrete training routes. Evolutionary Algorithms are the most appropriate for the task. They allow generating solutions to complex real-world problems, where it is impossible to use heuristic solutions [11]. Another constraint in the generation of routes is that the problem can be multiobjective. The input parameters of the route generation ascent and distance often conflict with each other, and improving one parameter may deteriorate the other; this is another field where Evolutionary Algorithms are highly suitable [12].
This previously unaddressed need for generating actual training routes is the key focus of our paper. We solve this problem by proposing a novel Evolutionary Algorithm based method for training track generation in the sport of cycling by combining different sources of real-world geographical data.
The contributions of this paper are: • proposed data fusion method of geographical data-based property graph and previous training knowledge • comparison and analysis of existing approaches for cycling training generation • an evolutionary algorithm-based method for generating sports training tracks in cycling • practical evaluation of the proposed algorithm. The paper is structured as follows, Section II presents existing solutions for the generation of cycling training routes, Section III presents the data collection and fusion for the generation of a property graph and the novel Evolutionary Algorithm for the generation of cycling training routes. Section IV presents the results and basic optimization of the input parameters for the proposed algorithm. Section V discusses the domain of route generation and places our approach in the domain. The final Section VI summarizes our findings and suggests future research in the field of Cycling Training Generation.

II. GENERATING CYCLING TRAINING SESSIONS AND ROUTES
Computational intelligence, meaning Artificial Neural Networks, Evolutionary Algorithms, Fuzzy Systems, and Bayesian Networks, has been used heavily in generating cycling training and routes. It has been adapted heavily to solving and finding approximate solutions to problems that are difficult to solve in other ways [13].
Automatizing and enriching cycling training by the use of Computational Intelligence can be divided roughly into two parts, as presented in this section. The first part of this section is concerned with the generation of training. In this approach, previous training may be evaluated to propose improvements in the future, and optimum training strategies may be proposed. This can concern the physiological parameters of the athlete, such as data collected by power meters, heart-rate sensors, personal feelings, and environment recordings such as distance traveled, speed, ascent, descent, and total duration, as seen in Fig. 1. This is a lot of data that can typically be recorded from a training session and can be predetermined precisely by the training plan. By selecting a suitable route the cyclist must travel on, we can accurately predict and adjust the distance, ascent, and descent the cyclist will cover in a training session.
Sports training may include the generation of actual routes that the cyclist travels and training parameters such as target heart rate, cadence, and speed. The first part of this section presents approaches to sports training generation that include mainly only the raw target parameters of the athletes. In the second part of this section, we present existing solutions that address generating cycling routes solely and are not connected with the generation of actual sports training plans. An overview of the Computational Intelligence approaches used in planning bike training was presented in [14], where the use of approaches using GPS devices, power and cadence meters, and heart-rate wearables are identified. Existing approaches of Computational Intelligence used in cycling training are presented in [5].

A. SPORT TRAINING GENERATION
The process of sports training generation is tailored to generate the optimum training plans for athletes to reach the highest levels of performance. Sports training must be done carefully to avoid the pitfalls of inadequate training plans, such as overtraining, plateau effects, and monotony, which negatively impact athletes' performance [15]. Some approaches generate generalized sports training plans for athletes [15], [16], [17], [18] where target power-meter, heart rate, and duration are proposed for individualized sports training sessions. How and where the trainee will execute his goals are left to his own will; some approaches do not generate actual sports training but aid in the training process (e.g. [19]), and some of them generate actual sports training routes that the cyclist must complete (e.g. [20]). Predicting a cyclist's heart rate at a given workload, activity duration, fatigue, exhaustion, and recovery were investigated in [16]. The study was done on nine welltrained cyclists. The proposed model could predict the heart rate to about 3bpm.
Three models for predicting average power on an individual cycling sports training were constructed in [21]. The three models were of a for-all approach, athlete-based, and clusterbased type. The study found that athlete-based models were the most suitable for power prediction. Their results show that regression models can predict the average power of new cycling activities within 20W of the actual effort. This is very useful because an average cycling enthusiast may not have access to expensive equipment such as power meters.
The relation between the heart rate and cycling cadence and the heart rate in the future second were researched in [19]; the result of the study was a Feedforward Neural Network model that predicted heart rate to an average error of 2.43 beats per minute for the next second.
The relationship between sports training load and performance in cycling was investigated in [17], and an Evolutionary Algorithm-based model was proposed for generating training plans. The main goal of the generated training plan was to avoid over-training and plateau effects.
Zahran et al. [22] proposes a conceptual framework for generating adaptive weekly sports training plans based on internal training load, physiological constraints, and behavioral-change features. The drawback is that the approach has not been used in practice outside of the preexisting training data used to construct it.
Study [18] proposed planning individual interval sport training sessions using stochastic population-based natureinspired metaheuristics. The Bat algorithm was used and tested on data from previously recorded interval training sessions.
Cycling training route generation was attempted in [20], where a method was presented for generating training sessions with routes in cycling based on data from power meters and previously recorded training routes. The altitude data was extracted from the training records. The routes generated following this approach can only be compiled from segments of previously completed and recorded routes. The route generation was completed using an Evolutionary Algorithm, which used Dijkstras' algorithm [23] and Tabu search [24] for route initialization. The weights for the route initialization were the distances between the nodes.
Kumyaito1 et al. [15] was concerned with generating eightweek sport training plans for athletes with known resting and maximum heart rates by generating a plan which determined the average heart rate and duration of each training session by using a Particle Swarm Optimization algorithm. The Particle Swarm Optimization algorithm-based sports training plan outperformed the standard plan based on a training plan from British Cycling, while satisfying all the physiological constraints of monotony, chronic training load ramp rate, and daily training impulse.
Similar research by Kumyaitol et al. [25] used a Genetic Algorithm to develop an eight-week aerobic cycling training plan, and found a Genetic Algorithm capable of creating training plans over-performing the standardized British cycling training plan.

B. ROUTE GENERATION
The problem of generating cycling routes can be solved from different perspectives, as presented in Table 1. The  Table columns and their meanings are the following: • Approach -Citation to the approach. • Description -A summary of the presented approach. • Supporting algorithms -Which algorithms were used for generating the routes?
• Generated from -From which data were routes generated?
• Start -Does the solution allow the selection of the route starting point?
• Via -Does the solution allow the selection of mid-points that the route must cover?
• End -Does the solution allow the selection of the route ending point?
• Optimization parameters -What parameters and in which way are selected in route generation?
• Type -Is the solution presented in a research paper, or is it a commercial solution?
Each solution is presented further following the Table or in the previous section. It can be approached as a simple routing problem, where the goal is to find the shortest route between points A and B. This is a suitable approach for travel purposes where the person needs to spend the least time and effort traveling between two destinations. Such an approach was studied in [26], where analysis was performed of recorded GPS data of cycling routes. The approach splits the routes into zones of 1000 × 1000, 350 × 350, and 100 × 100 meters, and tries to find the optimized shortest route between the starting and ending points. The algorithm for finding the shortest routes was the Dijkstra algorithm [23]. While the work presents research on route planning in cycling, it was unsuitable for our approach since it did not consider the altitude, and was proposed regarding the shortest route.
Commercial solutions also exist for generating cycling routes. Naviki [27] allows choosing the distance and bicycle type and a starting point, and generates a route where the user can see the total distance, ascent, and descent meters. Routeshuffle [28] allows the generation of circular cycling routes where the target distance can be configured, but there is no way to determine the number of elevation meters.
The Bicycle route generation was presented in [31], where OpenStreetMap [32] and Shuttle Radar Topography Mission [33] elevation data were combined. The presented solution allowed the generation of routes between two points tailored to the cyclist's parameters: quietness, flatness, comfort, and travel time. The approach allows the selection of only one criterion (e.g., only flatness).
Other solutions that aid in creating bike routes, where the user inputs the points to visit and receives a bike route with calculated distance, ascent, and descent, include Ride With GPS [30] and Bikemap [29]. These tools only help create the routes and calculate the ascents/descents, but cannot create routes where the user can also choose the ascent/descent parameters.
As can be seen, the only solutions that allow selecting the route's start, mid, and ending points are [27], [29], [30], all of them being closed-source commercial solutions. Another limitation of these three mentioned solutions is that they either find the shortest route ( [29], [30]) or only allow choosing the distance while not allowing the selection of the ascent parameter ( [27]).
The consequence is that the distance of the route must be approximated by the user, who must choose the appropriate starting and ending points of the routes using trial and error compared to our approach.

III. PROPOSED METHOD
The goal of the proposed method was to generate viable cycling routes based on elevation and distance constraints between two or more fixed predetermined points, generated from a pre-processed geographical data property graph.
The method consists of the following steps: • Data collection • Property graph generation • Data fusion • Generation of training supported by an evolutionary algorithm model All the steps are outlined in detail in the following subsections. All the original algorithms were implemented in the Python 3.9 [34] programming language. The execution environment of all the non-original algorithms is described where they are used.

A. DATA COLLECTION
The proposed method depends on OpenStreetMap (OSM) [32] geographical data, EU-DEM [35] elevation data and previously recorded training activities which were collected from [36] and [37]. From all the mentioned knowledge, the property graph was constructed and utilized as illustrated in Fig 2.

1) OpenStreetMap
OpenStreetMap [32] is a worldwide open-source project for creating a geographic database of the whole planet. The data are stored in XML format. The geographical data are represented by nodes, ways, relations, and tags. The nodes represent longitude and latitude pairs, and are, together with tags, the smallest building blocks of the map. Ways are ordered lists of nodes representing map features, for example, roads, buildings, and administrative boundaries. Multiple ways are connected with relations when a feature needs more than 2000 nodes to be represented. Because nodes, ways, and relations are just individual nodes or (connected) lists of them, Tags are used to give them context. They consist  of key-value pairs describing nodes' metadata, ways, and relations. To retrieve data from OSM, Overpass API [38] was used, which is a read-only API for querying selected parts of OSM geographical data.

2) EU-DEM
EU-DEM [35] is a digital surface model (elevation model) of 33 EEA (European Economic Area) members and six cooperating countries. The model allows retrieval of elevation data from latitude and longitude, has a vertical accuracy of 2.9 RMSE (Root Mean Square Error), and is, on average, 0.56 meters accurate [39]. The elevation data are sampled at 25 x 25 meters squares. The surface model was used by the self-hosted service Open Elevation API [40], which allows importing and querying digital surface model maps and returning elevation data for given latitudes and longitudes.

3) PROPERTY GRAPH
The Property graph was generated from the presented Open-StreetMap [32] and EU-DEM [35] data. So, from a directed graph perspective, as seen in (1), a property graph (G) is defined by a set of vertices (V), which represent actual intersections, and edges (E), which represent routes between them. The edges are directed, which means that their order matters. The edges exist only between the vertices connected in the real-world environment.
The procedure for generating the used graph is presented in [41]. The only difference that we implemented in relation to our previously proposed approach is that we used the Neo4J [42] for saving graphs instead of RedisGraph [43]. The change was done because the RedisGraph offers limited functionalities related to exporting and importing the generated property graphs.

4) SPORTS TRAINING ACTIVITY DATASETS
The collected sports training activity datasets were in the TCX (Training Center XML) and GPX (GPS Exchange Format), which are file formats used by training recording software [44]. All training activity files were composed of recorded sports training, which consisted of succeeding points. Each point had a recorded geographical longitude and latitude, timestamp, and heart rate. Two relevant datasets were identified and used [36], [37]. Only road and normal cyclists (not riding on mountain trails) were selected, and a total of 2651 training records were selected, as seen in Table 2.
The inclusion of only regular and road cycling was done because of the specifics of mountain biking, where training happens on sometimes unrecorded pathways, and high elevation differences in training. Only the road cycling and general cycling athletes were selected. Other sports (e.g., triathlon) were naturally eliminated since they were outside the scope of our research.

B. PROPERTY GRAPH
The generated Property graph is a graph featuring nodes / vertices, which are abstractions of actual intersections on the OpenStreetMap and relationships / edges which are abstractions of roads / ways between them. Each node has the following attributes: latitude, longitude, node id, and way ids, as seen in Table 3.   Each node is connected to other nodes via relationships named paths in our case, as seen in table 4. Each path contains the distance between the two connected nodes, ascent and descent.
The path can contain a positive value of both ascent and descent concurrently. This can happen because the ascent and descent are indexed at each node of the way in the OpenStreetMap. Hence, a path that traverses up and downhill will contain both, even if the ending altitude is the same as the starting point.
An example of such calculation is shown in 3 where two connected intersections (big black dots) are shown. They are connected by five nodes which are shown by the smaller dots. For each of the nodes, their altitude is determined based on their latitude and longitude, with upwards arrows showing the positive gain of altitude and downward arrows showing the negative gain of altitude. The positive altitude gains are summed to determine the ascent gain of a relationship. The negative gains are summed to calculate the descent gain.
Each pair of nodes connected in an intersection is done so in a two-way fashion, as seen in Fig. 4. That is done because traveling from A-B is due to ascent and descent properties different from traveling from B-A. Therefore, the ascent and descent values are switched when traveling in the opposite direction. The method of generation of the aforementioned property graph has been described thoroughly in [41].

1) LIMITATIONS OF THE PROPERTY GRAPH
Our process is limited in that we only queried connections from the OpenStreetMap, and some roads may be one-way roads in the real-world environment, but we did not check the directionality, and simplified the process of generating a two-way connection from each road. However, this is rarely a problem because roads, where cyclists ride are, in the majority, two-way roads.

C. DATA FUSION
The sports training data from [36], [37] and the intersection data in the property graph were in different formats. This problem was solved using data fusion, which is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation [45].
Data fusion was applied to the property graph to fully utilize the combined knowledge to improve quality. This was done by reading every training record, and identifying which recorded intersections it passed. These data were then added to the property graph as a verified count, where each intersection recorded the number of times it was driven over.
The verification procedure (Algorithm 1) receives a list of training records and a list of all intersections as parameters (line 1). The training records were parsed using The minimalistic toolbox for extracting features from sport activity files [44]. The outer for loop ensures that every training activity was processed (line 3-17), and the inner for loop (line 4-16) processed every individual recorded point of the individual training activity (GPX/TCX point). The next innermost for loop (line 6-11) calculates the distance between the recorded training activity point and each of the saved intersections, and saves it to the list of potential verified intersections (line 8-9). Suppose at least one intersection closer than 20 meters was identified (line 12). In that case, the potentially verified intersections are first sorted by their distance in an ascended list (line 13), and the closest intersection is added to the list of verified intersections (line 14).
In the end, the verified intersections are returned (line 18), and for each occurrence of a verified intersection, its verified property in the property graph is increased by one. if intersections.length>0 then 13: intersections.sort() 14 This process was applied for every of the 2651 training records. This ensured that a verify count could be determined for each intersection.

D. EVOLUTIONARY ALGORITHM FOR PLANNING CYCLING ROUTES
The purpose of the proposed algorithm was to generate a suitable route in terms of starting intersection, ending intersection, ascent, and distance parameters. The novel approach for generating cycling routes was based on the Genetic Algorithms [46], which are a group of Evolutionary Algorithms. The variation operators in Genetic Algorithms are selection, crossover, and mutation [47].
The proposed algorithm followed a similar procedure as shown in Fig. 5, which demonstrates the steps followed in the algorithm. The algorithm needs the combined knowledge represented in the form of a Property graph and the algorithm as well as route parameters to start with the optimum route generation. In the first step, initial routes are initialized, and then a subset of them is used for crossover and mutations. Following these two steps, the selection of the best routes is performed, and then the algorithm continues until termination VOLUME 10, 2022  conditions are reached. The best route in the end set of routes is determined using the selection evaluation function. All the concrete steps are further described in the following subsubsections.

1) PARAMETERS
The input parameters for this algorithm were divided into five categories, algorithm parameters, evaluation weights, route parameters, route generation parameters, and penalty parameters.
The route parameters refer to the conditions we want the route to contain ( Table 5). The required route has a defined location of the start (start_id), the end (end_id) and midpoints* (mid_points) Midpoints are explained in the subsection III-D6. If we want to create a circular route, we just select the same node of the end intersection as the start intersection, and add at least one node as a midpoint. The route parameters also included their minimum (min_km) and maximum distance (max_km) in kilometers, and their minimum min_ascent and maximum ascent (max_ascent) in meters.
The algorithm parameters in Table 6 define the chance that each route will be selected for crossing over (crossover_r) in each iteration, and the chance that the route crossed over will be mutated (mutation_r). It also determines the number of iterations to run, which signifies the number of generations  to create (i) in genetic-like algorithms, population size (n), which was the number of starting routes, and the number of routes to select in each selection step.

2) INITIALIZATION
In the first initialization step, the individual routes are generated from the previously presented property graph (1) based on the start and end node id from the route parameters (Table 5).
We investigated and tried using the Yen's K-Shortest Paths algorithm [48], which generates the K number of shortest paths between nodes, but decided against it since the time complexity increased exponentially with longer routes. We based our path generation on Dijkstra's algorithm [23], which is implemented inside the Neo4J [42] database. Additional steps were added to prevent the generation of the same shortest paths over and over again. We have introduced three new parameters, as seen in Table 7 called Route generation parameters.
Another vital thing to note and define is what we want the shortest route to be in Dijkstra's algorithm. We do not want the shortest routes to be the routes of the smallest distance, but want the newly generated routes to be different from previous ones, so that the diversity of routes helps to reach the optimized route. To enable this behavior in Dijkstra's algorithm [23], the algorithm's cost calculation is modified, as shown in (2). In a typical case (real (unmodified) cost → r), the cost would either be the distance or ascent between the two by path-connected nodes. A cost adjustment modifier between A-B is introduced to ensure that paths in each generation will be different from the routes of the previous generation (Cost(path A−B )). It is equal to the sum of a chosen cost parameter, either distance or ascent (r) between A-B, and the chosen cost parameter (r) multiplied by the usage count (usage_c) of the path, that is, how many times did the selected path appear in the previous generation of routes, and multiplied by a constant used_w. The constant can be selected arbitrarily and is subject to further optimization. This cost is calculated for each path between the nodes, and done for each generation of routes, so new paths are explored. The purpose is to penalize the heavily used routes and increase the relative cost of traveling them for other costlier but new routes to become viable. r R, r distances ∨ r ascents Cost(path A−B ) = r + (r · usage_c · used_w) (2) The second part of the Route generation parameters relates to the adjusting costs in generating paths, as seen in the Algorithm 2.
In the route initialization step a n number of paths are required between the start start_id and end end_id nodes. The algorithm additionally needs the generated property graph (G), costs dictionary, where the identity key of each path between two intersections has a defined travel cost and the Path generation weight adjustment p_gen_w and chance (p_gen_c) explained in Table 7. Routes array (routes) in line 2 collects all the generated paths. Routes are generated inside the while loop (line 3-10), which continues until n number of routes are generated. In the first step, the shortest path is generated (line 4) using Dijkstra's shortest path algorithm [23], and then added to the list of generated routes (line 5).
After that, a determined percentage p_gen_c of paths (edges) of the generated route are selected (line 6). For the selected / sampled paths the internal cost (for the Dijkstra's algorithm) is increased by a ratio of Path generation weight adjustment (p_gen_w) (line 7-9). The cost increases generated and appended in each route generated are non-persistent and affect only the generation of routes in the current execution of the procedure. After n routes are generated, the procedure terminates and returns them (line 10-12). The routes were generated fully only in the first step, but a selection evaluation was performed in each of the succeeding steps. The evaluation function depended on the route and evaluation parameters. The route parameters were the starting (start_id) and ending points (end_id), which were already used in the initialization step. The minimum ascent (min_ascent) and distance (min_distance) provided the lower boundaries of the generated routes, and maximum ascent (max_ascent) and distance (max_distance) provided  the upper boundaries of the generated routes. All routes not between these boundaries were penalized based on the evaluation function. The algorithm's goal was to find the route with the lowest possible penalty result, which was evaluated for each route in the evaluation function. A penalty scoring type was determined for each of the deviations from the proposed values, as seen in Table 8.
Two types of penalty scoring exist: absolute and relative. In the case that the absolute scoring was chosen in the algorithm initialization, and the investigated route parameter (x) was between the proposed minimum (min) and maximum (max), there was no penalty (value 0). If it was above the maximum, the penalty was equal to the absolute difference (x-max) times penalty weight (weight), and, if it was below the minimum, it was equal to the difference (min-x) between the actual value (x) and the maximum value (max) times the penalty weight (3).
In relative scoring, the absolute value of the difference between the proposed value and actual value was first divided by the proposed value and then multiplied by the weight ((min − x)/min · weight or (x − max)/min · weight) as seen in (4).
Another bonus was inspected and added since data can be combined with the existing training data verification bonus to reduce the penalty. Each time an existing recorded training passes through an intersection, its verified value increases by one (Table 9).
We can then sum the number of verifications for each route (the sum of all verification values in all intersections) to get the amount of verification (v) of a route. In the algorithm we determine the threshold (v_threshold) for the maximum verification bonus and bonus percentage (b) a route can receive. The actual bonus (B) is equal to the number of verifications (v) divided by the threshold value of verifications (v threshold ) VOLUME 10, 2022 times the maximum verification bonus (b), as seen in in (5).
If v is larger than v threshold the bonus is capped at the maximum verification bonus (b), as shown in (6).
The actual evaluated value (E val ) is then the sum of all four penalties: Minimum ascent (P min_ascent ), maximum ascent (P max_ascent ), minimum distance (P min_distance ), and maximum distance (P max_distance ) times 1 minus the actual bonus (1−B). It is also evident that, if the route is penalized for lower than the minimum boundary of a specific property (ascent or distance), the penalty for the maximum boundary will be zero, and vice versa, as seen in (7).

4) CROSSOVER
After the evaluation or selection phase, the crossover phase follows. This step selects a percentage (determined by the crossover rate) of the total population for crossover. We cannot simply sample individuals randomly due to the nature of the paths.
We can only crossover ways that share at least three points. Each route has a common start (V start ) and endpoint (V end ), as seen on Fig. 6. This is in contrast to classic non-domain specific Genetic Algorithms, where any part of the genome can be switched and replaced between the two parents.
However, we must only crossover routes where there is at least another common point, as seen in (8). This means that a non-starting and non-ending vertice of route A (v a ) is equal to a non-ending and non-starting vertice of route B (v b ), meaning the routes travel through a common vertice (intersection) as seen in (8 The algorithm for crossover selection (Algorithm 3) was provided with the number of needed routes (needed) and an array of existing routes (R).
The routes are shuffled randomly in an array in the first step (line 3). This is followed by the while loop (line 4-20) where previously shuffled succeeding routes (i = 1, 2, 3, . . . ) are checked for crossover with the initial route (0). The loop continues while there are fewer routes selected for crossover (len(pairs)) than the required number of routes (needed), and two or more routes are remaining. The inner while loop (line 7-16) checks every remaining route in succession to check the viability by checking (line 8) if any intersections suit the criteria listed in (8). When such a route is found (line 8), both the initial route 0 and succeeding route i are removed from the route candidates (line 10-11) and added to the crossover route pairs (line 9). If no routes are found to have a common point with the initial route 0, the route at the start of the array is removed from the candidates, and the next route (the one with index 1) takes its place (line 17-19). The main loop continues until no more candidates are identified, or a needed number of pairs are reached. The algorithm returns an array of route pairs (R y , R x ) saved in the lines' array (instantiated in line 2). Once the route pairs are selected it is time to perform the actual crossover. Suppose that we have two routes, A and B, and they have a set (C p ) of crossover points (v p1 , v p2 , . . . v pn ) where (8) which is true. Because the crossover can also happen at the beginning, v start is added to the possible crossover points, forming a set of all possible crossover points C p , as seen in (9).
Only the starting node is added to the possible crossover vertices since crossover from v start to v x will generate the same outcome as v x to v end .  To crossover the two routes, two random non-identical vertices (v x , v y )) are chosen from the set C p . Then, all the vertices and relationships are switched between the two nodes (v x , v y ). It should be noted that the switched routes do not need to be of the same number of nodes (Fig. 7).

5) MUTATION
After the crossover is performed the routes are selected for mutation based on the mutation rate parameter (mutation_r). The mutation is performed so that any two points of a route are chosen randomly (v m1 , v m2 ). After the points are chosen, all the vertices and paths between them are removed, and new routes are generated between them (Fig. 8). The routes are generated using the Algorithm 2, which is based on Dijkstra's algorithm [23]. The k-value chosen is the number of intersections between the chosen nodes (v m1 , v m2 ) for route mutation.
Once the k-number of routes is generated, one of them is chosen randomly to replace the previous sub-route. One of the generated sub-routes is chosen randomly (sub − route x ) and inserted where the previously deleted sub-route (sub−route i ) was.
After the mutations of randomly selected routes are performed, each pair of original parent routes (R 1 , R 2 ) has a pair of crossover offspring (R AB1 , R AB2 ), where one or both of them can be mutated (if chosen randomly for mutation). From them, we create sets of four routes, two original and two new, and evaluate them according to the evaluation function previously mentioned. Based on the evaluation function, we select the two routes with the lowest possible values and replace the original two routes in them. Once we perform this evaluation step for all the parent-offspring quartets, we move to the crossover step and repeat it until the required number of generations has been reached.

6) GENERATING A ROUTE BETWEEN MORE TWO POINTS AND CIRCULAR ROUTES
The previously explained algorithm has been extended to enable the generation of complex routes. These are routes that, in addition to the start and end nodes, also contain an ordered list of mid nodes. In our case, this means that the route can have a list of nodes (A 1 , . These are the nodes that the route must cover in the following order, as seen in Fig.9 An array of partial routes, therefore, represents a complex route (e.g., A 1 → A 2 ) as shown on Fig. 9. To generate a complex circular route, the ending point (A N ) must be equal to the route's starting point (A 1 ).
The initial route generation is performed by generating a population of complex routes. Each partial route is composed of the aforementioned partial paths. This is shown in the Algorithm 4. The procedure generateNComplexRoutes receives all the inputs that the Algorithm 2 receives, with the addition of mid_ids, which is an array of mid nodes that the route must contain. At the first step (lines 2-3), an empty routes array (cpl_routes) is initialized, and an array where the start node, mid nodes, and end node are concatenated. Sub-routes are generated in the while loop (lines 5-9). First, the start (a) and end (b) nodes are selected (lines 6-7) from the all_nodes array. This serve as the start_id and end_id input parameters of the generateNRoutes procedure (Algorithm 2). After the while loop is concluded the complex routes (cpl_routes) array is returned (line 10).  The first index of the cpl_routes array is the sub route index of an individual complex route, and the second one is the complex route index.

Algorithm 4 Generate Complex Routes Between
This changes the crossover phase by allowing the operation only to be done on the partial route of the same start and end nodes. Which partial route is chosen to crossover with another partial route is selected randomly, the same as selecting the two original parent routes.
The mutation phase is adjusted so that mutation can only be done on individual partial routes and not the whole route, to maintain the fixed points that must remain in the complex route.
The evaluation of routes is performed so that each partial route's total ascent and distance are calculated, and then summed up to receive the complex-route ascent and descent, which can then be graded per the route penalty criteria.

IV. EXPERIMENTS AND RESULTS
We demonstrated the use of our proposed method by designing the requirements for several typical cycling route test cases. The routes were generated based on their parameters. The area of routes was selected based on the training data of the selected athletes, described in Table 2 from the datasets [36], [37]. The heatmap projection of training and the selected area on which the routes were generated is shown in Fig. 10. This was done to enable the use of verified weights of nodes (intersections). As seen in Fig. 10, the highest activity in the viable merged dataset was recorded in northeast Slovenia in the geographical regions of Podravje and Pomurje. To reduce the processing and track generation time, a limited area with the following boundaries: minimum latitude: 46.20, maximum latitude: 46.86, minimum longitude: 15.16, and maximum longitude: 16.37, was selected, as shown by the red rectangle on the heatmap. The resulting property graph had a size of 79,6 megabytes, and the additional data needed for visualizations was 394 megabytes (latitudes and longitude data for each point of a vertice between two edges).
The mentioned area encompasses the places of Celje, Lenart, Ljutomer, Maribor, Murska Sobota, Ormož, Radgona,  Rogaska Slatina, and Slovenska Bistrica. In the given area, 40369 intersections were discovered, together with 109298 paths between them. The area is mostly flat in the region of Pomurje (near Murska Sobota and Ljutomer), and offers hilly terrain around the Pohorje mountain (near Maribor, Slovenska Bistrica.
Four routes, one for each scenario, were generated to test the proposed method: • Case A -Route of length 25 kilometers and an ascent of 400 meters.
• Case B -Route 50 kilometers long and 450 meters of ascent.
• Case C -Route 100 kilometers long and 900 meters of ascent.
• Case D -Circular complex route 100 kilometers long and 1000 meters of ascent. The initialization parameters for the algorithm were the following, as presented in Tab. 10. We tried to generate three typical training routes, short (25 kilometers), medium (50 kilometers), and long (100 kilometers). The crossover rate and mutation rate were selected at 50 %. However, it should be noted that, in our algorithm, only the crossover routes were mutated, so that means that 50 % of crossover routes were mutated in each iteration (25% of the total routes). There were 50 routes in each generation, and the algorithm generated 100 population generations.
The penalties (Table 11) were determined at 1 point deduction for each kilometer under or over the determined value, 0.1 points for each meter over or under the determined ascents. A bonus of up to 5 % was awarded to routes with more verified paths than an average path.
Each of the following subsections presents a case of a generated route (A, B, C, D). A brief description and the generated route image are provided for each case. The algorithm's progress in each case is also shown on charts which show the grade, ascent, and distance of the best route in each generation, as well as the average grade of all routes for each iteration.

A. CASE A
Case A demanded a short route (25 kilometers) and was limited to 400 altitude meters in a hilly environment (starting point: Ptuj, ending point: Ivanjkovci). We also chose Ivanjkovci because conventional pathfinding using Google Maps [49] provided us with a route of 27.7 kilometers for walking and of 32 kilometers long for driving. We were interested in how the algorithm would respond to such a problem. The best route of the first generation had the best grade of 1,06 penalty points. It had a distance of 26023 meters and an ascent value of 399 meters. The route generation process is shown in Fig. 11, where the distance, ascent, and best grade are shown in each iteration, and the average grade of the routes generated.
Interestingly, the best route was found in the 20th iteration, and no better routes were found afterward, as seen in Fig. 11. It can be seen that the main problem was finding a shorter route between the two determined destinations while still conforming to the ascent criteria. The route had a length of 26.02 kilometers and covered 399 meters of ascent, which is 1.02 kilometers over the distance limit and 1 meter under the ascent limit.
The best route of the first generation is shown with a blue line, and the changes in comparison to the final version of the route are shown with a pink line in Fig. 12. It can be seen that modification happened only on the starting part of the route, and no further optimizations were found.

B. CASE B
Case B presented regular training in a mixed environment, partially flat terrain (from Slovenska Bistrica to Ptuj) and partially hilly (from Ptuj to Lenart). The route had a required length of 50 kilometers and 450 ascent meters. The best route is shown in Fig. 13 where the blue line shows the original best route and the pink line is the final best route (modifications in comparison to the original). For brevity, the parts where changes occurred are marked with a slightly darkened circle marked by a green border. The route optimizations were reached in two parts when comparing the original best route and the ending best route.
This best solution was discovered in the 94th generation, as seen in Fig. 14.
It should be noted that because the route can only start and end in an intersection, the termination parameters should be set to a higher than zero penalty value in the future due to the high improbability of reaching the exact number of meters.
It can be noted that the initial generation best grade route had a grade of 7.10 points (average grade: 95.39), an ascent of 523 meters and a distance of 49.83 kilometers, and the final route had a distance of 50.79 kilometers and an ascent C. CASE C Case C was a case of a long-range route with a higher number of altitude meters (100 kilometers and 900 meters of ascent) between Slovenska Bistrica and Kungota. In this case, the optimum solution was discovered in the 87th generation of routes, as seen in Fig. 15. The best solution had a penalty value of 0.0975 and covered 99.99 kilometers with 899 ascent meters. It is better to place the two points a bit closer together, so that more different routes can be found between them, within the given parameters.
We can also see from Fig. 15 that the algorithm successfully tried to find a trade-off between distance and ascent to discover the most optimum route. The final route generated was 99.9 kilometers long and had an ascent gain of 899 meters.
The first generation best route is shown, together with the final best route in Fig. 16. The blue line shows the original best route, while the yellow line shows the final version of changes to the original route to reach the optimum solution. The red-bordered circles denote four notable diversions from the original best route.   complex circular route was generated between the following four intersections in the given order and places: 1) Videm pri Ptuju (start) 2) Pragersko 3) Zgornja Korena 4) Videm pri Ptuju (end) From the four intersection points, three partial routes were generated, namely: 1) Videm pri Ptuju → Pragersko, 2) Pragersko → Zgornja Korena, 3) Zgornja Korena → Videm pri Ptuju. The goal of the case was to prove the feasibility of the algorithm for generating complex routes that were also circular.
The initial best route had a grade of 17.54, and the best route overall was discovered in the 86th iteration. It had a grade of 0.0767. The route was 80 meters shorter than 100 kilometers and covered precisely 1000 meters of elevation gain. A route with a penalty smaller than one was discovered in the 41st generation, as seen in Fig. 17. The last chart, which represents the average grade of a route, is plotted on a logarithmic scale (base 10).
The best solution can be seen plotted on a map in Fig. 18. The red dot represents the starting and ending point of the route (Videm pri Ptuju), and the leftmost yellow dot (Pragersko) and topmost yellow dot (Zgornja Korena) represent the two additional fixed points that the route must cover. The red arrows represent the direction of the route.

E. LIMITATIONS
The algorithm initialization parameters were chosen on a trial-and-error basis and were found to be suitable for generating the required routes. The values (namely population size and number of generations) were limited due to the constraints of running the algorithm on a personal computer (CPU: 7700HQ i7, 16 GB ram). The routes were only generated in the bounding map (latitude and longitude limitations) since the most viable training data was available from that area.

V. DISCUSSION
The domain of Route Generation has been studied thoroughly. While widely known algorithms exist for finding the shortest routes, such as Dijkstra [23] and Yen's K-shortest routes [48], there are much fewer possibilities for finding the exact routes, especially if there is more than one parameter that we want to control.
Several geographical domain-specific challenges were also identified, (1) Crossover can only be done between routes with common intersections, (2) Any operation of the algorithm on individual or multiple routes must maintain that each route remains completely connected, and (3) At least the starting and ending point is fixed permanently, which somewhat limits the possibilities for route generation. Each of the identified challenges was addressed in the algorithm.
We have also presented the challenge of generating the routes that need to be controlled by more than one parameter. In our case, we controlled the routes not only for their length but also for their total ascent. We have also presented the extension of the original method to allow the generation of circular routes.
We have demonstrated the feasibility of our algorithm by presenting four different cases: (A) a Short route, (B) a Medium length, (C) a Long route, and (D) A long circular route. In all cases, the generated routes were near the required values, so we can say that the algorithm solved the given problems successfully. When comparing the route ascents and lengths, no final instances of best-generated routes had any of their parameters deviate by more than 4.08 % (distance in Case A), as seen in Table 12.
A thing that has not yet been optimized is the selection of the starting and ending points, which currently need to be selected manually. The only condition to generate valid routes is that the points are not too far apart.

VI. CONCLUSION
We have found a feasible solution to the generation of cycling training routes. Our novel approach offers a way to route generation inspired by the Evolutionary Algorithm approach. We have introduced the implementation of the (1) Initialization, (2) Selection, (3) Crossover, (4) Mutation, and (5) Evaluation stages in the domain of Geographical Maps.
The proposed approach can discover new routes based on route parameters: distance, ascent, start, and end points. The approach could be extended further for additional parameters if the need for them existed. The approach has been tested and demonstrated successfully on the four presented cases of routes (25 kilometers, 50 kilometers, 100 kilometers, and 100 kilometers circular) of varying lengths. Potential further research directions include: • extending the original property graph to include data for maximum and average slope inclines (degrees) between intersections; • optimization and sensitivity analysis of the algorithm parameters so that optimum solutions could be found faster and with fewer resources; • adapting the route generating algorithm to include route generation for running. He is currently an Assistant Professor at the University of Maribor. He has published more than 120 research papers in refereed journals, conferences, and book chapters. His research interests include data mining, pervasive computing, optimization, and sports science. He has acted as a program committee member of more than 30 international conferences. Furthermore, he is a member of the editorial boards of three different international journals. VOLUME 10, 2022