Flow-Based Reinforcement Learning

This paper presents a novel Flow-based reinforcement learning strategy to model agent systems that can adapt to complex and dynamic problem environments by incrementally mastering their skills. It is inspired by the psychological notion of Flow that describes the optimal mental state experienced by an individual when they are fully immersed in a task and find it intrinsically rewarding to engage with. The proposed model presents an algorithm to describe the Flow experience such that agents can be trained through finer distinctions to the challenges across training time to maintain them in the Flow zone. In contrast to the traditional and incremental learning approaches that suffer from limitations associated with overfitting, the Flow-based model drives agent behaviours not simply through external goals but also through intrinsic curiosity to improve their skills and thus the performance levels. Experimental evaluations are conducted across two simulation environments on a maze navigation task and a reward collection task with comparisons against a generic reinforcement learning model and an incremental reinforcement learning model. The results reveal that these two models are prone to overfit under different design decisions and loose the ability to perform in dynamic variations of the tasks in varying degrees. Conversely, the proposed Flow-based model is capable of achieving near optimal solutions with random environmental factors, appropriately utilising the previously learned knowledge to identify robust solutions to complex problems.

such as coordinated exploration [ [10] with both 25 virtual and physical applications. However, a known limita- 26 tion of the existing RL-based agent models is the difficulty 27 in adapting to dynamic and uncertain conditions. This is 28 primarily caused by the increased complexity of operation 29 associated with changes in the environment [11], [12], [13]. 30 This paper investigates a novel Flow-based RL strategy 31 which allows agent systems to adapt to complex environ- 32 ments by incrementally mastering their skills. In psychology, 33 Flow refers to the mental state experienced by an individual 34 The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang . when they are fully immersed in a task and find it intrinsically 35 rewarding to engage with. While there has always been an 36 awareness among people of the feeling of immersion, loss of 37 self-consciousness, and happiness experienced while being 38 fully engaged in a task they like, the concept was first coined 39 by the psychologist Mihaly Csikszentmihalyi [14]. The key 40 dimensions of any experience of a task are the challenges the 41 task brings, and skills required to achieve them. One deviates 42 from a Flow state of mind when they feel: anxious, due to 43 a challenge being beyond their reach; or bored, due to a 44 challenge being easily achievable compared to their current 45 skill level. If the challenges and the skill levels increase pro-46 portionally within the Flow zone, it can facilitate a sense of 47 discovery driving one with an intrinsic motivation for higher 48 performance levels. 49 We adapt this concept of Flow in training artificial agents 50 within a reinforcement learning model by making finer dis-51 tinctions to the challenges across training time to maintain 52 agents in a Flow zone. It can overcome intrinsic challenges 53 such as overfitting and catastrophic forgetting associated with 54 has been used for agent modelling. In a similar vein, transfer 108 learning [26] is an approach that uses knowledge gained in 109 a previous task to subsequently address a related but dif-110 ferent task. It has been adapted with evolutionary transfer 111 RL frameworks [27]; policy intersection to allow an external 112 policy influence the RL agent [28]; and with fine-tuning 113 where tasks are parameterised by their reward functions [29] 114 among other applications. Self-learning adaptive dynamic 115 programming [30] is also experimented in this regard as a 116 means of eliminating the explicit external reward scheme 117 by encouraging agents to learn internal rewards dynamically 118 based on the problem presented. The use of abstractions or 119 modular RL is another approach to solve complex problems 120 through tasks being subdivided into multiple simpler modules 121 to be learned independently and combined [31], [32], [33], 122 [34]. 123 The key limitation with these existing approaches is that 124 they are primarily goal oriented. The agent behaviour is 125 directed towards achieving a dynamic goal through refine-126 ment of action, and little thought is given to the learning 127 process in terms of balancing skills and challenges. As a 128 result, they are not capable of building a general awareness 129 of the environment that can later be utilised under changed 130 conditions; rather they tend to overfit to or forget the accu-131 mulated knowledge [15], [35] which leads to deterioration of 132 performance as the model is presented with more complex 133 challenges. Such a model is incapable of developing a broad 134 awareness of the environment that they are performing in, 135 which can make it prone to failure when the environment 136 changes despite being good at achieving dynamic goals.

137
Flow is a notion that is not focused on external goals. 138 An agent in Flow enjoys an optimal experience where they 139 are intrinsically motivated towards exploring the environment 140 and building an awareness of the task, which extends beyond 141 a simple goal oriented mind. The concept has often been 142 adopted in human development and education as a way to 143 understand the conditions that make the process of learning 144 more enjoyable and efficient from a psychological point of 145 view [36]. It has been identified that Flow can facilitate 146 creativity and self-actualisation in the domain of learning 147 and problem solving for humans [37]. In the technological 148 domains, Flow has primarily been investigated with games 149 and gamification. The interactions between a player and the 150 game and the operative description of game-play has been 151 characterised in the literature through the aspects of Flow on 152 learning and enjoyment [38], [39]. However, Flow has not 153 received attention in the domains of agent systems and AI 154 models. It has characteristics to be explored as a potential 155 alternative to overcome the learning issues in dynamic envi-156 ronments. Being in the Flow zone indicates that an agent will 157 not completely be goal oriented but will enjoy the experience 158 until it can no longer attain an optimal experience through 159 novel solutions [14]. Therefore, it can lead to artificial agents 160 that can identify more robust and generalisable solutions to 161 problems than too narrow and specific solutions. Therefore, 162 the work proposed here explores how the psychological the-163 ory can be adapted in the field of artificial agents to enhance 164 FIGURE 1. Complexity of consciousness increasing as a result of the Flow experience [14]. An experience falls out of the Flow zone if the skill levels improve without the challenge getting complex ( 2 ); or if the challenge gets increasingly complex without an opportunity to improve skills ( 3 ). In red is shown how we utilise the notion in agent systems to improve skills of agents across increasing challenges. The agent is given the opportunity to improve skills with a certain challenge level until it reaches a level of boredom ( a ) when the challenge is then made complex ( b ) bringing the agent back into the Flow zone.  Proposed Flow-based RL model. The task commences with an initial challenge level ς and a skill level ϑ. The agent keeps improving its skill level by an increment of ϑ (step 1) until the boredom value β at the challenge level ς exceeds the boredom threshold ϕ (step 2). If the boredom threshold has been exceeded and the experience level is the expected ultimate level of the system G (step 3), then the system completes the learning process. If not, the system increases the complexity of the challenge by an increment of ς (step 4) and moves back to the learning step.  With the understanding of the concept of Flow in psycho-207 logical experiences, the notion was adapted in our work to 208 improve the learning ability of artificial agents in complex 209 simulated environments. The goal is to maintain the agent(s) 210 in the Flow zone continuously, such that both the challenges 211 and their skills improve simultaneously over time until the 212 expected level of performance for the expected level of chal-213 lenge is reached. The experiences highlighted in red in Fig-214 ure 1 illustrate this process. When the agent starts improving 215 their skills for a given challenge (ς) and passes the threshold 216 for boredom at a , the challenge level is incremented such 217 that their experience will be at b . The agent then starts 218 improving the skills again for the challenge to attain a higher 219 performance until it cannot further improve and gets bored 220 after some time ( c ), and the challenge is incremented again 221 to bring the agent back into the Flow zone ( d ). This process 222 is repeated until the ultimate challenge level is reached.

223
The proposed Flow-based RL model designed based on 224 the said approach is illustrated in Figure 2. The task is 225 commenced with an initial challenge level ς and a skill 226 level ϑ. As the first step, the agent improves its skill level 227 by an increment of ϑ through the reinforcement learner. 228 At the next step, the algorithm calculates a boredom value 229 β at the challenge level ς and checks if it has exceeded 230 the boredom threshold ϕ. If it has not, it suggests that the 231 agent can still improve its performance and therefore moves 232 VOLUME 10, 2022 back to the learning step (step 1). If the boredom threshold is reached, the agent has reached its maximum performance 234 level for the particular challenge and has moved out of the 235 Flow zone and is not enjoying an optimal learning experience 236 anymore. As the next step (step 3), the algorithm checks if 237 the experience being enjoyed by the agent at this level was 238 the expected ultimate experience level of the system ( G ).

22:
if β ς > ϕ then 23: ς ← ς + ς However, if it has not reached the ultimate 243 challenge level, then the model increases the complexity of 244 the challenge by an increment of ς (step 4) and moves back 245 to the learning step.

246
As discussed above, the model requires a method to quan-247 tify boredom and incorporate that with the reinforcement 248 learner. Algorithm 1 details the proposed algorithm with the 249 boredom calculation method illustrated in Algorithm 2.

250
The modified Flow-based RL algorithm 1 starts by initial-251 ising the Q-table (line 2) and the state, action pairs of the 252 solution for the current challenge level ς as null (line 4). 253 A decaying epsilon-greedy Q-learning approach [40] is used 254 to balance the exploration versus exploitation tradeoff with 255 action selection. For the initial rounds of learning, a relatively 256 higher probability is assigned for selecting a random action, 257 and as the learning improves, this exploration probability 258 is reduced giving more chance for exploitation of the most 259 suitable actions (lines 8-12). Every state,action pair ((s, a)) 260 of the solution for each challenge level is recorded (line 14). 261 At the end of identifying a solution during every episode of 262 the challenge level, a boredom value is calculated based on 263 all state, action pairs visited by the solution (lines [19][20][21]. The 264 value is then compared against a set threshold to determine if 265 the agent has moved out of the Flow zone, and if it has, then 266 the task environment is updated with a higher challenge level 267 and the learning process is started from the beginning. If not, 268 the agent still has the capacity to improve its performance, 269 and the learning process is moved to the next episode in the 270 same challenge level (lines 22-25). The process terminates if 271

295
This section elaborates the designs of simulation environ-296 ments and experimental evaluations conducted to test the 297 proposed Flow-based RL model.

299
The experiments utilise two tasks designed to investigate two 300 different objectives: 301 • A maze navigation task: where the agent is forced to use 302 the new knowledge presented to the system.

303
• A reward collection task: where the agent is given the 304 option the use the new knowledge presented to the sys-305 tem but is not forced to do so.

306
Each task consists of 51 challenge levels each present-307 ing new knowledge to the system to evaluate agent perfor-308 mance in traditional RL, incremental RL, and Flow-based 309 RL environments. The first task is associated with a maze 310 navigation environment as depicted in Figure 3. The agent 311 is expected to navigate through the available cells by finding 312 a path avoiding the obstacles (in black) from the start position 313 (red) to the end position (green). The goal is to find the 314 shortest path while avoiding the obstacles. The first chal-315 lenge involves no obstacles, and the agent has the freedom 316 to explore all cells and find a suitable path to reach the end 317 position (Figure 3a). At each challenge level increment, new 318 obstacles are added by blocking free cells to make the task 319 more complex (Figure 3b). The agent can only travel to its 320 FIGURE 4. Cell reward collection task. The agent is expected to collect rewards by moving onto 100 cells (figure indicates only 20 cells for clarity of representation) across multiple channels. Switching between channels incur a cost associated with the distance between the two channels. The goal of the agent is to collect the maximum total rewards possible (rewards from cells -costs of channel switching) by the end of 100 time steps. The first task starts with only 2 channels, and each challenge level introduces a new channel with each channel having at least one cell with a higher reward than all previous channels at the same cell position upto 52 channels. The cell with the highest reward for each column is highlighted in yellow. However, this may not be the optimal path since channel switching also incurs a cost associated with the distance.

346
A few design strategies were used to ensure a consistent 347 increase in complexity with the challenge increments. Each 348 channel being added will have at least one cell which has a 349 higher reward than the rewards of all the previous channels at 350 the same cell position. This condition ensures that a difficulty 351 increase is guaranteed with every new channel being added 352 as there is an advantage in moving to the newly introduced 353 channel for a higher reward. The total reward of all cells 354 in a single channel should be within a given range [1000-355 1500] and the rewards are incremented along the channel in 356 a sinusoidal stepwise format. Figure 5 depicts the nature of 357 reward assignment in the cells within each channel.

358
This task is different from the maze navigation task where 359 the incremental and Flow-based agents would be forced by 360 the design itself to utilise the new knowledge presented at 361 each difficulty level. In this case, the agent is provided with 362 the choice to either explore the new knowledge or to remain 363 with the solution identified at the previous difficulty level. 364 The new channel added would be useful to explore since it 365 has at least one cell with a higher reward than all cells of the 366 previous channels in that particular column, but the agent is 367 free to decide whether to visit that channel or not.     learn to avoid all obstacles at every challenge level as it does 425 not carry forward any prior knowledge of the environment. 426 Therefore, the time required consistently increases across 427 challenge levels. However, the incremental learner possesses 428 prior knowledge of the environment from the previous chal-429 lenge levels and according to the results it can be deduced 430 that it overfits and finds it difficult to adapt to a new path 431 taking more time to converge to a solution until around the 432 30 th level. However, as the complexity of the challenge level 433 further increases, the number of obstacles increases, thus 434 gradually reducing the number of path options to reach the 435 exit (solutions). Therefore, even if the incremental learner 436 is prone to overfit, it becomes relatively easier to identify a 437 new solution as the complexity increases for two reasons: the 438 learner is forced to look for a new path through the design 439 itself (as the new obstacle always intercepts the previously 440 identified solution); and the available options for a solution 441 gradually decreases with the increasing number of obstacles. 442 In contrast, this phenomenon is not observed with the Flow-443 based learner. The tendency of the incremental learner to 444 overfit is further investigated with the reward collection task. 445 The length of the shortest path identified for each challenge 446 level with increasing number of obstacles is shown in Fig-447 ure 6(b). The lengths of the paths increase with all 3 models 448 as the challenge level increases due to the increasing number 449 of obstacles that should be avoided to reach the end position. 450 At a glance, the Flow-based model and incremental learn-451 ing model seem to identify shorter paths for all challenges 452 compared to the traditional model; however, there is not 453 enough statistical evidence to suggest a significant difference 454 between the solutions derived by the models (p = 0.0615 > 455 0.05).

456
The complexity of the paths were determined based on 457 the Manhattan distance which is the the sum of the abso-458 lute differences between the start and the end positions. The 459 difference between the actual path distance and the Manhat-460 tan distance was considered as the complexity of the path. 461 According to Figure 6(c), the complexities of the paths identi-462 fied by all 3 models are increasing with the increasing number 463 of obstacles and the longer paths that should be followed as a 464 result. This observation further supports the design decision 465 of the challenge levels as the increasing complexity of the 466 paths correspond to an increasing complexity of the chal-467 lenges. However, similar to the path length results, there is no 468 statistically significant difference between the complexities 469 of the paths identified by the models (p = 0.1817 > 0.05).

470
Therefore, the results suggest that the Flow-based model is 471 significantly efficient at identifying solutions for increasingly 472 complex challenges; however, there is not enough evidence 473 to suggest an improved quality of the results compared to 474 the traditional and incremental model with the current obser-475 vations. In order to further explore the applicability of the 476 Flow-based RL model, the evaluations were then extended to 477 more dynamic scenarios. The next set of experiments were 478 conducted to analyse whether the Flow-based model can 479 effectively utilise the skills learned through performing in the 480  where the skill level at the 50 th challenge was then fed into the 490 agent as the commencing skill level for each decrementing 491 challenge level. For the second experiment, the obstacles 492 were not removed in order; rather, a decreasing number of 493 obstacles were placed at random positions in the environment. 494 I.e., after the agent has completed 50 challenges, the next 495 challenge is to overcome 49 obstacles placed in a different 496 random order to what the agent has experienced so far. The 497 subsequent challenges include decreasing number of obsta-498 cles upto no obstacles placed at random positions. The skill 499 level achieved by the end of 50 challenges is fed to the agent 500 for each challenge after the 50 th level as before. 501 Figure 7 illustrates the results for the total of 100 challenge 502 levels for Flow-based, incremental, and traditional RL models 503 as mentioned. According to Figure 7a, all 3 models show that 504 they take a statistically insignificant time to learn the next 505 challenge after the 50 th challenge with their already improved 506 skill level with both experiments (obstacles removed in order, 507 and placed at random places). However, the Flow-based 508 model is still capable of completing the task faster than the 509 other two models.

510
Further, the length of the shortest paths shown in Figure 7b 511 demonstrates that the Flow-based and incremental learning 512 models can significantly enhance the agent's skills towards 513 achieving significant performance levels. When the obstacles 514 are removed in order, the lengths of the paths identified 515 gradually start decreasing implying that the Flow-based and 516 incremental models can use previously learned knowledge 517 to fall back on to a simpler challenge. More importantly, 518 when the obstacles are placed in random places, they behave 519 significantly better and shows the capacity to find relatively 520 similar shorter paths during all challenge levels despite the 521 complexity of the challenge. This observation deduces that 522 the models did not in fact master the skills for a specific 523 challenge but achieved higher performance levels which lead 524 the agent to be able to tackle any dynamic goal in the given 525 problem space. On the other hand, the traditional RL model 526 shows that when obstacles are removed in order, the model is 527 not capable of finding shorter paths any longer even though 528 the challenge is being simplified. The lengths of the paths 529 that are found when the agent is presented with simpler 530 challenges after the 50 th challenge is in the same range as 531 the solution derived for the 50 th challenge. This demonstrates 532 that the traditional model has overfitted and is incapable of 533 readjusting to different challenge requirements. When the 534 obstacles are placed at random places, the sudden significant 535 drop in the path length can be observed at the 49 th level, but it 536 starts performing poorly as the challenge level reduces. This 537 interesting observation is due to the same overfitting issue 538 observed before. Once 49 obstacles are placed in random 539 order, it suggests that a majority of the grid is covered with 540 obstacles which will significantly obstruct the path identi-541 fied at the 50 th level forcing the agent to learn a new path 542 disregarding some of the knowledge gathered earlier. This 543 leads to discovering a shorter path with the higher number 544 of randomly placed obstacles. However, as the number of 545 The results are similar with the complexity of the paths 551 observed in 7c as the complexity is correlated with the length 552 of the path. To further analyse the paths derived by the 553 models from 0-50 th challenge versus 50-0 th challenge after 554 commencing the learning process from the skill set of 50 th 555  to incremental learning. The primary goal of investigating 576 the next environment is to understand the behaviour of these 577 models when the agent is not forced but is only given the 578 choice to utilise new knowledge through new difficulty levels 579 of the task.

580
In contrast to the observations with the maze navigation 581 task, the time analysis presented in Figure 9a depicts that the 582 cumulative time taken by the Flow model increases exponen-583 tially with the challenge level and is significantly higher than 584 the traditional model (p = 0.0 < 0.05). The incremental learn-585 ing model is taking even more time compared to the Flow 586 model (p = 0.0 < 0.05) and is the most inefficient out of the 587 3 models compared. On a similar note, unlike the maze nav-588 igation task where a statistically significant improvement in 589 performance was not observed, Figure 9b shows that the total 590 rewards (cell rewards -channel switching costs) collected by 591 the Flow-based RL model significantly increases than both 592 the traditional model (p = 0.0 < 0.05) as well as the incre-593 mental learning model (p = 0.0 < 0.05). This shows that as 594 the number of channels increases, the traditional model finds 595  To better understand the causes for the performance of the 649 models as observed, Figure 12 looks at the movement across 650 channels for the agent during all 100 time steps at each chal-651 lenge level for both models. This gives a clear understanding 652 of the differences observed in the total rewards collected. The 653 traditional model starts with smaller channel switches during 654 the less complex challenges (as expected due to unavailability 655 of a large number of channels). But as more channels appear 656 in the task, the model jumps to these channels for the higher 657 rewards disregarding the costs associated with transferring 658 through channels. When the channels are removed in random, 659 it can be seen that the model switches back and forth from 660 distant channels to collect more rewards, however, it is also 661 associated with higher costs which result in a low value of 662 total reward. Conversely, the Flow model and the incremental 663 learning model behave more intelligently taking the cost into 664 consideration. These models identify that the most efficient 665 pathway is to move within a smaller set of channels and 666 collect the best rewards from them rather than switching 667 to distant channels that can increase the overall cost. This 668 pattern is more significant in the Flow model with even less 669 distant switches being observed compared to the incremental 670 learning model. Despite each higher channel having some 671 cells with better rewards, the model has the ability to compare 672 the relative benefits of exploration versus exploitation to 673 identify the best solution to improve rewards.

674
This is further analysed with Figures 13, 14, and 15 which 675 evaluate the individual cost and reward values. According 676 to the figures, the performance difference observed with the 677 traditional model is caused due to the cost associated with 678 switching channels. The traditional model consistently scores 679 higher costs during each cell movement due to constantly 680 switching to channels that are further apart. As the challenge 681 complexity increases, the cost values increase proportionally 682 due to the availability of more channels. Conversely, the 683 Flow-based model and the incremental model are capable 684 of maintaining the costs at a minimum by only switching to 685 channels that are only adjacent and focusing on enhancing 686 the reward pool while retaining a minimum cost damage. 687 Despite the same strategy used by both the Flow model and 688 the incremental model, the Flow model is still capable of 689 collecting significantly more rewards than the incremental 690 model as discussed before. The average rewards collected for 691 each challenge level depicted in Figure 13b illustrates this 692 difference in rewards. Therefore, this deduces that the incre-693 mental learner is more reluctant to adapt to new knowledge 694 and tends to stick with the knowledge gathered previously. 695 As a result, it misses the opportunity to increase the rewards 696 collected. Conversely, the Flow model is more flexible, and is 697 open to explore the new channel presented at every difficulty 698 level while retaining the previously acquired knowledge to 699 minimise costs and collect more rewards.

700
The evidence further proves the observations made with the 701 maze navigation task that suggests the Flow-based learning 702 model can push the boundaries of traditional and incremental 703 RL models. It has the potential to utilise the skills learned 704 in simpler challenges to achieve more complex goals in 705 future iterations, and the robustness and flexibility to adapt 706 to dynamic situations. In contrast to the compared models, 707 This paper investigates a novel Flow-based RL model as a 715 potential alternative to overcome the challenges associated 716 with modelling artificial agent systems that can adapt to 717 complex and dynamic environments. The existing AI tech-718 niques such as incremental and transfer learning suffer from 719 VOLUME 10, 2022 FIGURE 12. Channels moved to by traditional, incremental, and Flow-based RL models at every time step for all 100 challenge levels for both approaches: channels removed in order, and in random. The results are averaged across 50 runs each and the colour-bar depicts the channel number for each cell the agent was in at every time step.
issues related adapting to dynamic environments due to the 720 inherent property of these approaches being primarily goal 721 oriented [15]. As a result, these systems lack the capacity 722 to build an awareness of the environment making them less 723 robust in changing environmental conditions. The model pro-724 posed here focuses on maintaining agents in a Flow zone, 725 thus enabling them to enjoy an optimal experience of the 726 task which is not fueled only by the external goals but also 727 by the intrinsic curiosity to improve skills in a given task 728 environment. Therefore, agents learn to achieve goals through 729 incremental complexity levels while adjusting their skills set 730 to face any random variation of the task at every complexity 731 level. A measure of identifying the Flow zone is also intro-732 duced based on the novelty of the solutions identified by the 733 agents.

734
The Flow-based model is tested in two simulation envi-735 ronments: a maze navigation task, and a reward collection 736 task with comparisons against a traditional RL model and an 737 incremental RL model to investigate the impact of the pro-738 posed modifications to the algorithm. The two environments 739 were designed such that the maze navigation task deliberately 740 forces the incremental and Flow-based models to investigate 741 utilise new knowledge presented in incrementing difficulty 762 levels. When the agent is only provided with the choice but 763 is not forced, it is more prone to overfit to the previously 764 learned knowledge and not explore novel solutions which 765 could have lead to better performance. With an agent trained 766 using a traditional Q-learning RL model, the model is driven 767 only by the external goal and is satisfied once it achieves the 768 goal. Therefore, it does not attain enough knowledge to derive 769 flexible solutions given a dynamic environment.  that the agent has merged its actions and awareness to facil-790 itate the sense of involvement and control leading to robust 791 performance levels that can achieve dynamic and complex 792 goals.

793
The evaluations provide promising evidence to explore 794 Flow as a tool to model artificial agents that can perform 795 in complex real-world problem domains with dynamic and 796 constrained environments. This paper investigated Flow in 797 the field of RL, but there exists opportunities to apply the 798 concept with other AI techniques such as artificial neural 799 networks and evolutionary computing. As Flow is a universal 800 concept that is associated with the characteristics of the opti-801 mal experience enjoyed by a person/agent, it can be adapted 802 in any AI domain to understand the implications of learning 803 and knowledge transfer during multiple complexity levels of 804 a task. Both simulation environments that are tested within 805 this context are discrete environments where incrementing 806 challenges are relatively intuitive. However, Flow can also 807 where complexity cannot be increased in fine improvements.

819
Further, the current results are focussed on single agent sys-820 tems as the primary concern of this paper is to investigate 821 the applicability of the concept of Flow in AI domains.