Optimal Structure Design of Ferromagnetic Cores in Wireless Power Transfer by Reinforcement Learning

In this paper, a reinforcement learning algorithm is applied for the first time to find a ferromagnetic core structure with optimal coupling coefficient between transmitting (Tx) and receiving (Rx) coils of a wireless power transfer (WPT) system. Since formula-based theoretical design is not available due to the non-linear magnetic field distortion stems from the presence of the ferromagnetic core in a WPT system, the proposed design has been achieved through finite element analysis (FEA) simulation-based data learning. The proposed design methods are so general that they can be applied to any conventional WPT coil types. We applied the proposed algorithm to the ferromagnetic core structure design of a simple dipole coil first. By training only 2.3 % data out of total possible cases, it is experimentally verified that the core structure obtained by the proposed method has a coupling coefficient 7 % higher than that of the example design level in the case of 98 cm distance between Tx and Rx coils.


I. INTRODUCTION
In 2016, people of the world were astonished at the result of AlphaGo versus Lee Sedol, known as the Google DeepMind Challenge Match [1]. After the Go match, Lee said, ''I questioned human creativity.'' The CEO of Google DeepMind remarked, ''AI is a powerful tool to help people do their jobs.'' This suggests that reinforcement learning can be applied wherever creativity is needed [2], [3].
In the era of the 4th industrial revolution, which requires a large amount of computation based on big data, the battery problem of electronic devices is becoming a major issue, and wireless power transfer (WPT) technology has been studied as one of the promising solutions. WPT technology has been researched and commercialized by many research groups and industries [4]- [21].
The system efficiency and power ratings of a WPT system are considerably affected by the magnetic coupling between The associate editor coordinating the review of this manuscript and approving it for publication was Malik Jahan Khan . the transmitting (Tx) and receiving (Rx) coils [4], [18], [22]- [30]. Since ferromagnetic cores have high relative permeability characteristics (∼2000), they have the advantages of increasing the magnetic coupling between the Tx and Rx coils [22]- [25], and shielding the magnetic flux of undesired directions [18], [25]. Due to these advantages, ferromagnetic cores have been widely used in the WPT coils in spite of their heavy weight and high price.
Magnetic field distribution according to the different ferromagnetic core structures of the Tx coil is shown in Fig. 1. The straight shape ferromagnetic core is applied in Fig. 1(a), and the C-shape ferromagnetic core is applied in Fig. 1(b). It is obvious that the magnetic field distribution near the Tx and Rx coils varies depending on the shape of the ferromagnetic core. However, the problem is that since the magnetic field distribution is non-linearly distorted by the presence of the ferromagnetic cores, key variables affecting the performance of a WPT system, such as the coupling coefficient, mutual inductance, and the magnetic flux density at a specific location, still cannot be theoretically VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ analyzed by using electromagnetic theory. For this reason, numerous past studies analyzing the effects of ferromagnetic materials on magnetic field distribution have entirely relied on the simulation-based analysis or empirical equations so far [31]- [33]. Therefore, in the conventional method for developing the ferromagnetic core layout design of a WPT system, the initial structure design for obtaining high coupling coefficient was mostly designed by the experience, creativity, and intuition of skilled WPT designers. After this initial step, many times of simulation and experiment were conducted for the detailed design until specific criteria are satisfied such as maximum efficiency or load power. The problem is that it is still not clear whether the initial structure produced by the designer's intuition and experience will show the best performance under the given application constraints, and even worse, the initial structure design may be difficult or can take a considerable amount of time if the constraints are very complicated.
In the meanwhile, machine learning is known as one of tools for analyzing non-linear systems. In 2015, machine learning outperformed humans in the field of image recognition, which has very non-linear input and output relationships [34]. In the same way, the machine learning can be also used to learn the characteristic of non-linear magnetic field distortion caused by ferromagnetic core in a WPT system, and to determine the optimal layout of ferromagnetic core having high performance.
Since WPT technology is basically based on a hardware design, there has been no research using software algorithms such as machine learning to date. Taking into consideration the background above, a ferromagnetic core structure design of a WPT system using a machine learning algorithm is proposed for the first time in this paper. With application of the machine learning algorithm to the core structure design of a WPT system, because it is possible to find an innovative core structure of Tx and Rx coils that transcends the existing knowledge and to establish an opportunity to generalize it, it is expected that this will open a new chapter in future studies of WPT technology.

II. THE NECESSITY OF A REINFORCEMENT LEARNING FOR WPT COIL DESIGN
There are a number of variables that affect system efficiency and power ratings of a WPT system, but one of the major influences is the coupling coefficient between Tx and Rx coils [4], [24], [26]- [30]. Therefore, by optimally designing the structure of the ferromagnetic core having a high coupling coefficient between Tx and Rx coils in a given condition, the performance of a WPT system can be improved.
In WPT, the coupling coefficient between Tx and Rx coils, k, which depends largely on the surrounding ferromagnetic core shape, always exists, and an approximate value that is almost similar to the actual value can be obtained through a finite element analysis (FEA) simulation. It is worthy to note that the coupling coefficient between the Tx and Rx coils can be predicted with relatively low error by FEA simulation compared to the other variables affecting power efficiency and power rating [35], [36], e.g., AC and DC loss of coil resistance, hysteresis and eddy current loss of the ferromagnetic core, core characteristics, equivalent series resistance (ESR) of capacitors, and the efficiency of the inverter and rectifier. This characteristic enhances the reliability of the proposed learning algorithm based on simulation data in this paper, which will be demonstrated through algorithm results in section III.
WPT coils have been extensively studied to date with various geometries such as traditional circular (loop) coil, bipolar (dipole) coil, DD coil, and DD quadrature coil [37]- [41]. Recently, as a patent of wireless power TV using dipole-coilbased WPT system by Samsung Electronics Corp. has been disclosed to the public [42], the shape of the bipolar coil is expected to be widely used in future electronic products. Therefore, in this paper, a dipole coil shape is selected as a design example. It is worthy to note that the proposed methods are so general that they can be freely applied to any conventional WPT coil geometries. Fig. 2(a) is the basic setup used in this paper for the FEA simulation. The default size of a cube is 1 cm × 1 cm × 1 cm, and the dipole shaped Tx and Rx coils consist of two cubes with characteristic of the ferromagnetic core and 1 turn of winding with characteristic of copper. The distance between the Tx and Rx coil is fixed to 10 cm. The 16 cubes in a 4 × 4 array on the right side of the Tx coil are all characteristic of a vacuum. The color of the ferromagnetic cube is gray, and that of the vacuum cube is green. Fig. 2(b) shows an example where eight cubes out of 16 vacuum cubes are changed to ferromagnetic materials. In this paper, for the purpose of intuitive understanding and simplicity of the research, the ferromagnetic core structure is symmetrical with respect to the y-axis, facing each other, and the height of the ferromagnetic cores is fixed at 1 cm. In the case of a ferromagnetic cube, the corresponding position is represented by '1', and in the case of a vacuum cube, the corresponding position is represented by '0'. The final target of this paper is to select the eight cubes among the 16 vacuum cubes, which make the coupling coefficient between the Tx and Rx coils high when changed to a ferromagnetic material. It is worthy to note that the design constraints used in this study are one of the examples. It can be freely changed depending on the specific application or the intention of the designer.
Because permeability of the ferromagnetic material is several thousand times higher than that of the vacuum, generally, the coupling coefficient between the Tx and Rx coils in WPT varies greatly depending on the location of the surrounding ferromagnetic cores [31]. Based on the effect, it was assumed in this paper that each cube in a 4 × 4 array would influence the coupling coefficient when it changed to a ferromagnetic material, respectively. It can be seen that the influence represents the non-linear magnetic field distortion by the ferromagnetic core, and if the influence can be successfully analyzed, it is obvious that a core structure with a high coupling coefficient can be found. Based on this rationale, a machine learning algorithm, which can analyze the non-linear characteristic of the ferromagnetic cores on the coupling coefficient and propose a ferromagnetic core structure having high coupling coefficient, as learning progresses, is firstly introduced in this paper to obtain the high performance ferromagnetic core shape of a WPT system.
The initial method of this study was to learn and predict the influence of the ferromagnetic cores, which cause non-linear magnetic field distortion, on the coupling coefficient between the Tx and Rx coils by machine learning (supervised learning). However, unlike the conventional case where machine learning algorithm is trained from thousands to tens of thousands of data, there are two special constraints in this WPT application. One is that the FEA simulation takes around 4 minutes to obtain one data set of input and output. The other is that the number of possible cases to select ferromagnetic cubes increases exponentially as the number of selectable vacuum cubes increases. Therefore, it is difficult to obtain a sufficient number of training data sets for training the neural network in a limited time. For these reasons, to achieve high performance with limited time and data, reinforcement learning algorithm that can also learn the non-linear systems by using the neural network and select the optimal action through reward system by analyzing every possible action in each state is applied in this study.
In the case of Q-learning, which is one of the well-known types of reinforcement learning, when an agent takes an action, the environment gives a reward for the action. As the reward data accumulate, the agent selects the next action in the direction of maximizing the reward [2], [3]. Similarly, if the coupling coefficient between the Tx and Rx coils is given as the reward of Q-learning, the Q-learning algorithm will analyze the influence of each location of the ferromagnetic core on the coupling coefficient as the reward data accumulate, and then propose a core structure having a high coupling coefficient. This is similar to the situation in which an agent learned through Q-Learning solves complex mazes, or, in a brick-breakout game, an agent breaks the edge first to clear the stage as quickly as possible [43]. Therefore, if a reinforcement learning algorithm is utilized, there is a high possibility that more innovative features or principles, hitherto unimagined by human designers, can be found in future WPT research.

III. FERROMAGNETIC CORE STRUCTURE DESIGN BY Q-LEARNING ALGORITHM
In this section, the Q-learning algorithm is utilized to find optimal core structures with a high coupling coefficient between the Tx and Rx coils in an WPT system. The basic framework of the Q-learning algorithm was implemented through Python with reference to the open source [2], [3], [44], and the coupling coefficient between the Tx and Rx coils obtained from the ANSYS Maxwell FEA simulation will be used as a reward of the Q-learning algorithm.

A. APPLICATION CONSTRAINTS
The constraint condition of reinforcement learning in this application is that the number of data is small. When running on computing power of Intel Core i7, 64GB RAM, and GTX 1060 6GB, the analyzing time per one simulation for obtaining a coupling coefficient is about 4 minutes. Note that the total number of possible cases of selecting eight cubes out of 16 equals 16C8, 12870, and it takes about 35 days to fully analyze all cases. Because it is almost impossible to investigate all cases when the number of cubes increases or becomes a 3D shape in this application, how to achieve high performance with limited time and data is considerably important.
In this paper, to observe the creativity of the proposed method, the algorithm was not implemented by considering the actual situation; i.e., the ferromagnetic cubes should not always have to be attached. However, for the practical usage, a condition that all ferromagnetic cubes have to be always attached can be added in the future study. Fig. 3 shows the operation mechanism of the proposed Q-learning algorithm for core structure design of WPT. When the Q-learning algorithm receives the initial state, it takes an action by the decaying Epsilon-greedy (E-greedy) policy. The decaying E-greedy policy is a policy that selects cubes at random in the early episodes of learning (exploration), and selects cubes that are predicted to produce high coupling coefficients between the Tx and Rx coils by using accumulated reward data stored in the neural network as the end of learning is reached (exploitation). Note that an action in this paper is defined as selecting eight ferromagnetic cubes out of 16 vacuum cubes by the decaying E-greedy policy.

B. Q-LEARNING ALGORITHM FOR FERROMAGNETIC CORE DESIGN
After the action selection, the selected ferrite information (terminal state) is input to the environment, which is ANSYS Maxwell. After running the simulation, the environment returns a reward, which is the coupling coefficient between the Tx and Rx coils. The reward becomes an action-value function (Q-value), as described in the following equation [2]: where θ − is the main neural network and s, a, and k in (1) are state, action, and the coupling coefficient between the Tx and Rx coils, respectively. Note that the application of this paper does not consider the future reward because it reaches the terminal state with an action only. After training the neural network, finally, the state is reset to the initial state and one episode ends. Operating other Q-learning algorithms is implemented based on the open source [44], and Table 1 summarizes the design parameters of Q-learning algorithms used in this paper.  Fig. 4 shows the input and output information for training the neural network. The input layer used in the neural network represents the information of the eight selected ferrite cubes out of the total 16 vacuum cubes, which is terminal state. The output layer used in the neural network is also designed as a [1 × 16] array with the information of the coupling coefficient at selected ferrite locations. Because nodes of the input layer, hidden layers, and output layers are fully connected, the output of one node is the input for all nodes in the next layer. For the node, the simplest linear model where each node has one weight and one bias, respectively, is used in this study.  In the neural network training, because only the coupling coefficient is the variable to be trained, the output layer can be basically expressed as a [1 × 1] array. However, for implementing the exploitation in this study, it is necessary to find the terminal state representing the specific eight ferromagnetic cubes that are predicted to have a high coupling coefficient among the 16 cubes.

C. NEURAL NETWORK TRAINING
Assuming the output layer is set as a [1 × 1] array with the coupling coefficient only, the neural network must be used reversely to find the terminal state representing the specific eight ferromagnetic cubes that are predicted to have a high coupling coefficient. However, this is not suitable because the neural network is originally designed to operate only in the forward direction. Therefore, in this paper, the output layer used for training the neural network is also designed as a [1 × 16] array with the information of coupling coefficient at selected ferromagnetic locations, as shown in Fig. 4. Therefore, as learning progresses, the contribution of each ferromagnetic cube to the coupling coefficient between the Tx and Rx coils can be separately analyzed. After training several data sets, by assigning the input layer of the [1 × 16] array with all components of '1' (by assuming all cubes are selected as ferromagnetic material), it can be shown that the Q-values in the output layer predicted by the trained neural network have a degree of influence on the coupling coefficient at the corresponding locations. Accordingly, location information of the top eight Q-values out of 16 Q-values predicted by the trained neural network will be selected as the following action of exploitation. As the learning progresses, in the late episodes, those cubes having an influence on high coupling coefficient can be selected with a high probability.
Training neural network is conducted by taking the gradient descent to the following loss function [2], [3], [44]: where θ − is the main neural network and θ is the target neural network. Batch learning is applied for stability of the networks, and the target network is updated identically as the main network for every five episodes in the proposed algorithm.

D. ANSYS MAXWELL PROGRAM SETTING
With the ANSYS scripting function provided by the Maxwell, one can run the program fully automatically by using Python. When the terminal state is input to the Maxwell, a code that returns the coupling coefficient between the Tx and Rx coils as an output by conducting FEA simulation is implemented. Therefore, the learning mechanism including action selection, simulation for electromagnetic design, reward acquisition, neural network training, and reset process in Fig. 3 is completely implemented only by Python in this research. Throughout this paper, all simulations were performed by using ANSYS Maxwell version 17.0. Solution frequency was set as 100 kHz, which meets the Qi standard (Design standard for WPT system). Material for ferromagnetic cores and windings were selected as ferrite core and copper, respectively. Boundary conditions were set to insulate the surfaces of the Tx & Rx windings. The detail information used in the FEA simulation is summarized in Table 2.

E. PERFORMANCE OF THE PROPOSED LEARNING ALGORITHM
Under given conditions, three example designs are shown in Fig. 5(a) to Fig. 5(c). Since there are no general rules for designing the ferrite core structure in the conventional WPT researches, only the example designs are introduced in this study. The structure with the maximum coupling coefficient between the Tx and Rx coils under given conditions is shown in Fig. 5(d) by investigating all possible cases. The results of the coupling coefficient normalized to the maximum value are in parentheses. Fig. 6 shows the magnetic flux distribution of the example design A and the core structure with the maximum coupling coefficient between the Tx and Rx coils under the given conditions. It is worthy to note that the 10th, 11th and 15th ferromagnetic cubes in a box of Fig. 6(b) contribute to increasing the coupling coefficient between the Tx and Rx coils.   7 shows six examples of core structures obtained by the proposed learning algorithm. The algorithm learned a few general rules by itself to find a core structure having a high coupling coefficient under given conditions. It is worthy to note that the reinforcement learning algorithm even chooses the 10th, 11th, and 15th cubes with high probability, which provide high magnetic coupling between the Tx and Rx coils, by independently analyzing the ferrite cubes, as shown in Fig. 7(b).
To obtain high performance in a short time, and to avoid excessive time-consuming iteration of the learning algorithm, the number of episodes of the proposed learning algorithm was set to simulate 2.3 % (300 episodes) out of the total possible cases, 12870 episodes, which will be described in detail in the following sub-sections.
The reason the proposed algorithm cannot always find a core structure with the maximum coupling coefficient is that the proposed algorithm has to search for the high performance structure through learning with not only limited data (300 episodes), but also unknown final destination core structure.

F. COMPARISON WITH GENETIC ALGORITHM
The goal of this study is to find out the ferromagnetic core structure having a high coupling coefficient between the Tx and Rx coils in a short time by analyzing the non-linearity of the magnetic field distortion in the WPT system.
In general, genetic algorithm is considered to be useful when the search space is large, complex or poorly understood, the domain knowledge is scarce, expert knowledge is difficult to encode, and a mathematical analysis of the problem is difficult to carry out [45], [46]. In this respect, the genetic algorithm can be also applied to the WPT application in this study, therefore, the comparison study between the reinforcement learning algorithm and genetic algorithm is conducted.
For the comparison, the key point is whether the algorithm can converge quickly. There are two reasons. At first, exploration in this study is costly and impossible to carry out all cases. Secondly, to obtain an action-value function data, which is fitness function in the genetic algorithm, the FEA simulation which takes around 4 minutes to analyze the non-linearity should be used. Therefore, the comparison between the two algorithms was conducted in terms of the convergence of the coupling coefficient compared to the number of training sets. Fig. 8 shows the flowchart of the genetic algorithm to design the ferromagnetic core structure in this study. It starts from making a random population. When fitness function receives the first generation, it analyzes the coupling coefficient of all the ferromagnetic core structures of the first generation by using FEA simulation. After the fitness function, best samples and lucky samples are selected as survivors. By using the survivors, crossover step is implemented. In the crossover phase, the child always inherits the positions of the ferromagnetic cubes that both parents have, and for the remaining number of cubes, the child randomly inherits the unique positions of the ferromagnetic cubes by each parent. In the mutation stage, there is a low probability that one VOLUME 8, 2020 randomly selected ferromagnetic cube will turn into a vacuum cube, and one randomly selected vacuum cube will turn into a ferromagnetic cube. The design parameters of the genetic algorithm used in this study are summarized in Table 3. Considering the long computation time of the FEA simulation in the WPT application, the population and iteration of the genetic algorithm is set equal to the time required for the FEA simulation of the reinforcement learning algorithm. The comparison results of the 10 times operations of the two algorithms are shown in Fig. 9. Unlike the reinforcement learning algorithm, which converges rapidly with a high coupling coefficient of 99.3 % on average at 300 episodes, the genetic algorithm converges relatively slowly and find a coupling coefficient of 97.7 % on average at 300 episodes. This is because the influence of each ferromagnetic cube on the coupling coefficient has a considerably non-linear characteristic. Genetic algorithm that eliminates the explicit information concerning less desirable actions, as generations evolve, is slow to converge. On the contrary, in the case of the reinforcement learning algorithm, all contributions of the ferromagnetic cubes to the coupling coefficient are trained through the neural network, so that the optimum coupling coefficient can converge quickly. Fig. 10 shows the number of times selected per cube out of 100 times operations by the proposed reinforcement learning algorithm. With the proposed algorithm, the 1st, 2nd, 3rd, 4th, and 5th cubes are selected with a very high possibility. In addition, the 10th, 11th, and 15th cubes are selected with a high possibility. Fig. 11 shows histogram results of the reinforcement learning algorithm and genetic algorithm for 100 times operations, respectively, when the final episodes equal to 300. Note that 300 structures out of the total 12870 possible structures were searched at an operation for each algorithm. As a result of 100 times operations, the reinforcement learning algorithm finds a core structure with an average coupling coefficient of 99.3 % compared to the maximum value. In the case of genetic algorithm, a value of 97.7 % is obtained.
Therefore, by using the software algorithms, it is expected that an innovative core structure that cannot be even imagined by the WPT designer can be found. For the WPT application, comparison results show that the reinforcement learning algorithm has higher performance compared to the genetic algorithm. However, it turns out that both two algorithms are all effective considering that the ferromagnetic core structure with the highest coupling coefficient could not be easily imagined by the WPT designer. The two algorithms will show more powerful performance when the given constraints become more complicated or the number of selectable cubes is increased. Furthermore, it is possible to find the optimal core structure of the Tx coil while fixing the structure of the Rx coil in a specific shape, or to discover a core structure of the Tx coil that can provide a similar coupling coefficient to multiple Rx coils; this is left as further work.

G. STABILITY AND CONVERGENCE OF THE ALGORITHMS
It is known that there are difficulties in analyzing the stability and convergence of the algorithms where mathematical or theoretical relationships between the input and output data are not identified [47]- [51]. To address the stability and convergence issues of the reinforcement learning and genetic algorithm used in this study, the authors are conducting research to formulate the non-linear magnetic field distortion phenomena caused by the ferromagnetic core in a WPT system. However, since the non-linear magnetic field distortion in the WPT system is still beyond physical theory, theoretical formulation of the input and output relationship may take a considerable time. Due to this limitation, the authors use trial-and-error techniques to determine the number of episodes in this study. As shown in Fig. 9(a), the result of adjusting the episode to 300 shows that the reinforcement learning algorithm can converge quickly. The authors hope that the methods and results of this study will be used to support future research that will physically characterize the mathematical models between the ferromagnetic cores and magnetic fields.

IV. EXPERIMENTAL VERIFICATIONS
Experiments were conducted to measure the coupling coefficient between the Tx and Rx coils for the example design A and the core structures obtained by the reinforcement learning algorithm. Since the ferromagnetic cube used in the algorithm has a problem of precise ferrite machining, a customized prototype was used for the experimental verifications. PM12 from TODAISU was adopted for the ferromagnetic cores; it has 3200 relative permeability at the conditions of room temperature and 100 kHz measuring frequency. The size of one customized PM12 is l× w × t (= 98 mm × 48 mm × 4 mm). The number of turns for the Tx and Rx coils was set to 6 to prevent small inductance values, and the distance between the Tx and Rx coils was set to 10l (= 98 cm), as shown in Fig. 12. In the experiment, inductance values were measured by an impedance analyzer, E4990A made by KEYSIGHT. To minimize the effect of the floor concrete structures, the measurement was carried out on a 100 cm high wooden table. Example design A (Fig. 5(a)) and two results obtained by the proposed algorithm ( Fig. 7(b) and Fig. 7(f)) were fabricated, as shown in Fig. 13. In each case, the coupling coefficient between the Tx and Rx coils was measured by the following equations [52], [53].
where L 1 and L 2 are the inductance of the Tx and Rx coils, respectively, k is the coupling coefficient between the Tx and Rx coils, and M is the mutual inductance between the Tx and Rx coils. Note that L Total is the sum of the individual inductances connected together in series. In Table 4, the measurement results of L 1 , L 2 , k, and M for the fabricated structures in Fig. 13 are summarized. Measurement results show that the structures obtained by the proposed algorithm outperform the example design with VOLUME 8, 2020 respect to the coupling coefficient between the Tx and Rx coils at the air-gap of 10l. Fig. 14 shows the simulation and experiment results of the coupling coefficient for the fabricated two structures, which are example design A (Fig. 13(a)) and result 2 by the proposed algorithm ( Fig. 13(b)), with respect to the distance between the Tx and Rx coils. Due to the manufacturing errors, small discrepancies exist between the simulation and experiment results, but it can be seen that there is no difference in the tendency of the two results. The coupling coefficient of the proposed structure starts to become higher than that of the general design A when the distance between the Tx and Rx coils is 7l. Therefore, the results are in good agreement with the measurement results in Table 4. Simulation and experiment results of the coupling coefficient w.r.t. the distance between the Tx and Rx coils for the fabricated example design A (Fig. 13(a)) and result 2 ( Fig. 13(b)).
Physically, this is the result of lowering the magnetoresistance between the Tx and Rx coils, by virtue of the 10th, 11th, and 15th ferromagnetic cubes of the proposed structure, as shown in the magnetic flux distribution of Fig. 6.
In this paper, only the coupling coefficient was used as a reward of the software algorithm. However, it is possible to freely design the optimal structure by applying other parameters at the same time such as mutual inductance value for maximizing the output power in the Rx coil, or magnetic flux density value for uniform magnetic flux distribution. Therefore, the method proposed in this paper can be used to optimize previously commercialized WPT products, and it is expected that this will further upgrade the WPT technology.

V. CONCLUSION
An optimal structure design of ferromagnetic cores in WPT by reinforcement learning algorithm has been proposed in this paper for the first time. Unlike conventional design methods that rely on the intuition or experience of the WPT designer, the proposed method has two advantages: 1) Applicable under complex constraints where it is difficult to ensure that the design by the WPT designer is optimal 2) Possible to discover an innovative structure without investigating all cases Because of these advantages, even though most of the WPT designs are based on the hardware characteristics, the learning algorithm is applicable and can find an innovative structure that shows high performance above the level of structures proposed by skilled WPT designers. In addition, it can be easily applied to various types of WPT optimal designs such as a coil shape design, or a flux distribution design. Through combination with the convolutional neural network (CNN), which is optimized for image data learning, it is anticipated that optimization will be possible for a wider range of WPT design such as high dimensional matrix or 3-D optimal structure design to find more innovative structures; this is left as further work. Finally, software power is expected to bring more creativity to future WPT studies.