Improving Characteristics of FSMs With Mixed Codes of Outputs

Practically, any digital system includes sequential blocks. This paper considers a case when LUT-based sequential blocks are represented by Mealy finite state machines (FSMs). The LUT count is one of the most important characteristics of an FSM circuit. In this paper, a method is proposed which aims at decreasing the LUT counts of FPGA-based Mealy FSMs with mixed encoding of the collections of outputs. To do it, a method of encoding of the fields of compatible states is proposed. The proposed approach leads to LUT-based Mealy FSM circuits having three levels of logic blocks. Each function for any logic level is represented by a circuit including a single LUT. There is given an example of FSM synthesis with the proposed method. The experiments are conducted using standard benchmark FSMs. The results of experiments show that the proposed approach produces LUT-based circuits with fewer LUTs than it is for circuits produced by other investigated methods (Auto and One-hot of Vivado, JEDI, the mixed encoding the collections of outputs). The LUT count is decreased by an average of 6.97 to 62.85 percent. These improvements are accompanied by a slight decrease in the maximum operating frequency. The frequency is decreased by up to 8.09%. The advantages of the proposed method increase as the number of FSM inputs and states increases.


I. INTRODUCTION
Any digital system includes various combinational and sequential blocks [1]. As a rule, to implement circuits of combinational blocks (such as adders, shifters, decoders, and so on), standard library cells of computer-aided design (CAD) tools can be used [2]. Unfortunately, this approach cannot be used to implement circuits of sequential blocks (such as control units) [3]. A circuit of a sequential block is determined by block's behaviour. To specify the behaviour of a sequential block, it is necessary to use some formal model. In many cases, the models of Mealy finite state machines (FSMs) are used for this purpose [4], [5].
Circuit designers strive to minimize such characteristics of FSM circuits as: the hardware amount, the propagation time, and the power consumption. These characteristics are strongly interconnected [1], [6]. As a rule, the hardware The associate editor coordinating the review of this manuscript and approving it for publication was Engang Tian . amount (for example, the occupied chip area) has a strong influence on the rest of the FSM circuit characteristics [6], [7]. To optimize the hardware amount, various methods of decomposition can be used [8]. But the structural decomposition leads to an increase in the number of logic levels in the resulting FSM circuits. In turn, an increase in the number of logic levels leads to a decrease in the FSM performance It is quite possible that a multi-level FSM circuit does not provide the required performance. In this case, it is necessary to reduce the number of logic levels. It is highly desirable that reducing the number of levels does not lead to a sharp increase in hardware amount of a resulting FSM circuit. One of these approaches is proposed in [9]. This is a method of mixed encoding of FSM outputs. This method is advisable to use if FSM circuits are implemented with field-programmable gate arrays (FPGAs) [10]- [12].
Nowadays, FPGAs are very popular platforms for implementing various digital systems [13]. This explains why we chose the FPGA-based Mealy FSMs as a research object in our current article. In this article, we discuss Mealy FSM circuits implemented using look-up table (LUT) elements, programmable flip-flops and interconnections of FPGAs. We use LUT count as a characteristic of hardware amount. Now, Xilinx is the largest manufacturer of FPGA chips [14]. Due to it, we are orienting this research on solutions of Xilinx. So, we consider ways of reducing the LUT count in FPGA-based Mealy FSMs.
A LUT is a logic block having NI LUT inputs and a single output [10], [14], [15]. If an arbitrary Boolean function has up to NI LUT arguments, then the corresponding circuit is implemented as a single LUT. Unfortunately, the value of NI LUT does not exceed 6 [11], [16]. If a Boolean function depends on more than NI LUT variables, then it should be decomposed using various methods of functional decomposition [1], [17], [18]. The functional decomposition (FD) results in multi-level FSM circuits with complex system of interconnections [19], [20].
As a rule, Mealy FSM circuits are represented by systems of Boolean functions (SBFs). The step of technology mapping [19], [21] is executed to transform these SBFs into a LUT-based FSM circuit. The outcome of this step tremendously affects the LUT count, maximum operating frequency and power consumption of a resulting FSM circuit [1], [22]. As shown in [23], time delays of the interconnection system are starting to play a major role in comparison with LUT delays. Also, more than 70% of the power dissipation is due to the interconnections [16]. So, the optimization of interconnections for LUT-based FSM circuits leads to increasing the operating frequency and reducing the consumption of power. The system of interconnections could be optimizing, for example, due to the twofold state assignment [24], [25].
The characteristics of LUT-based circuits can be improved by the increasing the number of LUT inputs. But, the results of research [13] predict that it is practically impossible to expect an increase in value of NI LUT . As a rule, modern LUTs have no more than 6 inputs [14], [15]. An increase in the number of inputs leads to an imbalance of area-time-power characteristics of a LUT circuit. So, there is an imbalance between the numbers of arguments in SBFs representing FSM circuits and a rather small value of NI LUT . To reduce the impact of this imbalance on quality of FSM circuits, it is necessary to improve synthesis methods of FPGA-based FSMs.
Our current paper considers the synthesis of LUT-based Mealy FSMs obtained using the method of mixed encoding of outputs [9], [26]. The mixed encoding of outputs allows, reducing the LUT counts of FPGA-based Mealy FSMs compared with equivalent FSMs' circuits based on using the methods of functional decomposition [8]. In the best case, this leads to two-level FSM circuits. But it is quite possible that there will be more than one level of LUTs generating input memory functions and additional variables encoding collections of FSM outputs. If performance is the dominant quality factor, then the number of levels in FSM circuit should be reduced. To reduce the number of levels, we propose a new synthesis method based on using codes of fields of compatible states.
The main contribution of this paper is a novel design method aimed at reducing the number of LUTs in circuits of FPGA-based Mealy FSMs with mixed encoding of outputs.
The main idea of the proposed approach is to use the codes of fields of compatible states for the state assignment. This approach is similar to the twofold state assignment [24], [25]. But our new method allows excluding the block of transformation of state codes which is necessary in the case of twofold state assignment. The experimental results show that this method allows decreasing the LUT counts of LUT-based FSMs compared with equivalent FSMs obtained using some known methods of FSM design.
The further text of the article includes five sections. The background of single-level LUT-based Mealy FSMs is shown in the second section. The state-of-the-art in synthesis of LUT-based FSMs is discussed in the third section. The main idea of the proposed approach is considered in the fourth section. In the fifth section, there is shown an example of synthesis of FSM with the encoding of the fields of compatible states. The sixth section includes the results of experiments. A short conclusion ends the paper.

II. BACKGROUND OF LUT-BASED MEALY FSMs
There is a lot of configurable logic blocks (CLB) [11], [14] in FPGAs manufactured by Xilinx. There are such CLBs as embedded memory blocks, digital signal processors, and microprocessors. To connect CLBs, a programmable routing matrix [10] is used. In this paper, we consider FSM design with CLBs consisted of LUTs, multiplexers and programmable flip-flops. A LUT has NI LUT inputs and a single output. Networks of LUTs implement systems of Boolean functions representing an FSM circuit.
If a Boolean function depends on up to NI LUT arguments, then it can be implemented by a single-LUT circuit. To implement sequential circuits, it is necessary to connect combinational outputs of some LUTs with inputs of flip-flops. As a rule, an FSM state register (SRG) is based on D flipflops [2], [12]. To load state codes into SRG, the pulse of synchronization Clk is used. As a rule, the initial state has code with all zeros. To load zeros into all flip-flops of SRG, the pulse Reset is used. A programmable multiplexer selects the type of a CLB output (combinational or registered).
As mentioned in [3], [4], SBFs representing FSM circuits may depend on up to 50-70 Boolean variables. At the same time, modern LUTs have no more than 6 inputs. This imbalance leads to applying various methods of functional decomposition (FD) in LUT-based FSM design [1], [17]. FD-based circuits have one serious drawback: they have a lot of logic levels connected by ''spaghetti-type'' nections [11].
A Mealy FSM is represented by a vector A = I , O, S, δ, λ, s 1 [3], [4]. The components of A have the following meaning: a set of internal states, δ is a function of transitions, λ is a function of outputs, and s 1 is an initial state. Various tools can be used to represent the vector A, such as state transition graphs [4], binary decision diagrams [27], [28], and-inverter graphs [29], [30], graph-schemes of algorithms [3]. In this paper, we represent Mealy FSMs by state transition tables (STTs) [4].
An STT includes five columns [4]: a current state s C ; a state of transition (a next state) s T ; an input signal (a conjunction of FSM inputs) I h causing a transition from s C to s T ; a collection of FSM outputs O h ⊆ O generated during the transition < s C , s T >; the number of interstate transitions h (h ∈ {1, . . . , H }). For example, Table 1 is the STT of a Mealy FSM A 0 .
The following characteristics of FSM A 0 can be found from Table 1: L = 5 inputs, N = 9 outputs, M = 9 states, and H = 23 interstate transitions. Table 1 defines functions of transitions and outputs of FSM A 0 . For example, the following functions are determined by the first row of Table 1: δ(s 1 , i 1 ) = s 2 and λ(s 1 , To design an FSM circuit, it is necessary to encode states s m ∈ S by binary codes K (s m ) having R s bits. In this article, we use the maximum binary state assignment with The state variables from the set T = {T 1 , . . . , T RS } are used to create the state codes. The state variables are kept into the SRG. The input memory functions (IMFs) can change the contents of SRG. The IMFs create a set D = {D 1 , . . . , D RS }. Using state codes and IMFs leads to transforming the initial STT into a direct structural table (DST). A DST has three additional columns: a code of a current state K (s C ), a code of a state of transition K (s T ), and a column D h with a collection of IMFs equal to 1 to load a code K (s T ) into SRG [2].
A DST defines two SBFs representing an FSM circuit. These SBFs are the following: (2) The SBFs (2)-(3) represent a structural diagram of P Mealy FSM (Fig. 1). In P Mealy FSMs, the block of IMFs is specified by SBF (2), the block of outputs is determined by SBF (3). The inputs of register SRG are connected with outputs of the block of IMFs. The outputs of SRG form a feedback necessary to create a sequential circuit [1]. In each cycle of FSM operation, the SRG contains a code K (s C ). The pulse of synchronization Clk allows loading a code K (s T ) into the SRG. If Reset = 1, then the code K (s 1 ) is loading into SRG. In this paper, we consider a case when both blocks are implemented using LUTs, flip-flops and interconnections. In this case, the flip-flops of SRG are distributed among LUTs of the block of IMFs.

III. OPTIMIZING CIRCUITS OF FPGA-BASED MEALY FSMs
The first methods of synthesis of FPGA-based FSMs appeared in the mid-1980s. To date, a huge number of works on this topic have been published. The problems connected with synthesis and optimization of LUT-based FSMs are discussed, for example, in [2], [3], [6], [8], [12], [17], [23], [25], [27], [28], [31]. The analysis of numerous literature shows that characteristics of FSM circuits obtained by different synthesis methods may differ significantly. Three main characteristics are considered to estimate the quality of an FSM circuit: 1) the chip resources used for implementing the circuit; 2) the maximum operating frequency (or maximum time of cycle) and 3) the power consumption [2].
In the case of LUT-based FSMs, the following chip resources are used: 1) LUTs; 2) programmable flip-flops; 3) programmable interconnections; 4) the circuit of synchronization and 5) the programmable input-outputs. Obviously, the best circuit requires the minimum possible chip area; it has the highest possible operating frequency and the lowest value of power consumption. However, such a combination of characteristics is almost impossible. For example, a decrease in the LUT count is usually accompanied by a decrease in the performance [2].
In this article, we propose a method for improving the LUT count of FPGA-based Mealy FSMs. The method is based on encoding of fields of compatible states.
Each row of a DST determines the following conjunction F h : In (4), the symbol S C stands for a conjunction of state variables corresponding to the code of a current state s C ∈ S from the h-th row of the DST. The functions (2)-(3) are represented as sum-of-products (SOPs) [4] depended on terms (4). There If the condition takes place, then the corresponding circuit consists of a single LUT. This is the best possible solution. In this case, there are exactly R S + N LUTs in the FSM circuit. This is the best possible LUT count for an FSM circuit. If condition (5) is violated, then the corresponding LUT-based circuit is multi-level.
To implement multi-level LUT-based FSM circuits, various methods of functional decomposition (FD) are used [17]. The FD is a process during which some additional functions appear. It means that an initial SOP is broken down by partial SOPs corresponding to these additional functions. This process is terminated when each partial SOP includes not more than NI LUT literals.
One serious drawback of FD follows from the analysis of work [17]. It is possible that different partial SOPs of the same function includes the same inputs i l ∈ I or/and the same state variables T r ∈ T . This leads to a duplication of literals in different partial SOPs of the original SOP. In turn, this complicates the FSM circuit interconnection system. This phenomenon complicates the placement and routing process. This leads to rather slow FSM circuits with a high value of power consumption [24]- [26]. So, if condition (5) is violated, then the LUT count is equal to To optimize LUT count of multi-level FSM circuits, it is necessary to diminish the value of N (F). To solve this problem, a large number of FD-based methods have been developed [1], [17], [32], [33]. The analysis of some FD-based methods can be found in [1]. Various algorithms of FD are included into CAD tools aimed in implementation of FPGA-based digital systems.
The number of literals in SOPs representing an FSM circuit may be reduced due to the one-hot state assignment [4]. In this case, the following relation takes place: R S = M [4]. There are M flip-flops in the SRG, if the one-hot state assignment is used. In the case of FPGA-based design, this is not a problem due to a large summarized number of programmable flip-flops in CLBs. So, this approach is very often used in LUT-based FSM design. For example, the one-hot state assignment is used in the academic CAD system ABC by Berkeley [30]. Also, this approach is used in industrial CAD packages such as, for example, Vivado of Xilinx [34] and Quarts of Intel (Altera) [35].
The main drawback of the one-hot state assignment is an increase in the number of IMFs compared with their minimum possible number determined by (1). But these IMFs are much simpler than in the case of maximum binary state assignment [2]. There is a comparison of these state assignment methods in [36]. The research [36] shows that using one-hot codes improves FSM characteristics if there is M > 16. However, there is one more factor influencing the LUT-based FSM circuit characteristics. The rather small number of LUT inputs increases the influence of the value of NI LUT on the characteristics of LUT-based FSM circuits [2]. As shown in [37], the maximum state assignment produces better LUT-based FSM circuits if there is L ≥ 10.
So, sometimes better LUT-based circuits are based on the maximum binary state assignment. But sometimes, the using one-hot state codes gives better results. Thus, it is necessary to check which method will give the best results for a particular FSM. Due to it, we have compared the FSMs circuits produced by our proposed approach with LUT-based circuits of P Mealy FSMs produced by using: 1) the algorithm JEDI [38], 2) the method of binary state assignment Auto and 3) the one-hot state assignment of Vivado [34] by Xilinx [14]. We chose Vivado because we wanted to compare FSM circuits implemented with Xilinx FPGA chips of Virtex 7 family.
The reducing the power consumption is one of the very important problems connected with FPGA-based design [39]- [41]. This problem can be solved due to a special state assignment [42]. The goal of vast majority of these methods is a reducing the switching activity of an FSM circuit [43]. The following rule is used: the more often a pair < s i , s j > occurs in an STT, the less is the Hamming distance for state codes K (s i ) and K (s j ) [40]. But this is not only the way for reducing the power consumption. As shown in [24]- [26], the power consumption can be reduced by minimizing the number of interconnections inside an FSM circuit. To reduce the number of interconnections, it is necessary to minimize the numbers of arguments in SBFs (2)-(3) [4]. This can be done using various methods of structural decomposition [8].
The structural decomposition (SD) is an efficient way of reducing LUT counts in Mealy FSMs logic circuits [8]. The main idea of these methods is the elimination of a direct connection between FSM inputs i l ∈ I and state variables T r ∈ T , on the one hand, and outputs o n ∈ O and IMFs D r ∈ D, on the other hand. To do it, some additional functions are introduced. These functions form a set F add having N (F add ) elements. The functions f j ∈ F add depend on the FSM inputs and state variables. In turn, the FSM outputs and IMFs use the functions f j ∈ F add as arguments of corresponding SOPs. The structural decomposition leads to reducing the number of LUTs, if the following conditions are true [8]: It is possible that FD-and SD-based methods should be used together. It should be done if condition (5) is violated for some functions f j ∈ F add [8].
The encoding of collections of outputs (COs) is an SD-based method having its roots in the microprogram control units [44]. In the 1950s, this method was used for reducing the number of bits in control memory words [45]. Next, it was used to optimize hardware of FSM circuits implemented with various programmable logic devices [2]. It is also used in FPGA-based FSMs [8].
In P FSM, the LUTerD implements SBF (2), the LUTerO generates functions (3). The state register is distributed between LUTs of LUTerD. If the condition (5) if violated, then there is more than a single level of logic in this circuit.
If an STT includes Q different COs, then it is enough R Q variables to encode them: The encoding of COs presumes a representing each CO Y q ⊆ O by a binary code K (Y q ). The variables z r ∈ Z are used for the encoding, where |Z | = R Q . Now, we can turn a P FSM into a PY Mealy FSM (Fig. 3).
Analysis of Table 2 gives the following COs: There is Q = 18. Using (8) gives R Q = 5 and Z = {z 1 , . . . , z 5 }. Let us encode these COs in the trivial way: In a DST of PY FSM, the column O h is replaced by the column Z h . The column Z h contains variables z r ∈ Z equal to 1 in a code K (Y q ) of a CO from the h-th row of DST of P Mealy FSM. In the discussed case, the DST of PY FSM A 0 is represented by Table 3.
A table of LUTerO includes the columns Y q , K (Y q ), O q , q. This table is constructed in the trivial way [3]. There are Q = 18 rows in the table of LUTerO in the discussed case. Five rows of this table are shown in Table 4.
If the condition R Q > NI LUT (11) takes place, then the circuit of LUTerO is multi-level.
To reduce the number of levels up to 1, the method of mixed encoding of COs (MECO) is proposed in [9]. The main idea of MECO is the following [9]. Consider the COs then the following relation is true: The process of elimination is continued till the following condition is true: In ( (Fig. 4).

The LUTerO implements SBF
As shown in [9], the replacement of PY FSM by equivalent PY M FSM allows reducing the LUT count, if the condition (11) takes place. If condition (5) takes place for functions f i ∈ D ∪ Z ∪ O oh , then the maximum operating frequency of PY M FSM is higher than this is for an equivalent PY FSM. This is connected with the fact that LUTerO of the PY FSM is represented by a multi-level circuit.
If the condition (5) is violated for functions f i ∈ D∪Z ∪O oh , then both LUTerD and LUTerZ are represented by multilevel circuits. To implement these circuits, the methods of FD should be used. In this article, we propose an approach allowing to reduce the LUT counts in PY M FSMs. The proposed method is based on the idea of encoding of fields of compatible states (FCSs).

IV. MAIN IDEA OF PROPOSED METHOD
The proposed method is based on the existence of a state compatibility. A state s m ∈ S determines three sets. A set I (s m ) includes inputs i l ∈ I determining transitions from  the state s m ∈ S. A set O(s m ) includes outputs generating during transitions from the state s m ∈ S. A set D(s m ) includes IMFs D r ∈ D equal to 1 to load into SRG codes of states of transitions from the state s m ∈ S. For example, the following sets can be found from DST of P Mealy FSM A 0 ( Table 2):  Table 3.
In this paper, we consider the compatibility of states with respect to the value of NI LUT . If a set S k ⊆ S includes M k states, then it is enough variables to encode states s m ∈ S k . In (15), the one is added to account for the relation s m / ∈ S k . The set S k ⊆ S determines a set I (S k ) with inputs causing transitions from states s m ∈ S k . This set includes L k elements.  States s m ∈ S k are compatible if the following condition takes place: The following specific characterizes the set S k : any function generated during transitions from states s m ∈ S k is represented by a single LUT with NI LUT inputs. Of course, this is true, if state codes have R k bits determined by (15).
Let us find a partition S = {S 1 , . . . , S K } of the set S by classes of compatible states. Each class S k ∈ S determines a field FS k in the state codes FC(s m ). There is a structure of the code FC(s m ) shown in Fig. 5.
There are K fields in the code FC(s m ). There are R k bits in the field FS k . To encode states s m ∈ S k , the variables T r ∈ T k are used, where |T k | = R k . There are Obviously, there are R FC bits in the code FC(s m ).
To construct partition S with a minimum number of classes, the methods [24], [25] can be used. These methods were used in the twofold state assignment. In the twofold state assignment, each state has two codes. A code K (s m ) determines a state s m ∈ S as in element of the set S. A code C(s m ) determines state s m ∈ S as an element of the set S k ∈ S . Having two types of state codes requires a special block of transformation K (s m ) into C(s m ). This block consumes some resources of FPGA chip.
In this paper, we propose to use only codes of FCS. This leads to P FC Y M Mealy FSM shown in Fig. 6.
In P FC Y M Mealy FSM, a LUTerk corresponds to the class S k ∈ S . The variables T r ∈ T k encode states s m ∈ S k . The LUTerk implements the following systems of partial functions: The LUTerOR generates functions o n ∈ O oh , z r ∈ Z , and D r ∈ D as disjunctions of partial functions: The functions D r ∈ D enter flip-flops of SRG. The state variables T r ∈ T are outputs of SRG.
The LUTerO implements the SBF (14). If condition (12) takes place, then there are exactly |O mb | LUTs in the circuit of LUTerO.
To optimize hardware for the first circuit level, it is necessary to minimize the numbers of shared elements into eponymous sets I k , O k oh , Z k and D k ; In (24)- (27), there is q = g and q, g ∈ {1, . . . , K }.
Meeting the condition (24) allows reducing the requirements for the electrical power of the input sources. In addition, the placement and routing process is simplified for the first level of P FC Y M FSM. If conditions (25)- (27) take places, then the number of LUTs is reduced in LUTer1-LUTerK. In addition, the number of interconnections between LUTer1-LUTerK and LUTerOR is reduced, too.
If there is then there are exactly N oh + R Q + R FC LUTs in the circuit of LUTerOR, where N oh = |O oh |. Also, there is only a single level of LUTs in LUTerOR. This is the best case.
Let A(f i ) be the number of occurrences of the function then a circuit implementing this function includes only a single LUT. This is why it is so important to find a partition S that satisfies the conditions (25)- (27).
In this article, we propose a design method for P FC Y M Mealy FSMs. The STT is used to represent an FSM. The method includes the following steps: 3) Executing encoding of COs Y q ⊆ O mb and creating SBF (14). 4) Creating the partition S satifying (24)

V. EXAMPLE OF SYNTHESIS
Consider an example of synthesis of P FC Y M Mealy FSM A 0 . The circuit is implemented using LUTs with NI LUT = 4.
Step 1. As we found before, there are Q = 18 COs in the case of A 0 . So, there is R Q = 5. This means the condition (11) takes place. So, it is necessary to use the MECO approach.
Step 2. Using the method [9] gives  Table 5. Now, there is Q 0 = 12. Using (8) gives R Q = 4. So the condition (12) is true. The division of set O is terminated. Also, there is the set Z = {z 1 , . . . , z 4 }. Step 3. Let us encode COs Y q ⊆ O mb in a way minimizing the number of literals in SOPs (10). This minimizes the number of interconnections between blocks LUTerOR and LUTerO. To do it, the method [46] can be used. One of the possible solutions is shown in Fig. 7.
Using Table 5 and Fig. 7, the following SBFs can be obtained: As follows from (30), there are 20 literals in SOPs of functions o n ∈ O mb . The maximum possible number of literals is determined as N mb * R Q = 36. So, using codes (Fig. 7) allows reducing the number of literals by 1.8 times. Due to (12), the maximum number of LUTs in LUTerO is equal to N mb . In the discussed case, functions o 2 and o 4 are generated by LUTerOR. So, in the discussed case, there are 7 LUTs in LUTerO.
Step 4. This step is very important [8]. Its outcome influences significantly the number of LUTs in FSM circuit [24]. Using approach [24], [25] gives the partition S = So, there are three FCSs in the discussed case. As follows from Table 1, this partition determines the sets This value is very close to minimum.
Step 5. In the discussed case, each class includes 3 elements. Using (15) gives R 1 = R 2 = R 3 = 2. Using (17) gives R FC = 6. This determines the following sets: , T 6 }, and T = {T 1 , . . . , T 6 }. Due to (16), the outcome of encoding of states does not affect the number of LUTs in FSM circuit [24], [25]. We use codes 00 to show that states do not belong to a particular class. One of the possible outcomes of state assignment is shown in Table 6.
Step 6. There are the following columns in a DST of The meaning of these columns is clear from Table 7.
In Table 7, the state codes are taken from Table 6, the COs are taken from Table 5. The codes K (Y q ) are taken from Fig. 7.
Step 7. Tables of LUTer1-LUTer3 are created as parts of DST (Table 7). There are the same columns in the tables of LUTer1-LUTer3 and DST (Table 7). Because only R k bits of FC(s C ) include state codes for states s m ∈ S k , then only R k bits are written on the column FC(s C ) of the table of LUTerk. For example, the LUTer1 is represented by Table 8.
Tables of LUTer2 and LUTer3 are constructed in the same manner. We do not show them in this example.
Step 8. The SBFs (18)- (20) are derived from tables of LUTerk (k ∈ {1, . . . , K }). The terms of SOPs (18)- (20) are determined by (4). For example the following product terms can be derived from Table 8:   The functions (18)- (20) are constructed in the trial way. For example, the following SOPs can be derived from Table 8: The superscript ''1'' in (31) means that the corresponding functions are generated by LUTer1. There are 11 different elements in the columns O oh , Z h and D h of Table 8. So, there are 11 LUTs in the circuit of LUTer1.
Analysis of Table 7 shows that the following variables are generated during the transitions from s m ∈ S 2 : z 1 , z 2 , z 3 , D 1 , D 2 , D 3 , D 4 , D 5 , D 6 . So, there are 8 LUTs in the circuit of LUTer2. Next, we can find that variables o 6 , z 1 , z 2 , z 3 , D 2 , D 3 , D 5 , D 6 are generated during transitions from states s m ∈ S 3 . So, there are 9 LUTs in the circuit  of LUTer3. Totally, there are 28 LUTs in the circuits of LUTer1-LUTer3.
Step 9. The LUTs of LUTerOR execute disjunctions of eponymous functions with equal superscripts. In the discussed case, LUTerOR is represented by Table 9.
As follows from Table 9, there is no need in LUT for z 4 . So, there are 10 LUTs in LUTerOR.
Step 10. The functions (21)-(23) are constructed using Table 9. The following SOPs, for example, can be derived: Step 11. This step is connected with the solution of different problems of technology mapping [1]. We do not discuss this step for our example.
So, there are 28 LUTs in LUTer1-LUTer3, 10 LUTs in LUTerOR, and 6 LUTs in LUTerO. This gives 44 LUTs in the circuit of P FC Y M Mealy FSM A 0 . There are exactly 3 levels of LUTs in this circuit.

VI. EXPERIMENTAL RESULTS
In this section, the results of experiments are shown. In these experiments, there are used benchmark FSMs from the library [47]. The library includes 48 benchmarks represented in the format KISS2. These benchmarks have a wide range of basic characteristics (the numbers of states, inputs, and outputs). The characteristics of these benchmarks can be found in many articles and books, for example, in [1], [7], [24], [28], [43]. Different researchers use these benchmarks to compare area and time characteristics of FSMs obtained using new and known design methods.
To conduct the experiments, we used a personal computer having the following characteristics: CPU: Intel Core i7 6700K 4.24.4GHz, Memory: 16GB RAM 2400MHz CL15. As a platform for implementing FSM circuits we used the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [48]. The FPGA chip of this platform includes LUTs with 6 inputs. To execute the technology mapping, CAD tool Vivado v2019.1 (64-bit) [34] was used. The results of experiments are taken from reports produced by Vivado. As the source information for the CAD tool, we used VHDL-based FSM models obtained by the transformation files in KISS2 format into VHDL codes. The transformation is executed by the CAD tool K2F [8].
As a rule, the LUT count (the number of LUTs) is used to estimate a chip area occupied by an FSM circuit [48]- [50]. Using results of experiments, we compared area (the LUT count) and time (the maximum operating frequency) characteristics of FSMs based on five different approaches. Three of them are P Mealy FSMs based on: 1) Auto of Vivado (it uses binary state codes); 2) One-hot of Vivado; 3) JEDI. The fourth objects for comparison are PY M -based FSMs [8], [9] shown in Fig. 4. In the case of PY M -based FSMs, we use JEDI to encode state codes. We compared the characteristics of these four FSMs with P F CY M -based FSM circuits.
It is known [8] that area and time characteristics of LUT-based FSM circuits depend strongly on the relation between numbers of inputs (L) and state variables (R S ), on the one hand, and the number of LUT inputs, on the other hand. As in our previous research [24], we have divided the benchmarks into five classes.
To divide the benchmarks into classes, we used numbers that are multiples of 6 (because in our experiments we used LUTs with NI LUT = 6). The benchmarks belong to class of trivial FSMs (class 0), if R S + L ≤ 6. The benchmarks belong to class of simple FSMs (class 1), if R S + L ≤ 12. The benchmarks belong to class of average FSMs (class 2), if R S + L ≤ 18. The benchmarks belong to class of big FSMs (class 3), if R S + L ≤ 24. The benchmarks belong to class of very big FSMs (class 4), if R S + L > 24. As research [24] shows, the greater the result of dividing R S + L by NI LUT , the bigger the gain from using methods of structural decomposition compared to FD-based FSM circuits.
The results of experiments are shown in Table 10-Table 13. These tables are organized in the same manner. The table columns are marked by the names of investigated methods. The table rows are marked by the names of benchmarks. In the row ''Total'', there are shown results of summation of values from columns. The row ''Percentage'' includes the percentage of summarized characteristics of FSM circuits produced by other methods respectively to P FC Y M -based FSMs. To point that the model of P FSM is used for methods Auto, One-hot, and JEDI, we name these methods as P-Auto, P-One-hot, and P-JEDI. Let us analyse the experimental results taken from reports produced by Vivado.
If we take the total number from the row ''Total'' of Table 10, then the following conclusion can be made: the P F CY M -based FSMs require the minimum amount of LUTs VOLUME 10, 2022 compared with other investigated approaches. Our approach consumes fewer LUTs than it is for P-Auto (39.94% of gain), P-One-hot (62.85% of gain), P-JEDI-based FSMs (15.25% of gain), and PY M -based FSMs (6.97% of gain). However, the gain (or loss) depends on which class the benchmark FSM belongs to.
For classes 0 and 1, the FSM circuits obtained using the JEDI-based state assignment are characterized by better LUT counts compared to their counterparts obtained using other methods studied. The proposed method loses out, since it is based on the encoding of collections of outputs. This means that even where it is not necessary, the block LUTerO should be implemented. But as follows from Table 10, for FSMs of classes 0 and 1, it is preferable to use unitary (one-hot) encoding of outputs. But already for class 2, our approach gives better results compared to the other investigated methods. At the same time, equivalent PY M and P FC Y M FSMs have almost the same LUT counts. The benefits of applying P FC Y M FSMs instead of other models are evident for classes 2-4. To show it, we have created Table 11 with the experimental results for classes 2-4.
As follows from Table 11, the P FC Y M -based FSMs require significantly fewer LUTs than in the general case represented by Table 10. The P F CY M -based FSMs consume fewer LUTs than it is for P-Auto (58.77% of gain), P-One-hot (76.9% of gain), P-JEDI-based FSMs (29.27% of gain), and P FC Y M -based FSMs (11.37% of gain). So, to reduce the LUT count, it makes sense to use the encoding of the fields of compatible states starting from the average FSMs (class 2).
As follows from Table 12, the JEDI-based FSMs have the higher values of maximum operating frequency compared with other investigated FSMs. Our analysis shows that the difference in frequency depends largely on the FSM class. For classes 0-1, P FC Y M -based FSMs have the same (rather bad) results as equivalent PY M -based FSMs. These results can be explained by the presence of an unnecessary block LUTerO in both PY M -and PY M -based FSMs. That is, the LUTerO not only consumes the resources of the FPGA chip, but also creases the clock cycle time. So, for FSMs of classes 0 and 1, it is preferable to use JEDI for state assignment. However, for more complex benchmarks, joint using the encoding of fields of compatible states and mixed encoding of collections of outputs reduces the number of levels in the circuits of P FC Y M -based FSMs compared to this characteristics of equivalent FSMs based on Auto, One-hot, and the mixed encoding of COs. To confirm this statement, we have formed Table 13.
As follows from Table 13, P FC Y M -based FSMs have practically the same operating frequency as equivalent JEDI-based FSMs. The loss relative to the JEDI-based FSMs is 0.14%. For classes 2-4, our approach gives the gain compared to P-Auto-, P-One-hot-, and PY M -based FSMs. This gain is equal to 11.54%, 11.77%, and 2.29%, respectively.
We have proposed the method based on encoding of classes of compatible states to improve the LUT count compared to FSMs based on the mixed encoding of collections of outputs. Our research shows that starting from average FSMs (class 2) there is the gain in LUTs. The gain in LUT counts is accompanied by a loss in the maximum operating frequency. The JEDI-based FSMs have the best frequency characteristics. However, as the FSM class grows, this difference in frequency decreases regarding this characteristic of the equivalent JEDI-based FSMs.

VII. CONCLUSION
There are up to 7 billion transistors in modern FPGA chips [10]. Due to it, a very complex digital system may be implemented using a single FPGA chip. The complexity of the implemented systems is constantly increases, but the number of LUT inputs remains rather small. As research [11], [16] states, there is no sense in having LUTs with more than 6 inputs. If an FSM circuit is represented by SOPs for which the condition (5) is violated, various methods of functional decomposition should be applied during the LUT-based technology mapping. The functional decomposition leads to the multi-level FSM circuits with complex interconnection systems.
Both LUT counts and maximum operating frequency of FPGA-based FSM circuits may be improved using various methods of structural decomposition [8]. Very often, FSM circuits based on the structural decomposition have much better characteristics compared with their counterparts based on the functional decomposition [1], [17]. Our research [26], [27] shows that LUT-based FSM circuits with the mixed encoding of collections of outputs have better LUT counts than their counterparts based on the functional decomposition. But it is quite possible that there is more than a single level of LUTs in a circuit generating variables encoding of COs. In this case, it is necessary to use the methods of functional decomposition to implement this circuit (with all the negative consequences).
In our current paper, we propose to use the codes of fields of compatible states to avoid using the functional decomposition in FSM design. As a result, we propose the structural diagram and the design method of LUT-based P FC Y M Mealy FSMs. This approach is similar to a twofold state assignment [24], [25]. But using the codes of fields of compatible states allows eliminating a special block of code transformer inherent in the case of twofold state assignment. As a result, we achieved a decrease in LUT counts (up to 6.97%) accompanied by a small decrease (up to 0.81%) in the maximum operating frequency compared to PY M FSMs with JEDI state assignment.
The results of experiments show that the gain in LUT count increases as the complexity of an FSM (the total number of FSM inputs and state variables) increases. At the same time, the increase in the FSM complexity leads to a decrease in the loss in the maximum operating frequency.
Based on the research results, we think that the proposed method of encoding of the fields of compatible states has a good potential for use in the FSM design. In further research, we hope to use this method to improve the characteristics of LUT-based FSMs with twofold state assignment [24], [25].
LARYSA TITARENKO received the M.Sc., Ph.D., and Doctor of Technical Sciences degrees in telecommunications from the Kharkov National University of Radioelectronics, Ukraine, in 1993, 1996, and 2005, respectively. Since 2007, she has been a Professor of telecommunications at the Institute of Informatics and Electronics, University of Zielona Góra, Poland. She has taken part in of a number of research projects sponsored by the Ministry of Science and Higher Education of Ukraine (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). Her current research interests include the theory of telecommunication systems, theory of antennas, and theory of digital automata and its applications.
SŁAWOMIR CHMIELEWSKI received the M.Sc. degree in computer engineering from the Technical University of Zielona Góra, Poland, in 2001, and the Ph.D. degree in computer science from the University of Zielona Góra, Poland, in 2016. His Ph.D. thesis was devoted to synthesis methods targeting CPLD-based FSMs. Since 2017, he has been a Lecturer at the State University of Applied Sciences in Głogów. He is currently a specialist in the field of logistics of Lumel company. His research interests include logic synthesis, design of VLSI-based FSMs, and embedded systems.
KAMIL MIELCAREK received the M.Sc. degree in computer engineering from the Technical University of Zielona Góra, Poland, in 1995, and the Ph.D. degree in computer science from the University of Zielona Góra, Poland, in 2010. Since 2001, he has been a Lecturer at the University of Zielona Góra. His current interests include methods of logic synthesis and optimization of control units in FPGA logic devices, VLSI-based FSMs, hardware description languages, perfect graphs and Petri nets, algorithmic theory and safety of UNIX, and network systems. VOLUME 10, 2022