Next Article in Journal
Conformal Array Geometry for Hemispherical Coverage
Previous Article in Journal
CIRO: The Effects of Visually Diminished Real Objects on Human Perception in Handheld Augmented Reality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment

by
Alexander Barkalov
1,2,
Larysa Titarenko
1,3,
Kazimierz Krzywicki
4,* and
Svetlana Saburova
3
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
2
Department of Mathematics and Information Technology, Vasyl’ Stus Donetsk National University, 600-Richya Str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzów Wielkopolski, Poland
*
Author to whom correspondence should be addressed.
Electronics 2021, 10(8), 901; https://doi.org/10.3390/electronics10080901
Submission received: 20 March 2021 / Revised: 7 April 2021 / Accepted: 8 April 2021 / Published: 10 April 2021
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
Practically, any digital system includes sequential blocks. This article is devoted to a case when sequential blocks are represented by models of Mealy finite state machines (FSMs). The performance (maximum operating frequency) is one of the most important characteristics of an FSM circuit. In this article, a method is proposed which aims at increasing the operating frequency of LUT-based Mealy FSMs with twofold state assignment. This is done using only extended state codes. Such an approach allows excluding a block of transformation of binary state codes into extended state codes. The proposed approach leads to LUT-based Mealy FSM circuits having two levels of logic blocks. Each function for any logic level is represented by a circuit including a single LUT. The proposed method is illustrated by an example of synthesis. The results of experiments conducted with standard benchmarks show that the proposed approach produces LUT-based circuits with significantly higher operating frequency than it is for circuits produced by other investigated methods (Auto and One-hot of Vivado, JEDI, twofold state assignment). The performance is increased by an average of 15.9 to 25.49 percent. These improvements are accompanied by a small growth of the numbers of LUTs compared with circuits based on twofold state assignment. Our approach provides the best area-time products compared with other investigated methods. The advantages of the proposed approach increase as the number of FSM inputs and states increases.

1. Introduction

Practically, any digital system includes various sequential blocks [1,2,3]. To specify the behaviour of a sequential block, it is necessary to use some formal model. In many cases, the behaviour of sequential blocks is specified by Mealy finite state machines (FSMs) [4,5,6,7]. Three main characteristics determine the quality of an FSM circuit: the chip area occupied by the circuit, performance (either minimum propagation time or maximum operating frequency), and the consumption of power. These characteristics are strongly interconnected [8]. As a rule, the occupied chip area significantly affects other characteristics of an FSM circuit [9,10]. Various methods of structural decomposition can be used for optimizing the size of the occupied chip area [11]. These methods have one serious drawback: they lead to multi-level FSM circuits with a significant decrease in performance compared with equivalent single-level circuits.
If a multi-level FSM circuit does not provide the required operating frequency, then it is necessary to reduce the number of levels. This reducing must be performed in a way that increases as little as possible the occupied chip area. One of these methods is proposed in [12]. This is a method of twofold state encoding. The method is aimed at Mealy FSMs implemented with field programmable gate arrays (FPGAs) [13,14,15].
Now, FPGAs are widely used for implementing various digital systems [16,17,18]. Due to this, we chose the FPGA-based Mealy FSMs as a research object. Our current article considers Mealy FSM circuits implemented using three main components of FPGA chips. These components are look-up table (LUT) elements, programmable flip-flops and interconnections of FPGAs. We focus our current research on solutions of Xilinx which is the largest manufacturer of FPGA chips [19,20]. So, we consider ways of increasing the performance of LUT-based Mealy FSMs.
A LUT is a logic block having N I L U T inputs and a single output [14,21]. A single LUT may implement an arbitrary Boolean function having up to N I L U T arguments. Unfortunately, the value of N I L U T is very small [20]. If a Boolean function depends on more than N I L U T variables, then it is necessary to apply methods of functional decomposition of this function [19,22]. It is known, the applying functional decomposition leads to multi-level FSM circuits with “spaghetti-type” interconnections [23,24].
FSM circuits are represented by systems of Boolean functions (SBFs). To implement an LUT-based FSM circuit, it is necessary to transform an initial SBF into a network of LUTs of a particular FPGA chip. This is a step of technology mapping [25,26]. The outcome of technology mapping tremendously affects all main characteristics of FSM circuits [20,27,28]. To implement an LUT-based FSM circuit, it is necessary to use LUTs, programmable flip-flops, programmable interconnections, circuits of synchronization, and input-output blocks.
The article [26] notes that time delays of the interconnection system are starting to play a major role in comparison with LUT delays. Furthermore, more than 70% of the power dissipation is due to the interconnections [29]. So, the optimization of interconnections leads to increasing the performance and reducing the power consumption of LUT-based FSM circuits. This can be done, for example, using the twofold state assignment [12].
Obviously, increasing the number of LUT inputs leads to a decrease in both the number of LUTs and their levels in FSM circuits. However, as shown in [30,31], for the foreseeable future, it is very difficult to expect an increase in the number of LUT inputs. Basically, modern LUTs have no more than 6 inputs [14,21]. An increase in the number of inputs leads to an imbalance for the main characteristics of an LUT circuit. However, this state of affairs leads to an imbalance between an increasing number of arguments in SBFs representing FSM circuits and a fairly small number of LUT inputs. However, the increasing complexity of modern digital systems is accompanied by an increase in the number of arguments in functions representing FSM circuits. This imbalance is the source of the need to improve synthesis methods of LUT-based FSMs.
Our current article is devoted to synthesis of multi-level LUT-based circuits of Mealy FSMs obtained using the method of twofold state assignment [12]. The twofold state assignment allows improving characteristics of LUT-based Mealy FSMs compared with their counterparts based on functional decomposition [32]. However, applying twofold state assignment leads to FSM circuits with three levels of LUTs. These circuits are significantly slower than their counterparts with fewer logic levels. If performance is the dominant quality factor, then the number of levels in FSM circuit should be reduced. It is extremely important that the increase in operating frequency is accompanied by as small an increase in the number of LUTs as possible (compared with the original three-level FSM circuit).
Recently, we have published works [11,12,32] where a method of twofold state assignment have been proposed. In these works, each FSM state has two codes. The first of them represents a state as an element of the set of FSM states. The second code is an extended state code (ESC). The ESC represents a state as an element of some class of partition of the set of states. This approach requires using an additional block generating ESCs. This block introduces an additional level of logic to the FSM circuit. In our current article, we propose a method which allows eliminating this block. The elimination of this additional block is the main feature that distinguishes our current work from works [11,12,32]. This approach leads to an increase in the value of FSM operating frequency compared with FSMs based on methods [11,12,32].
The main contribution of this paper is a novel design method aimed at increasing the operating frequency of LUT-based Mealy FSMs with twofold state assignment. The main idea of the proposed approach is to use only extended state codes. This approach leads to a completely new structural diagram of the LUT-based Mealy FSM. This diagram does not include a block transforming binary state codes into extended state codes. Our current research shows that this approach leads to FSM circuits having practically the same amount of LUTs as FSM circuits based on the twofold state assignment. The experimental results show that this method allows increasing the maximum operating frequency of LUT-based FSMs compared with equivalent FSMs obtained using some known methods of FSM design.
The further text of the article includes six sections. The second section shows the background of single-level LUT-based Mealy FSMs. The third section discusses the state-of-the-art in the synthesis of LUT-based FSMs. The fourth section represents the main idea of the proposed method. The fifth section includes an example of FSM synthesis using the extended state codes. The results of experiments are shown and discussed in the sixth section of the article. A short conclusion ends the article.

2. Single-Level LUT-Based Mealy FSMs

FPGAs manufactured by Xilinx include a lot of configurable logic blocks (CLB) [1,15], such as embedded memory blocks, digital signal processors, even microprocessors. The CLBs are connected using a programmable routing matrix [33]. In this paper, we consider CLBs including LUTs, multiplexers and programmable flip-flops. A LUT has N I L U T inputs and a single output. Networks of LUTs implement systems of Boolean functions representing an FSM circuit.
A LUT can implement an arbitrary Boolean function which depends on up to N I L U T arguments. A combinational output of LUT may be connected with a flip-flop. Usually, D flip-flops are used to organize an FSM state register (SRG) [34,35]. The value of a function may be loaded into a flip-flop using the pulse of synchronization Clk. It is possible to make output of flip-flop equal to zero using the pulse Reset. To select the type of a CLB output (combinational or registered), a programmable multiplexer is used.
In practical digital systems, an SBF representing an FSM circuit may depend on up to 50–70 literals [4,33]. At the same time, modern LUTs have no more than 6 inputs. So, there is a contradiction between a large number of arguments and a very small number of LUT inputs. This leads to the need for functional decomposition (FD) of SBFs representing an FSM circuit [36,37]. However, FD-based circuits have a lot of logic levels and complex systems of “spaghetti-type” interconnections [38,39].
A Mealy FSM is represented by a six-component vector < I , O , S , δ , λ , s 1 > [4,33]. These components are the following: I = { i 1 , , i L } is a set of inputs, O = { o 1 , , o N } is a set of outputs, S = { s 1 , , s M } is a set of internal states, a function of transitions δ , a function of outputs λ , and an initial state s 1 . An FSM can be represented using various tools, such as state transition graphs [4], binary decision diagrams [40,41], and-inverter graphs [25,26], graph-schemes of algorithms [33], state transition tables (STTs) [4].
Very often, STGs represent FSMs. An STG is a directed graph which vertices correspond to FSM states and edges correspond to interstate transitions [4]. Each edge is directed from the current state s m S to the state of transition s T S . There are H edges in an STG. The h-th edge is marked by the pair < I h , O h > , where I h is a conjunction of inputs causing a transition < s m , s T > , O h is a collection of outputs generated during the transition < s m , s T > . Consider a Mealy FSM A 0 represented by the STG shown in Figure 1.
The following characteristics of FSM A 0 can be found from Figure 1: L = 5 inputs, N = 7 outputs, M = 6 states, and H = 15 interstate transitions. Furthermore, the STG (Figure 1) defines functions of transitions and outputs of FSM A 0 .
An STG can be transformed into an STT. Each edge of an STG corresponds to a single row of an STT. For example, FSM A 0 can be represented by its STT (Table 1) obtained from the STG (Figure 1).
There are five columns in an STT [4]. The first four of them are the following: a current state s m ; a state of transition s T ; a conjunction of inputs I h ; a collection of outputs O h . The column h includes the numbers of interstate transitions ( h { 1 , , H } ). For example, the following functions are determined by the fourth row of Table 1: δ ( s 2 , i 4 ) = s 3 and λ ( s 2 , i 4 ) = o 3 . For a row with the unconditional transition, there is I h = 1 . In Table 1, this is the row 12.
Let us encode FSM states s m S by binary codes K ( s m ) having R s bits where
R s = l o g 2 M .
To encode states, we use state variables from the set T = { T 1 , , T R s } . These variables are kept into the SRG. To change the contents of SRG, the special input memory functions (IMFs) are used. They form a set D = { D 1 , , D R s } . Now, we can transform an STT into a direct structure table (DST). A DST is an extension of the STT with the three following columns: K ( s m ) , K ( s T ) , and D h [1]. The first of them contains a code of a current state, the second includes a code of a state of transition, and there is a collection of IMFs in the third of them. This collection includes IMFs equal to 1 to load the code K ( s T ) into SRG.
Using a DST, we can derive two following SBFs representing an FSM circuit:
D = D ( T , I ) ;
O = O ( T , I ) .
The SBF (2) represents the function of transitions, the SBF (3) represents the function of outputs. Using the terminology from [42], we can state that SBFs (2) and (3) represent a structural diagram of P Mealy FSM (Figure 2).
In P Mealy FSMs, the block of transition logic is determined by SBF (2), the block of output logic is specified by SBF (3). The inputs of register SRG are connected with outputs of the block of transition logic. In each cycle of FSM operation, the SRG contains a current state code. The pulse of synchronization Clk allows changing the contents of SRG. To load the zero code of the initial state s 1 S into SRG, it is necessary to generate the pulse Reset. We discuss a case when both blocks are implemented using LUT-based CLBs. In this case, the flip-flops of SRG are distributed among LUTs of the block of transition logic.
The analysis of systems (2) and (3) shows that both input memory functions and FSM outputs depend on state variables and FSM inputs. This peculiarity of Mealy FSMs is used for optimizing LUT-based FSM circuits [11].

3. Optimizing Circuits of FPGA-Based Mealy FSMs

There are a significant number of methods for synthesis of LUT-based FSMs [11,12,23,24,26,32,36,37,38,41,43,44]. It is quite possible that the quality of FSM circuits obtained by different synthesis methods will differ significantly. As a rule, the quality of an FSM circuit is determined by a combination of three main characteristics. These characteristics are the following: (1) chip resources used for implementing the circuit; (2) the maximum performance and (3) the power consumption [45,46]. In the case of LUT-based FSMs, the following chip resources are necessary: (1) LUTs; (2) programmable flip-flops; (3) programmable interconnections; (4) the circuit of synchronization and (5) the programmable input-outputs of a chip. Obviously, the best FSM circuit consumes the minimum amount of chip resources, has the maximum possible operating frequency, and consumes the minimum power.
To improve the quality of an FSM circuit, it is necessary to solve some optimization problems [47,48]. In this article, we propose a method for improving the performance (the maximum operating frequency) of LUT-based Mealy FSMs.
The functions (2) and (3) are represented as sum-of-products (SOPs) [4]. These SOPs include product terms F h corresponding to rows of a DST. A term F h is determined as the following conjunction:
F h = S m I h ( h { 1 , , H } ) .
In (4), the symbol S m stands for a conjunction of state variables corresponding to the code of a current state s m S from the h-th row of the DST.
Each function f j D O depends on N L ( f j ) literals, where a literal is either direct or compliment value of a Boolean variable [4]. Consider the following condition
N L ( f j ) N I L U T .
If condition (5) takes place, then there is exactly a single LUT in the circuit implementing the function f j D O . If condition (5) is violated, then the corresponding LUT-based circuit is a multi-level one.
To design multi-level LUT-based FSM circuits, various methods of functional decomposition (FD) may be used [22]. During the process of FD, an initial SOP is broken down by partial SOPs corresponding to some additional functions. This process is terminated when each partial SOP includes no more than N I L U T literals. Different partial SOPs of a function f j D O may include the same inputs i l I or/and the same state variables T r T . Thus, there is a duplication of literals in different partial SOPs of the original SOP. This phenomenon leads to a significant complication of the interconnection system. In turn, this not only complicates the placement and tracing process, but also reduces performance and increases power consumption compared with an equivalent single-level circuit [11,29].
If condition (5) is true for all functions representing an FSM circuit, then the number of LUTs in the circuit is equal to N + R S . This is the best possible LUT count. If condition (5) is violated, then the LUT count is equal to R S + N ( F ) , where N ( F ) is the number of additional functions different from f j D O .
To optimize FSM circuits, it is necessary to reduce the value of N ( F ) . The importance of this problem has led to the development of a significant number of methods of FD [22]. Various algorithms of FD are included into CAD tools aimed in implementation of FPGA-based digital systems.
The values of N L ( f j ) could be reduced due to proper state assignment [4]. For example, it is possible to represent a state using only a single state variable. This is achieved by the one-hot state assignment, when R S = M [49]. The one-hot state assignment requires using the SRG with M bits. However, this is not a problem because modern FPGAs include a lot of programmable flip-flops. Due to this, this approach is very popular in FPGA-based design. For example, this method is used in the academic CAD system ABC by Berkeley [50,51]. It is also used in industrial CAD packages such as, for example, Vivado of Xilinx [52] and Quartus of Intel (Altera) [53].
The Equation (1) determines so called maximum binary state codes [4]. The negative effect of one-hot state assignment is an increase in the number of IMFs compared with their minimum possible number (1). However, these IMFs are much simpler than in the case of maximum binary state assignment [1]. These approaches have been compared, for example, in [54]. The research [54] shows that using one-hot codes improves FSM characteristics if there is M > 16 . However, the number of state variables is not the only factor influencing the circuit characteristics. The limited number of LUT inputs increases the effect of the number of FSM inputs on the characteristics of LUT-based FSM circuits [1]. It is shown in [35] that the one-hot state assignment produces worse FSM circuits if the number of FSM inputs exceeds 10.
So, in one case the best results are produced by the method of maximum binary state assignment, and in the other case it is better to use the one-hot state codes. Thus, it is necessary to check which method will give the best results for a particular FSM. Due to this, we have compared the FSM circuits produced by our proposed approach with FSM circuits produced by three other methods of state assignment. As a base for comparison, we chose the algorithm JEDI [9,55], the methods of binary state assignment Auto and the One-hot state assignment of Vivado [52] by Xilinx [15]. We chose Vivado because it aims in Xilinx FPGA chips. We chose JEDI because it is one of the best maximum binary state assignment methods [5].
It is possible to encode states in a way reducing the power consumption [56]. The majority of such methods are based on reducing the switch activity of an FSM circuit [48,57]. To do it, it is necessary to minimize the Hamming distance for codes of states with the maximum amount of transitions [57]. However, our research [12,38,39] shows that the power consumption can be reduced due to the reducing the number of interconnections inside an FSM circuit. To reduce the number of interconnections, it is necessary to minimize the numbers of arguments in SBFs (2) and (3) [4]. This can be done using various methods of state assignment.
The structural decomposition is an efficient way of reducing LUT counts in Mealy FSMs logic circuits [11]. The main idea of these methods is the elimination of direct connection between FSM inputs i l I and state variables T r T , on the one hand, and outputs o n O and IMFs D r D , on the other hand. This is connected with introducing additional functions forming a set F having N ( F ) elements. The functions f j F depend on FSM inputs and state variables. In turn, FSM outputs and IMFs use these additional functions as arguments creating literals of corresponding SOPs. To optimize LUT count due to applying the methods of structural decomposition (SD), the following conditions should take places [11]:
N ( F ) N + R S ;
N ( F ) L + R S .
All known methods of SD are based on conditions (6) and (7). These methods are analysed, for example, in [11]. If condition (5) is violated for some functions f j F , then joint application of FD- and SD-based decomposition methods is necessary [12].
One of the SD-based methods is a method of twofold state assignment [12,32]. Let us analyse this method. The method is based on construction a partition π S = { S 1 , , S K } of the set of Mealy FSM states. Each class S k π S includes M k states. The maximum binary codes C ( s m ) are used for encoding states as elements of some class S k π S . There are R ( S k ) bits in codes C ( s m ) of states s m S k , where
R ( S k ) = l o g 2 ( M k + 1 ) .
To encode states s m S k , the variables τ r τ k are used. The sets τ 1 , , τ K form a set τ having R 0 elements:
R 0 = R 1 + + R K .
Each class S k π S determines three sets. The set I k I includes inputs causing transitions from states s m S k . There are L k elements in the set I k I . The set O k O includes outputs generating during the transitions from states s m S k . The set D k D includes IMFs equal to 1 during the transitions from states s m S k .
This method can be used if the following condition takes place:
R ( S k ) + L k N I L U T ( k { 1 , , K } ) .
In this case, it is possible to use the model of P T Mealy FSM (Figure 3).
A class S k τ S corresponds to a B l o c k S k implementing SBFs
D k = D k ( τ k , I k ) ;
O k = O k ( τ k , I k ) .
The circuit of each B l o c k S k is implemented with LUTs having N I L U T inputs. The functions (11) and (12) represent the partial SOPs of FSM input memory functions and outputs. The B l o c k T O creates final values of functions from the set D O . To do it, each LUT of this block implements disjunctions having up to K inputs:
D r = k = 1 K D r k ( r { 1 , , R S } ) ;
o n = k = 1 K o n k ( n { 1 , , N } ) .
The outputs of LUTs producing functions (13) are connected with inputs of D flip-flops. This explains the existence of pulses R e s e t and C l k as inputs of B l o c k T O . So, this block produces outputs o n O and state variables T r T .
To create SBFs (11) and (12), it is necessary to have state variables τ r τ . These variables are generated by the B l o c k τ . This block transforms state variables T r T and generates the following SBF:
τ = τ ( T ) .
As follows from [12], the circuits of P T FSMs require fewer LUTs than the circuits of equivalent P Mealy FSMs. If condition (10) takes place, circuits of P T FSMs also have fewer levels of logic. Due to this, they are faster than the circuits of equivalent P Mealy FSMs.
As follows from Figure 3, both inputs i l I and state variables τ r τ enter only LUTs of the first level of FSM circuit. The partial functions (11) and (12) enter only LUTs of the second level of FSM circuit. At last, the state variables T r T enter only LUTs of the B l o c k τ creating the third level of logic. Due to this, the circuits of P T FSMs have regular systems of interconnections. This distinguishes them from the circuits of FD-based P FSMs having complex systems of “spaghetti-type” interconnections. Our research [11,32] shows that the circuits of P T FSMs consume less power than the circuits of equivalent P Mealy FSMs.
The analysis of Figure 3 shows that P T FSMs have a drawback. Namely, they include B l o c k τ in the path leading from inputs i l I to state variables τ r τ . The conversion (15) takes some time, which is added to the FSM cycle time. In this article, we propose a way of eliminating the B l o c k τ from FSM circuit. It allows reducing the cycle time and, therefore, increasing the value of maximum operating frequency of a resulting FSM circuit.

4. Main Idea of the Proposed Method

Assume that we have constructed a partition π S for some Mealy FSM. To eliminate the converter of state codes K ( s m ) into state codes C ( s m ) , we propose to use extended state codes E C ( s m ) . For a state s m ( m { 1 , , M } ) , the extended state code determines the state as an element of both sets S and S k π S . The number of bits in E C ( s m ) is equal to R 0 determined by (9). To encode states by ESCs, we use state variables τ r τ , where | τ | = R 0 . If s m S k , then only variables τ r τ k can differ from zero.
Using only ESCs leads to P E Mealy FSMs. There is a structural diagram of P E Mealy FSM shown in Figure 4.
In P E Mealy FSMs, the blocks of the first level implement partial functions (11) and (12). These functions enter inputs of LUTs from B l o c k τ O . As in the case of P T FSMs, the functions (12) are transformed into FSM outputs. However, the functions (11) are transformed directly into state variables τ r τ . Due to this, there is no need in the B l o c k τ used in the case of P T Mealy FSM (Figure 3).
By removing the B l o c k τ , the three-level FSM circuit turns into a two-level one. So, we can expect that P E Mealy FSMs have better performance than equivalent P T Mealy FSMs. Results of our experimental studies, shown in Section 6, have confirmed this assumption.
In this article, we propose a synthesis method for P E Mealy FSMs. The method aims in LUT-based FSMs where an LUT has N I L U T inputs. We assume that an FSM is represented by its state transition table. Furthermore, we assume that there exists a partition π S satisfying the condition (10). There are the following steps in the proposed synthesis method:
  • Constructing the partition π S with the minimum cardinality number K.
  • Encoding of FSM states by extended state codes E C ( s m ) .
  • Creating DST of P E Mealy FSM.
  • Creating tables representing B l o c k S 1 B l o c k S K .
  • Creating table representing B l o c k τ O .
  • Constructing SBFs representing blocks of FSM circuit.
  • Implementing P E Mealy FSM circuit with particular LUTs and other resources of FPGA chip.
To create the partition π S , we can use the methods from [11,32]. These methods try to minimize the number of LUTs in the resulting FSM circuit. Firstly, the number of shared outputs in the sets S k π S should be minimum one. This reduces the number of LUTs implementing partial outputs in the circuit of B l o c k S k . Furthermore, this allows to reduce the number of interconnections among B l o c k S k and LUTs of B l o c k τ O . The circuit of block B l o c k τ O is guaranteed to have a single level of logic if the following condition is true:
K N I L U T .
If condition (16) is violated, then the circuit of B l o c k τ O is still a single-level one, if each partial output is generated by no more than N I L U T blocks of the first level of logic. Due to this, it makes sense to minimize the appearance of shared FSM outputs in different sets S k π S .
Each class S k π S is characterised by a set δ ( S k ) including states of transitions from the states s m S k . The methods [11,32] minimize the appearance of shared states of transition in different classes of the partition π S . This allows minimizing the number of partial input memory functions generated by a particular block of logic. In turn, this minimizes the numbers of LUTs in circuits of all blocks of P T FSM circuit. In the case of P E FSM, we still use this method.
The difference in the organization of P T and P E FSMs leads to a change in the method of state assignment. In [32], state are encoded in a way minimizing the number of LUTs in the circuit of L U T e r τ . There is no L U T e r τ in P E FSMs. We propose to encode states in a way minimizing the number of LUTs generating functions (11). To do it, we encode the states s m δ ( S k ) in a way maximizing the number of zeros in the same bits of codes for different states of transition within each set δ ( S k ) .
Consider the following example. There is a set δ ( S 1 ) = { s 3 , s 5 } . If these states have the codes E C ( s 3 ) = 01101 , E C ( s 5 ) = 10011 , then all five partial input memory functions are generated by B l o c k S 1 . To generate them, it is necessary to use five LUTs. If there are codes E C ( s 3 ) = 00001 and E C ( s 5 ) = 00011, then only two partial input memory functions should be generated. To do it, only 2 LUTs are necessary. We use this approach to encode states of P E Mealy FSMs.

5. Example of Synthesis

If a model of P E FSM is used to implement the circuit of some FSM A b , then we denote this as P E ( A b ) . In this section, we discuss an example of synthesis of P E Mealy FSM starting from STT (Table 1). To implement the circuit of Mealy FSM P E ( A 0 ) , we use LUTs having N I L U T = 5 .
Step 1. Using the approach in [32] gives the partition π S = { S 1 , S 2 } of the set of states. It includes the classes S 1 = { s 1 , s 3 , s 4 } and S 2 = { s 2 , s 5 , s 6 } . There is M 1 = M 2 = 3 in the discussed case. Using (8) gives R ( S 1 ) = R ( S 2 ) = 2 . As follows from the analysis of Table 1, the following sets exist for each class S k π S : I 1 = { i 1 , i 2 , i 3 } , O 1 = { o 1 , o 2 , o 4 , o 6 , o 7 } , δ ( S 1 ) = { s 1 , s 2 , s 3 , s 5 , s 6 } for S 1 ;   I 2 = { i 4 , i 5 } , O 2 = { o 1 , o 3 , o 5 } , δ ( S 2 ) = { s 1 , s 3 , s 4 , s 6 } for S 2 . So, the following relation takes place: L 1 = L 2 = 2 . Because of N I L U T = 5 , the condition (10) is true for each class of the partition π S = { S 1 , S 2 } .
As we can see, there are no identical elements in the sets I 1 and I 2 ( I 1 I 2 = ) . The fewer identical elements in different sets I k I , the fewer connections between the sources of FSM inputs and LUTs of the first level of logic. In our particular case, the absolute minimum of zero is reached.
There is only a single common output o 1 in sets O 1 and O 2 . It means that only a single LUT of B l o c k τ O is necessary to generate FSM outputs. All other outputs are generated by LUTs from corresponding blocks of the first level of logic circuit. In general, the fewer identical elements in different sets O k O , the smaller the number of LUTs in the second level of logic.
Step 2. There is R ( S 1 ) = R ( S 2 ) = 2 . From (9), we can get R 0 = 4 . This gives the set τ = { τ 1 , , τ 4 } . We can create the following sets for the state encoding: τ 1 = { τ 1 , τ 2 } and τ 2 = { τ 3 , τ 4 } . If s m S 1 , then τ 3 = τ 4 = 0 in the code E C ( s m ) . There is τ 1 = τ 2 = 0 in the code E C ( s m ) for the states s m S 2 . Furthermore, we should maximize the number of zeros in the same bits of codes for different states of transition within each set δ ( S k ) .
In our example, it is possible to encode only states s m δ ( S 2 ) in such a way. The outcome of state assignment is shown in Figure 5.
As follows from the Karnaugh map (Figure 5), there is τ 3 = 0 in extended codes of states s m δ ( S 2 ) . It means that only three LUTs are used for generating input memory functions in B l o c k S 2 . Furthermore, we can see that variables τ 3 and τ 4 are insignificant for conjunctions S m for the codes of states s m S 1 . The same is true for the variables τ 1 and τ 2 and the codes of states s m S 2 . This allows diminishing the number of literals in terms (4) up to R ( S k ) . For example, there is S 3 = τ 1 τ 2 ¯ . These minimized conjunctions are used in SOPs of functions (11) and (12).
Step 3. The transition from an STT to a DST is executed in the trivial way. The columns s m , s T , I h , O h , h contain the same symbols in both tables. The state codes are taken from a corresponding Karnaugh map. In the discussed example, the codes are taken from Figure 5. There is a symbol D r in the row h of the column D h , if there is 1 in the code of state of transition s T S in this row. In the discussed case, the DST of FSM P E ( A 0 ) is represented by Table 2. This table includes h = 15 rows.
Step 4. To construct a table of B l o c k S k , it is necessary to select the rows of a DST with transitions from the states s m S k . Obviously, this is executed in a trivial way. Applying this approach to Table 2 gives two tables. Table 3 represents the B l o c k S 1 ; Table 4 represents the B l o c k S 2 . There are 9 rows in Table 3 and 6 rows in Table 4. So, together, these tables include all 15 rows of the original DST (Table 2).
Step 5. The table of B l o c k τ O includes two columns. The column Function includes all functions from sets of input memory functions and FSM outputs. The column Block is divided by K sub-columns ( S 1 , , S K ) . The sub-column S k corresponds to B l o c k S k . If a function f i D O is generated by B l o c k S k , then there is 1 at the intersection of the row f i and the column S k .
In the discussed case, there are R 0 + N = 11 rows in the table of B l o c k τ O . This block is represented by Table 5.
There is a transparent connection between Table 3 and Table 4, on the one hand, and Table 5, on the other hand. For example, there is D 3 = 0 in Table 4. So, there is 0 at the intersection of the row D 3 and the column S 2 of Table 5. Next, there is o 4 = 1 in Table 3. So, there is 1 at the intersection of the row o 4 and the column S 1 of Table 5. All other rows of Table 5 are filled on the base of a similar analysis.
Step 6. Systems (11) and (12) are extracted from tables of B l o c k S 1 B l o c k S K . The following SBFs can be derived from Table 3 (after minimization):
D 1 1 = F 3 F 8 = τ 1 ¯ τ 2 i 1 ¯ i 2 ¯ τ 1 τ 2 i 2 i 3 ¯ ; D 2 1 = F 4 F 8 = τ 1 τ 2 ¯ i 2 i 3 τ 1 τ 2 i 2 i 3 ¯ ; D 3 1 = F 1 F 2 F 5 F 6 F 7 ; D 4 1 = F 5 [ F 6 F 9 ] = τ 1 τ 2 ¯ i 2 i 3 ¯ τ 1 i 2 ¯ ; .
o 1 1 = F 1 F 5 F 9 = τ 1 ¯ τ 2 i 1 τ 1 τ 2 ¯ i 2 i 3 ¯ τ 1 τ 2 i 2 ¯ ; o 2 1 = F 1 F 3 F 5 F 7 ; o 4 1 = F 3 F 4 F 8 ; o 6 1 = F 2 [ F 4 F 7 ] = τ 1 ¯ τ 2 i 1 ¯ i 2 τ 1 i 2 i 3 ; o 7 1 = F 6 = τ 1 τ 2 ¯ i 2 ¯ .
The following systems are derived from Table 4:
D 1 2 = [ F 1 F 2 ] F 5 = τ 3 τ 4 ¯ τ 3 ¯ τ 4 i 4 ¯ i 5 ; D 2 2 = F 2 F 4 F 5 ; D 4 2 = F 3 F 6 = τ 3 τ 4 τ 3 ¯ τ 4 i 4 ¯ i 5 ¯ .
o 1 2 = F 5 = τ 3 ¯ τ 4 i 4 ¯ i 5 ; o 3 2 = F 1 F 3 F 5 ; o 5 2 = F 2 F 6 = τ 3 τ 4 ¯ i 4 ¯ τ 3 ¯ τ 4 i 4 ¯ i 5 ¯ .
The following SBF is derived from Table 5:
D 1 = D 1 1 D 1 2 ; D 2 = D 2 1 D 2 2 ; D 3 = D 3 1 ; D 4 = D 4 1 D 4 2 ; o 1 = o 2 1 o 2 2 ; o 2 = o 2 1 ; o 3 = o 3 2 ; o 4 = o 4 1 ; o 5 = o 5 2 ; o 6 = o 6 1 ; o 7 = o 7 2 .
Systems (17)–(21) represent logic circuit of FSM P E ( A 0 ) . Let us analyse the LUT counts for each level of logic. To do it, we should analyse systems (17)–(21).
As follows from (17) and (18), there are 9 LUTs in the circuit of B l o c k S 1 . The LUT count is determined by the number of equations in (17) and (18). From SBFs (19) and (20) follows that there are 6 LUTs in the circuit of B l o c k S 2 .
If some equation of SBF (21) has more than a single product term, then the corresponding LUT is included into B l o c k τ O . Otherwise, a corresponding function is generated by some of block of the first level of logic. So, there are 4 LUTs in the circuit of B l o c k τ O .
So, there are 9 + 6 + 4 = 19 LUTs in the circuit of FSM P E ( A 0 ) . The pulses C l k and R e s e t enter CLBs generating functions D r D . The circuit is shown in Figure 6.
Step 7. To implement the circuit of FSM P E ( A 0 ) , it is necessary to use very complicated methods of technology mapping [26]. This can be done using, for example, the CAD tool Vivado by Xilinx [52]. This package solves the problems of mapping, placement, routing, testing, finding characteristics of an FSM circuit (such as the LUT count, number of CLBs, number of flip-flops, maximum operating frequency, power consumption). We do not show the results of implementation for this particular example. In the next section, we use Vivado to investigate the efficiency of our method compared with some other design methods.

6. Experimental Results

The results of experiments are shown in this section. To conduct experiments, we use benchmark FSMs from the library [58]. The library includes 48 benchmarks represented in the format KISS2. These benchmarks have a wide range of basic characteristics (numbers of states, inputs, and outputs). The characteristics of these benchmarks can be found in many articles and books, for example, in [11,27,37]. These benchmarks are used very often by different researchers to compare area and time characteristics of FSMs obtained using different design methods. The characteristics of the benchmarks are shown in Table 6.
To conduct the experiments, we used a personal computer with the following characteristics: CPU: Intel Core i7 6700 K 4.2@4.4 GHz, Memory: 16 GB RAM 2400 MHz CL15. As a platform for implementing FSM circuits we used the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [59]. The FPGA chip of this platform includes LUTs with 6 inputs. We used CAD tool Vivado v2019.1 (64-bit) [52] to execute the technology mapping. The results of experiments are taken from reports produced by Vivado. As the source information for the CAD tool, we used VHDL-based FSM models obtained by the transformation of files in KISS2 format into VHDL codes. The transformation is executed by the CAD tool K2F [11].
We compared area (the LUT count) and time (the maximum operating frequency) characteristics of FSMs based on five different approaches. Three of them are P Mealy FSMs based on: (1) Auto of Vivado (it uses binary state codes); (2) One-hot of Vivado; (3) JEDI. The fourth objects for comparison are P T -based FSMs [12,32]. We compared these four FSMs with our approach.
It is known [11] that area and time characteristics of LUT-based FSM circuits depend strongly on the relation between numbers of inputs (L) and state variables ( R S ), on the one hand, and the number of LUT inputs N I L U T , on the other hand. Due to this, we have divided the benchmarks into five following classes. The benchmarks belong to class of trivial FSMs (class 0) if R S + L 6 . The benchmarks belong to class of simple FSMs (class 1) if R S + L 12 . The benchmarks belong to class of average FSMs (class 2) if R S + L 18 . The benchmarks belong to class of big FSMs (class 3) if R S + L 24 . The benchmarks belong to class of very big FSMs (class 4) if R S + L > 24 . As research [39] shows, the larger the class number, the bigger the gain from using methods of structural decomposition.
The class 0 includes the benchmarks bbtas, dk17, dk27, dk512, ex3, ex5, lion, lion9, mc, modulo12, and shiftreg. The class1 contains the benchmarks bbara, bbsse, beecount, cse, dk14, dk15, dk16, donfile, ex2, ex4, ex6, ex7, keyb, mark1, opus, s27, s386, s8, and sse. The class 2 consists of the benchmarks ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, styr, and tma. The class 3 includes the benchmark sand. At last, the benchmarks s420, s510, s820, and s832 create the class 4.
The results of experiments are shown in Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15. There is the same organization of Table 7, Table 9, Table 11, Table 13, Table 14 and Table 15. We marked the table columns by the names of investigated methods. The table rows contain the names of benchmarks. Within each table, the results for the same class are shown in adjacent rows. There are results of summation of values from columns in the row “Total”. The row “Percentage” includes the percentage of summarized characteristics of FSM circuits produced by other methods, respectively, to P E -based FSMs. To point that the model of P FSM is used for methods Auto, One-hot, and JEDI, we name these methods as P-Auto, P-One-hot, and P-JEDI. Table 8, Table 10 and Table 12 show summarized experimental results for different classes of benchmark FSMs. Let us analyse the experimental results taken from reports produced by Vivado.
As follows from Table 7, the P T –based FSMs consume the minimum amount of LUTs compared with other investigated approaches. The P E -based FSMs require 7.7% more LUTs than equivalent P T –based FSMs. However, all other FSMs require more LUTs than it is for our approach. Our approach consumes fewer LUTs than it is for P-Auto (25.47% of gain), P-One-hot (46.1% of gain) and P-JEDI-based FSMs (6.67% of gain).
To show a dependence of the gain in LUTs on the class of benchmarks, we have created Table 8. It shows the gain for classes 0, 1 and 2–4.
As follows from Table 8, the P E –based FSMs have worse results than equivalent P-Auto FSMs (12.68% of loss on the number of LUTs), P-JEDI FSMs (14.08% of loss on the number of LUTs) and P T -based FSMs (12.68% of loss on the number of LUTs). So, for the class 0, the P-JEDI based FSMs have the minimum amount of LUTs. For the class 1, our approach loses out relative to the other two approaches (11.01% to P E –based FSMs and 1.46% to P-JEDI-based FSMs). However, our approach produces more economical circuits than it is for P-Auto (18.71% of gain) and P-One-hot (53.51% of gain). The P E –based FSMs only lose to the P T -based FSMs (6.23%). So, for the classes 2–4, our approach defeats P-Auto FSMs (30.35% of gain), P-One-hot FSMs (45.23% of gain), and P-JEDI FSMs (6.13% of gain). It means that our approach allows obtaining FSM circuits with a number of LUTs comparable to this number for equivalent P-JEDI- and P T -based FSMs. At the same time, the loss of our approach decreases as the complexity of FSMs increases: the larger the class number, the smaller our loss relative to the best solutions (P-JEDI- and P T -based FSMs).
We thought that our approach would allow us to obtain FSM circuits with higher performance than it is for circuits based on models of either P or P T FSMs. To test this assumption, we have created Table 9. It includes values of maximum operating frequency measured in megahertz.
As follows from Table 9, the P E –based FSMs have the higher values of maximum operating frequency compared with other investigated FSMs. Our approach provides the following gain: (1) 25.49% compared with P-Auto-based FSMs; (2) 26.09% compared with P-One-hot-based FSMs; (3) 10.06% compared with P-JEDI-based FSMs; (4) 15.9% compared with P T -based FSMs. Our research has shown that the frequency gain depends on the class to which an FSM belongs. This conclusion is supported by data from Table 10.
As follows from Table 10, P E –based FSMs have the same operating frequency as equivalent P T FSMs from the class 0. We can explain this phenomenon by the fact that there is no code transformer al P T FSMs. Furthermore, P-JEDI-based FSMs have slightly higher frequency (0.84%). However, our approach gives a slight advantage over P-Auto-based FSMs (0.07%) and P-One-hot-based FSMs (2.51%). It follows from Table 10 that for trivial automata, the method of organizing the FSM circuit is practically irrelevant. The difference in frequency depends mainly on the state encoding method.
Starting from simple FSMs (the class 1), our approach allows producing the fastest circuits. There is the following gain in maximum operating frequency: (1) 24.68% compared with P-Auto-based FSMs; (2) 24.77% compared with P-One-hot-based FSMs; (3) 19.33% compared with P-JEDI-based FSMs; (4) 15.68% compared with P T -based FSMs. The gain is even greater for FSMs of classes 2–4. This is the following: (1) 39.94% compared with P-Auto-based FSMs; (2) 40.1% compared with P-One-hot-based FSMs; (3) 32.02% compared with P-JEDI-based FSMs; (4) 24.64% compared with P T -based FSMs. As we can see, starting from simple FSMs, the difference in frequency depends mainly on the architecture of FSM.
When comparing different variants of the FSM circuit implementation, an integral estimate is often used, which is equal to the product of the chip area occupied by a circuit by the performance [45]. In the case of LUT-based FSMs, the circuit quality is estimated by the product of the LUT count by the minimum cycle time [45]. Obviously, the time of cycle is inversely to the operating frequency. The lower the value of this product, the better is the quality of the corresponding FSM circuit.
The results of the comparison relative to this estimate are shown in Table 11. The time of cycle is represented in nanoseconds. So, the data inside Table 11 are represented as “number of LUTs × nsecs”.
As follows from Table 11, our approach leads to FSM circuits with better integral estimates than it is for other investigated methods. There is the following gain in the integral estimates: (1) 101.63% compared with P-Auto-based FSMs; (2) 133.62% compared with P-One-hot-based FSMs; (3) 45.54% compared with P-JEDI-based FSMs; (4) 16.89% compared with P T -based FSMs. This is connected with the fact that our approach produces two-level FSM circuits. At the same time, a circuit for each function for any level of logic requires only a single LUT.
To check the dependence of the integral estimate on the value of R S + L , we split Table 11 and built Table 12, which shows integral estimates for FSMs from classes 0, 1 and 2–4.
In the case of class 0 (Table 12), the circuits of P E –based FSMs have the worst values of integral estimates compared with the circuits produced by P-Auto, P-JEDI, and P T FSMs. There is the following loss: (1) 10.78% compared with P-Auto; (2) 12.78% compared with P-JEDI and (3) 12.41% compared with P T FSMs. So, the loss is approximately the same relative to these three models. However, P E –based FSMs have a gain of 27.07% compared with LUT-based P-One-hot FSMs.
Starting from simple FSMs (the class 1), our approach allows producing the circuits with significantly better integral estimates as their counterparts. There is the following gain in values of area-time products: (1) 63.38% compared with P-Auto-based FSMs; (2) 110.11% compared with P-One-hot-based FSMs; (3) 24.27% compared with P-JEDI-based FSMs; (4) 5.01% compared with P T -based FSMs. The gain is even greater for FSMs of classes 2–4. This is the following: (1) 124.33% compared with P-Auto-based FSMs; (2) 150.66% compared with P-One-hot-based FSMs; (3) 58.24% compared with P-JEDI-based FSMs; (4) 23.48% compared with P T -based FSMs. As we can see, starting from simple FSMs, the difference in the values of area-time products depends mainly on the architecture of FSM. Due to this, the circuits based on P T FSMs also benefit in comparison to P-Auto, P-One-hot, and P-JEDI FSMs.
As follows from results of experiments, using the model of P E FSMs allows obtaining FSM circuits with higher operating frequency and less values of integral estimates than they are for other investigated models. Winning starts already from simple FSMs for whom the following relation takes place: ( L + R S ) N I L U T > 0 . The gain from using our method increases as the difference between the number of FSM inputs and state variables, on one side, and the number of inputs of LUT, on the other side, increases.
In our previous papers [12,32,38,39], we have proposed various methods for improving characteristics of LUT-based Mealy FSMs. All these methods lead to three-level FSM circuits. In [12], there is proposed a synthesis method based on the twofold state assignment and one-hot encoding of outputs. In [32], there is proposed a synthesis method based on the twofold state assignment and encoding of collections of outputs. These methods are also shown in our book [11]. In [38], there is proposed a synthesis method based on the replacement of FSM inputs and encoding of collections of outputs. In [39], there is proposed a synthesis method based on the transformation of codes of collections of outputs into state codes.
To compare our new results with results [12,32,38,39], we have created three additional tables. Table 13 includes the experimental results for the number of LUTs. Table 14 shows the results for the maximum operating frequency. Table 15 contains the products of LUT counts by propagation times.
As follows from Table 13, the circuits of [38]-based FSMs use a minimal number of LUTs compared to other investigated methods. The P E -based FSMs require 7.7% more LUTs than equivalent [12]–based FSMs, 1.32% more LUTs than [32]–based FSMs, and 8.54% more LUTs than [38]-based FSMs. Our approach consumes fewer LUTs than it is [39]-based FSMs (0.49% of gain).
As follows from Table 14, our approach allows obtaining FSM circuits with the highest operating frequency. There is the following gain in operating frequency: (1) 15.9% compared with [12]–based FSMs; (2) 26.43% compared with [32]-based FSMs; (3) 27.81% compared with [38]-based FSMs; (4) 17.44% compared with [39]-based FSMs.
As follows from Table 15, the circuits of P E -based FSMs have the best values of area-time products. There is the following gain: (1) 16.89% compared with [12]–based FSMs; (2) 47.89% compared with [32]-based FSMs; (3) 40.87% compared with [38]-based FSMs; (4) 29.88% compared with [39]-based FSMs.
We have proposed the method based on extended state codes to improve the time characteristics of P T FSMs. Note that the gain in frequency is accompanied by a slight increase in the number of LUTs compared with equivalent P T FSMs. We think that our approach can be used instead of P T FSMs if the performance is the main criterion for the optimality of LUT-based FSM circuits.

7. Conclusions

Modern FPGAs have up to 7 billion transistors [13]. It means that a very complex digital system may be implemented using a single FPGA chip. The complexity of the implemented systems is constantly increasing, but the number of LUT inputs remains very small. As research [30,31] states, there is no sense in having LUTs with more than 6 inputs. If an FSM circuit is represented by functions for which the condition (5) is violated, then the technology mapping is based on applying various methods of functional decomposition. In turn, this leads to multi-level LUT-based FSM circuits having complicated systems of interconnections.
The characteristics of LUT-based FSM circuits may be improved using various methods of structural decomposition [11]. Very often, FSM circuits based on the structural decomposition have much better characteristics compared with their counterparts based on the functional decomposition [11,12,38,39]. Our research [12] shows that LUT-based Mealy FSM circuits with the twofold state assignment have better characteristics (fewer LUTs and lower power consumption) than their counterparts based on functional decomposition. However, to apply this approach, it is necessary to create the extended state codes. It leads to using a block of state code transformer adding some delay in the cycle time.
In our current article, we propose to use only extended state codes for the state assignment. As a result, we propose a structural diagram and the design method of P E Mealy FSMs. The elimination of code transformer allows increasing the maximum operating frequency in comparison with P T –based FSMs. In P E Mealy FSMs, outputs o n O are produced simultaneously with functions D r D . As a result, we achieved an increase in operating frequency (up to 23.48%) accompanied by a small increase (up to 12.68%) in the FPGA resources used.
The results of experiments show that the performance gain increases as the complexity of an FSM (the number of FSM inputs and state variables) increases. At the same time, the increase in the FSM complexity leads to a decrease in the loss in the number of LUTs. Furthermore, our approach provides better area-time products starting from FSMs for which the total number of inputs and state variables exceeds twice the number of inputs N I L U T .

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T., K.K. and S.S.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T., K.K. and S.S.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T., K.K. and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
DSTdirect structure table
ESCextended state code
FDfunctional decomposition
FSMfinite state machine
FPGAfield-programmable gate array
IMFinput memory function
LUTlook-up table
SBFsystems of Boolean functions
SDstructural decomposition
SOPsum-of-products
SRGstate register
STGstate transition graph
STTstate transition table
N I L U T number of LUT inputs
I = { i 1 , , i L } set of FSM inputs
O = { o 1 , , o N } set of FSM outputs
S = { s 1 , , s M } set of FSM states
Lnumber of inputs
Nnumber of outputs
Mnumber of states
Hnumber of interstate transitions
K ( s m ) binary code of state s m S
R s number of state variables in K ( s M )
T = { T 1 , , T R s } set of state variables
D = { D 1 , , D R s } set of input memory functions
F h product term corresponding to h-th row of DST
s m conjunction of state variables corresponding to code of state s m S
N L ( f j ) number of literals in SOP of function f j
N ( F ) number of additional functions different from f j D O
Fset of additional functions
π S = { S 1 , , S K } partition of the set of FSM states
S k π S class number k of partition π S = { S 1 , , S K }
C ( s m ) code of state as an element of a class S k
R ( S k ) number of bits necessary to encode states from a class S k
L k number of inputs determining transitions from states of class S k
τ r τ k state variables encoding states s m S k
R 0 total number of state variables encoding states as elem. of partition π S = { S 1 , , S K }
I k I set of inputs causing transitions from states s m S k
O k O set of outputs generating during transitions from states s m S k
D k D set of input memory functions generating during transitions from states s m S k
E C ( s m ) extended state code of state s m
Knumber of classes of partition π S = { S 1 , , S K }
δ ( S k ) set of states of transitions from the states s m S k

References

  1. Sklyarov, V.; Skliarova, I.; Barkalov, A.; Titarenko, L. Synthesis and Optimization of FPGA-Based Systems; Springer: Berlin, Germany, 2014. [Google Scholar]
  2. Branco, S.; Ferreira, A.G.; Cabral, J. Machine Learning in Resource-Scarce Embedded Systems, FPGAs, and End-Devices: A Survey. Electronics 2019, 8, 1289. [Google Scholar] [CrossRef] [Green Version]
  3. Zajac, W.; Andrzejewski, G.; Krzywicki, K.; Królikowski, T. Finite State Machine Based Modelling of Discrete Control Algorithm in LAD Diagram Language with Use of New Generation Engineering Software. Proc. Comput. Sci. 2019, 159, 2560–2569. [Google Scholar] [CrossRef]
  4. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw–Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  5. Krzywicki, K.; Barkalov, A.; Andrzejewski, G.; Titarenko, L.; Kolopienczyk, M. SoC research and development platform for distributed embedded systems. Przegląd Elektrotechniczny 2016, 92, 262–265. [Google Scholar] [CrossRef] [Green Version]
  6. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  7. Andrzejewski, G.; Zajac, W.; Krzywicki, K.; Królikowski, T. On some aspects of Concurrent Control Processes Modelling and Implementation in LAD Diagram Language With Use of New Generation Engineering Software. Proc. Comput. Sci. 2020, 176, 2173–2183. [Google Scholar] [CrossRef]
  8. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  9. Skorupski, M. Analysis of Influence of the State Assignment on Area of Microprogram Control Units. Master’s Thesis, Univesity of Zielona Gora, Zielona Gora, Poland, 2020. [Google Scholar]
  10. Gajski, D.D.; Abdi, S.; Gerstlauer, A.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  11. Barkalov, A.; Titarenko, L.; Mielcarek, K.; Chmielewski, S. Logic Synthesis for FPGA-Based Control Units—Structural Decomposition in Logic Design; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  12. Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristics of LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar]
  13. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
  14. Altera. Cyclone IV Device Handbook. Available online: http://www.altera.com/literature/hb/cyclone-iv/cyclone4-handbook.pdf (accessed on 15 February 2021).
  15. Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 15 February 2021).
  16. Wang, Z.; Tang, Q.; Guo, B.; Wei, J.-B.; Wang, L. Resource Partitioning and Application Scheduling with Module Merging on Dynamically and Partially Reconfigurable FPGAs. Electronics 2020, 9, 1461. [Google Scholar] [CrossRef]
  17. Zhang, F.; Guo, C.; Zhang, S.; Chen, L.; Li, X.; Sun, H.; Meng, Y.; Chen, Q. Research on Hex Programmable Interconnect Points Test in Island-Style FPGA. Electronics 2020, 9, 2177. [Google Scholar] [CrossRef]
  18. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef] [Green Version]
  19. Minns, P.; Elliot, I. FSM-Based Digital Design Using Verilog HDL; JohnWiley and Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  20. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  21. Intel FPGAs and Programmable Devices. Available online: https://www.intel.pl/content/www/pl/pl/products/programmable.html (accessed on 15 February 2021).
  22. Kuon, I.; Tessier, R.; Rose, J. FPGA architecture: Survey and challenges—Found trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
  23. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  24. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  25. Machado, L.; Cortadella, J. Support-reducing decomposition for FPGA mapping. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 39, 213–224. [Google Scholar] [CrossRef]
  26. Kubica, M.; Kania, D. Decomposition of multi-level functions oriented to configurability of logic blocks. Bull. Pol. Acad. Sci. 2017, 67, 317–331. [Google Scholar]
  27. Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA Performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA”18), Monterey, CA, USA, 25–27 February 2018; p. 6. [Google Scholar] [CrossRef]
  28. Rawski, M.; Łuba, T.; Jachna, Z.; Tomaszewicz, P. The Influence of Functional Decomposition Onmodern Digital Design Process. In Design of Embedded Control Systems; Springer: Boston, MA, USA, 2005; pp. 193–203. [Google Scholar]
  29. Mishchenko, A.; Brayton, R.; Jiang, J.H.R.; Jang, S. Scalable do not-care-based logic optimization and resynthesis. ACM Trans. Reconfig. Technol. Syst. TRETS 2011, 4, 1–23. [Google Scholar]
  30. Salauyou, V.; Ostapczuk, M. State Assignment of Finite-State Machines by Using the Values of Output Variables. In Theory and Applications of Dependable Computer Systems. DepCoS-RELCOMEX 2020. Advances in Intelligent Systems and Computing; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Springer: Cham, Switzerland, 2020; Volume 1173, pp. 543–553. [Google Scholar]
  31. Kilts, S. Advanced FPGA Design: Architecture, Implementation, and Optimization; Wiley-IEEE Press: Hoboken, NJ, USA, 2007. [Google Scholar]
  32. Barkalov, O.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar] [CrossRef] [Green Version]
  33. Baranov, S.I. Logic and System Design of Digital Systems; TUT Press: Tallinn, Estonia, 2008. [Google Scholar]
  34. Sklarova, D.; Sklarov, V.A.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  35. Sklyarov, V. Synthesis and implementation of RAM-based finite state machines in FPGAs. In International Workshop on Field Programmable Logic and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 718–727. [Google Scholar]
  36. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  37. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  38. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  39. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics 2020, 9, 1859. [Google Scholar] [CrossRef]
  40. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  41. Opara, A.; Kubica, M.; Kania, D. Strategy of Logic Synthesis using MTBDD dedicated to FPGA. Integr. VLSI J. 2018, 62, 142–158. [Google Scholar] [CrossRef]
  42. Baranov, S. Synthesis of Control Automaton. In Logic Synthesis for Control Automata; Springer: Boston, MA, USA, 1994; pp. 96–140. [Google Scholar]
  43. Klimovich, A.S.; Solovev, V.V. Minimization of mealy finite-state machines by internal states gluing. J. Comput. Syst. Sci. Int. 2012, 51, 244–255. [Google Scholar] [CrossRef]
  44. El-Maleh, A.H. A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization. Integr. VLSI J. 2017, 56, 32–43. [Google Scholar] [CrossRef]
  45. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  46. Benini, L.; De Micheli, G. State assignment for low power dissipation. IEEE J. Solid State Circuits 1995, 30, 258–268. [Google Scholar] [CrossRef]
  47. Villa, T.; Kam, T.; Brayton, R.K.; Sangiovanni-Vincentelli, A. Synthesis of Finite State Machines: Logic Optimization; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
  48. De Micheli, G.; Brayton, R.K.; Sangiovanni-Vincentelli, A. Optimal state assignment for finite state machines. IEEE Trans. Comp. Aided Des. Integr. Circuits Syst. 1985, 4, 269–285. [Google Scholar] [CrossRef] [Green Version]
  49. Rawski, M.; Selvaraj, H.; Łuba, T. An application of functional decomposition in ROM-based FSM implementation in FPGA devices. J. Syst. Archit. 2005, 51, 423–434. [Google Scholar] [CrossRef]
  50. ABC System. Available online: https://people.eecs.berkeley.edu/~alanmi/abc/ (accessed on 15 February 2021).
  51. Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification; Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar]
  52. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 15 February 2021).
  53. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 15 February 2021).
  54. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  55. Sentowich, E.M.; Singh, K.J.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.K.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  56. Sutter, G.; Todorovich, E.; López-Buedo, S.; Boemo, E. Low-power FSMs in FPGA: Encoding alternatives. In Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation; Springer: Berlin/Heidelberg, Germany, 2002; pp. 363–370. [Google Scholar]
  57. Solov’ev, V.V. Changes in the length of internal state codes with the aim at minimizing the power consumption of finite-state machines. J. Commun. Technol. Electron. 2012, 57, 642–648. [Google Scholar] [CrossRef]
  58. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  59. Xilinx Inc. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019; Available online: https://www.xilinx.com/support/documentation/boards_and_kits/vc709/ug887-vc709-eval-board-v7-fpga.pdf (accessed on 11 March 2019).
Figure 1. State transition graph of Mealy FSM A 0 .
Figure 1. State transition graph of Mealy FSM A 0 .
Electronics 10 00901 g001
Figure 2. Structural diagram of P Mealy FSM.
Figure 2. Structural diagram of P Mealy FSM.
Electronics 10 00901 g002
Figure 3. Structural diagram of P T Mealy FSM.
Figure 3. Structural diagram of P T Mealy FSM.
Electronics 10 00901 g003
Figure 4. Structural diagram of P E Mealy FSM.
Figure 4. Structural diagram of P E Mealy FSM.
Electronics 10 00901 g004
Figure 5. Extended state codes of P E FSM A 0 .
Figure 5. Extended state codes of P E FSM A 0 .
Electronics 10 00901 g005
Figure 6. Logic circuit of FSM P E ( A 0 ) .
Figure 6. Logic circuit of FSM P E ( A 0 ) .
Electronics 10 00901 g006
Table 1. STT of Mealy FSM A 0 .
Table 1. STT of Mealy FSM A 0 .
s m s T I h O h h
s 1 s 2 i 1 o 1 o 2 1
s 2 i 1 ¯ i 2 o 6 2
s 3 i 1 ¯ i 2 ¯ o 2 o 4 3
s 2 s 3 i 4 o 3 4
s 4 i 4 ¯ o 5 5
s 3 s 1 i 2 i 3 o 4 o 6 6
s 5 i 2 i 3 ¯ o 1 o 2 7
s 5 i 2 ¯ o 7 8
s 4 s 2 i 2 i 3 o 2 o 6 9
s 4 i 2 i 3 ¯ o 4 o 7 10
s 6 i 2 o 1 11
s 5 s 6 1 o 3 12
s 6 s 1 i 4 -13
s 4 i 4 ¯ i 5 o 1 o 3 14
s 6 i 4 ¯ i 5 ¯ o 5 15
Table 2. DST of P E FSM A 0 .
Table 2. DST of P E FSM A 0 .
s m EC ( s m ) s T EC ( s T ) I h O h D h h
s 1 0100 s 2 0010 i 1 o 1 o 2 D 3 1
s 2 0010 i 1 ¯ i 2 o 6 D 3 2
s 3 1000 i 1 ¯ i 2 ¯ o 2 o 4 D 1 3
s 2 0010 s 3 1000 i 4 o 3 D 1 4
s 4 1100 i 4 ¯ o 5 D 1 D 2 5
s 3 1000 s 1 0100 i 2 i 3 o 4 o 6 D 2 6
s 5 0011 i 2 i 3 ¯ o 1 o 2 D 3 D 4 7
s 5 0011 i 2 ¯ o 7 D 3 D 4 8
s 4 1100 s 2 0010 i 2 i 3 o 2 o 6 D 3 9
s 4 1100 i 2 i 3 ¯ o 4 o 7 D 1 D 2 10
s 6 0001 i 2 ¯ o 1 D 4 11
s 5 0011 s 6 00011 o 3 D 4 12
s 6 0001 s 1 0100 i 4 D 2 13
s 4 1100 i 4 ¯ i 5 o 1 o 3 D 1 D 2 14
s 6 0001 i 4 ¯ i 5 ¯ o 5 D 4 15
Table 3. Table of B l o c k S 1 .
Table 3. Table of B l o c k S 1 .
s m EC ( s m ) s T EC ( s T ) I h O h D h h
s 1 0100 s 2 0010 i 1 o 1 o 2 D 3 1
s 2 0010 i 1 ¯ i 2 o 6 D 3 2
s 3 1000 i 1 ¯ i 2 ¯ o 2 o 4 D 1 3
s 3 1000 s 1 0100 i 2 i 3 o 4 o 6 D 2 4
s 5 0011 i 2 i 3 ¯ o 1 o 2 D 3 D 4 5
s 5 0011 i 2 ¯ o 7 D 3 D 4 6
s 4 1100 s 2 0010 i 2 i 3 o 2 o 6 D 3 7
s 4 1100 i 2 i 3 ¯ o 4 o 7 D 1 D 2 8
s 6 0001 i 2 ¯ o 1 D 4 9
Table 4. Table of B l o c k S 2 .
Table 4. Table of B l o c k S 2 .
s m EC ( s m ) s T EC ( s T ) I h O h D h h
s 2 0010 s 3 1000 i 4 o 3 D 1 1
s 4 1100 i 4 ¯ o 5 D 1 D 2 2
s 5 0011 s 6 00011 o 3 D 4 3
s 6 0001 s 1 0100 i 4 - D 2 4
s 4 1100 i 4 ¯ i 5 o 1 o 3 D 1 D 2 5
s 6 0001 i 4 ¯ i 5 ¯ o 5 D 4 6
Table 5. Table of B l o c k τ O .
Table 5. Table of B l o c k τ O .
Function Block Function Block
S1S2 S1S2
D 1 11 o 3 01
D 2 11 o 4 10
D 3 10 o 5 01
D 4 11 o 6 10
o 1 11 o 7 10
o 2 10---
Table 6. Characteristics of Mealy FSM benchmarks.
Table 6. Characteristics of Mealy FSM benchmarks.
BenchmarkLN R S + LM/ R S HClass
bbara42812/4601
bbsse771226/5561
bbtas2269/4240
beecount34710/4281
cse771232/5911
dk1435826/5561
dk1535817/5321
dk1623975/71081
dk1723616/4320
dk2712510/4140
dk51213624/5150
donfile21724/5961
ex19191680/71382
ex222725/5721
ex322614/4360
ex4691118/5211
ex522616/4320
ex658914/4341
ex7221217/5361
keyb771222/51701
kirkman1261848/63702
lion2155/3110
lion921611/4250
mark15161022/5221
mc3568/3100
modulo1211512/4240
opus561018/5221
planet7191486/71152
planet17191486/71152
pma881449/6732
s1871454/61062
s148881915112/72512
s149481915118/72502
s1a861586/71072
s2081121737/61532
s2741811/4341
s386771223/5641
s42019227137/81374
s51019727172/8774
s841815/4201
s82018192578/72324
s83218192576/72454
sand1191888/71843
shiftreg11516/4160
sse771226/5561
styr9101667/71662
tma791363/6442
Table 7. Experimental results (the number of LUTs).
Table 7. Experimental results (the number of LUTs).
BenchmarkP-AutoP-One-HotP-JEDI P T FSM P E FSM
Class 0
bbtas55555
dk17512555
dk2735464
dk5121010989
ex399989
ex5999810
lion25224
lion9611556
mc47446
modulo1277777
shiftreg26246
Class 1
bbara1717101113
bbsse3337242226
beecount1919141214
cse4066363234
dk141627101212
dk1515161269
dk161534121011
donfile3131241921
ex299889
ex41513121011
ex62436222022
ex745446
keyb4361403738
mark12323201820
opus2828222325
s27618668
s3862639221822
s89991012
sse3337302629
Classes 2–4
ex17074534244
kirkman4258393537
planet131131888087
planet1131131888087
pma9494867880
s16599615761
s14881241311089296
s14941261321109494
s1a4981434147
s208123110911
styr93120817379
tma4539393336
sand132132114101108
s42010319810
s5104848322931
s8208882685859
s8328079625461
Total18082104148913301441
Percentage,%125.47146.01103.3392.30100.00
Table 8. Summarized results for FSM classes (the number of LUTs).
Table 8. Summarized results for FSM classes (the number of LUTs).
ClassP-AutoP-One-HotP-JEDI P T FSM P E FSMTotal Percentage
06286616271Total%
87.32121.1385.9287.32100.00
1406525337304342Total%
118.71153.5198.5488.89100.00
2–41340149310919641028Total%
130.35145.23106.1393.77100.00
Table 9. Experimental results (maximum operating frequency, MHz).
Table 9. Experimental results (maximum operating frequency, MHz).
BenchmarkP-AutoP-One-HotP-JEDI P T FSM P E FSM
Class 0
bbtas204.16204.16206.12200.38200.38
dk17199.28167199.39199.87199.87
dk27206.02201.9204.18196.65196.65
dk512196.27196.27199.75208.17208.17
ex3194.86194.86195.76201.12201.12
ex5180.25180.25181.16182.01182.01
lion202.43204202.35200.18200.18
lion9205.3185.22206.38207.13207.13
mc196.66195.47196.87196.12196.12
modulo12207207207.13208.12208.12
shiftreg262.67263.57276.26256.69256.69
Class 1
bbara193.39193.39212.21210.37252.44
bbsse157.06169.12182.34198.65238.38
beecount166.61166.61187.32201.43241.72
cse146.43163.64178.12206.55247.86
dk14191.64172.65193.85186.53223.84
dk15192.53185.36194.87189.14226.97
dk16169.72174.79197.13211.52253.82
donfile184.03184203.65231.63248.19
ex2198.57198.57200.14201.34241.61
ex4180.96177.71192.83197.76237.31
ex6169.57163.8176.59198.65238.35
ex7200.04200.84200.6200.69240.83
keyb156.45143.47168.43187.48224.98
mark1162.39162.39176.18189.58227.47
opus166.2166.2178.32177.84213.4
s27198.73191.5199.13198.76238.53
s386168.15173.46179.15182.63218.87
s8180.02178.95181.23178.32213.65
sse157.06169.12174.63189.64205.41
Classes 2–4
ex1150.94139.76176.87212.94276.82
kirkman141.38154156.68174.73227.15
planet132.71132.71187.14193.49251.54
planet1132.71132.71187.14193.49251.54
pma146.18146.18169.83184.45239.83
s1146.41135.85157.16170.19221.47
s1488138.5131.94157.18187.95244.31
s1494149.39145.75164.34186.22242.05
s1a153.37176.4169.17178.84214.53
s208174.34176.46178.76196.37255.28
styr137.61129.92145.64178.65232.24
tma163.88147.8164.14181.22235.59
sand115.97115.97126.82163.18221.14
s420173.88176.46177.25181.62263.32
s510177.65177.65198.32209.36297.76
s820152153.16176.58192.14268.1
s832145.71153.23173.78192.87274.22
Total8127.088061.228718.879172.6610,906.96
Percentage, %74.5173.9179.9484.10100.00
Table 10. Summarized results for FSM classes (maximum operating frequency).
Table 10. Summarized results for FSM classes (maximum operating frequency).
ClassP-AutoP-One-HotP-JEDI P T FSM P E FSMTotal Percentage
02254.902199.702275.352256.442256.44Total %
99.9397.49100.84100.00100.00
13339.553335.573576.723738.514433.63Total %
75.3275.2380.6784.32100.00
2–42532.632525.952866.803177.714216.89Total %
60.0659.9067.9875.36100.00
Table 11. Experimental results (products of LUT counts by propagation times).
Table 11. Experimental results (products of LUT counts by propagation times).
BenchmarkP-AutoP-One-HotP-JEDI P T FSM P E FSM
Class 0
bbtas24.4924.4924.2624.9524.95
dk1725.0971.8625.0825.0225.02
dk2714.5624.7619.5930.5120.34
dk51250.9550.9545.0638.4343.23
ex346.1946.1945.9739.7844.75
ex549.9349.9349.6843.9554.94
lion9.8824.519.889.9919.98
lion929.2359.3924.2324.1428.97
mc20.3435.8120.3220.4030.59
modulo1233.8233.8233.8033.6333.63
shiftreg7.6122.767.2415.5823.37
Class 1
bbara87.9187.9147.1252.2951.50
bbsse210.11218.78131.62110.75109.07
beecount114.04114.0474.7459.5757.92
cse273.17403.32202.11154.93137.17
dk1483.49156.3951.5964.3353.61
dk1577.9186.3261.5831.7239.65
dk1688.38194.5260.8747.2843.34
donfile168.45168.48117.8582.0384.61
ex245.3245.3239.9739.7337.25
ex482.8973.1562.2350.5746.35
ex6141.53219.78124.58100.6892.30
ex720.0024.9019.9419.9324.91
keyb274.85425.18237.49197.35168.90
mark1141.63141.63113.5294.9587.92
opus168.47168.47123.37129.33117.15
s2730.1993.9930.1330.1933.54
s386154.62224.84122.8098.56100.52
s849.9950.2949.6656.0856.17
sse210.11218.78171.79137.10141.18
Classes 2–4
ex1463.76529.48299.66197.24158.95
kirkman297.07376.62248.91200.31162.89
planet987.11987.11470.24413.46345.87
planet1987.11987.11470.24413.46345.87
pma643.04643.04506.39422.88333.57
s1443.96728.74388.14334.92275.43
s1488895.31992.88687.11489.49392.94
s1494843.43905.66669.34504.78388.35
s1a319.49459.18254.18229.26219.08
s20868.83175.6855.9445.8343.09
styr675.82923.65556.17408.62340.17
tma274.59263.87237.60182.10152.81
sand1138.231138.23898.91618.95488.38
s42057.51175.6850.7844.0537.98
s510270.19270.19161.36138.52104.11
s820578.95535.39385.09301.86220.07
s832549.04515.56356.77279.98222.45
Total12,228.6114,168.648844.907089.456064.86
Percentage, %201.63233.62145.84116.89100.00
Table 12. Summarized results for FSM classes (products of LUT counts by propagation times).
Table 12. Summarized results for FSM classes (products of LUT counts by propagation times).
ClassP-AutoP-One-HotP-JEDI P T FSM P E FSMTotal Percentage
0312.09444.47305.10306.38349.79Total %
89.22127.0787.2287.59100.00
12423.083116.091842.981557.371483.07Total %
163.38210.11124.27105.01100.00
2–49493.4510,608.086696.835225.704232.00Total %
224.33250.66158.24123.48100.00
Table 13. Comparison with our previous works (the number of LUTs).
Table 13. Comparison with our previous works (the number of LUTs).
Benchmark[12][32][38][39] P E FSM
Class 0
bbtas58895
dk17598105
dk2768794
dk51281212149
ex381211149
ex5813101210
lion26684
lion9578106
mc48686
modulo127109117
shiftreg46466
Class 1
bbara1111101413
bbsse2223262926
beecount1213141614
cse3235333534
dk141212121412
dk156116119
dk161012111311
donfile1919212421
ex28108109
ex41013111311
ex62024212322
ex748686
keyb3742374038
mark11823192120
opus2322212325
s2766688
s3861824202222
s810891112
sse2626262929
Classes 2–4
ex14242404444
kirkman3541333537
planet8080788287
planet18080788287
pma7874727680
s15754545861
s14889292899396
s14949493909494
s1a4142384247
s20891191111
styr7381707879
tma3331303436
sand10110099103108
s42081081010
s5102931222331
s8205859525659
s8325460505261
Total13301422131814481441
Percentage, %92.3098.6891.46100.49100.00
Table 14. Comparison with our previous works (maximum operating frequency, MHz).
Table 14. Comparison with our previous works (maximum operating frequency, MHz).
Benchmark[12][32][38][39] P E FSM
Class 0
bbtas200.38198.18194.43201.47200.38
dk17199.87147.21147.22172.99199.87
dk27196.65184.61181.73190.32196.65
dk512208.17175.02175.63187.45208.17
ex3201.12176.95174.44187.26201.12
ex5182.01169.39162.56162.56182.01
lion200.18188.13185.74195.73200.18
lion9207.13172.57167.28183.45207.13
mc196.12177.62178.02182.95196.12
modulo12208.12190.99189.70201.74208.12
shiftreg256.69251.75248.79253.72256.69
Class 1
bbara210.37184.15183.32210.21252.44
bbsse198.65162.62159.24193.43238.38
beecount201.43156.44156.72194.47241.72
cse206.56157.87153.24182.62247.86
dk14186.53161.11162.78201.39223.84
dk15189.14177.38175.42206.74226.97
dk16211.52165.78164.16199.14253.82
donfile231.63179.63174.28206.83248.19
ex2201.34192.45188.95196.58241.61
ex4197.76169.77168.39196.18237.31
ex6198.65170.55156.42187.53238.35
ex7200.69199.19191.43204.16240.83
keyb187.48140.41136.49178.59224.98
mark1189.58157.61153.48182.37227.47
opus177.84158.49157.42186.34213.40
s27198.76187.47185.15201.26238.53
s386182.63167.92164.65192.34218.87
s8178.32171.46168.32191.32213.65
sse189.64165.31158.14171.18205.41
Classes 2–4
ex1212.93167.20164.32180.72276.82
kirkman174.73157.37155.36184.62227.15
planet193.49176.91174.68212.45251.54
planet1193.49175.83173.29212.45251.54
pma184.45162.11156.12192.43239.83
s1170.19157.64145.32145.32221.47
s1488187.95142.77141.27182.14244.31
s1494186.22156.43155.63186.49242.05
s1a178.84168.76166.36188.92214.53
s208196.37168.33166.42192.15255.28
styr178.65120.14118.02164.52232.24
tma181.22139.34137.48182.72235.59
sand163.18127.72120.07143.14221.14
s420181.62187.21186.35218.62263.32
s510209.36201.54199.05221.19297.76
s820192.14176.99175.69195.73268.10
s832192.87179.83174.39199.18274.22
Total9172.658024.157917.109005.1110,906.96
Percentage, %84.1073.5772.1982.56100.00
Table 15. Comparison with our previous works (products of LUT counts by propagation times).
Table 15. Comparison with our previous works (products of LUT counts by propagation times).
Benchmark[12][32][38][39] P E FSM
Class 0
bbtas24.9540.3741.1544.6724.95
dk1725.0261.1454.3457.8125.02
dk2730.5143.3338.5247.2920.34
dk51238.4368.5668.3374.6943.23
ex339.7867.8263.0674.7644.75
ex543.9576.7561.5273.8254.94
lion9.9931.8932.3040.8719.98
lion924.1440.5647.8254.5128.97
mc20.4045.0433.7043.7330.59
modulo1233.6352.3647.4454.5333.63
shiftreg15.5823.8316.0823.6523.37
Class 1
bbara52.2959.7354.5566.6051.50
bbsse110.75141.43163.28149.93109.07
beecount59.5783.1089.3382.2757.92
cse154.92221.70215.35191.65137.17
dk1464.3374.4873.7269.5253.61
dk1531.7262.0134.2053.2139.65
dk1647.2872.3967.0165.2843.34
donfile82.03105.77120.50116.0484.61
ex239.7351.9642.3450.8737.25
ex450.5776.5765.3266.2746.35
ex6100.68140.72134.25122.6592.30
ex719.9340.1631.3439.1824.91
keyb197.35299.12271.08223.98168.90
mark194.95145.93123.79115.1587.92
opus129.33138.81133.40123.43117.15
s2730.1932.0132.4139.7533.54
s38698.56142.93121.47114.38100.52
s856.0846.6653.4757.5056.17
sse137.10157.28164.41169.41141.18
Classes 2–4
ex1197.25251.20243.43243.47158.95
kirkman200.31260.53212.41189.58162.89
planet413.46452.21446.53385.97345.87
planet1413.46454.98450.11385.97345.87
pma422.88456.48461.18394.95333.57
s1334.92342.55371.59399.12275.43
s1488489.49644.39630.00510.60392.94
s1494504.78594.52578.29504.05388.35
s1a229.26248.87228.42222.32219.08
s20845.8365.3554.0857.2543.09
styr408.62674.21593.12474.11340.17
tma182.10222.48218.21186.08152.81
sand618.95782.96824.52719.58488.38
s42044.0553.4242.9345.7437.98
s510138.52153.82110.52103.98104.11
s820301.86333.35295.98286.11220.07
s832279.98333.65286.71261.07222.45
Total7089.458969.408543.537877.316064.86
Percentage, %116.89147.89140.87129.88100.00
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment. Electronics 2021, 10, 901. https://doi.org/10.3390/electronics10080901

AMA Style

Barkalov A, Titarenko L, Krzywicki K, Saburova S. Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment. Electronics. 2021; 10(8):901. https://doi.org/10.3390/electronics10080901

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Kazimierz Krzywicki, and Svetlana Saburova. 2021. "Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment" Electronics 10, no. 8: 901. https://doi.org/10.3390/electronics10080901

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop