Next Article in Journal
AlexNet Convolutional Neural Network for Disease Detection and Classification of Tomato Leaf
Next Article in Special Issue
Novel Approach and Methods for Optimizing Highly Sensitive Low Noise Amplifier CMOS IC Design for Congested RF Environments
Previous Article in Journal
A High-Loop-Gain Low-Dropout Regulator with Adaptive Positive Feedback Compensation Handling 1-A Load Current
Previous Article in Special Issue
Low-Cost Data Acquisition System for Solar Thermal Collectors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits

by
Alexander Barkalov
1,2,
Larysa Titarenko
1,3,
Kazimierz Krzywicki
4,* and
Svetlana Saburova
3
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
2
Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University, 600-richya str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzów Wielkopolski, Poland
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(6), 950; https://doi.org/10.3390/electronics11060950
Submission received: 17 February 2022 / Revised: 15 March 2022 / Accepted: 16 March 2022 / Published: 18 March 2022
(This article belongs to the Special Issue Feature Papers in Circuit and Signal Processing)

Abstract

:
One of the very important problems connected with FPGA-based design is reducing the hardware amount in implemented circuits. In this paper, we discuss the implementation of Mealy finite state machines (FSMs) by circuits consisting of look-up tables (LUT). A method is proposed to reduce the LUT count of three-block circuits of Mealy FSMs. The method is based on finding a partition of set of internal states by classes of compatible states. To reduce the LUT count, we propose a special kind of state code, named complex state codes. The complex codes include two parts. The first part includes the binary codes of a state as the element of some partition class. The second part consists of the code of corresponding partition class. Using complex state codes allows obtaining FPGA-based FSM circuits with exactly four logic blocks. If some conditions hold, then any FSM function from the first and second blocks is implemented by a single LUT. The third level is represented as a network of multiplexers. These multiplexers generate either additional variable encoding collections of outputs or input memory functions. The fourth level generates FSM outputs. An example of synthesis and experimental results is shown and discussed. The experiments prove that the proposed approach allows reducing hardware compared to such methods as auto and one-hot of Vivado, JEDI. Further, the proposed approach produces circuits with fewer LUTs than for three-level Mealy FSMs based on joint use of several methods of structural decomposition. The experiments show that our approach allows reducing the LUT counts on average from 11 to 77 percent. As the complexity of an FSM increases, the gain from the application of the proposed method grows; the same is true for both the FSM performance and power consumption.

1. Introduction

The behavior of a sequential device can be represented by the model of a Mealy finite state machine (FSM) [1,2]. This stimulates constant development of various methods of designing Mealy FSM logic circuits [2,3]. As a rule, these methods are aimed at optimizing one or more basic characteristics of resulting FSM circuits [4]. There are three basic characteristics, namely: (1) the chip area occupied by an FSM circuit), (2) the operating frequency, and (3) the power consumption; however, as a rule, it is impossible to optimize these three characteristics at the same time. For example, a decrease in the required internal resources (the required chip area) is often associated with a decrease in the maximum operating frequency [2]. As it is known, the occupied chip area significantly affects other characteristics of an FSM circuit [5]. At the same time, it is important that reducing the area as little as possible increases the delay time of the circuit. As it is known [6], the major challenge in the LUT-based FSM design is developing a low-area circuit without the compromising an FSM performance. In this paper, we propose a method to create Mealy FSMs whose three-level circuits are implemented using internal resources of field-programmable gate arrays (FPGAs) [7,8]. The proposed approach belongs to methods of structural decomposition [2].
Recently, more and more digital systems are implemented using FPGA chips [9]. The analysis of VLSI’ market shows that Xilinx [10] is the largest manufacturer of FPGA chips. This fact explains the orientation of our article to FPGAs of Xilinx. We discuss a case when an FSM circuit is implemented using internal resources of FPGAs such as look-up table (LUT) elements, programmable flip-flops, inter-slice multiplexers, programmable interconnects, synchronization tree, and programmable input–outputs.
Our current article is devoted to improving the LUT count of three-block LUT-based Mealy FSM circuits obtained with the simultaneous usage of the replacement of FSM inputs and the encoding of collections of FSM outputs [11]. The resulting FSM circuits have three blocks of LUTs; each block has a unique system of inputs and outputs. When certain conditions are met, the circuit of some (or even all) logic block is synthesized using the methods of functional decomposition [12,13]. Such blocks are represented by circuits having several levels of LUTs. This leads to significant decrease in the FSM operating frequency. Moreover, the interconnection system of a multi-level block becomes dramatically more complex, which leads to a further decrease in the FSM performance. This is why it is so important to reduce the number of levels in each logic block of FSM circuits.
The main contribution of this paper is a novel design method aimed at reducing the number of LUTs and their levels in circuits of three-block LUT-based Mealy FSMs. The reduction diminishes the total number of LUTs in an FSM circuit compared to this number for equivalent FSMs based on the functional decomposition. To apply our method, it is necessary to construct classes of compatible states. This in turn leads to an increase in the number of state variables compared to their minimum number. To reduce the number of state variables, we propose a new type of state codes. We name them complex state codes (CSC). A CSC of any state includes two parts. The first part is a code of a class of compatible states including the particular state. The second part is a code of this state as an element of a particular class. Our method produces four-block FSM circuits. In the best case, each block is represented by a single-level LUT-based circuit. As experimental results show, the proposed approach also provides the performance at the level of three-block FSMs and reduces the power consumption. These phenomena are additional positive qualities of the proposed method.
The further text of the paper includes five sections. Section 2 shows the background of LUT-based Mealy FSMs. Section 3 analyses the related works. The main idea of the proposed method is shown in Section 4. An example of a CSC-based FSM synthesis is shown in Section 5. Section 6 analyses the results of experiments. The paper ends with a short conclusion.

2. Basic Information

A Mealy FSM logic circuit can be represented by two systems of Boolean functions (SBFs) [14]. One of these SBFs represents FSM outputs connected with operational units of a particular digital system. The second SBF represents input memory functions (IMFs). The arguments of these SBFs are external FSM inputs and internal state variables. The inputs form a set X = { x 1 , , x L } ; the IMFs create a set Φ = { D 1 , , D R } . An FSM circuit is represented by the following SBFs:
Y = Y ( T , X ) ;
Φ = Φ ( T , X ) .
The state variables T r T encode internal states from a set A = { a 1 , , a M } . To encode M states, the minimum number of state variables is determined as [1]
R = l o g 2 M .
Each state a m A is represented by a binary code K ( a m ) having R bits. These codes are kept into the state code register (RG). In this article, we discuss a case when the RG has informational inputs of D type. This is the most common case [15]. The systems (1) and (2) determine so called P Mealy FSM [2] shown in Figure 1.
In Figure 1, the block of IMFs is implemented using SBF (2); the block of outputs is based on SBF (1). The state register has R D flip-flops. The r-th flip-flop keeps the state variable T r T . The pulse S t a r t allows clearing the content of RG. This pulse loads a code of the initial state a 1 A into RG. As a rule, the code K ( a 1 ) consists of zeros. The pulse C l o c k shows an instant when the RG content can be changed by current IMFs.
As a rule, an FSM is represented by either a state transition table (STT) [1] or a state transition graph (STG) [5]. To obtain the systems (1) and (2), it is necessary to form an FSM direct structure table (DST) [14]. In this article, we start from the STG. Next, this graph is transformed into the equivalent STT. Using the STT, we construct the DST.
An STG is a directed graph whose nodes correspond to FSM states. Interstate transitions are represented by edges of STG. Each edge is marked by a combination of inputs causing a particular transition and collection of outputs (COs) generated during this transition. An STT is a representation of STG as a list of interstate transitions. An STT includes five columns with: a current state a m ; a state of transition a T ; an input signal X h which is a conjunction of some inputs (or their complements) determining this particular transition; CO Y h generated during this transition; h is a number of transition ( h { 1 , , H } ) [1].
A DST includes the columns with state codes and IMFs [14]. These columns are: the code of the current state K ( a m ) , the code of the next state K ( a T ) , and a collection of IMFs D h Φ equal to 1 to load the code of the next state into the state register RG.
In this paper, we consider a case when internal resources of FPGA chips are used for implementing SBFs (1) and (2). An FSM circuit is implemented using configurable logic blocks (CLB) of FPGAs produced by Xilinx [10]. A circuit is represented as a network of CLBs connected with help of a programmable routing matrix [16]. In this paper, we discuss a case when CLBs include LUTs, multiplexers, and programmable flip-flops. Using the notation [17], we denote as I L -LUT a single-output LUT with I L inputs. If a Boolean function depends on up to I L arguments, then it is represented by a single-LUT logic circuit. If the number of LUT inputs is less than the number of arguments, then a circuit has more than a single level of LUTs. To implement multi-level circuits, the methods of functional decomposition (FD) are used [18,19]. As a rule, the FD-based circuits have the complicated systems of “spaghetti-type” interconnections [2].
We discuss a case when each CLB is a part of slice [10]. The slice includes internal multiplexers. They can be used for changing the number of LUT inputs within one slice. The internal multiplexers are connected with LUTs by a system of fast inter-slice interconnections. Due to this, the delay time for 6-, 7-, and 8-input LUTs is practically the same for SLICEL of Virtex-7 [20,21]. This approach makes it possible to flexibly adapt the LUT parameters to the characteristics of the function being implemented. For example, the SLICEL of Virtex-7 includes four 6-LUTs, 8 flip-flops, and 27 multiplexers [20]. Each 6-LUT can be used as two 5-LUTs with shared inputs. This explains the presence of eight flip-flops in each SLICEL. Using internal multiplexers allows combining two 6-LUTs into a single 7-LUT. Next, four 6-LUTs can be combined into a single 8-LUT. The control inputs of multiplexers can be used as inputs of 7- and 8-LUTs. Each SLICEL possesses special carry chains used for organization of fast multi-bit adders. It is worth noting that these circuits can be used to implement arbitrary logic circuits [22,23].
In this paper, we use multiplexers to generate functions (1) and (2). We denote a multiplexer having K data inputs as K M X . Using a single 6-LUT, we can implement a circuit of 4 M X . It has two control inputs and four data inputs. Further, we can organize an 8 M X with the help of two 6-LUTs. Its circuit has only slightly bigger delay than a circuit of a 4 M X [20]. It is possible due to using fast interconnections inside a slice. If a 16 M X has the control inputs T 1 T 4 , then its circuit includes four 6-LUTs controlled by T 3 T 4 . To implement a 32 M X , two slices and inter-slice interconnections are used. As a result, a 32 M X is much slower than a 16 M X .
In LUT-based FSMs, the flip-flops of RG are distributed among LUTs generating functions (1). Due to this, the RG is “hidden” inside the slices where the IMFs are generated. There are two blocks in an LUT-based P Mealy FSM (Figure 2).
In this paper, we denote as LUTer a logic block consisting of LUT-based CLBs. In the P Mealy FSM (Figure 2), a L U T e r Y implements SBF (1) and L U T e r T implements SBF (2). To control the RG, the pulses S t a r t and C l o c k are used.
If each function from systems (1) and (2) depends on not more than I L arguments, then both blocks of LUT-based P Mealy FSM (Figure 2) are represented by single-level circuits. For Xilinx-based solutions, an LUT has 6 inputs [10]. There is no point in increasing this value, because I L = 6 provides the best balance for such LUT characteristics as the occupied chip area, performance and consumed power [16]; however, even for FSMs with average complexity [14], it could be up to 40 arguments in functions (1) and (2). Obviously, there is a distinct imbalance between such big number of arguments in SBFs representing FSM circuits and a fairly small value of LUT inputs. This imbalance requires improving synthesis methods of LUT-based FSMs.
Denote as N A ( f i ) the number of literals in a sum-of-product of function f i Φ Y . If the condition
N A ( f i ) > I L
holds, then it is impossible to represent the function f i Φ Y by a single-level circuit. In this case, it is very important to optimize the system of connections between different slices of an FSM circuit. This follows from the fact that more than 70% of the power consumption is due to the interconnections [2]. Moreover, time delays of the interconnection system are starting to play a major role in comparison with CLB delays [18]. The results of research [2] show that the optimization of interconnections leads to increasing the maximum operating frequency and reducing the power consumption of LUT-based FSM circuits. This can be performed, for example, using various methods of structural decomposition [2].

3. Related Work

There are a huge number of methods for improving characteristics of circuits targeting LUT-based FSMs. The survey of these methods can be found, for example, in [2,12]. These methods should be applied if the condition (4) holds [2]. These methods can improve either the LUT count or the maximum operating frequency or the power consumption [24]. Sometimes, these methods are looking for a solution that allows joint improvement of more than only one FSM circuit’s characteristic. In this paper, we propose a method for decreasing the number of LUTs (this is an LUT count) of FPGA-based Mealy FSMs.
This task can be solved using various methods of state assignment [25,26,27]. In these methods, the number of bits in the state codes ranges from the minimum determined by formula (3) to the maximum determined by the total number of states, M. If R = M , then this is a one-hot state assignment. These approaches are used in both academic and industrial CAD tools. The examples of academic systems are SIS [28] and ABC by Berkeley [29,30]. The examples of industrial systems are Vivado [31] of Xilinx and Quartus of Intel (Altera) [32].
Now, there is no single universal method of state encoding that provides the best possible characteristics of FSM circuits. The applicability of a particular method can be judged both by the required number of state variables (R) and by the number of FSM inputs (L). As follows from [33], the one-hot codes improve FSM characteristics if R > 4 . The rather small value of I L increases the influence of the value of L on the characteristics of LUT-based FSM circuits [2]. It is shown in [34] that it is better to use the codes with the minimum number of bits ( R = l o g 2 M ), if L > 10 .
So, in one case, it is better to use the one-hot state codes, and, in the other case, it is better to use the maximum binary state assignment with R = l o g 2 M ; therefore, it makes sense to compare several state assignment methods for the same FSM and find a method with the best characteristics. Due to this, we have compared the FSMs circuits produced by our proposed approach with FSM circuits produced by four other methods of state assignment. As a base for comparison, we use: the method of maximum state assignment Auto used in the CAD tool Vivado [31] by Xilinx; one-hot state assignment used in Vivado; the algorithm JEDI [28] which is one of the best methods of binary state assignment [19]. Our choice of Vivado is dictated by the fact that it operates with FPGAs of Xilinx. Further, we compared FSM circuits produced by our approach with three-block FSM circuits [11].
In this paper, we propose a method leading to four-block FSM circuits. It belongs to the methods of structural decomposition (SD) [2]. The main idea of these methods is the elimination of the direct connection between FSM outputs y n Y and IMFs D r Φ , on the one hand, and FSM inputs x l X and state variables T r T , on the other hand. The SD leads to an increase in the total number of implemented functions having significantly fewer arguments than functions (1) and (2). These methods are analyzed, for example, in [2].
In [11], we have proposed an optimization method based on the combined use of two structural decomposition methods. These methods are the replacement of FSM inputs and encoding of collections of outputs. This approach leads to so called M P Y Mealy FSMs. Let us discus these two methods.
The first method is based on the replacement of inputs x l X by additional variables p g P = { p 1 , , p G } , where G L . The method uses the fact that transitions from any state a m A depend on L m inputs, where L m L [14]. In the case of LUT-based FSMs, the variables p g P are generated by an additional block L U T e r P . This block implements the system
P = P ( T , X ) .
The second method is based on the fact that only a limited set of outputs is formed during transitions from any FSM state. Each transition is accompanied by generating some CO Y q Y , where q { 1 , , Q } . Each CO Y q Y is encoded by a binary code K ( Y q ) having R Y bits, where
R Y = l o g 2 Q .
To encode these COs, additional variables z r Z = { z 1 , , z R Y } are used.
In M P Y FSMs, a special block L U T e r Y generate FSM outputs as functions
Y = Y ( Z ) .
The variables z r Z and IMFs are generated by a block L U T e r T Z :
Z = Z ( T , P ) .
Φ = Φ ( T , P ) .
So, the structural diagram of an LUT-based M P Y Mealy FSM includes three blocks connected in series (Figure 3).
There is a hidden register RG inside the block L U T e r T Z . This explains why pulses C l o c k and S t a r t enter the block L U T e r T Z . Obviously, the informational inputs of D flip-flops are connected with IMFs D r Φ .
As shown in [11], such joint usage of two methods of SD leads to a significant decrease in the LUT count compared with other investigated methods; however, the gain in LUTs is significantly reduced if the condition (4) is met for the functions f i Φ Z , where these systems are represented as (8) and (9). In this case, the circuit of L U T e r T Z is designed using the methods of functional decomposition [13]. As a result, there are several levels of LUTs in the circuit of L U T e r T Z with all the negative consequences.
The proposed method is an evolution of methods of twofold state assignment [17]. These methods are based on construction a partition Π A of the set of states by the classes of compatible states: Π A = { A 1 , , A J } . Each state a m A determines a set X ( a m ) including FSM inputs causing transitions from this state. Inside each class, the states are encoded using maximum binary codes. If a set A j Π A includes M j elements, then it is enough
R j = l o g 2 ( M j + 1 )
variables to encode these states by maximum binary codes. One additional code corresponds to the relation a m A j .
We use the same definition of compatible states as the one propose in the paper [17]. The states a m A j are compatible if
R j + L j I L .
In (11), the symbol L j stands for the number of inputs determining transitions from states a m A j . These inputs form a set of inputs X j X .
To create the partition Π A = { A 1 , , A J } with minimum number of classes, J, we use the method from [17]. Each class A j Π A consists of compatible states. Each class A j Π A determines local sets of inputs X j X and outputs Y j Y . If outputs y n Y are generated during transitions from states a m A j , then they are included into a set Y j Y .
In the case of twofold state assignment, each state a m A has two codes [17]. The code K ( a m ) determines the state a m A as an element of the set A. The extended state code (ESC) [35] C ( a m ) determines this very state an element of a compatibility class A j Π A . Each class A j Π A determines a collection of partial functions generating by a block L U T e r j . These partial functions are partial outputs y n Y j and partial IMFs D r Φ J . The set Φ j Φ includes IMFs generating during the transitions from the states a m A j . These partial functions are denoted as y n j and D r j . Because of (11), each partial function is represented by a single LUT.
An ESC consists of J fields. The j-th field has R j bits and corresponds to the class A j Π A . So, there are R E = R 1 + R 2 + . . . + R J bits in ESCs. If the relation a m A j holds, then only bits from the j-th field differ from zeros.
In [17], it is propose to produce ESCs by the transforming state codes K ( a m ) kept into RG. Unfortunately, the transformation requires an additional block of CLBs which consumes some internal resources of a chip and decreases the performance of a resulting FSM circuit. The improvement of this approach is proposed in [35]. In this case, the codes C ( s m ) are generated in parallel with FSM outputs. This approach allows eliminating the block of code transformation; however, this approach also has some drawbacks. Firstly, there are R E flip-flops in the RG. Secondly, the total number of LUTs generating IMFs is increased by R E R compared to the previous approach.
The experiments [35] show that using only ESCs allows increasing performance up to 15.9% compared with equivalent FSMs based on the twofold state assignment. The growth of operating frequency is accompanied by a slight growth in the LUT count (up to 7.7%). In this paper, we propose a method of reducing the LUT count in M P Y Mealy FSM. The method is based on the replacing ESCs by complex state codes proposed in this paper.

4. Main Idea of the Proposed Method

The proposed method is based on finding a partition Π C = { A 1 , , A J C } of the set A by J C classes of compatible states. The same state variables are used for encoding states from different compatibility classes. The states are encoded by codes C ( a m ) using R A state variables:
R A = m a x ( l o g 2 M 1 , . . . , l o g 2 M J C ) .
A code C ( a m ) determines the state a m A as the element of a particular class of Π C . The classes A j Π C are encoded by class codes K ( A j ) . These codes include R C bits:
R C = l o g 2 J C .
We propose to represent FSM states a m A by the complex state codes denoted as C S C ( a m ) . For any state a m A j , a CSC is a concatenation of the class code K ( A j ) and a state code C ( a m ) :
C S C ( a m ) = K ( A j ) * C ( a m ) .
In (14), the sign “*” denotes the concatenation of the codes. There are R B state variables in CSCs. The value of R B is determined as
R B = R C + R A .
To encode the classes, we use the variables from the set T C = { T 1 , , T R C } . To encode states as elements of classes A j Π C , we use R C variables from the set T A = { T R C + 1 , , T R B } . Together, these sets form a set T = T C T A having R B elements.
The proposed method of state assignment is aimed at the reducing LUT count for LUT-based circuits of M P Y FSMs. The method is based on the joint application of: (1) the replacement of FSM inputs; (2) the encoding of collections of outputs; (3) the encoding of states by complex state codes. As a result, we propose to replace M P Y FSMs by M P C Y FSMs. The subscript “C” means that the complex state codes are used in M P Y FSM. There is the structural diagram of M P C Y FSM shown in Figure 4.
There are four levels of logic blocks in M P C Y FSMs. The first level is represented by LUTerP. This block implements the SBF (5).
The second level includes J C blocks L U T e r j , where j { 1 , , J C } . A class A j Π C determines three sets of variables. The set P j P includes additional variables p g P determining transitions from the states a m A j . The set Φ j contains IMFs generated during the transitions from the states a m A j . The set Z j consists of the variables z r Z equal to 1 in codes of COs produced during the transitions from the states a m A j determined by each class A j Π C . Each block L U T e r j produces the following partial functions:
Φ j = Φ j ( T A , P j ) ;
Z j = Z j ( T A , P j ) .
The block L U T e r T Z represents the third logic level. It consists of R Y + R B multiplexers generating IMFs D r Φ and additional variables z r Z . The data inputs of these multiplexers are the partial functions (16) and (17). To select a particular partial function, we use the class variables T r T C . So, the multiplexers generate the following SBFs:
D r = D r ( T C , D r 1 , , D r J C ) ( r { 1 , , R B } ) ;
z r = z r ( T C , z r 1 , , z r J C ) ( r { 1 , , R Y } ) .
The functions (18) enter the inputs of the flip-flops that make up the hidden register RG. Due to this, the control signals C l o c k and S t a r t enter this block.
The fourth logic level is represented by the block L U T e r Y . It implements the SBF (7).
So, there are four levels of logic blocks in the circuits of M P C Y Mealy FSMs. In the best case, each block is represented by a single-level LUT-based circuit.
In this paper, we propose a synthesis method for LUT-based M P C Y Mealy FSMs. We start the synthesis process from an FSM state transition graph. The proposed method includes the following steps:
(1)
Creating the state transition table of Mealy FSM.
(2)
Constructing the partition Π C of the set of states by classes of compatible states.
(3)
Encoding of FSM states by complex state codes C S C ( a m ) .
(4)
Executing the replacement of FSM inputs by additional variables p g P .
(5)
Creating SBF (5) representing L U T e r P .
(6)
Encoding of collections of outputs by codes K ( Y q ) .
(7)
Creating SBF (7) representing L U T e r Y .
(8)
Creating direct structure table of M P C Y Mealy FSM.
(9)
Creating tables of blocks of partial functions L U T e r 1 L U T e r J C .
(10)
Creating SBFs (16) and (17) representing the second level of M P C Y Mealy FSM logic circuit.
(11)
Creating table of L U T e r T Z .
(12)
Creating SBFs (18) and (19) representing the third level of the logic circuit.
(13)
Implementing the LUT-based circuit of M P C Y Mealy FSM using internal resources of a particular FPGA chip.
The partition Π C is created using the method [17]. This approach allows minimizing LUT counts in the resulting Mealy FSM circuits. If it is possible, each class of compatible states should include the maximum possible number of states. This helps minimizing the number of classes (and the blocks of the second level of logic). In turn, this optimizes the number of LUTs in the circuit of L U T e r T Z . Any multiplexer from this block is implemented as a single LUT if the following condition takes place:
R C + J C I L .
Even if condition (20) is violated, then the multiplexers could be implemented as single-level circuits. This is possible, if the number of partial functions for a given function f i Φ Y does not exceed the value I L R C . Otherwise, the internal multiplexers of CLBs are used for generating functions (18) and (19).

5. Example of Synthesis

We use the symbol M P C Y ( S a ) to show that the model of M P C Y Mealy FSM (Figure 4) is used to implement the circuit of an FSM S a . Consider an FSM S 0 represented by its STG (Figure 5). Let us synthesize the circuit of Mealy FSM M P C Y ( S 0 ) using 5-LUTs.
Step 1. The h-th edge of an STG is transformed into a row of an STT [14]. There are 19 edges in the STG (Figure 5). So, it should be H = 19 rows in the corresponding STT. The transformation is executed in a trivial way [1]. Table 1 is a resulting STT of FSM S 0 . The following sets can be derived from Table 1: the set of states A = { a 1 , , a 8 } , the set of inputs X = { x 1 , , x 10 } , and the set of outputs Y = { y 1 , , y 7 } . This gives the following parameters: M = 8 , L = 10 , and N = 7 .
Step 2. Using the methods [17], we can obtain the partition Π C = { A 1 , A 2 } with J C = 2 . There are the following classes of this partition: A 1 = { a 1 , , a 4 } and A 2 = { a 5 , , a 8 } . So, there is M 1 = M 2 = 4 . Using (12), we can obtain the value R A = m a x ( l o g 2 M 1 , l o g 2 M 2 ) = 2 . Using (13), we can obtain the value R C = 1 . Now, we have the sets T = { T 1 , T 2 , T 3 } , T C = { T 1 } , and T A = { T 3 , T 4 } .
Step 3. As known [2], the state codes do not affect the number of LUTs in circuits of FSMs based on twofold or extended state codes [35]. So, the states can be encoded in the arbitrary way. For our example, one of the possible outcomes of the state assignment is shown in Figure 6.
The following class and state codes can be found from Figure 6: K ( A 1 ) = 0 and K ( A 2 ) = 1 , C ( a 1 ) = C ( a 5 ) = 00 , , C ( a 4 ) = C ( a 8 ) = 11 . Using the codes of classes of compatible states K ( A j ) and state codes C ( a m ) gives the following complex state codes: C S C ( a 1 ) = 000 , C S C ( a 2 ) = 001 , , C S C ( a 4 ) = 011 , and C S C ( a 8 ) = 111 .
Step 4. To execute the replacement, we should find the minimum value of additional variables, G. To do it, we use the methods from [14]. It is necessary to analyze sets X ( a m ) X including FSM inputs which determine the transitions from states a m A [2]. These sets can be found using either the STG (Figure 5) or STT (Table 1). In the discussed case, there are the following sets: X ( a 1 ) = { x 1 , x 2 } , X ( a 2 ) = { x 3 , x 4 } , X ( a 3 ) = { x 6 } , X ( a 4 ) = { x 5 } , X ( a 5 ) = { x 5 , x 7 } , X ( a 6 ) = , X ( a 7 ) = { x 8 , x 9 } , and X ( a 8 ) = { x 10 } . If L ( a m ) is a number of elements in the set X ( a m ) X , then L ( a 1 ) = L ( a 2 ) = L ( a 5 ) = L ( a 7 ) = 2 , L ( a 3 ) = L ( a 4 ) = L ( a 8 ) = 1 , L ( a 6 ) = 0 .
The value of G is equal to the maximum value of L ( a m ) . Obviously, there is G = 2 . So, it is enough G = 2 additional variables to replace L = 10 inputs: P = { p 1 , p 2 } .
The columns of table of inputs’ replacement are marked by FSM states a m A , the rows are marked by additional variables p g P . If an input x l X is replaced by a variable p g P in a state a m A , then this input is written at the intersection of the corresponding column and row. Using methods from [14] gives the table of replacement (Table 2).
Step 5.Table 2 is a base for finding SBF (5). The following SBF can be derived from Table 2:
p 1 = A 1 x 1 A 2 x 3 A 3 x 6 A 5 x 7 A 7 x 8 A 8 x 10 ; p 2 = A 1 x 2 A 2 x 4 A 4 x 5 A 5 x 5 A 7 x 9 .
In (21), the symbol A m stands for a conjunction of state variables corresponding to the state a m A . Obviously, each of Equation (21) can be implemented as 8 M X .
Step 6. There are Q = 9 different collections of outputs in STT (Table 1). They are the following: Y 1 = , Y 2 = { y 1 , y 2 } , Y 3 = { y 5 } , Y 4 = { y 4 } , Y 5 = { y 3 , y 6 } , Y 6 = { y 2 , y 5 } , Y 7 = { y 4 , y 7 } , Y 8 = { y 3 } , Y 9 = { y 3 , y 7 } .
To optimize the circuit of L U T e r Y , it is necessary to encode COs in a way minimizing the total number of literals in SBF (7) [2]. Each literal determines an interconnection between L U T e r T Z and L U T e r Y . Using the approach from [2], we can encode the COs as it is shown in Figure 7.
Step 7. Using contents of COs and their codes (Figure 7) gives the following SBF:
y 1 = Y 2 = z 2 z 3 ¯ ; y 5 = Y 3 Y 6 = z 1 ¯ z 2 ¯ z 4 ; y 2 = Y 2 Y 6 = z 3 ¯ z 4 ; y 6 = Y 5 = z 1 ¯ z 2 z 3 ; y 3 = Y 5 Y 8 Y 9 = z 3 z 4 ¯ ; y 7 = Y 7 Y 9 = z 1 z 2 ¯ . y 4 = Y 4 Y 7 = z 1 z 4 ;
There are 16 literals in (22). The maximum number of literals is equal to N R Y = 7 · 4 = 28 . So, due to encoding shown in Figure 7, the number of literals (and interconnections) has almost halved.
Step 8. The DST of M P Y Mealy FSM is constructed using the initial STT, codes of states and COs, and a table of replacement of inputs. A DST includes the following columns: a m , K ( a m ) , a T , K ( a T ) , P h , Φ h , Z h , h. The columns of state codes include codes from Figure 6. The column P h is constructed using the initial STT and table of replacement of inputs (Table 2). The column Φ h includes IMFs equal to 1 for loading the code K ( a T ) into state register. The column Z h includes variables z r Z equal to 1 in the code K ( Y q ) of CO written in the h-th row of STT. This column is constructed using the initial STT and codes of COs (Figure 7).
In the discussed case, the DST is represented by Table 3. Let us analyze the first row of Table 3. There is the input x 1 in this row of Table 1. As follows from Table 2, the input x 1 is replaced by the additional variable p 1 in the state a 1 . For this row, there is the following relation: a T = a 4 . As follows from Figure 6, there is K ( a 4 ) = 011 . Due to this, column Φ h of Table 3 contains D 2 = D 3 = 1 in row h = 1 . In row 1 of Table 1, there is the CO Y 2 in column Y h . As follows from Figure 7, there is K ( Y 2 ) = 0101 . Due to this, column Z h of Table 3 contains z 2 = z 4 = 1 in row h = 1 . All other rows of Table 3 are constructed in the same way.
Step 9. These tables are constructed using the classes A j Π C , DST of M P Y Mealy FSM, codes C ( a m ) and C S C ( a m ) . For the discussed example, there is J C = 2 . So, there are two blocks ( L U T e r 1 and L U T e r 2 ) generating the partial functions (16) and (17). The transitions from the states from the class A 1 Π C are represented by Table 4, for the class A 2 Π C by Table 5.
There is a transparent correspondence between Table 3, on the one hand, and tables of L U T e r 1 (Table 4) and L U T e r 2 (Table 5), on the other hand. There are H 1 = 10 rows in Table 4 and H 2 = 9 rows in Table 5. Obviously, the following equality takes place: H 1 + H 2 = H = 19 .
Step 10. The following sets can be found from Table 4 and Table 5: P 1 = P 2 = P , Φ 1 = Φ 2 = Φ , and Z 1 = Z 2 = Z . It means that each L U T e r j contains R B + R Y = 7 5-LUTs. Together, this gives 14 5-LUTs in the mutual circuit of L U T e r 1 and L U T e r 2 .
The functions (16) and (17) are constructed in the trivial way. For example, the following SBF of partial functions D 1 1 , D 1 2 , z 1 1 , and z 1 2 can be derived from Table 4 and Table 5:
D 1 1 = T 2 ¯ T 3 p 1 T 2 ¯ T 3 p 1 ¯ p 2 T 2 T 3 ¯ p 1 ; D 1 2 = T 2 ¯ T 3 ¯ p 2 ¯ T 2 ¯ T 3 T 2 T 3 ¯ T 2 T 3 p 1 .
z 1 1 = T 2 ¯ T 3 ¯ p 1 ¯ p 2 ¯ T 2 T 3 ¯ p 1 ; z 1 2 = T 2 ¯ T 3 ¯ p 2 ¯ T 2 T 3 p 2 .
All other partial functions are created in the same manner.
Step 11. The table of L U T e r T Z is constructed using sets Φ j and Z j where j { 1 , , J C } . The table contains the columns "Function" and "j". For our example, this block is represented by Table 6.
For example, the IMF D 1 appears in both tables. Due to this, there are ones in columns with j = 1 and j = 2 . All other rows are filled used the similar analysis.
Step 12. The SBFs (18) and (19) representing the third level of the logic circuit are constructed in the trivial way. They include two components: (1) conjunctions of variables T r T C corresponding to class codes and (2) corresponding partial functions. For example, functions D 1 and z 1 are represented by the following SBF:
D 1 = T 1 ¯ D 1 1 T 1 D 1 2 ; z 1 = T 2 ¯ z 1 1 T 1 ¯ z 1 2 .
All other functions (18) and (19) are constructed in the same manner.
Step 13. To implementing the LUT-based circuit of Mealy FSM M P C Y ( S 0 ) , it is necessary to use some CAD tools. In the case of FPGAs from Virtex-7, the system Vivado [31] should be used; for our simple example we can design this circuit manually.
As follows from (21), there are nine literals in the sum-of-products of p 1 and eight literals in the sum-of-products of p 2 . The circuit should be implemented using LUTs with I L = 5 inputs. So, the condition (4) holds. To implement the circuit of L U T e r P , it is necessary to apply the methods of FD [12,13]. As a result, we obtain a two-level circuit of L U T e r P including six LUTs.
In the discussed case, each function f i Z Φ is represented by J C = 2 partial functions. Further, the condition (11) holds for the blocks of Π C . Due to this, there are enough 2 ( R B + R Y ) = 14 5-LUTs for implementing the circuits of L U T e r 1 - L U T e r 2 . Since ( R B + R Y ) = 7 , there are seven LUTs in the circuit of L U T e r T Z . As follows from (22), there are seven LUTs in the circuit of L U T e r Y .
So, there are 34 5-LUTs in the circuit of Mealy FSM M P C Y ( S 0 ) . This circuit has five levels of LUTs (Figure 8).
In this circuit, L U T e r P is represented by LUT1–LUT6. This circuit has two levels of LUTs shown in Figure 8. The B u s X T delivers the inputs x l X and state variables T r T A for generating the additional variables p g P . These variables enter B u s P T to be transformed into the partial functions (16) and (17). The transformation is executed by L U T e r 1 and L U T e r 2 . These blocks include elements LUT7–LUT20. The fourth level of the FSM circuit is represented by LUT21–LUT27. The IMFs are generated by LUT21–LUT23. The outputs of these LUTs are connected with flip-flops implementing the register RG. The flip-flops are controlled by the pulses S t a r t and C l o c k . The variables z r Z are generated by LUT24–LUT27. The outputs of L U T e r T Z form the B u s T Z . At last, level five consists of seven LUTs (LUT28–LUT34) creating the circuit of L U T e r Y .
We compared the characteristics of the 5-LUT-based circuits of M P C Y ( S 0 ) and M P Y ( S 0 ) FSMs. In both cases, there is the same number of flip-flops in the state register ( R = R B = 3 ). In the case of M P Y ( S 0 ) , there are six LUTs in the L U T e r P and seven LUTs in L U T e r Y . There are two levels of LUTs in the circuit of L U T e r P . So, these subcircuits are the same for M P C Y ( S 0 ) and M P Y ( S 0 ) . There are two levels of LUTs in the circuit of L U T e r T Z of M P Y ( S 0 ) . This block’s circuit includes 24 LUTs. So, there are 6 + 24 + 7 = 37 LUTs in the circuit of M P Y ( S 0 ) .
Thus, for the FSM S 0 , the transition from model M P Y ( S 0 ) to model M P C Y ( S 0 ) allows you to reducing the LUT count by 1.088 times. Note that both circuits have the same number of logical levels; therefore, the model proposed in this article allows reducing the number of LUTs without reducing the operating frequency compared to the circuit of equivalent M P Y ( S 0 ) FSM. In the next Section, we compare some FSM models with the one proposed in this article.

6. Experimental Results

In this Section, we show the results of experiments which have been conducted to compare characteristics of M P C Y Mealy FSMs with characteristics of FSM circuits based on some other models. To conduct the experiments, we use: (1) the internal resources of Virtex-7; (2) the benchmark FSMs from the library [36]; (3) the industrial package Vivado [31]. The library [36] includes 48 benchmarks represented in the format KISS2. The benchmarks have a wide range of basic characteristics (numbers of states, inputs, and outputs). They are used very often by different researchers to compare area and time characteristics of FSMs obtained using various synthesis methods. The characteristics of benchmarks are shown Table 7.
We executed the experiments using a personal computer with the following characteristics: CPU: Intel Core i7 6700 K 4.2@4.4 GHz, Memory: 32 GB RAM 2400 MHz CL15. Further, we used the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [37] and CAD tool Vivado v2019.1 (64-bit) [31]. There is I L = 6 for FPGAs of Virtex-7. To obtain the results of experiments, the reports produced by Vivado are used. To enter Vivado, we use thed CAD tool K2F [2].
We compared three basic characteristics of resulting FSM circuits. These parameters are: (1) the LUT count; (2) the time of cycle; (3) the power consumption. In addition, two integral characteristics were investigated, namely: (1) the area-time products and (2) the area-time-power products. To conduct the experiments, five FSM models were used. They are: (1) Auto of Vivado (it uses binary state codes); (2) one-hot of Vivado; (3) JEDI; (4) M P Y -based FSMs; (5) M P C Y -based FSMs proposed in this article. Obviously, the first three methods are based on the model of P FSM shown in Figure 2.
Based on the methodology [35], we divide the benchmark FSMs [36] by five categories. To divide the benchmarks, we use the relation between the values of R + L and I L . There is I L = 6 for LUTs of Virtex-7. We use this value to divide the benchmarks by the categories.
The benchmarks belong to category of trivial FSMs (category 0), if the following condition holds: R + L 6 . This category includes the following 11 benchmarks: b b t a s , d k 17 , d k 27 , d k 512 , e x 3 , e x 5 , l i o n , l i o n 9 , m c , m o d u l o 12 , and s h i f t r e g . The benchmarks belong to category of simple FSMs (category 1), if there is R + L 12 . The category 1 consists of the benchmarks bbara, bbsse, beecount, cse, dk14, dk15, dk16, donfile, ex2, ex4, ex6, ex7, keyb, mark1, opus, s27, s386, s840, and s s e . The benchmarks belong to category of average FSMs (category 2), if R + L 18 . The category 2 contains the benchmarks ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, styr, and t m a . The benchmarks belong to category of big FSMs (category 3), if the following condition takes place: R + L 24 . The category three includes only the benchmark s a n d . The category of very big FSMs (category 4) includes benchmarks satisfying relation R + L > 24 . The benchmarks s420, s510, s820, and s832 belong to this category.
The results of experiments are shown in Table 8, Table 9, Table 10, Table 11 and Table 12. There is the same organization of these tables. The investigated methods are listed in the table columns. The table rows contain the names of benchmarks. Inside each table, the benchmarks are listed in alphabetical order, and sorted by ascending category number. The rows “Total” contain results of summation of values for each column. The row “Percentage” includes the percentage of summarized characteristics of FSM circuits produced by other methods respectively to M P C Y -based FSMs. We use the model of P Mealy FSM as a starting point for methods Auto, one-hot, and JEDI. The basic data (the LUT count, time, and power consumption) are taken from reports of Vivado. Next these data were used to obtain the integral characteristics.
Let us analyze the experimental data taken from reports of Vivado. These tables contain the following data: (1) the LUT counts (Table 8); (2) the minimum time of cycle (Table 9); (3) the total On-Chip Power (Table 10); (4) the area-time products (Table 11); (5) the area-time-power products (Table 12). In addition, we compared each of the characteristics for each category; however, in order to avoid a significant increase in the size of the article, we did not show the corresponding tables. We just showed the results of these comparisons.
As follows from Table 8, the M P C Y –based FSMs require fewer LUTs than it is for other investigated counterparts. Using the proposed approach, we can obtain circuits having 52.19% less 6-LUTs than it is for equivalent Auto–based FSMs; 77.1% less 6-LUTs than for equivalent one-hot–based FSMs; 25.34% less 6-LUTs than for equivalent JEDI–based FSMs. Our approach produces circuits having on average 11.36% less 6-LUTs than the circuits of M P Y -FSMs.
Using Table 8, we can compare LUT counts for different categories of benchmark FSMs. Comparing the results for category 0 shows that both multi-level approaches ( M P Y and M P C Y ) lose out to the other methods. This loss is 30.4% compared to auto-based FSMs, 3.4% compared to one-hot-based FSMs, and 31.5% compared to JEDI-based FSMs. We explain this by the fact that condition (4) is not satisfied for benchmark FSMs of the category 0. This means that only a single LUT is needed to implement any function for systems (1) and (2). Obviously, for category 0, the replacement of inputs should not be performed for both M P Y and M P C Y FSMs; however, the encoding of output collections is always performed for these multi-level FSMs. Due to this, for the category 0, the multi-level FSMs have higher LUT counts than they are for other investigated design methods. Let us point out, that equivalent M P Y - and M P C Y -FSMs have the same LUT counts for this category.
Starting from category 1, the condition (4) is met. At the same time, it makes sense to use structural reduction methods instead of methods of functional decomposition. For this category, using the complex state codes in M P Y FSMs allows obtaining FSM circuits with fewer LUTs than it is for other methods used in our experiments. This gain is 40.0% compared to auto-based FSMs, 81.1% compared to one-hot-based FSMs, 16.2% compared to JEDI-based FSMs, and 11.0% compared to M P Y FSMs.
As follows from this part of research, the winnings increase with the increase in the category number. The gain in LUTs increases up to 65.64% (for categories 2–4) compared to auto-based FSMs. The gain increases up to 65.64% (for categories 2–4) compared to one-hot-based FSMs. Comparison with JEDI-based FSMs shows that the gain increases up to 34.86% (for categories 2–4). At last, compared to M P Y -based FSMs, the gain increases from 8.44% (for categories 0–1) to 12.73% (for categories 2–4).
As follows from Table 9, the M P C Y -based FSMs are faster than their investigated counterparts. They require a cycle time 9.39% less than the equivalent auto-based FSMs, 10.24% less than one-hot-based FSMs, and 1.08% less than the equivalent JEDI-based FSMs. win 18.73%. They also marginally benefit (0.31%) in relation to M P Y FSMs. It follows from this that our approach allows reducing the number of LUTs without losing performance. As we have already noted, this is the greatest challenge associated with the optimization of chip area occupied by an FSM circuit. So, our approach allows overcoming this obstacle.
Using Table 9, we have compared time characteristics for different categories of benchmark FSMs. Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 3.23% compared to auto-based FSMs, 0.2% compared to one-hot-based FSMs, 3.68% compared to JEDI-based FSMs, and 1.13% compared to M P Y -based FSMs. As it is for LUT counts, we explain this by the fact that condition (4) is not satisfied for benchmarks of this category. Starting from category 1, the condition (4) is met. This allows obtaining some gain compared to FSMs based on both Auto (3.48%) and one-hot (3.37%); however, other models provide better performance than our approach (3.84% compared to JEDI and 1.93% compared to M P Y ).
Starting from the category 2, our approach gives better results compared to all other investigated methods. This gain for the category 2 is the following: (1) 24.17% compared to Auto; (2) 25.96% compared to one-hot; (3) 8.88% compared to JEDI; (4) 3.78% compared to M P Y FSMs. For the category 3, the gain increases. It is the following: (1) 31.49% compared to Auto; (2) 31.49% compared to one-hot; (3) 20.24% compared to JEDI; (4) 4.76% compared to M P Y FSMs. Further, there is a gain for category 4; however, the gain is less than for category 3. It is the following: (1) 18.73% compared to Auto; (2) 16.48% compared to one-hot; (3) 7.96% compared to JEDI; (4) 3.07% compared to M P Y FSMs. We explain this decrease in winnings by an increase in the number of levels in the circuits of M P C Y -based FSMs compared to their number for category 3; however, the following conclusion can be drawn: the proposed approach allows obtaining faster LUT-based FSM circuits starting from category 2.
The Vivado provides us by information about the total on-chip power. We combine these reports in Table 10. As follows from Table 10, the M P C Y -based FSMs consume less energy than their investigated counterparts. On average, they provide the following gain in power consumption: (1) 47.02% compared to auto-based FSMs; (2) 59.17% compared to one-hot-based FSMs; (3) 23.96% compared to JEDI-based FSMs; (4) 5.44% compared to M P Y Mealy FSMs.
Using Table 10, we have compared total on-chip power for each category. Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 19.37% compared to auto-based FSMs, 17.6% compared to one-hot-based FSMs and 21.08% compared to JEDI-based FSMs. The same data are correct for M P Y FSMs; however, starting from the category 1, our approach allows designing circuits consuming less power. The winnings grow as the category number grows. With respect to auto-based FSMs, our method provides the following gain: (1) 33.95% for the category 1; (2) 85.68% for the category 2; (3) 106.28% for the category 3; (4) 124.46% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 47.22% for the category 1; (2) 98.26% for the category 2; (3) 106.28% for the category 3; (4) 163.44% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 19.69% for the category 1; (2) 43.98% for the category 2; (3) 77.38% for the category 3; (4) 80.97% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 5.20% for the category 1; (2) 7.58% for the category 2; (3) 10.77% for the category 3; (4) 12.36% for the category 4. So, the proposed organization of the FSM circuit allows reducing the power consumption, starting with simple FSMs (category 1).
Using data from Table 8, Table 9 and Table 10, we can calculate the values for two integral characteristics. One of them is an area-time product [6,38], the second is an area-time-power product. The smaller the values of these products, the better the quality of the resulting FSM circuit [6]. As it is the case in many articles [6,38], we estimate the area of an FSM circuit by its LUT count.
As follows from Table 11, the M P C Y -based FSMs have better area-time characteristics than their investigated counterparts. On average, they provide the following gain: (1) 84.13% compared to auto-based FSMs; (2) 113.34% compared to one-hot-based FSMs; (3) 33.41% compared to JEDI-based FSMs; (4) 13.53% compared to M P Y Mealy FSMs. Using Table 11, we have compared area-time characteristics for each category of benchmark FSMs. As in the previous cases, for category 0 our approach gives the worst results; however, starting from category 1, the benefits of our approach are steadily increasing.
Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 31.8% compared to auto-based FSMs, 2.88% compared to one-hot-based FSMs, 33.33% compared to JEDI-based FSMs, and 1.1% compared to M P Y FSMs; however, starting from category 1, our approach allows designing circuits having smaller values of area-time products than they are for all other approaches. With respect to auto-based FSMs, our method provides the following gain: (1) 46.49% for the category 1; (2) 107.13% for the category 2; (3) 90.73% for the category 3; (4) 141.9% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 88.39% for the category 1; (2) 139.36% for the category 2; (3) 90.73% for the category 3; (4) 148.74% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 11.42% for the category 1; (2) 45.42% for the category 2; (3) 50.63% for the category 3; (4) 61.03% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 8.74% for the category 1; (2) 16.92% for the category 2; (3) 13.88% for the category 3; (4) 18.75% for the category 4. So, the proposed organization of the FSM circuit allows reducing the values of area-time products, starting with simple FSMs (category 1).
As follows from Table 12, the M P C Y -based FSMs have much smaller values of area-time-power products than they are for their investigated counterparts. On average, they provide the following gain: (1) 254.69% compared to auto-based FSMs; (2) 325.06% compared to one-hot-based FSMs; (3) 96.36% compared to JEDI-based FSMs; (4) 22.75% compared to M P Y Mealy FSMs. Using Table 12, we have compared area-time-power products for each category of benchmark FSMs. As in the previous cases, for category 0 our approach gives the worst results; however, starting from category 1, the benefits of our approach are steadily increasing.
Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 45.51% compared to auto-based FSMs, 10.12% compared to one-hot-based FSMs, 48.69% compared to JEDI-based FSMs, and 1.13% compared to MPY FSMs; however, starting from category 1, our approach allows designing circuits having smaller values of area-time-power products than they are for all other approaches. With respect to auto-based FSMs, our method provides the following gain: (1) 104.39% for the category 1; (2) 301.21% for the category 2; (3) 293.45% for the category 3; (4) 502.86% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 194.34% for the category 1; (2) 376.56% for the category 2; (3) 293.45% for the category 3; (4) 528.13% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 36.51% for the category 1; (2) 112.16% for the category 2; (3) 167.19% for the category 3; (4) 213.43% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 15.65% for the category 1; (2) 25.71% for the category 2; (3) 26.14% for the category 3; (4) 33.91% for the category 4. So, the proposed organization of the FSM circuit allows reducing the values of area-time-power products, starting with simple FSMs (category 1).
The main goal of the proposed approach is the reducing LUT counts in FPGA-based circuits of Mealy FSMs. The results of experiments (Table 8) show that this goal has been achieved. Obviously, this gain is achieved by using complex state codes in M P Y FSMs. Using these codes leads to introducing an additional level of LUTs forming the partial functions. It was natural to expect that the introduction of this additional level would lead to a decrease in performance; however, as follows from Table 9, our approach leads to slower FSM circuits only for FSMs from categories 0–1. As the complexity of FSMs increases, our approach begins to give a win in terms of minimum cycle time. Moreover, the proposed approach allows reducing the power consumption of resulting FSM circuits (starting from the category 1). The same is true for the integral characteristics of FSM circuits (the area-time and area-time-power products). These phenomena are positive side effects associated with our approach.
So, the results of our experiments show that the proposed approach can be used instead of other models starting from the simple FSMs (category 1). Our approach allows improving LUT counts starting from the simple FSMs. The same is true for the power consumption. Further, starting from the category 2, the proposed method allows improving the minimum cycle time compared with other investigated methods. In our research, we use the chip xc7vx690tffg1761-2 by Virtex-7 (Xilinx); however, this chip has no unique architecture of CLBs. This very architecture of CLBs is used in all chips of the 7th generation of Xilinx chips. Due to this, the results of our experiments show that the proposed approach can be used for improving LUT counts for designs based on any FPGA chip of the 7th generation. Moreover, all Xilinx FPGA families have one fundamental property in common: an extremely limited number of LUT inputs. This leads to the need to develop FSM synthesis methods aimed at reducing the influence of this factor on the characteristics of the LUT-based FSM circuits. The results of our research show that the proposed method allows solving this problem better than some well-known methods combining various approaches of state assignment (auto, one-hot, and JEDI) and functional decomposition, as well as our previous method based on structural decomposition ( M P Y FSMs).

7. Conclusions

Modern FPGA chips include more than 7.5 billion transistors [8]. They have proved to be a very effective means of implementing a variety of digital systems. There is a serious drawback inherent in FPGAs, namely, a rather small number of LUT inputs. This leads to the need of using various methods of functional decomposition under the design of LUT-based FSM circuits. As a result, the LUT-based circuits of rather complex FSMs are presented in the form of multi-level networks with a complex system of spaghetti-type interconnections [2]. This disadvantage can be overcome due to applying methods of structural decompositions [2,35]. This leads to FSM circuits with predicted number of levels and regular systems of interconnections [2].
Our research [2,35] shows that SD-based Mealy FSM circuits have better characteristics than their FD-based counterparts. As a rule, the combined use of methods of SD allows obtaining a greater gain in the number of LUTs than from the use of each of these methods separately [2]. In [11], we proposed to use two methods of SD, namely, the replacement of inputs and encoding of collections of outputs; however, even in this case, some parts of the resulting FSM circuits can have more than a single level of LUTs. In this article, we discuss just such a case.
To diminish the number of LUTs and their levels, we use the ideas [17]. These methods are based on finding a partition of states by classes of compatible states; however, in contrast to the known methods [17], we have replaced known extended state codes by the complex state codes, which have not been known before. The complex state codes are represented by concatenations of class codes and the codes of FSM states as elements of these classes. This approach leads to four-level FSM circuits, which require fewer LUTs than their counterparts based on methods [11]. There is a gain in the LUT count around 11.36% relative to three-level M P Y FSM circuits [11]. Moreover, our approach provides a very small increase in the FSM performance (on average, only 0.31%) and a decrease in the power consumption (on average by 5.79%) for the benchmarks from the library [36]. In our opinion, the proposed method can be applied instead of LUT-based MPY Mealy FSMs.

Author Contributions

Conceptualization, A.B., L.T., and K.K.; methodology, A.B., L.T., K.K., and S.S.; software, A.B., L.T., and K.K.; validation, A.B., L.T., and K.K.; formal analysis, A.B., L.T., K.K., and S.S.; investigation, A.B., L.T., and K.K.; writing—original draft preparation, A.B., L.T., K.K., and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
COcollection of outputs
CSCcomplex state codes
DSTdirect structure table
ESCextended state code
FDfunctional decomposition
FPGAfield-programmable gate array
FSMfinite state machine
IMFinput memory function
LUTlook-up table
RGstate code register
SBFsystems of Boolean functions
SDstructural decomposition
STGstate transition graph
STTstate transition table

References

  1. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  2. Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
  3. Kubica, M.; Kania, D. Technology Mapping of FSM Oriented to LUT-Based FPGA. Appl. Sci. 2020, 10, 3926. [Google Scholar] [CrossRef]
  4. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices. In Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013; Volume 231. [Google Scholar]
  5. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  6. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  7. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  8. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  9. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef] [Green Version]
  10. Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 18 December 2021).
  11. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  12. Kubica, M.; Opara, A.; Kania, D. Technology mapping for LUT-based FSMs. In Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2021; Volume 713, p. 216. [Google Scholar]
  13. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  14. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  15. Sklarova, D.; Sklarov, V.A.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  16. Kuon, I.; Tessier, R.; Rose, J. FPGA architecture: Survey and challenges—Found trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
  17. Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristics of LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar]
  18. Kubica, M.; Kania, D. Decomposition of multi-level functions oriented to configurability of logic blocks. Bull. Pol. Acad. Sci. 2017, 67, 317–331. [Google Scholar]
  19. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  20. Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources. Application Note. 2012. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.5300&rep=rep1&type=pdf (accessed on 17 March 2022).
  21. Sasao, T.; Mishchenko, A. LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations. Proc. IWLS. 2009, pp. 310–316. Available online: http://www.lsi-cad.com/sasao/Papers/files/IWLS2009_sasao_mis.pdf (accessed on 17 March 2022).
  22. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  23. Kim, J.H.; Anderson, J. Post-LUT-Mapping Implementation of General Logic on Carry Chains Via a MIG-Based Circuit Representation. In Proceedings of the 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, Germany, 30 August–3 September 2021; pp. 334–340. [Google Scholar]
  24. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  25. Das, N. Reset: A Reconfigurable state encoding technique for FSM to achieve security and hardware optimality. Microprocess. Microsyst. 2020, 77, 103196. [Google Scholar] [CrossRef]
  26. Tao, Y.; Zhang, Y.; Wang, Q.; Cao, J. MPGA: An evolutionary state assignment for dynamic and leakage power reduction in FSM synthesis. IET Comput. Digit. Tech. 2018, 12, 111–120. [Google Scholar] [CrossRef]
  27. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  28. Sentovich, E.M.; Singh, K.J.; Lavagno, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Brayton, R.K.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  29. ABC System. Available online: https://people.eecs.berkeley.edu/~alanmi/abc/ (accessed on 18 December 2021).
  30. Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification (Berlin, Heidelberg, 2010); Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar]
  31. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 18 December 2021).
  32. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 18 December 2021).
  33. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  34. Sklyarov, V. Synthesis and implementation of RAM-based finite state machines in FPGAs. In International Workshop on Field Programmable Logic and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 718–727. [Google Scholar]
  35. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics 2020, 9, 1859. [Google Scholar] [CrossRef]
  36. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  37. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019.
  38. Islam, M.M.; Hossain, M.S.; Hasan, M.K.; Shahjalal, M.; Jang, Y.M. FPGA implementation of high-speed area-efficient processor for elliptic curve point multiplication over prime field. IEEE Access 2019, 7, 178811–178826. [Google Scholar] [CrossRef]
Figure 1. Structural diagram of P Mealy FSM.
Figure 1. Structural diagram of P Mealy FSM.
Electronics 11 00950 g001
Figure 2. Structural diagram of LUT-based P Mealy FSM.
Figure 2. Structural diagram of LUT-based P Mealy FSM.
Electronics 11 00950 g002
Figure 3. Structural diagram of LUT-based M P Y Mealy FSM.
Figure 3. Structural diagram of LUT-based M P Y Mealy FSM.
Electronics 11 00950 g003
Figure 4. Structural diagram of LUT-based M P C Y Mealy FSM.
Figure 4. Structural diagram of LUT-based M P C Y Mealy FSM.
Electronics 11 00950 g004
Figure 5. The state transition graph of Mealy FSM S 0 .
Figure 5. The state transition graph of Mealy FSM S 0 .
Electronics 11 00950 g005
Figure 6. Complex state codes of Mealy FSM M P C Y ( S 0 ) .
Figure 6. Complex state codes of Mealy FSM M P C Y ( S 0 ) .
Electronics 11 00950 g006
Figure 7. Outcome of encoding of COs for Mealy FSM M P C Y ( S 0 ) .
Figure 7. Outcome of encoding of COs for Mealy FSM M P C Y ( S 0 ) .
Electronics 11 00950 g007
Figure 8. Logic circuit of Mealy FSM M P C Y ( S 0 ) .
Figure 8. Logic circuit of Mealy FSM M P C Y ( S 0 ) .
Electronics 11 00950 g008
Table 1. State transition table of Mealy FSM S 0 .
Table 1. State transition table of Mealy FSM S 0 .
a m a T X h Y h h
a 1 a 4 x 1 y 1 y 2 1
a 3 x 1 ¯ x 2 y 5 2
a 2 x 1 ¯ x 2 ¯ y 4 3
a 2 a 5 x 3 y 3 y 6 4
a 6 x 3 ¯ x 4 y 2 y 5 5
a 3 x 3 ¯ x 4 ¯ y 1 y 2 6
a 3 a 6 x 6 y 4 7
a 4 x 6 ¯ y 5 8
a 4 a 3 x 5 ¯ y 5 9
a 1 x 5 10
a 5 a 2 x 5 y 3 y 6 11
a 6 x 5 ¯ x 7 y 4 12
a 7 x 5 ¯ x 7 ¯ y 4 y 7 13
a 6 a 7 1 y 3 14
a 7 a 5 x 8 15
a 6 x 8 ¯ x 9 y 2 y 5 16
a 8 x 8 ¯ x 9 ¯ y 1 y 2 17
a 8 a 8 x 10 y 5 18
a 4 x 10 ¯ y 3 y 7 19
Table 2. Replacement of inputs for Mealy FSM M P C Y ( S 0 ) .
Table 2. Replacement of inputs for Mealy FSM M P C Y ( S 0 ) .
A a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
P
p 1 x 1 x 3 x 6 x 7 x 8 x 10
p 2 x 2 x 4 x 5 x 5 x 9
Table 3. Direct structure table of Mealy FSM U 4 ( S 1 ) .
Table 3. Direct structure table of Mealy FSM U 4 ( S 1 ) .
a m K ( a m ) a T K ( a T ) P h Φ h Z h h
a 1 000 a 4 011 p 1 D 2 D 3 z 2 z 4 1
a 3 010 p 1 ¯ p 2 D 2 z 3 z 4 2
a 2 001 p 1 ¯ p 2 ¯ D 3 z 1 z 2 z 3 z 4 3
a 2 001 a 5 100 p 1 D 1 z 2 z 3 4
a 6 101 p 1 ¯ p 2 D 1 D 3 z 4 5
a 3 010 p 1 ¯ p 2 ¯ D 2 z 2 z 4 6
a 3 010 a 6 101 p 1 D 1 D 3 z 1 z 2 z 3 z 4 7
a 4 011 p 1 ¯ D 2 D 3 z 3 z 4 8
a 4 011 a 3 101 p 2 D 2 z 3 z 4 9
a 1 000 p 2 ¯ 10
a 5 100 a 2 001 p 2 D 3 z 2 z 3 11
a 6 101 p 2 ¯ p 1 D 1 D 3 z 1 z 2 z 3 z 4 12
a 7 110 p 2 ¯ p 1 ¯ D 1 D 2 z 1 z 3 z 4 13
a 6 101 a 7 1101 D 1 D 2 z 3 14
a 7 110 a 5 100 p 1 D 1 15
a 6 101 p 1 ¯ p 2 D 1 D 3 z 4 16
a 8 111 p 1 ¯ p 2 ¯ D 1 D 2 D 3 z 2 z 4 17
a 8 111 a 8 111 p 1 D 1 D 2 D 3 z 3 z 4 18
a 4 011 p 1 ¯ D 2 D 3 z 1 z 3 19
Table 4. Table of L U T e r 1 of Mealy FSM M P C Y ( S 0 ) .
Table 4. Table of L U T e r 1 of Mealy FSM M P C Y ( S 0 ) .
a m C ( a m ) a T CSC ( a T ) P h 1 Φ h 1 Z h 1 h
a 1 00 a 4 011 p 1 D 2 D 3 z 3 z 4 1
a 3 010 p 1 ¯ p 2 D 2 z 3 z 4 2
a 2 001 p 1 ¯ p 2 ¯ D 3 z 1 z 2 z 3 z 4 3
a 2 01 a 5 100 p 1 D 1 z 2 z 3 4
a 6 101 p 1 ¯ p 2 D 1 D 3 z 4 5
a 3 010 p 1 ¯ p 2 ¯ D 2 z 2 z 4 6
a 3 10 a 6 101 p 1 D 1 D 3 z 1 z 2 z 3 z 4 7
a 4 011 p 1 ¯ D 2 D 3 z 3 z 4 8
a 4 11 a 3 010 p 2 D 2 z 3 z 4 9
a 1 000 p 2 ¯ 10
Table 5. Table of L U T e r 2 of Mealy FSM M P C Y ( S 0 ) .
Table 5. Table of L U T e r 2 of Mealy FSM M P C Y ( S 0 ) .
a m C ( a m ) a T CSC ( a T ) P h 2 Φ h 2 Z h 2 h
a 5 00 a 2 001 p 2 D 3 z 2 z 3 1
a 6 101 p 2 ¯ p 1 D 1 D 3 z 1 z 2 z 3 z 4 2
a 7 110 p 2 ¯ p 1 ¯ D 1 D 2 z 1 z 3 z 4 3
a 6 01 a 7 1101 D 1 D 2 z 3 4
a 7 10 a 5 100 p 1 D 1 5
a 6 101 p 1 ¯ p 2 D 1 D 3 z 4 6
a 8 111 p 1 ¯ p 2 ¯ D 1 D 2 D 3 z 2 z 4 7
a 8 11 a 8 111 p 1 D 1 D 2 D 3 z 3 z 4 8
a 4 011 p 2 D 2 D 3 z 1 z 3 9
Table 6. Table of L U T e r T Z of Mealy M P C Y ( S 0 ) .
Table 6. Table of L U T e r T Z of Mealy M P C Y ( S 0 ) .
Function j
12
D 1 11
D 2 11
D 3 11
z 1 11
z 2 11
z 3 11
z 4 11
Table 7. Characteristics of benchmark Mealy FSMs [36].
Table 7. Characteristics of benchmark Mealy FSMs [36].
BenchmarkLN R + L M / R HCategory
bbara42812/4601
bbsse771226/5561
bbtas2269/4240
beecount34710/4281
cse771232/5911
dk1435826/5561
dk1535817/5321
dk1623975/71081
dk1723616/4320
dk2712510/4140
dk51213624/5150
donfile21724/5961
ex19191680/71382
ex222725/5721
ex322614/4360
ex4691118/5211
ex522616/4320
ex658914/4341
ex7221217/5361
keyb771222/51701
kirkman1261848/63702
lion2155/3110
lion921611/4250
mark15161022/5221
mc3568/3100
modulo1211512/4240
opus561018/5221
planet7191486/71152
planet17191486/71152
pma881449/6732
s1871454/61062
s148881915112/72512
s149481915118/72502
s1a861586/71072
s2081121737/61532
s2741811/4341
s386771223/5641
s42019227137/81374
s51019727172/8774
s841815/4201
s82018192578/72324
s83218192576/72454
sand1191888/71843
shiftreg11516/4160
sse771226/5561
styr9101667/71662
tma791363/6442
Table 8. Experimental results (LUT counts).
Table 8. Experimental results (LUT counts).
BenchmarkAutoOne-HotJEDI MPY Our ApproachCategory
bbtas555880
dk175125880
dk27354770
dk5121010912120
ex399911110
ex599910100
lion252660
lion96115880
mc474660
modulo12777990
shiftreg262440
bbara1717101091
bbsse33372426211
beecount19191414121
cse40663633301
dk1416271012101
dk15151612671
dk1615341211101
donfile31312421181
ex2998881
ex415131211101
ex624362221191
ex7454661
keyb43614037341
mark123232019161
opus28282221191
s276186671
s38626392225231
s8999991
sse33373026221
ex170745340342
kirkman42583933292
planet1311318878682
planet11311318878682
pma94948672642
s165996154512
s148812413110889812
s149412613211090792
s1a49814338322
s208123110992
styr931208170612
tma45393930272
sand13213211499913
s42010319884
s51048483222194
s82088826852464
s83280796250424
Total18082104148913231188
Percentage,%152.19177.10125.34111.36100.00
Table 9. Experimental results (the latency time for all benchmarks, nsec).
Table 9. Experimental results (the latency time for all benchmarks, nsec).
BenchmarkAutoOne-HotJEDI MPY Our ApproachCategory
bbtas4.8984.8984.8524.9915.0130
dk175.0185.9885.0155.0035.0750
dk274.8544.9534.8985.0855.1500
dk5125.0955.0955.0065.1505.2080
ex35.1325.1325.1085.2305.3720
ex55.5485.5485.5205.6165.6900
lion4.9404.9024.9424.9965.0250
lion94.8715.3994.8455.0225.1550
mc5.0855.1165.0795.1775.2780
modulo124.8314.8314.8284.9724.8240
shiftreg3.8073.7943.6203.8963.9780
bbara5.1715.1714.7124.9454.9971
bbsse6.3675.9135.4845.5185.8091
beecount6.0026.0025.3385.4015.4611
cse6.8296.1115.6145.7085.8751
dk145.2185.7925.1595.2585.2981
dk155.1945.3955.1325.2025.2581
dk165.8925.7215.0735.1465.2441
donfile5.4345.4354.9104.9775.0451
ex25.0365.0364.9975.0425.0721
ex45.5265.6275.1865.2595.3701
ex65.8976.1055.6635.8395.9121
ex74.9994.9794.9855.0475.0961
keyb6.3926.9705.9376.1726.2401
mark16.1586.1585.6765.8765.9201
opus6.0176.0175.6085.7055.8301
s275.0325.2225.0225.0995.1951
s3865.9475.7655.5825.6555.8741
s85.5555.5885.5185.6115.7881
sse6.3675.9135.7265.8786.0861
ex16.6257.1555.6545.4845.2352
kirkman7.0736.4946.3825.9835.5212
planet7.5357.5355.3445.2885.2342
planet17.5357.5355.3445.2885.2342
pma6.8416.8415.8885.6125.3702
s16.8307.3616.3636.1645.7432
s14887.2207.5796.3625.9415.8092
s14946.6946.8616.0855.8055.5572
s1a6.5205.6695.9115.6115.5172
s2085.7365.6675.5945.5035.3392
styr7.2677.6976.8666.1785.9082
tma6.1026.7666.0925.6595.5542
sand8.6238.6237.8856.8646.5583
s4205.7515.6675.6425.3415.1934
s5105.6295.6295.5125.3385.2574
s8206.5796.5295.6635.4965.2884
s8326.8636.5265.7545.3735.1694
Total278.536280.710257.378255.403254.625
Percentage,%109.39110.24101.08100.31100.00
Table 10. Experimental results (Total On-Chip Power, Watts).
Table 10. Experimental results (Total On-Chip Power, Watts).
BenchmarkAutoOne-HotJEDI MPY Our ApproachCategory
bbtas0.5330.5330.5330.6610.6610
dk171.9011.9351.8912.3632.3630
dk271.1680.8541.1581.4591.4590
dk5121.4961.4961.3451.7081.7080
ex30.3910.3910.3910.5010.5010
ex50.3870.3870.3850.4960.4960
lion0.5420.6290.5470.7110.7110
lion90.7330.970.7280.9390.9390
mc0.4470.5610.4430.5670.5670
modulo120.5590.5590.5630.7150.7150
shiftreg0.5230.6030.5120.6450.6450
bbara0.5690.5690.4880.3990.3791
bbsse2.221.2061.7131.5221.4741
beecount1.6311.6311.0210.8350.7931
cse0.9581.0190.8910.6830.6491
dk142.9593.332.9522.8922.7471
dk151.4031.9051.3991.3121.2461
dk162.9672.7422.5122.3352.2181
donfile0.7090.7090.6030.4780.4541
ex20.3680.3860.3420.2670.2541
ex41.5621.2411.1870.9230.8771
ex62.2693.852.2421.9751.8791
ex70.9921.1810.9940.9980.9681
keyb1.0931.0711.0750.7960.7481
mark11.4451.4451.2271.0871.0111
opus1.3441.3441.2831.1211.0641
s270.7561.950.7650.5640.5251
s3861.2511.3931.1210.9980.9381
s80.7360.8050.7320.6820.6621
sse1.221.2961.0890.9070.8621
ex14.1022.9682.3421.7281.5892
kirkman1.6931.8441.4391.1271.0482
planet4.1224.1222.4562.0281.9062
planet14.1224.1222.4562.0281.9062
pma1.371.371.2530.8030.7392
s12.6853.132.5182.0481.9252
s14883.9824.0963.5481.8831.7512
s14943.0793.1782.9822.3582.1692
s1a1.3222.011.2080.8850.8322
s2081.3672.821.2490.9570.8712
styr4.0444.7713.1872.6322.4482
tma1.5891.3141.3210.9180.8452
sand1.1491.1490.9880.6170.5573
s4201.3372.821.2860.8920.7944
s5101.5431.5431.0910.8520.7674
s8202.0541.8011.4630.8430.7424
s8322.0962.0871.8280.9320.8294
Total76.78883.13664.74755.07052.231
Percentage,%147.02159.17123.96105.44100
Table 11. Experimental results (area-time products).
Table 11. Experimental results (area-time products).
BenchmarkAutoOne-HotJEDI MPY Our ApproachCategory
bbtas24.4924.4924.2639.9240.100
dk1725.0971.8625.0840.0340.600
dk2714.5624.7619.5935.6036.050
dk51250.9550.9545.0661.8062.490
ex346.1946.1945.9757.5359.100
ex549.9349.9349.6856.1656.900
lion9.8824.519.8829.9730.150
lion929.2359.3924.2340.1841.240
mc20.3435.8120.3231.0631.670
modulo1233.8233.8233.8044.7543.410
shiftreg7.6122.767.2415.5815.910
bbara87.9187.9147.1249.4544.971
bbsse210.11218.78131.62143.46121.981
beecount114.04114.0474.7475.6265.531
cse273.17403.32202.11188.38176.251
dk1483.49156.3951.5963.1052.981
dk1577.9186.3261.5831.2136.811
dk1688.38194.5260.8756.6052.441
donfile168.45168.48117.85104.5290.801
ex245.3245.3239.9740.3440.581
ex482.8973.1562.2357.8553.701
ex6141.53219.78124.58122.61112.331
ex720.0024.9019.9430.2830.581
keyb274.85425.18237.49228.38212.171
mark1141.63141.63113.52111.6594.721
opus168.47168.47123.37119.80110.771
s2730.1993.9930.1330.5936.371
s386154.62224.84122.80141.36135.101
s849.9950.2949.6650.5052.101
sse210.11218.78171.79152.83133.891
ex1463.76529.48299.66219.37178.002
kirkman297.07376.62248.91197.43160.112
planet987.11987.11470.24412.44355.892
planet1987.11987.11470.24412.44355.892
pma643.04643.04506.39404.06343.702
s1443.96728.74388.14332.86292.902
s1488895.31992.88687.11528.75470.552
s1494843.43905.66669.34522.44439.012
s1a319.49459.18254.18213.23176.552
s20868.83175.6855.9449.5348.052
styr675.82923.65556.17432.45360.412
tma274.59263.87237.60169.76149.952
sand1138.231138.23898.91679.57596.763
s42057.51175.6850.7842.7341.554
s510270.19270.19176.39117.4599.874
s820578.95535.39385.09285.78243.234
s832549.04515.56356.77268.64217.114
Total12,228.6114,168.648859.937540.036641.23
Percentage,%184.13213.34133.41113.53100.00
Table 12. Experimental results (area-time-power products).
Table 12. Experimental results (area-time-power products).
BenchmarkAutoOne-HotJEDI MPY Our ApproachCategory
bbtas13.0513.0512.9326.3926.510
dk1747.70139.0447.4294.5895.940
dk2717.0121.1522.6951.9352.600
dk51276.2276.2260.60105.56106.730
ex318.0618.0617.9828.8229.610
ex519.3219.3219.1327.8628.220
lion5.3515.425.4121.3121.440
lion921.4257.6117.6437.7338.730
mc9.0920.099.0017.6117.960
modulo1218.9018.9019.0332.0031.040
shiftreg3.9813.733.7110.0510.260
bbara50.0250.0223.0019.7317.041
bbsse466.45263.85225.47218.35179.801
beecount186.00186.0076.3163.1451.971
cse261.70410.99180.08128.66114.391
dk14247.05520.76152.28182.48145.531
dk15109.31164.4486.1540.9545.861
dk16262.23533.37152.91132.17116.321
donfile119.43119.4571.0649.9641.221
ex216.6817.5013.6710.7710.311
ex4129.4890.7873.8753.4047.101
ex6321.14846.15279.31242.16211.061
ex719.8429.4019.8230.2229.601
keyb300.41455.36255.30181.79158.701
mark1204.66204.66139.29121.3695.761
opus226.43226.43158.29134.30117.861
s2722.82183.2923.0517.2519.091
s386193.43313.20137.66141.08126.731
s836.8040.4936.3534.4434.491
sse256.34283.54187.08138.62115.411
ex11902.351571.49701.79379.07282.842
kirkman502.94694.49358.19222.50167.802
planet4068.894068.891154.90836.42678.332
planet14068.894068.891154.90836.42678.332
pma880.97880.97634.51324.46253.992
s11192.032280.97977.34681.70563.842
s14883565.114066.822437.87995.65823.932
s14942596.922878.191995.981231.90952.212
s1a422.36922.96307.05188.71146.892
s20894.09495.4169.8747.4041.852
styr2733.034406.711772.501138.20882.292
tma436.33346.73313.87155.84126.712
sand1307.821307.82888.12419.30332.403
s42076.89495.4165.3038.1132.994
s510416.91416.91192.44100.0676.604
s8201189.16964.23563.39240.91180.484
s8321150.781075.98652.18250.38179.984
Total30,285.7736,295.1316,766.6810,481.708538.73
Percentage,%354.69425.06196.36122.75100.00
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits. Electronics 2022, 11, 950. https://doi.org/10.3390/electronics11060950

AMA Style

Barkalov A, Titarenko L, Krzywicki K, Saburova S. Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits. Electronics. 2022; 11(6):950. https://doi.org/10.3390/electronics11060950

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Kazimierz Krzywicki, and Svetlana Saburova. 2022. "Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits" Electronics 11, no. 6: 950. https://doi.org/10.3390/electronics11060950

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop