# Programmable Energy-Efficient Analog Multilayer Perceptron Architecture Suitable for Future Expansion to Hardware Accelerators

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

^{2}of on-chip area. The system architecture is analyzed in several different configurations with each achieving a power efficiency of greater than 1 tera-operations per watt. This work offers an energy-efficient and scalable alternative to digital configurable neural networks that can be built upon to create larger networks capable of standard machine learning applications, such as image and text classification. This research details a programmable hardware implementation of an MLP that achieves a peak power efficiency of 5.23 tera-operations per watt while consuming considerably less power than comparable digital and analog designs. This paper describes circuit elements that can readily be scaled up at the system level to create a larger neural network architecture capable of improved energy efficiency.

## 1. Introduction

**x**is the vector of input values, $\varphi $() is the nonlinear activation function, ${\mathbf{w}}_{i}^{\prime}$ is the vector containing the weight values, and ${b}_{i}$ is the bias for a given neuron. Figure 1 shows a general MLP structure with two hidden layers. A distinct characteristic of an MLP ANN is that every neuron will output to all of the neurons of the subsequent layer and is typically used for the classification of a set of data that are not linearly separable [2].

## 2. Results

^{2}(Ops/s/μm

^{2}), power in microwatts (μW), synapses (number of multipliers), density per synapse (Ops/s/μm

^{2}), power per synapse (μW), and power efficiency (TOPS/W). Table 4 shows each design’s implemented technology node or whether software only, the nominal supply voltage, and nominal frequency as compared with the proposed architecture.

## 3. Materials and Methods

#### 3.1. Multiplier Design

#### 3.2. Sigmoid Design

#### 3.3. Winner-Take-All Design

#### 3.4. MLP Hardware System Architecture

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- MacKay, D. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Gales, M. Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons. 2015. Available online: http://mi.eng.cam.ac.uk/~mjfg/local/4F10/ (accessed on 4 April 2018).
- Hasler, P. Low-power programmable signal processing. In Proceedings of the Fifth International Workshop on System-on-Chip for Real-Time Applications (IWSOC’05), Banff, AB, Canada, 20–24 July 2005; pp. 413–418. [Google Scholar] [CrossRef]
- Gravati, M.; Valle, M.; Ferri, G.; Guerrini, N.; Reyes, N. A novel current-mode very low power analog CMOS four quadrant multiplier. In Proceedings of the 31st European Solid-State Circuits Conference, ESSCIRC 2005, Grenoble, France, 12–16 September 2005; pp. 495–498. [Google Scholar] [CrossRef]
- Al-Absi, M.A.; Hussein, A.; Abuelma’atti, M.T. A novel current-mode ultra low power analog CMOS four quadrant multiplier. In Proceedings of the 2012 International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 3–5 July 2012; pp. 13–17. [Google Scholar] [CrossRef]
- Talaska, T.; Kolasa, M.; Długosz, R.; Pedrycz, W. Analog Programmable Distance Calculation Circuit for Winner Takes All Neural Network Realized in the CMOS Technology. IEEE Trans. Neural Netw. Learn. Syst.
**2016**, 27, 661–673. [Google Scholar] [CrossRef] [PubMed] - Lont, J.B.; Guggenbuhl, W. Analog CMOS implementation of a multilayer perceptron with nonlinear synapses. IEEE Trans. Neural Netw.
**1992**, 3, 457–465. [Google Scholar] [CrossRef] [PubMed] - Riedmiller, M. Machine Learning: Multi Layer Perceptrons. 2009. Available online: https://ml.informatik.uni-freiburg.de/ (accessed on 21 April 2018).
- Park, S.W.; Park, J.; Bong, K.; Shin, D.; Lee, J.; Choi, S.; Yoo, H.J. An Energy-Efficient and Scalable Deep Learning/Inference Processor with Tetra-Parallel MIMD Architecture for Big Data Applications. IEEE Trans. Biomed. Circuits Syst.
**2015**, 9, 838–848. [Google Scholar] [CrossRef] [PubMed] - Tsai, C.H.; Yu, W.J.; Wong, W.H.; Lee, C.Y. A 41.3/26.7 pJ per Neuron Weight RBM Processor Supporting On-Chip Learning/Inference for IoT Applications. IEEE J. Solid-State Circuits
**2017**, 52, 2601–2612. [Google Scholar] [CrossRef] - Yüzügüler, A.C.; Celik, F.; Drumond, M.; Falsafi, B.; Frossard, P. Analog Neural Networks with Deep-Submicrometer Nonlinear Synapses. IEEE Micro
**2019**, 39, 55–63. [Google Scholar] [CrossRef] - Binas, J.; Neil, D.; Indiveri, G.; Liu, S.C.; Pfeiffer, M. Precise deep neural network computation on imprecise low-power analog hardware. arXiv
**2016**, arXiv:1606.07786. [Google Scholar] - Nguyen-Hoang, D.T.; Ma, K.M.; Le, D.L.; Thai, H.H.; Cao, T.B.T.; Le, D.H. Implementation of a 32-Bit RISC-V Processor with Cryptography Accelerators on FPGA and ASIC. In Proceedings of the 2022 IEEE Ninth International Conference on Communications and Electronics (ICCE), Nha Trang, Vietnam, 27–29 July 2022; pp. 219–224. [Google Scholar] [CrossRef]
- Venkataramanaiah, S.K.; Yin, S.; Cao, Y.; Seo, J.S. Deep Neural Network Training Accelerator Designs in ASIC and FPGA. In Proceedings of the 2020 International SoC Design Conference (ISOCC), Yeosu, Republic of Korea, 21–24 October 2020; pp. 21–22. [Google Scholar] [CrossRef]
- Zhang, H.; Li, Z.; Yang, H.; Cheng, X.; Zeng, X. A High-Efficient and Configurable Hardware Accelerator for Convolutional Neural Network. In Proceedings of the 2021 IEEE 14th International Conference on ASIC (ASICON), Kunming, China, 26–29 October 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Harrison, R. MOSFET Operation in Weak and Moderate Inversion; University of Utah: Salt Lake City, UT, USA, 2010. [Google Scholar]
- Gilbert, B. Translinear circuits: A proposed classification. Electron. Lett.
**1975**, 11, 14–16. [Google Scholar] [CrossRef] - Lopez-Martin, A.J.; Carlosena, A. Design of MOS-translinear multiplier/dividers in analog VLSI. VLSI Des.
**2000**, 11, 321–329. [Google Scholar] [CrossRef] [Green Version] - Gilbert, B. Translinear circuits: An historical overview. Analog Integr. Circuits Signal Process.
**1996**, 9, 95–118. [Google Scholar] [CrossRef] - Sedra, A.S.; Smith, K.C. Microelectronic Circuits; Oxford University Press: New York, NY, USA, 1998; Volume 1. [Google Scholar]
- Minch, B.A. A low-voltage MOS cascode current mirror for all current levels. In Proceedings of the 2002 45th Midwest Symposium on Circuits and Systems, MWSCAS-2002, Tulsa, OK, USA, 4–7 August 2002; Volume 2, p. II-53-6. [Google Scholar] [CrossRef]
- Mead, C. Analog VLSI and Neural Systems; Addison-Wesley VLSI System Series; Addison-Wesley: Boston, MA, USA, 1989. [Google Scholar]
- Wunderlich, R.B.; Adil, F.; Hasler, P. Floating Gate-Based Field Programmable Mixed-Signal Array. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
**2013**, 21, 1496–1505. [Google Scholar] [CrossRef]

**Figure 16.**Simulation of the multiplier circuit with an input current of 100 nA and weighting signals of 100 nA.

**Figure 17.**Simulation of the Multiplier Circuit with a DC sweep of the input current and weighting signals of 100 nA.

**Figure 20.**Sigmoid circuit schematic simulation with current and reference biasing to maintain a similar output current signal as the input signal.

**Figure 21.**Sigmoid circuit schematic simulation with the input current swept from 0 to 500 nA with a constant reference and bias current of 200 and 25 nA, respectively.

**Figure 24.**Winner-take-all circuit simulation with a 100 nA input current signal and 100 nA weight, saturation, and biasing signals to produce an output voltage that corresponds to the high signal levels of the input signal.

**Figure 32.**Final layout view of the MLP architecture that fits within a 1 mm by 1 mm physical space.

Config Number | Inputs | Neurons | Freq. (MHz) | On/Off Current (μA) | Power (μW) | Power Efficiency (TOps/W) |
---|---|---|---|---|---|---|

1 | 1 | 3 | 4.06 | 14/7 | 12.6 | 1.93 |

2 | 1 | 4 | 4.06 | 14.1/7.3 | 12.84 | 2.52 |

3 | 1 | 5 | 4.06 | 19.4/8.6 | 16.8 | 2.90 |

4 | 1 | 7 | 4.06 | 18.9/6.3 | 15.12 | 5.91 |

5 | 1 | 9 | 2.75 | 22.8/7.4 | 18.12 | 4.56 |

6 | 1 | 12 | 2.75 | 26.6/15.5 | 25.26 | 5.23 |

7 | 4 | 12 | 1.51 | 35.8/17 | 31.68 | 2.86 |

8 | 1 | 6 | 2.04 | 13.5/4 | 10.5 | 2.09 |

9 | 2 | 6 | 2.57 | 19/8.1 | 16.26 | 2.85 |

10 | 1 | 7 | 4.06 | 21.6/8.2 | 17.88 | 4.54 |

Config Number | Inputs | Neurons | Freq. (MHz) | Delay (Rising, ns) | Delay (Falling, ns) |
---|---|---|---|---|---|

1 | 1 | 3 | 4.06 | 338 | 489 |

2 | 1 | 4 | 4.06 | 345 | 457 |

3 | 1 | 5 | 4.06 | 343 | 479 |

4 | 1 | 7 | 4.06 | 436 | 492 |

5 | 1 | 9 | 2.75 | 515 | 585 |

6 | 1 | 12 | 2.75 | 541 | 621 |

7 | 4 | 12 | 1.51 | 555 | 889 |

8 | 1 | 6 | 2.04 | 829 | 750 |

9 | 2 | 6 | 2.57 | 610 | 687 |

10 | 1 | 7 | 4.06 | 395 | 564 |

Prior Art | Analog or | Power | Density per Synapse | Power per Synapse | Power Efficiency | Fabricated |
---|---|---|---|---|---|---|

Digital | (μW) | (Ops/s/μm^{2}) | (μW) | (TOps/W) | ||

Park [9] | Digital | 213,100 | 20.00 | 103.65 | 1.930 | Yes |

Tsai [10] | Digital | 310,000 | 23.86 | 75.68 | 1.450 | Yes |

Yuzuguler [11] | Analog | - | 40.64 | - | 3.846 | No |

Binas [12] | Analog | 200 | - | 0.008 | 7.97 | Yes |

Hoang [13] | Digital | 42,100 | - | - | 0.0012 | No |

Shreyas [14] | Digital | 10,450 | - | - | 2.60 | No |

Zhang [15] | Digital | 7,250,000 | - | - | 0.0524 | No |

This work | Analog | 15.12 | 8.12 | 2.16 | 5.91 | Yes |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dix, J.; Holleman, J.; Blalock, B.J.
Programmable Energy-Efficient Analog Multilayer Perceptron Architecture Suitable for Future Expansion to Hardware Accelerators. *J. Low Power Electron. Appl.* **2023**, *13*, 47.
https://doi.org/10.3390/jlpea13030047

**AMA Style**

Dix J, Holleman J, Blalock BJ.
Programmable Energy-Efficient Analog Multilayer Perceptron Architecture Suitable for Future Expansion to Hardware Accelerators. *Journal of Low Power Electronics and Applications*. 2023; 13(3):47.
https://doi.org/10.3390/jlpea13030047

**Chicago/Turabian Style**

Dix, Jeff, Jeremy Holleman, and Benjamin J. Blalock.
2023. "Programmable Energy-Efficient Analog Multilayer Perceptron Architecture Suitable for Future Expansion to Hardware Accelerators" *Journal of Low Power Electronics and Applications* 13, no. 3: 47.
https://doi.org/10.3390/jlpea13030047