

# **Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory**

Sangheon Lee <sup>1</sup>, Gwanwoo Park <sup>1</sup> and Hanwool Jeong <sup>1,2,\*</sup>

- <sup>1</sup> Department of Electronic Engineering, Kwangwoon University, Seoul 01897, Republic of Korea; leesang@kw.ac.kr (S.L.); gwanwoo0620@kw.ac.kr (G.P.)
- <sup>2</sup> Articron Inc., Ansan-si 15588, Republic of Korea
- \* Correspondence: hwjeong@kw.ac.kr

Abstract: This paper comparatively reviews sensing circuit designs for the most widely used embedded memory, static random-access memory (SRAM). Many sensing circuits for SRAM have been proposed to improve power efficiency and speed, because sensing operations in SRAM dominantly determine the overall speed and power consumption of the system-on-chip. This phenomenon is more pronounced in the nanoscale era, where SRAM bit-cells implemented near minimum-sized transistors are highly influenced by variation effects. Under this condition, for stable sensing, the control signal for accessing the selected bit-cell (word-line, WL) should be asserted for a long time, leading to increases in the power dissipation and delay at the same time. By innovating sensing circuits that can reduce the WL pulse width, the sensing power and speed can be efficiently improved, simultaneously. Throughout this paper, the strength and weakness of many SRAM sensing circuits are introduced in terms of various aspects—speed, area, power, etc.

Keywords: static random-access memory; sensing circuit; offset voltage



Citation: Lee, S.; Park, G.; Jeong, H. Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory. *Sensors* 2024, 24, 16. https://doi.org/10.3390/ s24010016

Academic Editors: Giuseppe Ferri, Alfiero Leoni and Gianluca Barile

Received: 8 November 2023 Revised: 14 December 2023 Accepted: 17 December 2023 Published: 19 December 2023



**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

## 1. Introduction

System-on-chip design encounters considerable challenges related to power consumption and latency, with an influence emanating from static random-access memory (SRAM) [1–4]. Thus, the efficient management of SRAM power consumption and the enhancement of SRAM access speed becomes highly important. Although reducing the supply voltage (V<sub>DD</sub>) proves effective in reducing power consumption, it introduces potential performance and stability trade-offs. In particular, the SRAM bit-cell, a circuit component for binary data storage, is typically constructed with near minimum-sized transistors to achieve high-density integration, resulting in significant performance variability due to process deviations [5–8]. Furthermore, to address read stability issues, read assist circuits are employed to suppress the word-line voltage, which can exacerbate performance degradation. Consequently, the optimization of SRAM circuits to minimize both power consumption and delay becomes crucial.

By analyzing the read operation, we can identify a method to simultaneously reduce power consumption and delay in SRAM. During the read operation, the bit-cell generates a voltage difference across the bit-line pair. Then, a sensing circuit measures this voltage difference and subsequently delivers the results to the external system. Importantly, the bit-line pair, which plays a fundamental role, has a significant capacitance, enough to make it the dominant contributor to both delay and power consumption during the read operation. Consequently, when a substantial voltage swing in the bit-line is necessitated for the read operation, it inevitably results in increased delays and power consumption. Thus, reducing the bit-line swing during the read operation can effectively decrease the power consumption and delay at the same time [9–11].

However, it is highly challenging to reduce bit-line voltage swing. This is because sensing circuits, especially the sense amplifier (SA) responsible for detecting bit-line swing, necessitate a sufficiently large bit-line voltage difference ( $\Delta V_{BL}$ ) for precise operation. This need arises due to transistor mismatch within the SA, causing asymmetry in its characteristics. The minimum input voltage difference (in this case,  $\Delta V_{BL}$ ) required for stable SA operation is known as the SA offset voltage ( $V_{OS}$ ). To reduce the  $\Delta V_{BL}$ , it becomes essential to lower the  $V_{OS}$ .

Additionally, the SA is crucially utilized not only in SRAM but also in novel components, improving the efficiency of data processing [12–21]. SAs are used as row ADCs in [12–14], binary activation functions in [15–17], multilevel sense amplifiers in [18], fourbit flash ADCs in [19], and sensing circuits in [20,21]. Therefore, research on low  $V_{OS}$ for high accuracy, low power consumption, fast speed, and high integration for efficient performance is crucial for SAs.

Consequently, there are numerous prior research efforts proposed to reduce the  $V_{OS}$ , the most important performance of SAs. The simplest method is to use larger width transistors for SAs, which can reduce the mismatch between paired transistors. However, this approach incurs area and power overhead. To reduce the  $V_{OS}$  while minimizing the area and power overhead, various offset reducing circuit techniques have been proposed [22–47]. This paper aims to conduct a comparative analysis of these circuits, explaining their effectiveness in reducing the  $V_{OS}$  and achieving power and performance benefits.

The rest of this paper is organized as follows: Section 2 provides essential background information on SRAM read operations and conventional SRAM sensing circuits, including an examination of their limitations. This foundation is crucial for understanding the subsequent content. Section 3 delves into comprehensive introductions of various previously researched SRAM sensing circuits designed to reduce the  $V_{OS}$ , ultimately enhancing speed and power efficiency. Section 4 details a comparative analysis and discussion of the SRAM sensing circuits introduced in Section 3 from various perspectives.

#### 2. Backgrounds on SRAM Read Operation and Conventional Sensing Circuits

Figure 1 presents the simplified circuits in the conventional SRAM for the read operation. In the following, we provide brief explanations for the structure and operation of each circuit shown in Figure 1.



Figure 1. Simplified schematic of the conventional SRAM for the read operation.

At the top of Figure 1, the bit-cell is composed of six transistors. In this 6T bit-cell, two cross-coupled inverters are formed of  $M_1$ ,  $M_2$ ,  $M_3$ , and  $M_4$  for storing and latching the binary data at two storage nodes,  $Q_T$  and  $Q_C$ . The two access transistors,  $M_5$  and  $M_6$ ,

serve as control elements that regulate connections between the bit-line pair ( $BL_T$  and  $BL_C$ ) and storage nodes ( $Q_T$  and  $Q_C$ ). When the WL activates (i.e., WL = 1), access transistors are turned on to connect bit-lines to storage nodes.

Next, the bit-line pre-charge circuit is shown, which is formed of  $M_{PCT}$ ,  $M_{PCC}$ , and  $M_{EQ}$ . These transistors are controlled by the low-enable pre-charge trigger signal, *PCB*, with their gates connected. When *PCB* = 0,  $M_{PCT}$  and  $M_{PCC}$  are turned on to pre-charge *BL*<sub>T</sub> and *BL*<sub>C</sub> up to V<sub>DD</sub>, while  $M_{EQ}$  ensures that *BL*<sub>T</sub> and *BL*<sub>C</sub> are pre-charged to equal voltages.

The column multiplexer (MUX) implemented with  $M_{C1}$ ,  $M_{C2}$ , ...,  $M_{C8}$  selects one bit-line pair from multiple pairs (four pairs in Figure 1) and connects it to the SA input pair  $SL_T$  and  $SL_C$ . The specific bit-line pair to be connected is determined by the column address signal, COLB[0:3], with only one of these signals set to low.

The SA plays a key role in the SRAM read operation. It amplifies the voltage difference between  $SL_T$  and  $SL_C$ , converting it into a full-logic swing voltage. This amplified signal is then made available at the SA's differential outputs— $SO_T$  and  $SO_C$ . Two commonly used conventional SA structures are the voltage-type latch SA (VLSA) and the current-type latch SA (CLSA), which are shown in Figure 2a,b, respectively [48]. Compared to VLSAs, CLSAs acquire SA input voltages,  $SL_C$  and  $SL_T$ , through the gate of access transistors,  $M_{S1}$  and  $M_{S2}$ . Therefore, the SA input voltage drives high impedance and less sensitivity to the timing mismatch. However, CLSAs have additional transistors for sensing operations. Therefore, CLSAs have lower speed performance, higher energy consumption, and a larger area, compared to VLSAs. The SA enable signal (SAE), connected to  $M_{S5}$ – $M_{S7}$  of VLSA and  $M_{S7}$ – $M_{S9}$  of CLSA, is utilized for triggering the amplifying operation of the SA.



**Figure 2.** Schematic of two commonly used SAs in SRAM: (**a**) voltage-type latch SA (VLSA) and (**b**) current-type latch SA.

Figure 3 provides operational waveforms of relevant signals during the conventional SRAM read operation, divided into three phases: the pre-charge phase, the access phase, and the evaluation phase. In the pre-charge phase, the *PCB* becomes low, which pre-charges the bit-lines ( $BL_T$  and  $BL_C$ ) and SA inputs ( $SL_T$  and  $SL_C$ ) to  $V_{DD}$  through the bit-line pre-charge circuit and the SA input pre-charge circuit. Then, the access phase starts by making PCB = 1 to turn off the pre-charge circuits, while the WL for the selected bit-cell is asserted to reflect the data at  $Q_T$  and  $Q_C$  onto the bit-line pair of  $BL_T$  and  $BL_C$ . Figure 3 shows an example of bit-cell storing datum "1" ( $Q_T = 1$  and  $Q_C = 0$ ). In this example,  $BL_T$  remains high while  $BL_C$  falls due to the bit-cell current through  $M_6$ , creating a voltage difference between  $BL_T$  and  $BL_C$ . By lowering the COLB[i] in the selected column, the column MUX transistors transfer only the selected bit-line pair voltage to the SA inputs,  $SL_T$  and  $SL_C$ .



Figure 3. Operational waveforms for the read operation relevant signals in the conventional SRAM.

During the subsequent evaluation phase, the SA enable signal (*SAE*) becomes high to trigger the positive feedback configuration in the SA. In this manner, a small voltage difference between SL<sub>T</sub> and SL<sub>C</sub>,  $\Delta V_{IN,SA}$  (See Figure 3), is amplified into the digital voltage difference at SA output nodes  $SO_T$  and  $SO_C$ . For example, the sensing operation of a VLSA in Figure 2a is shown in Figure 4.



Figure 4. Description of VLSA operation for sensing datum "1".

When the sensing datum is "1", the  $SL_T$  remains at  $V_{DD}$  while the  $SL_C$  decreases due to the bit-cell, reaching  $V_{DD} - \Delta V_{IN,SA}$ , as shown on the left side of Figure 4. The voltages at the SA outputs,  $SO_T$  and  $SO_C$ , are equal to those at  $SL_T$  and  $SL_C$ , respectively, through the pass transistors  $M_{S5}$  and  $M_{S6}$ . During the subsequent evaluation phase, the *SAE* rises, and current flows through paired nFETs.

The FETs in the SA,  $M_{S1}$  and  $M_{S2}$ , are depicted as  $I_{S1}$  and  $I_{S2}$  in the middle of Figure 4. At the beginning of the evaluation phase, the  $V_{GS}$  of  $M_{S2}$  ( $SO_T = V_{DD}$ ) is greater than that of  $M_{S1}$  ( $SO_C = V_{DD} - \Delta V_{IN,SA}$ ). Consequently,  $I_{S2} > I_{S1}$  makes  $SO_C$  fall faster than  $SO_T$ . This leads to positive feedback, formed by  $M_{S1}-M_{S2}-M_{S3}-M_{S4}$ . As a result,  $SO_T$  and  $SO_{C}$  eventually reach  $V_{DD}$  and 0 V, respectively, as shown on the right side of Figure 4, indicating a successful "1" datum sensing process.

However, it is not always guaranteed that the SA operation is stably performed. In Figure 5, there is a scenario where sensing failure occurs. The access phase is the same as the previous normal sensing operation (the left side of Figure 5). However, when the evaluation starts by triggering the SA, as shown in the middle of Figure 5, problems can arise. It should be noted that, although the V<sub>GS</sub> of M<sub>S2</sub> ( $SO_T = V_{DD}$ ) is greater than the V<sub>GS</sub> of M<sub>S1</sub> ( $SO_C = V_{DD} - \Delta V_{IN,SA}$ ),  $I_{S2} < I_{S1}$ . This can occur because there is a mismatch between the M<sub>S1</sub>–M<sub>S2</sub> pair, specifically since the V<sub>th</sub> of M<sub>S1</sub> is lower than the V<sub>th</sub> of M<sub>S2</sub> [22]. Consequently, the  $SO_T$  (initially V<sub>DD</sub>) falls more quickly than the  $SO_C$  (initially V<sub>DD</sub> –  $\Delta V_{IN,SA}$ ). Therefore,  $SO_T$  and  $SO_C$  end up with 0 V and  $V_{DD}$ , respectively, meaning that sensing fails in attempting to sense datum "1".



Figure 5. Description of sensing failure in VLSA for sensing datum "1".

Here, the key point is that the mismatch between the paired transistors is responsible for the sensing failure. To prevent this sensing failure,  $\Delta V_{IN,SA}$  should be large enough to compensate the effects of the transistor mismatch. This minimum required  $\Delta V_{IN,SA}$ for stable sensing is the offset voltage in the SA, referred to as  $V_{OS}$ , and necessitates that  $\Delta V_{IN,SA} > V_{OS}$ . This  $V_{OS}$  problem becomes severed in low- $V_{DD}$  regions and is significantly affected by temperature [49,50]. To meet this condition, the WL pulse width is extended to achieve a sufficiently large  $\Delta V_{BL}$ , which, in turn, results in a large  $\Delta V_{IN,SA}$ . However, this increased  $\Delta V_{BL}$  requirement not only causes delays but also raises power consumption, since more power is needed to pre-charge the significant capacitance of the BL pair, stemming from the combined effects of the long wire capacitance of the BL wire and the parasitic capacitance of the bit-cells.

Although employing large-sized transistors for sensing schemes can mitigate the mismatch problem, it incurs power, speed, and area overhead in the sensing stage [18]. In addition, the various replica bit-line delay or self-timed SAE generation techniques are proposed to minimize *WL* pulses [51–58], but their effects are limited because local variations cannot be considered. The speed and power issue due to the  $\Delta V_{BL}$  requirement in SRAM becomes more severe in today's advanced sub-nanometer technology nodes, because *WL*-suppressed assist circuits are widely used, which necessitates larger WL pulses for  $\Delta V_{BL}$  requirements [59–62].

Therefore, it would be highly beneficial to reduce the  $V_{OS}$ , as it would alleviate the demand for a large  $\Delta V_{BL}$ . In the following section, we describe SRAM sensing circuits designed to reduce the  $V_{OS}$  for the purpose of improving speed and power efficiency. We will explore these circuits in terms of their structure, operation, and key performance characteristics.

#### 3. SRAM Sensing Circuits for Offset Reduction

### 3.1. Schmitt Trigger Sense Amplifiers

Schmitt triggers are often used to improve the robustness of a standard inverter by modifying the switching threshold. Utilizing this feature, the authors in [24–26] proposed the Schmitt trigger-based SA (STSA) to reduce  $V_{OS}$ , where one example structure is shown in Figure 6a. This structure intends to weaken the pull-down network of the inverter holding high voltages relative to that of the low-voltage inverter.



**Figure 6.** Schematics of (**a**) Schmitt trigger-based SA (STSA) and (**b**) the voltage-boosted STSA (VBSTSA).

For example, when  $SL_T$  is  $V_{DD}$  while  $SL_C$  is  $V_{DD} - \Delta V_{IN,SA}$  for datum "1" sensing,  $SO_T$  and  $SO_C$  become  $V_{DD}$  and  $V_{DD} - \Delta V_{IN,SA}$ , respectively, at the end of the access phase. When the evaluation phase starts with SAE rising,  $M_{S5}$  is more strongly turned on than  $M_{S6}$  because  $SO_T > SO_C$ . Thus, the  $Z_T$  node (the source of  $M_{S3}$ ) is more strongly pulled up than  $Z_C$  (the source of  $M_{S4}$ ). In this manner, which adjusts not only the gate voltage but also controls the source voltages of  $M_{S3}$  and  $M_{S4}$  according to  $SO_T$  and  $SO_C$ , the  $V_{GS}$  of  $M_{S3}$  is greatly suppressed. That is, the  $V_{GS}$  difference in two paired nFETs ( $M_{S3}$ – $M_{S4}$ ) in the STSA is larger than that in  $M_{S1}$ – $M_{S2}$  in the VLSA, which makes it more tolerant to the mismatch effects. In this manner, the STSA attempts to provide a reduced  $V_{OS}$  compared to the VLSA.

However, the STSA has a limited ability to reduce the  $V_{OS}$ . This is because there are additional transistor pairs existing in the STSA; thus, the mismatch effect can be larger. In particular, the mismatch between  $M_{S5}$  and  $M_{S6}$  and the mismatch between  $M_{S1}$  and  $M_{S2}$ , which are not present in the VLSA, increase the asymmetricity in the SA and increase the  $V_{OS}$ . However, the circuit technique implemented in the STSA, performed by  $M_{S1}$ ,  $M_{S2}$ ,  $M_{S5}$ , and  $M_{S6}$ , effectively mitigates these mismatch effects, thereby compensating for the increase caused by the additional transistor pair. As a result, the final  $V_{OS}$  is reduced compared to the VLSA. Furthermore, the sensing delay is increased compared to the VLSA due to the use of a stacked nFET structure [26].

To mitigate the speed problem of STSAs, the voltage-boosted STSAs (VBSTSAs) are proposed [27], as shown in Figure 6b. In VBSTSAs, the negative voltage generator (NVG) used for the negative bit-line write-assist circuit is reutilized to accelerate the operation of STSAs. In the NVG, as the NVG operation starts, the BSTEN increases and the BSTENb decreases. Through the decreased BSTENb,  $M_{S13}$ , which was holding OUT to  $V_{SS}$ , is turned off, allowing OUT to reach a floating state. Subsequently, after  $M_{S13}$  is completely turned off, BSTENd, delayed through inverters, decreases and OUT is lowered to a negative voltage through a coupling capacitor, C. Note that BSTENd should decrease after the  $M_{S13}$ is fully turned off. Therefore, sufficient delay should be provided by the inverter in the NVG. Specifically, the ground voltage for the SA is pulled down to the negative voltage at the rising edge of the SAE, or 0 V otherwise. This is realized by making the switch, which is turned on only when the SAE is high, delivering the negative voltage generated by the NVG. Although sensing speed can be enhanced in this manner, it incurs a significant amount of power overhead. In addition, NVGs are not always used for write-assist circuits; other types of write-assist circuit, such as cell voltage collapse write assist, do not use NVGs.

#### 3.2. Hybrid Latch-Type Sense Amplifiers

Some previously proposed SAs combine the features of VLSAs and CLSAs to reduce the  $V_{OS}$ , which can be referred to as hybrid latch-type SAs (HYSA) [28–33]. Figure 7a shows one example of an HYSA proposed in [32], the variation-tolerant SA (VTSA). For consistency in explanation with other structures, the polarity in this VTSA example is reversed from the original structure. The VTSA is primarily based on the CLSA structure but also incorporates features of a VLSA. Specifically, the SA outputs,  $SO_T$  and  $SO_C$ , are pre-charged to the SA inputs,  $SL_T$  and  $SL_C$ , using pass transistors  $M_{S7}$  and  $M_{S8}$ .



**Figure 7.** Schematics of two representative hybrid latch-type SAs: (**a**) variation-tolerant SA (VTSA) in [32] and (**b**) hybrid latch-type SA-QZ (HYSA-QZ) in [33].

When comparing VTSAs with VLSAs, a notable difference is observed in the pulldown networks of the positive feedback configurations in the SA. In the VTSA, these networks, consisting of  $M_{S3}$  and  $M_{S4}$ , are not directly connected to the *CM* node as in the VLSA. Instead, they are connected to  $Z_T$  and  $Z_C$  nodes, as shown in Figure 7a. These nodes are pulled down by  $M_{S1}$  and  $M_{S2}$ , respectively, with their gates controlled by  $SL_C$  and  $SL_T$ . This configuration effectively adjusts the  $V_{GS}$  of  $M_{S3}$  and  $M_{S4}$  for proper sensing.

The detailed operation of the VTSA is as follows: During the access phase, when SAE = 0 and datum "1" is being sensed, the  $SL_T$  is at  $V_{DD}$ , and  $SL_C$  is at  $V_{DD} - \Delta V_{IN,SA}$ , making  $SO_T$  and  $SO_C$  pre-charged to  $V_{DD}$  and  $V_{DD} - \Delta V_{IN,SA}$ , respectively, through  $M_{S7}$  and  $M_{S8}$ , similar to the VLSA. Additionally, the gate voltages of  $M_{S1}$  and  $M_{S2}$ ,  $V_{G,MS1}$  and  $V_{G,MS2}$ , become  $V_{DD} - \Delta V_{IN,SA}$  and  $V_{DD}$ , respectively. When the evaluation phase begins with SAE = 1,  $Z_T$  and  $Z_C$  are pulled down by  $M_{S1}$  and  $M_{S2}$ , respectively. In this configuration, since  $SL_T > SL_C$ ,  $M_{S1}$  can drive more current than  $M_{S2}$ , resulting in  $Z_C$  being pulled down more strongly than  $Z_T$  (i.e.,  $Z_T > Z_C$ ). As a result, compared to the VLSA, the difference between  $V_{GS,MS3}$  and  $V_{GS,MS4}$  is lager in the VTSA, indicating that the amplification can be more stabilized, and thus,  $V_{OS}$  can be reduced. This is due to adjustments made not only in the gate voltage conditions of  $M_{S3}$  and  $M_{S4}$  ( $V_{G,MS3} < V_{G,MS4}$ ), but also in their source voltage conditions ( $V_{S,MS3} > V_{S,MS4}$ ).

However, the VTSA has an additional pair of nFET transistors compared to the VLSA— $M_{S1}$  and  $M_{S2}$ —involved in the initial amplification of signals. This additional pair not only incurs area overhead but also potentially increases the mismatch effects. That is, the mismatch between  $M_{S1}$  and  $M_{S2}$ , which does not need to be considered in VLSAs, can

result in unintentional changes in  $Z_T$  and  $Z_C$  and degrade the sensing stability. In addition, stacked nFETs degrade the sensing delay and power consumption, like STSAs.

Figure 7b shows another example of an HYSA, the HYSA-QZ, which is proposed in [33]. This structure more aggressively pre-charges the internal nodes of the SA than the VTSA. The notation of QZ here means that not only output nodes (Q), but the internal nodes between the  $M_{S1}-M_{S2}$  pair and  $M_{S3}-M_{S4}$  pair (Z) are also pre-charged to SA inputs in a direction for precise sensing. As shown in Figure 7b, not only  $SO_T$  and  $SO_C$  are pre-charged to  $SL_T$  and  $SL_C$ , but also  $Z_T$  and  $Z_C$  are pre-charged to  $SL_T$  and  $SL_C$ , respectively. In this manner, the bias condition of the SA becomes more favorable for accurate sensing than the VTSA.

#### 3.3. Capacitor-Based Offset-Compensated SAs

Several previously proposed SAs have addressed transistor mismatches by employing capacitors [34–40]. These capacitors capture the mismatches between paired transistors, and the stored mismatch information is subsequently utilized to bias the internal nodes of the SA for compensation. Figure 8a illustrates the configuration of a capacitor-based threshold-matching SA (TMSA), as presented in [38].



**Figure 8.** (a) Schematic of capacitor-based threshold-matching SA (TMSA), (b) VLSA part in TMSA, and (c) capacitor-based threshold-matching circuit part.

As demonstrated in Figure 8b,c, the TMSA comprises two main components: a VLSA part and the capacitor-based threshold-matching part. The primary goal of the TMSA is to compensate the mismatch between the  $M_{S1}$ - $M_{S2}$  pair, which is the most critical pair in a VLSA. This correction is accomplished by initially sampling the  $V_{th}$  of  $M_{S1}$  and  $M_{S2}$ — $V_{th,MS1}$  and  $V_{th,MS2}$ —during the pre-charge phase. Then, the sampled  $V_{th,MS1}$  and  $V_{th,MS2}$  are stored at the source nodes of  $M_{S1}$  and  $M_{S2}$ . This ensures that the current through  $M_{S1}$  and  $M_{S2}$  during the amplification operation— $I_{S1}$  and  $I_{S2}$ —are independent to their  $V_{th}$  mismatch.

The detailed operation that achieves this objective is illustrated in Figure 9a–d, in the example of sensing datum "1", with a comprehensive explanation provided as follows.



**Figure 9.** Four-step operation of TMSA: (**a**) pre-charge phase, (**b**) access phase, (**c**) evaluation phase, and (**d**) latching phase.

- (1) Pre-charge phase (Figure 9a): During this phase, the input and output nodes of the SA— $SL_T$ ,  $SL_C$ ,  $SO_T$ , and  $SO_C$ —are pre-charged to  $V_{DD}$ . Then, the top-plate nodes of C<sub>0</sub> and C<sub>1</sub>— $CT_T$  and  $CT_C$ —are pre-charged to  $V_{DD} V_{th,MS1}$  and  $V_{DD} V_{th,MS2}$ , respectively, and M<sub>S1</sub> and M<sub>S2</sub> become turned off. This pre-charge is conducted under the assumption that  $CT_T$  and  $CT_C$  are initially at 0 V before pre-charging (the rationale for this will be explained). In addition, the common bottom-plate node for C<sub>0</sub> and C<sub>1</sub>, *NRSC*, is pre-charged to  $V_{DD}$  by M<sub>S8</sub>, which is turned on by *PCB* = 0.
- (2) Access phase (Figure 9b): In this phase,  $SL_C$  is lowered and becomes  $V_{DD} \Delta V_{IN,SA}$  by the bit-cell, causing the  $SO_C$  to also be  $V_{DD} \Delta V_{IN,SA}$ . In addition, the *PCB* becomes high, so the common bottom-plate node of C<sub>0</sub> and C<sub>1</sub>, *NRSC*, becomes float-high.
- (3) Evaluation phase (Figure 9c): This phase starts with the *SAE* rising, turning on M<sub>S7</sub>, so the *NRSC* is pulled down. This results in negative capacitive voltage couplings from *NRSC* to  $CT_T$  and  $CT_C$ , through C<sub>0</sub> and C<sub>1</sub>, respectively. Thus,  $CT_T$  and  $CT_C$  are decreased by  $\Delta V$ , meaning that  $CT_T$  and  $CT_C$  are changed into  $V_{DD} V_{th,MS1} \Delta V$  and  $V_{DD} V_{th,MS2} \Delta V$ , respectively. These turn on M<sub>S1</sub> and M<sub>S2</sub>, where the

overdrive voltage ( $V_{OV} = V_{GS} - V_{th}$ ) of M<sub>S1</sub> and M<sub>S2</sub>— $V_{OV,MS1}$  and  $V_{OV,MS2}$ —become as follows:

$$V_{\text{OV,MS1}} = V_{\text{GS,M1}} - V_{\text{th,M1}} = V(SO_{\text{C}}) - V(CT_{\text{T}}) - V_{\text{th,MS1}}$$
$$= (V_{\text{DD}} - \Delta V_{\text{IN,SA}}) - (V_{\text{DD}} - V_{\text{th,MS1}} - \Delta V) - V_{\text{th,MS1}} = \Delta V - \Delta V_{\text{IN,SA}}$$
$$V_{\text{OV,MS2}} = V_{\text{GS,M2}} - V_{\text{th,M2}} = V(SO_{\text{T}}) - V(CT_{\text{C}}) - V_{\text{th,MS2}}$$
$$= V_{\text{DD}} - (V_{\text{DD}} - V_{\text{th,MS2}} - \Delta V) - V_{\text{th,MS2}} = \Delta V$$

The noticeable point is that  $V_{OV,MS1}$  and  $V_{OV,MS2}$ , which determine  $I_{S1}$  and  $I_{S2}$ , are independent of  $V_{th,MS1}$  and  $V_{th,MS2}$ , respectively. Thus, even in the presence of a mismatch between  $V_{th,MS1}$  and  $V_{th,MS2}$ ,  $I_{S1}$  and  $I_{S2}$  can be stably generated (e.g.,  $I_{S1} < I_{S2}$  for datum "1" sensing as in Figure 9c) at the beginning of the evaluation phase. This renders the TMSA to be notably more robust than the conventional VLSA, leading to a reduced  $V_{OS}$ .

(4) Latching phase (Figure 9d): After the *NRSC* becomes low in the evaluation phase, this change in *NRSC* propagates to make  $LAT = V_{DD}$  through a delay buffer, which starts the latching phase. In this phase,  $CT_T$  and  $CT_C$  become 0 V, so  $SO_T$  and  $SO_C$  can latch the sensing results at the full digital level. This state is kept until the next pre-charge phase. Here, one can see that  $CT_T$  and  $CT_C$  are 0 V, and they are to be charged up to  $V_{DD} - V_{th,MS1}$  and  $V_{DD} - V_{th,MS2}$ , respectively, in the next pre-charge phase.

Although the TMSA effectively reduces the  $V_{OS}$  by compensating the mismatch between  $M_{S1}$  and  $M_{S2}$ , there are several shortcomings in this structure. First, the structure is still under the effect of a mismatch between capacitors,  $C_0$  and  $C_1$ . The mismatch, however, is typically much smaller than the transistor  $V_{th}$  mismatch. Second, the implementation of capacitors and delay buffers in the TMSA results in a significant increase in power consumption and area requirements. In particular, a sufficiently large  $\Delta V$  is necessary to turn on  $M_{S1}$  and  $M_{S2}$  in the early stage of the amplification stage; it is inevitable to employ large capacitors for  $C_0$  and  $C_1$ . However, by placing the metal–oxide–metal (MOM) capacitors on top of the circuit layout, the area overhead can be avoided [39]. Consequently, a significant amount of power is required to charge up the *NRSC* from 0 V to  $V_{DD}$  in the pre-charge phase.

As an alternative approach, the variation-tolerant small-signal SA (VTS-SA) is proposed in [39], specifically addressing mismatches between the two inverters in the SA. This is achieved through the utilization of capacitors at the input acceptance part. The structure of the VTS-SA is shown in Figure 10 below.



Figure 10. Structure of VTS-SA.

The VTS-SA is based on a VLSA composed of  $M_{S1}-M_{S2}-M_{S3}-M_{S4}$ , while the SA input nodes,  $SL_T$  and  $SL_C$ , are accepted through coupling capacitors  $C_{C1}$  and  $C_{C2}$ , respectively. By utilizing capacitors, the VTS-SA can capture and store the trip points of two inverters in SA-INV<sub>1</sub> ( $M_{S1}$  and  $M_{S3}$ ) and INV<sub>2</sub> ( $M_{S2}$  and  $M_{S4}$ ), shown in Figure 10. By biasing the two inverters with their respective trip points, the two inverters become highly sensitive to small voltage input variations. That is, even small input voltage changes can push the inverters to switch their output states. This enhanced voltage gain of the inverters contributes to the improved speed of the SA. Furthermore, trip-point biasing in the VTS-SA serves another crucial purpose: it allows the SA to adapt and account for process variations within the inverters. By individually setting the trip points, the VTS-SA makes each inverter operate primarily in response to input changes, minimizing its dependence on process variations as much as possible.

The detailed operations of the VTS-SA are illustrated in Figure 11a–c, where there are three main operation phases: (1) the trip-point bias phase, (2) the access phase, and (3) the evaluation phase.



**Figure 11.** Three operation phases of VTS-SA: (**a**) trip-point bias, (**b**) access phase, and (**c**) evaluation phase.

- (1) Trip-point bias phase (Figure 11a): In this phase, the input and output are shorted in INV<sub>1</sub> and INV<sub>2</sub> of the SA. As a result, the input and output of INV<sub>1</sub> and INV<sub>2</sub> are set to their respective trip points— $V_{\text{bias,INV1}}$  and  $V_{\text{bias,INV2}}$ . This is accomplished by turning on the  $M_{\text{S7}}$  and  $M_{\text{S8}}$  transistors through *PRE* = 1, while also turning on the header and footer switches  $M_{\text{S11}}$  and  $M_{\text{S12}}$  with EN = 1. In addition, *SAE* = 0 in this phase, to make the bottom plate of the coupling capacitors, *SLI<sub>T</sub>* and *SLI<sub>C</sub>*, also be equal to the trip points of the inverters.
- (2) Access phase (Figure 11b): In this phase, the input–output connections are disconnected, and the two trip-point-biased inverters are ready to accept changes in  $SL_T$  and  $SL_C$  through capacitive couplings. Specifically, when sensing datum "1", as demonstrated in Figure 11b,  $SL_C$  is decreased by  $\Delta V_{IN,SA}$ . Then,  $SLI_C$  is decreased by  $\Delta V_{coup}$  through capacitive coupling via  $C_{C1}$ . Due to trip-point bias, this input change of INV2 leads to a significant change in the output of INV<sub>2</sub>,  $SO_T$ . As a result, an amplified voltage difference is observed between  $SO_T$  and  $SO_C$ , which is  $K \times \Delta V_{IN,SA}$ , where K > 1. It is important to note that, as previously mentioned, because the inverters are biased to their respective trip point, the output change is almost only determined by the input change, while largely independent to the process variations.
- (3) Evaluation phase (Figure 11c): In this phase, the SAE becomes high; thus, the two inverters are connected in a cross-coupled fashion, by turning on M<sub>S10</sub> and M<sub>S9</sub>. At the same time, the two cross-coupled inverters are isolated from the input by turning off M<sub>S5</sub> and M<sub>S6</sub>. Through the positive feedback of the cross-coupled inverters, the final data are latched onto SO<sub>T</sub> and SO<sub>C</sub> at the full digital level, similar to the operation of other SAs.

Although the VTS-SA tries to reduce the  $V_{OS}$  by capturing the mismatch between INV<sub>1</sub> and INV<sub>2</sub> through trip-point biasing, there are several limitations to this structure. First, the mismatch between  $M_{S5}-M_{S6}$ ,  $M_{S7}-M_{S8}$ , and  $M_{S9}-M_{S10}$  are newly introduced in this structure, which limits  $V_{OS}$  reduction. Second, similar to the TMSA, the VTS-SA is still affected by mismatches between  $C_{C1}$  and  $C_{C2}$ , although it is less influential than the transistor mismatch. Third, because the input voltage should be transferred through capacitive coupling, not all of the  $\Delta V_{IN,SA}$  is delivered to the SA. This inefficiency contributes to an increase in effective  $V_{OS}$ . Fourth, the trip-point biasing process should be completed before the  $\Delta V_{IN,SA}$  appears between  $SL_T$  and  $SL_C$ . This requirement potentially increases the circuit complexity. In addition, the short current from  $V_{DD}$  to  $V_{SS}$  is inevitable during the trip-point biasing, resulting in high power consumption.

The current-mode SA with a capacitive offset correction (CSA<sub>COC</sub>) structure proposed in [40] utilizes a single capacitor for storing the trip points of inverters, so it is free from capacitor mismatch effects. The schematic of the CSA<sub>COC</sub> is shown in Figure 12a, and the operation waveforms of its three main control clock signals—the trip-point storage enable,  $\Phi_{\text{Trs}}$ ; the trip-point bias enable,  $\Phi_{\text{Trb}}$ ; and the sense enable, *SAE*—are illustrated in Figure 12b.



Figure 12. (a) Schematic of CSA<sub>COC</sub> and (b) operation waveforms of three control clock signals.

The key concept of the CSA<sub>COC</sub> is to store the difference in the trip point voltages of the two inverters, INV<sub>1</sub> and INV<sub>2</sub>, in Figure 12a. The difference in the trip point voltages of the two inverters,  $\Delta V_{\text{Tr}} = V_{\text{Tr1}} - V_{\text{Tr2}}$ , is stored across the single capacitor, C<sub>0</sub>. Then, the two inverters are biased to compensate the trip-point difference, effectively correcting for the mismatch. The operation of CSA<sub>COC</sub> unfolds in three phases, as illustrated in Figure 13a–c, with explanations for each provided as follows.



**Figure 13.** Three operation phases of CSA<sub>COC</sub>: (a) trip-point storage phase ( $\Phi_{\text{Trs}} = 1$ ), (b) trip-point bias phase ( $\Phi_{\text{Trb}} = 1$ ), and (c) evaluation phase (*SAE* = 1).

- (1) Trip-point storage phase ( $\Phi_{Trs} = 1$ , Figure 13a): In this phase,  $SL_T$  and  $SL_C$  are precharged to  $V_{DD}$ , and the input and output of each inverter,  $INV_1$  and  $INV_2$ , are shorted. In this manner, the trip points of  $INV_1$  and  $INV_2$ ,  $V_{Tr1}$  and  $V_{Tr2}$ , are captured and stored at the input and output nodes of the respective inverters, as shown in Figure 13a. It is accomplished by turning on  $M_{S7}$ ,  $M_{S8}$ ,  $M_{S9}$ , and  $M_{S10}$  while turning off  $T_1$ ,  $T_2$ ,  $M_{S5}$ , and  $M_{S6}$ . The difference between two inverter trip points,  $\Delta V_{Tr}$ , is stored across the capacitor,  $C_0$ .
- (2) Trip-point bias phase ( $\Phi_{Trb}$  =1, Figure 13b): During this phase, the input and output of INV1 and INV2 are disconnected by turning off M<sub>S7</sub> and MS<sub>10</sub>. Subsequently, by utilizing the  $\Delta V_{Tr}$  stored in C<sub>0</sub> in the previous phase, the input of each inverter is held as its respective trip point, while INV<sub>1</sub> and INV<sub>2</sub> are configured in the cross-coupled connection. For example, the input of INV<sub>1</sub> is kept as  $V_{Tr1}$ , while it is connected to the output of INV<sub>2</sub> (=*SO*<sub>c</sub>), and vice versa. This is achieved by turning on M<sub>S5</sub> and M<sub>S7</sub> while turning off MS<sub>11</sub>. Then, the voltage difference is made between *SL*<sub>T</sub> and *SL*<sub>C</sub> by the bit-cell, and develops the differential current through M<sub>S3</sub> and M<sub>S4</sub>.
- (3) Evaluation phase (*SAE* = 1, Figure 13c): In this phase, the two cross-coupled inverters are disconnected from C0 by turning off  $M_{S8}$  and  $M_{S9}$ . Simultaneously, the positive feedback of the cross-coupled inverters is initiated by turning on  $M_{S11}$ ,  $T_1$ , and  $T_2$ . As a result, the full digital voltage level appears at two differential outputs of the SA,  $SO_T$  and  $SO_C$ .

The CSA<sub>COC</sub> is immune to capacitor mismatch due to use of a single capacitor, unlike the TMSA and VTS-SA. However, compared to the previous SAs in which the voltage between  $SL_T$  and  $SL_C$  is transferred to  $SO_T$  and  $SO_C$  through fully turned-on pFETs during the access phase, in the CSA<sub>COC</sub>, the voltage difference between  $SO_T$  and SOC follows that of  $SL_T$  and  $SO_C$  through partially turned-on pFETs (current-based). This leads to voltage loss, effectively increasing the  $V_{OS}$ . In addition, there are numerous required switches and a control signal generation logic, which increases the circuit design complexity with power and area overhead.

#### 3.4. Offset-Compensated Pre-Amplifiers

Another approach in offset compensation is the use of pre-amplifiers that amplify the bit-line signal preceding the SA stage, as seen in [41–44]. Instead of directly modifying the SA structure, these additional offset-compensating pre-amplifiers are employed in front of the SA. This allows for the required offset compensation while maintaining the original SA structure. One such example is the bit-line pre-charge and pre-amplifying switching pFET circuit (BP<sup>2</sup>SP), with its structure and key operational waveforms depicted in Figure 14a,b.



Figure 14. (a) Schematic of BP<sup>2</sup>SP and (b) its operational waveforms.

As shown in Figure 14b, BP<sup>2</sup>SP is operated in three phases, as explained below.

- (1) Pre-charge phase (*PCB* = 0): In this phase,  $M_{S13}$  and  $M_{S14}$  in BP<sup>2</sup>SP are turned on to precharge  $BL_C$  and  $BL_T$ , respectively. This pre-charges  $BL_C$  and  $BL_T$  to  $V_{DD} - V_{th,MS15}$ and  $V_{DD} - V_{th,MS16}$ , respectively, through a diode connection. It ensures that  $M_{S15}$ and  $M_{S16}$  have  $V_{GS} = V_{th}$ , allowing them to turn on immediately, regardless of  $V_{th}$ variations, when  $BL_C$  or  $BL_T$  is discharged in the subsequent phase. This compensates the  $V_{th}$  mismatch between  $M_{S15}$  and  $M_{S16}$ . In the SA side,  $SL_T$  and  $SL_C$  are predischarged to 0 V through  $M_{S8}$  and  $M_{S9}$ .
- (2) Access phase (PCB = 1, WL = 1): During this phase, the data stored in the selected bitline are reflected to the  $BL_T$  and  $BL_C$ . In the example shown in Figure 14b, datum "1" is sensed, so the  $BL_T$  remains close to its pre-charge level,  $V_{DD} - V_{th,MS16}$ , while  $BL_C$ decreases from  $V_{DD} - V_{th,MS15}$ . Because the  $BL_C$  is pre-charged at  $V_{DD} - V_{th,MS15}$ ,  $M_{S15}$  turns on instantly as soon as the  $BL_C$  decreases. This causes the  $BLX_T$  to increase rapidly. Simultaneously, the *COLB* is lowered to enable the column MUX, resulting in  $SL_T$  increasing and  $SL_C$  remaining at 0 V. As shown in Figure 14b, this phase effectively pre-amplifies the voltage difference between  $BL_T$  and  $BL_C$  to the voltage difference between  $SL_T$  and  $SL_C$ .
- (3) Evaluation phase (SAE = 1): In this phase, the *SAE* is raised, meaning */SAE* is lowered. Consequently, the VLSA is enabled to store the final sensing data in the form of a full digital voltage at the  $SO_T$  and  $SO_C$  nodes. In addition, during this phase, the bit-line equalization circuit—transmission gate  $T_1$ —is activated to equalize  $BL_T$  and  $BL_C$ . This ensures that the subsequent pre-charge operation of  $BL_T$  and  $BL_C$  can start with both bit-lines having the same low voltage level as the initial condition. This equalization step is important for maintaining consistency in the subsequent memory operation.

The operation principle of BP<sup>2</sup>SP is to use the same pFETs for using pre-charge bit-line and pre-amplify bit-line voltages. Specifically, by pre-charging the bit-line to capture the  $V_{\text{th}}$  variation of the pre-amplifying pFETs, these pre-amplifying pFETs can instantly turn on in response to bit-line pair voltage development. This allows the amplified voltage to be observed at SL<sub>T</sub> and SL<sub>C</sub>, reducing the required  $\Delta V_{\text{BL}}$  for stable sensing, leading to improvements in speed and power efficiency. However, to make bit-line pairs to  $V_{\text{DD}} - V_{\text{th}}$ , it is necessary to ensure that the bit-line voltages are sufficiently lower than  $V_{\rm DD} - V_{\rm th}$  before pre-charge. This requirement increases the circuit complexity, especially when the memory is awakened from power-down mode or standby mode. In addition, after pre-charging the bit-line pair to  $V_{\rm DD} - V_{\rm th}$ , the bit-lines become floating, making them susceptible to noise. Moreover, the initial  $V_{\rm GS}$  condition of pre-amplifier pFETs can significantly vary according to the pre-charge period, which means that the overall speed of the read operation is highly affected by the pre-charge time.

In [43], another pre-amplifier circuit for SRAM, the cross-coupled nFET pre-amplifier and pre-charge circuit (CCN-PP), is presented. The structure and operational waveforms of the CCN-PP are shown in Figure 15a,b. As depicted in Figure 15b, the CCN-PP operates in four phases.



Figure 15. (a) Schematic of CCN-PP and (b) its operational waveforms.

- (1) Pre-charge phase (*PBE* = 0, *PCB* = 0): During this phase, the pre-charging boost enable signal (PBE) and PCB are low, so the SA input pre-charge circuit ( $M_{S3}-M_{S4}-M_{S5}$ ) and  $M_{S6}$  are turned on. This maintains *VDDSA* as  $V_{DD}$ , while SLX<sub>T</sub> and SLX<sub>C</sub> are pre-charged to  $V_{DD}$ . It should be noted that, unlike the conventional pre-charge operation, all the column MUX transistors and bit-line equalization circuits (T<sub>1</sub>) are turned on. As a result, SL<sub>T</sub>, SL<sub>X</sub>, BL<sub>T</sub>, and BL<sub>C</sub> are pre-charged through the CCN-PP. Because the CCN-PP is composed of nFETS, there a threshold voltage drop for pre-charging voltages. That is, *BL*<sub>T</sub> and *BL*<sub>C</sub> are pre-charged to  $V_{DD} \min(V_{th,MS1}, V_{th,MS2})$ .
- (2) Access phase 1 (*PBE* = 1): During this phase, the unselected column MUX transistors are turned off and the *PBE* is raised. As a result,  $M_{S6}$  is turned off and then the PBEd rises, boosting the VDDSA into  $V_{DD} + \Delta V_C$  through  $C_0$  coupling. Thus, the SA inputs, *SLX*<sub>T</sub> and *SLX*<sub>C</sub>, are also pre-charged to  $V_{DD} + \Delta V_C$ . Accordingly, BL<sub>T</sub> and BL<sub>C</sub> can be slightly raised. In this phase, the WL is activated, so BL<sub>T</sub> and BL<sub>C</sub> start to be developed according to bit-cell data.
- (3) Access phase 2 (PBE = 0, PCB = 1): With PCB rising,  $SLX_T$  and  $SLX_C$  are affected by the change in  $BL_T$  and  $BL_C$  through the CCN-PP. For example, when accessing the datum "1", as shown in Figure 15b,  $BL_C$  and  $SL_C$  decrease, leading  $M_{52}$  to be turned on while  $M_{51}$  is kept turned off. The turned-on  $M_{52}$  makes  $SLX_C$  fall while  $SLX_T$  is

kept high, close to  $V_{DD} + \Delta V_C$ . Due to the positive feedback nature of cross-coupled nFETs, the voltage difference between SLX<sub>T</sub> and SLX<sub>C</sub> is larger than that of BL<sub>T</sub> and BL<sub>C</sub>, meaning that the bit-line voltage is pre-amplified.

(4) Evaluation phase (SAE = 1): High SAEs activate the SA to latch the data at SA outputs,  $SO_{\rm T}$  and  $SO_{\rm C}$ . In addition, similar to BP<sup>2</sup>SP, the bit-line equalization circuit is activated to provide proper bit-line initial conditions for the subsequent pre-charge phase.

Unlike BP<sup>2</sup>SP, the initial V<sub>GS</sub> of pre-amplifier transistors in the CCN-PP are determined by access phase 1. Thus, the performance is less dependent on the pre-charge period, so a stable speed can be provided with the CCN-PP. However, as in BP<sup>2</sup>SP, the CCN-PP still suffers from floating BL<sub>T</sub> and BL<sub>C</sub> during the pre-charge phase. In addition, the CCN-PP cannot compensate the mismatch between M<sub>S1</sub> and M<sub>S2</sub>, which is an inferior point compared to BP<sup>2</sup>SP. In addition, utilizing the VDDSA boosting circuit can incur a significant amount of power and area overhead.

In [44], the offset-cancelled current SA (OCCSA) is proposed. As shown in Figure 16, the OCCSA uses nFET MUX transistors instead of pFET MUX transistors. Here, the nFET MUX (PSA) operates as a common-gate amplifier, so it effectively pre-amplifies the BL. To bias these PSAs properly with offset-compensating features,  $BL_T$  and  $BL_C$ , the BL should be pre-charged lower than  $V_{DD} - V_{th,MS1}$  and  $V_{DD} - V_{th,MS2}$ , respectively. To realize this, a separate supply voltage,  $V_{prebl}$ , is required. However, the incorporation of this new voltage source is highly costly due to its substantial power and area overheads, making the circuit impractical for actual implementation.



Figure 16. Structure of OCCSA.

#### 3.5. Other Structures

In [45], an SA with inherent offset cancellation (SAOC) is proposed, with its structure shown in Figure 17a. The SAOC utilizes pFETS— $M_{S10}$  and  $M_{S11}$  in Figure 17a—for input reception, connecting SL<sub>T</sub> and SL<sub>C</sub> to the gate node of these pFETs. Before sensing, by driving SL<sub>T</sub> and SL<sub>C</sub> low and toggling the PRE from low to high, the  $|V_{thp}|$  of  $M_{S10}$  and  $M_{S11}$  is captured at the output nodes of SA—SO<sub>T</sub> and SO<sub>C</sub>, respectively. Subsequently, BL<sub>T</sub> and BL<sub>C</sub> are transferred into SL<sub>T</sub> and SL<sub>C</sub> by turned-on MUX transistors, while  $M_{S10}$  and  $M_{S11}$  are turned on by the low PRE. This results in the charging of SO<sub>T</sub> and SO<sub>C</sub> by  $M_{S10}$  and  $M_{S11}$ . In this manner, the SAOC achieves sensing operations, compensating the mismatch between  $M_{S10}$  and  $M_{S7}$  is not compensated, and pulling up SL<sub>T</sub> and SL<sub>C</sub> with nFETs based on BL<sub>T</sub> and BL<sub>C</sub> occurs losses during transmitting BL voltage differences to  $\Delta V_{IN,SA}$ .





Figure 17. Structure of (a) SAOC, (b) DIBBSA-FL, (c) DIBBSA-PD, and (d) CDOR.

In [46], the body-biasing technique is used at critical sensing transistors for auto-offset mitigation features. A differential-input body-biased sense amplifier with floating output nodes (DIBBSA-FL) and a differential-input body-biased sense amplifier with pre-discharge output nodes (DIBBSA-PD) are shown in Figure 17b,c, respectively. The difference between the DIBBSA-FL and the DIBBSA-PD is that the DIBBSA-PD has additional transistors,  $M_{S8}$  and  $M_{S9}$ , to predischarge SO<sub>T</sub> and SO<sub>C</sub>, while the DIBBSA-FL only equalizes SO<sub>T</sub> and SO<sub>C</sub>. The operations of DIBBSA-FL and DIBBSA-PD are as follows. During the sensing operation, the SAEB decreases and  $M_{S3}$  and  $M_{S4}$  turn on. Simultaneously, when BL<sub>T</sub> is higher than BL<sub>C</sub>, through the body-bias effect on  $M_{S1}$ ,  $M_{S2}$ ,  $M_{S3}$ , and  $M_{S4}$ ,  $M_{S1}$  and  $M_{S3}$  become forward body-biased and  $M_{S2}$  and  $M_{S4}$  become reverse body-biased. Therefore, SO<sub>T</sub> pulls up much faster than SO<sub>C</sub>. However, recently, 3D FETs such as the FinFET and GAA FET have become commonly used. In these technologies, the body effect is nearly negligible. Therefore, using the body-bias technique in recent technologies is not suitable.

Figure 17d shows the cancellation based on delay and offset relation (CDOR) structure [47]. Before the sensing operation, the mismatch in the SA is captured by the sensing operation, with SL<sub>T</sub> and SL<sub>C</sub> equally set to V<sub>DD</sub>. Because of the mismatch in the SA, SO<sub>T</sub> and SO<sub>C</sub> become (1, 0) or (0, 1), connected to the gate of M<sub>S15</sub> and M<sub>S14</sub>, respectively. When SO<sub>T</sub> and SO<sub>C</sub> are (1, 0), this means that the pull-up strength on the SO<sub>T</sub> side is higher than that on the SO<sub>C</sub> side. Simultaneously, Q and QB become V<sub>DD</sub> and V<sub>SS</sub>, turning off M<sub>S6</sub> and M<sub>S7</sub>. In the case of (SO<sub>T</sub>, SO<sub>C</sub>) = (1, 0), M<sub>S14</sub> turns on and M<sub>S15</sub> turns off, lowering the SL<sub>T</sub>. Due to the decreased SL<sub>T</sub>, the pull-up strength of the SO<sub>C</sub> side becomes stronger, which operates as offset mitigation. However, the process of adjusting the voltage is highly challenging. This is because the voltage variance is highly dependent on the offset mitigation activation time and the sizes of the M<sub>S6</sub> and M<sub>S7</sub> transistors.

#### 4. Comparison

Table 1 summarizes the comparison among the SRAM sensing circuit designs covered in Section 3.

| Table 1. Comparison of SRAM sensing circuit design |
|----------------------------------------------------|
|----------------------------------------------------|

|                         | Structure  | Offset Reduction Technique                                                                            | Components                      | Control Signals                                     | Limitations                                                               |
|-------------------------|------------|-------------------------------------------------------------------------------------------------------|---------------------------------|-----------------------------------------------------|---------------------------------------------------------------------------|
| VLSA                    | Figure 2a  | -                                                                                                     | 7 TR                            | PCB, SAE                                            | Large V <sub>OS</sub>                                                     |
| CLSA                    | Figure 2b  | -                                                                                                     | 9 TR                            | PCB, SAE                                            | Increased V <sub>OS</sub> due to additional TR pair                       |
| STSA                    | Figure 6a  | Driving Internal Nodes of VLSA $(Z_T \text{ and } Z_C)$                                               | 11 TR                           | PCB, SAE                                            | Speed degradation due to stack                                            |
| VBSTSA                  | Figure 6b  | STSA + Negative Boosting $V_{SS}$                                                                     | 14 TR<br>+ NVG (share)          | PCB, SAE, BSTEN                                     | Necessitating for NVG<br>(power/area cost)                                |
| VTSA                    | Figure 7a  | Pre-charging $SO_{\rm T}$ and $SO_{\rm C}$<br>to $SL_{\rm T}$ and $SL_{\rm C}$ in CLSA                | 9 TR                            | PCB, SAE                                            | Speed degradation due to stack                                            |
| HYSA-QZ                 | Figure 7b  | Pre-charging output nodes and<br>internal nodes of CLSA                                               | 11 TR                           | PCB, SAE                                            | Speed degradation due to stack                                            |
| TMSA                    | Figure 8a  | Capturing $V_{\rm th}$ of pull-down nFETs through paired cap                                          | 11 TR + INV + Buffer<br>+ 2 C   | PCB, SAE                                            | Capacitor mismatch, Cap<br>power/area overhead                            |
| VTS-SA                  | Figure 10  | Capturing trip points of<br>cross-coupled INVs with input<br>acceptation via coupling cap pair        | 12 TR + 2 C                     | EN, PCB, PRE, SAE                                   | Capacitor mismatch, power/area overhead                                   |
| CSA <sub>COC</sub>      | Figure 12a | Capturing trip points of<br>cross-coupled inverters via single<br>capacitor                           | 16 TR + 1C<br>+2 OR (shared)    | PCB, SAE, $\Phi_{\text{Trs}}$ , $\Phi_{\text{Trb}}$ | Many switches, control<br>signal circuit                                  |
| BP <sup>2</sup> SP      | Figure 14a | Capturing V <sub>th</sub> of pre-amplifying pFET pair at BL pre-charge                                | 6TR + SA                        | PCB, SAE                                            | Bit-line floating, unstable<br>pre-charge level,<br>power/area overhead   |
| CCN-PP                  | Figure 15a | Pre-amplifying BL via cross-coupled nFET pair, while capturing $V_{\rm th}$ with boosted $V_{\rm DD}$ | 4TR + 2C + Buffer +<br>1TR + SA | PCB, SAE, PBE                                       | Bit-line floating,<br>power/area overhead                                 |
| OCCSA                   | [44]       | Capturing V <sub>th</sub> of MUX nFETs<br>at BL pre-charge                                            | 7 TR                            | PCB, SAE                                            | Additional Vprebl<br>voltage generator,<br>different MUX signal           |
| SAOC                    | [45]       | Capturing V <sub>th</sub> of input pFETs at SA pre-charge                                             | 11 TR                           | PCB, SAE, OCEN                                      | N1, N2 mismatch, control signal circuit                                   |
| DIBBSA-FL,<br>DIBBSA-PD | [46]       | Body biasing                                                                                          | 7 TR, 9TR<br>+Body contact      | PCB, SAE                                            | Inapplicable to the recent<br>technology whose body<br>effect is minimal  |
| CDOR                    | [47]       | Lowering input voltage according<br>to SA mismatch                                                    | 15 TR                           | PCB, SAE, Q                                         | Control signal circuit for<br>added Q and different<br>PCB, SAE operation |

Unlike the conventional SAs (VLSA and CLSA), the STSA, VTSA, and HYSA-QZ drive or pre-charge the internal nodes of the SA in favor of accurate sensing. In this manner, without using additional control signals or employing additional operation phases, the offset voltage can be efficiently reduced. In terms of reducing the  $V_{OS}$ , the VTSA and HYSA-QZ, which directly pre-charge the internal nodes using pass gates connected to SL<sub>T</sub> and SL<sub>C</sub>, outperform the STSA. This is because the mismatch effects in the gated FETs controlling the SL<sub>T</sub> and SL<sub>C</sub> in the STSA are larger than the mismatch effects in the transmission gates used by the VTSA or HYSA-QZ to transfer SL<sub>T</sub> and SL<sub>C</sub>. Compared with the VTSA, the HYSA-QZ can achieve a smaller  $V_{OS}$  because more internal nodes are pre-charged than the VTSA. However, the SA delay is increased in the STSA, VTSA, and HYSA-QZ compared to the VLSA, because of using increased stack numbers.

The TMSA, VTS-SA, and CSA<sub>COC</sub> directly capture mismatches in SAs, utilizing a capacitor(s). In this manner, the  $V_{OS}$  can be further reduced compared to the STSA, VTSA,

and HYSA-QZ. However, this improvement comes at a cost: introducing additional phases or control signals, biasing through short circuit currents, and using large capacitors increase the SA delay and energy consumption significantly. The trade-off between BL delay/energy and SA delay/energy becomes evident in this context. More precise compensation of SA mismatches can result in a smaller  $V_{OS}$  and reduced BL delay and energy. However, achieving this delicacy requires additional circuit components, which can lead to increased SA delay and energy consumption.

Pre-charging BL circuits, BP<sup>2</sup>SP and CCN-PP, offer an alternative approach to capturing transistor  $V_{\text{th}}$  values and reducing BL voltage development. They can be implemented more simply compared to SA mismatch compensation structures because pre-amplifiers have a simpler structure than SAs. However, controlling BL pre-charge levels can be challenging in practice, especially since they should be floating when diode-connection TRs are used for pre-charging.

In addition to the sensing circuit covered in Section 3, there are several other approaches for reducing  $V_{BL}$  requirements [44–47], as shown in the last four rows in Table 1. However, it is worth noting that these methods have specific characteristics that may affect their applicability. In one of these structures, the SAOC is introduced to address the mismatch between two input pFETs at the beginning of the read access to reduce the  $V_{OS}$ . However, it is important to note that the mismatches other transistor pairs, which are also critical for  $V_{OS}$ , are not able to be compensated. Thus, it may have increased the  $V_{OS}$  even compared to the conventional SAs. In addition, short-circuit current paths are inevitably formed, which limits its practical applicability.

The OCCSA utilizes the MUX transistors as the common gate amplifier to pre-amplify the  $V_{BL}$ . Although it is powerful, to operate the MUX as an amplifier, an additional highvoltage source is required for bit-line pre-charge ( $V_{prebl}$ ), which significantly incurs power and area overheads. In addition, to compensate the mismatch between the MUX transistor pair, a significant amount of time is required for the separate bit-line pre-charge phase before the access phase, which substantially degrades the cycle time.

The DIBBSA-FL and DIBBSA-PD are proposed. In these structures, differential bit-line inputs are transferred to differential output nodes through pull-up pFETs, while the body of the output pull-up pFETs are biased with bit-lines to enhance sensing accuracy. However, a critical limitation of these approaches arises from the fact that most recent SRAMs utilize multiple gate FETs, such as finFETs and gate-all-around FETs, which exhibit minimal body effects. Consequently, the current or threshold voltage remains nearly independent of body voltage changes, rendering these structures inapplicable.

The CDOR-based offset compensating sensing circuit is introduced. This structure captures the mismatch in SAs during the pre-charge phase of the SRAM. This is achieved by enabling the SA (SAE = 1) with the condition of  $SL_T = SL_C = V_{DD}$ . In this manner, the mismatch information is stored at the differential output nodes,  $SO_T$  and  $SO_C$ . For example, if the mismatch favors the SA to make the  $SO_T$  become low, this mismatch capturing process makes  $SO_T$  become 0, while  $SO_C$  becomes high during the pre-charge phase. Then, utilizing this stored information, when the sensing phase starts,  $SL_T$  and  $SL_C$  are calibrated to compensate the mismatch. Although the compensation technique is innovative, the accuracy of this compensation technique is highly dependent on factors such as the width of the calibration timing and the sizing of the calibration transistor. This dependency can potentially result in an increase in the effective  $V_{OS}$  of the SA, which may render the structure less practical.

Figure 18 shows the minimum operating voltage of SAs according to technology scalability. The minimum operating voltage represents the minimum voltage that satisfies the  $6\sigma$  sensing yield at the operating frequency of 1 GHz in the 7 nm, 14 nm, and 28 nm processes.



Figure 18. Minimum operating voltage of SAs according to technology scalability.

A quantitative comparison among the different SAs covered in Section 3 is shown in Table 2. It is simulated in TSMC 28 nm technology when a four-to-one MUX is used, with  $V_{DD} = 1.0$  V, and the number of bit-cells per column is 256. The distribution of  $V_{OS}$ in the SAs is estimated as follows [63]: First, we assume that  $V_{OS}$  follows the Gaussian distribution. Thus,  $P_{FailSA}$ , the probability of sensing failure, can be expressed as follows:

$$P_{\text{FailSA}} = P(V_{\text{OS}} > \Delta V_{\text{IN,SA}}) = P\left(\frac{V_{\text{OS}} - \mu_{\text{OS}}}{\sigma_{\text{OS}}} > \frac{\Delta V_{\text{IN,SA}} - \mu_{\text{OS}}}{\sigma_{\text{OS}}}\right) = P\left(Z > \frac{\Delta V_{\text{IN,SA}} - \mu_{\text{OS}}}{\sigma_{\text{OS}}}\right)$$
(1)

in (1),  $\Delta V_{\text{IN,SA}}$  is the SA input voltage difference,  $\mu_{\text{OS}}$  is the mean  $V_{\text{OS}}$ ,  $\sigma_{\text{OS}}$  is the standard deviation of the  $V_{\text{OS}}$ , and Z is the standard Gaussian random variable. Second, representing the standard Gaussian cumulative distribution function (CDF) as  $\Phi(z)$ , Equation (1) can be shown as follows:

$$P_{\text{FailSA}} = 1 - \Phi\left(\frac{\Delta V_{\text{IN,SA}} - \mu_{\text{OS}}}{\sigma_{\text{OS}}}\right)$$
(2)

|                    | Standard Dev.<br>of V <sub>OS</sub> (mV) | BL Delay<br>(ps) | SA Delay<br>(ps) | Energy Consumption<br>for Four BLs (fJ) | SA Energy<br>Consumption (fJ) | Area (µm²) |
|--------------------|------------------------------------------|------------------|------------------|-----------------------------------------|-------------------------------|------------|
| VLSA               | 16.46                                    | 203.86           | 15.25            | 93.86                                   | 2.94                          | 6.48       |
| CLSA               | 27.77                                    | 323.32           | 27.69            | 110.35                                  | 3.99                          | 7.88       |
| STSA               | 12.24                                    | 159.57           | 19.41            | 86.90                                   | 3.67                          | 8.49       |
| VTSA               | 11.54                                    | 152.21           | 17.67            | 87.47                                   | 3.37                          | 6.79       |
| HYSA-QZ            | 10.39                                    | 140.25           | 16.83            | 79.21                                   | 3.65                          | 7.09       |
| TMSA               | 9.96                                     | 138.46           | 13.47            | 95.43                                   | 25.23                         | 8.63       |
| VTS-SA             | 5.75                                     | 91.84            | 15.25            | 76.60                                   | 16.38                         | 7.11       |
| CSA <sub>COC</sub> | 9.76                                     | 133.67           | 27.69            | 89.27                                   | 18.26                         | 9.55       |

**Table 2.** Quantitative comparison of SRAM SAs at  $V_{DD}$  = 1.0 V in 28 nm technology.

Third, through the inverse function, (2) can be expressed as follows:

$$\mu_{\rm OS} + \sigma_{\rm OS} \Phi^{-1} (1 - P_{\rm FailSA}) = \Delta V_{\rm IN,SA} \tag{3}$$

in (3), both  $P_{\text{failSA}}$  and  $\Delta V_{\text{IN,SA}}$  are values obtainable through simulation. With the specified values for  $P_{\text{failSA}}$  and  $\Delta V_{\text{IN,SA}}$ , only  $\mu_{\text{OS}}$  and  $\sigma_{\text{OS}}$  remain as variables in (3). Thus, with two instances of (3), the two variables,  $\mu_{\text{OS}}$  and  $\sigma_{\text{OS}}$ , can be derived. Therefore, due to a 1000-sample Monte Carlo simulation of  $V_{\text{IN,test1}}$  ( $\Delta V_{\text{IN,SA}} = 10 \text{ mV}$ ) and  $V_{\text{IN,test2}}$   $(\Delta V_{IN,SA} = -10 \text{ mV})$ ,  $P_{FailSA1}$  and  $P_{FailSA2}$  can be determined and can be shown as the following two equations, using (3):

$$\mu_{\rm OS} + \sigma_{\rm OS} \Phi^{-1} (1 - P_{\rm FailSA1}) = V_{\rm INtest1} \tag{4}$$

$$\mu_{\rm OS} + \sigma_{\rm OS} \Phi^{-1} (1 - P_{\rm FailSA2}) = V_{\rm INtest2}$$
<sup>(5)</sup>

Finally, by calculating (4) and (5),  $\mu_{OS}$  and  $\sigma_{OS}$  can be shown as follows:

$$\mu_{\rm OS} = \frac{\Phi^{-1}(1 - P_{\rm FailSA2})V_{\rm INtest1} - \Phi^{-1}(1 - P_{\rm FailSA1})V_{\rm INtest2}}{\Phi^{-1}(1 - P_{\rm FailSA2}) - \Phi^{-1}(1 - P_{\rm FailSA1})}$$
(6)

$$\sigma_{\rm OS} = \frac{V_{\rm INtest1} - V_{\rm INtest2}}{\Phi^{-1}(1 - P_{\rm FailSA1}) - \Phi^{-1}(1 - P_{\rm FailSA2})}$$
(7)

In (6) and (7), because  $V_{INtest1}$ ,  $V_{INtest2}$ ,  $P_{FailSA1}$ , and  $P_{FailSA2}$  are determined through simulation,  $\mu_{OS}$  and  $\sigma_{OS}$  can be estimated. Additionally, the energy consumption is measured by integrating the sum of all currents flowing during one cycle. The energy consumptions shown correspond to those consumed at the four columns of the BL. As mentioned earlier, the reduction in  $V_{OS}$  can be observed to enhance the performance of BL delay/energy and SA delay/energy.

**Funding:** This research was supported by the Core Research Institute Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (No. 2018R1A6A1A03025242), MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-RS-2022-00156225), supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation), and the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT). (No. 2023-11-0830, Development of memory module and memory compiler for non-volatile PIM optimized for data characteristics and data access characteristics of AI processor).

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Not applicable.

**Conflicts of Interest:** Author Hanwool Jeong is a founder of the company Articron Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## References

- 1. Zhu, H.; Kursun, V. A comprehensive comparison of data stability enhancement techniques with novel nanoscale SRAM cells under parameter fluctuations. *IEEE Trans. Circuits Syst. I Regul. Pap.* **2011**, *61*, 1062–1070. [CrossRef]
- Indumathi, G.; Aarthi alias Ananthakirupa, V.P.M.B. Energy optimization techniques on SRAM: A survey. In Proceedings of the 2014 International Conference on Communication and Network Technologies, Sivakasi, India, 18–19 December 2014; pp. 216–221. [CrossRef]
- Saleh, R.; Lim, G.; Kadowaki, T.; Uchiyama, K. Trends in low power digital system-on-chip designs. In Proceedings of the International Symposium on Quality Electronic Design, San Jose, CA, USA, 18–21 March 2002; pp. 373–378. [CrossRef]
- Lin, S.; Kim, Y.-B.; Lombardi, F. A 32nm SRAM design for low power and high stability. In Proceedings of the 2008 51st Midwest Symposium on Circuits and Systems, Knoxville, TN, USA, 10–13 August 2008; pp. 422–425. [CrossRef]
- 5. Pelgrom, M.; Duinmaijer, A.C.J.; Wlebers, A.P.G. Matching properties of MOS transistors. *IEEE J. Solid-State Circuits* **1989**, 24, 1433–1439. [CrossRef]
- Cho, K.; Park, J.; Oh, T.W.; Jung, S.-O. One-Sided Schmitt-Trigger-Based 9T SRAM Cell for Near-Threshold Operation. *IEEE Trans. Circuits Syst. I Regul. Pap.* 2020, 67, 1551–1561. [CrossRef]
- Nabavi, M.; Sachdev, M. A 290-mV 3.34-MHz 6T SRAM with pMOS access transistors and boosted wordline in 65-nm CMOS technology. IEEE J. Solid-State Circuits 2018, 53, 656–667. [CrossRef]
- Abbasian, E. A Highly Stable Low-Energy 10T SRAM for Near-Threshold Operation. *IEEE Trans. Circuits Syst. I Regul. Pap.* 2022, 69, 5195–5205. [CrossRef]
- 9. Khayatzadeh, M.; Lian, Y. Average-8 T differential-sensing Subthreshold SRAM with bit Interleaving and 1kbits per Bitline. *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.* 2014, 22, 971–982. [CrossRef]

- 10. Teman, A.; Pergament, L.; Cohen, O.; Fish, A. A 250 mV 8 kb 40 nm ultra-low power 9 T supply feedback SRAM (SF-SRAM). *IEEE J. Solid-State Circuits* 2011, 46, 2713–2726. [CrossRef]
- 11. Weste, N.; Harris, D. CMOS VLSI Design: A Circuits and Systems Perspective; Addison-Wesley: Boston, MA, USA, 2005.
- 12. Mu, J.; Kim, H.; Kim, B. SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks. *IEEE Trans. Circuits Syst. I Regul. Pap.* **2022**, *69*, 2412–2422. [CrossRef]
- 13. Yu, C.; Yoo, T.; Kim, H.; Kim, T.T.-H.; Chuan, K.C.T.; Kim, B. A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks. *IEEE Trans. Circuits Syst. I Regul. Pap.* **2021**, *68*, 667–679. [CrossRef]
- Yu, C.; Yoo, T.; Kim, T.T.-H.; Chuan, K.C.T.; Kim, B. A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC. In Proceedings of the 2020 IEEE Custom Integrated Circuits Conference (CICC), Boston, MA, USA, 22–25 March 2020; pp. 1–4. [CrossRef]
- Kim, H.; Kim, Y.; Ryu, S.; Kim, J.-J. Algorithm/Hardware Co-Design for In-Memory Neural Network Computing with Minimal Peripheral Circuit Overhead. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; pp. 1–6. [CrossRef]
- Sun, X.; Peng, X.; Chen, P.-Y.; Liu, R.; Seo, J.-S.; Yu, S. Fully parallel RRAM synaptic array for implementing binary neural network with (+1, -1) weights and (+1, 0) neurons. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Republic of Korea, 22–25 January 2018; pp. 574–579. [CrossRef]
- 17. Lee, S.-T.; Woo, S.Y.; Lee, J.-H. Low-Power Binary Neuron Circuit with Adjustable Threshold for Binary Neural Networks Using NAND Flash Memory. *IEEE Access* 2020, *8*, 153334–153340. [CrossRef]
- Liu, R.; Peng, X.; Sun, X.; Khwa, W.S.; Si, X.; Chen, J.J.; Li, J.F.; Chang, M.F.; Yu, S. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks. In Proceedings of the 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 24–28 June 2018; pp. 1–6.
- Dong, Q.; Sinangil, M.E.; Erbagci, B.; Sun, D.; Khwa, W.-S.; Liao, H.-J.; Wang, Y.; Chang, J. 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 242–244. [CrossRef]
- Sun, J.; Wang, Y.; Liu, P.; Wen, S.; Wang, Y. Memristor-Based Neural Network Circuit With Multimode Generalization and Differentiation on Pavlov Associative Memory. *IEEE Trans. Cybern.* 2023, *53*, 3351–3362. [CrossRef] [PubMed]
- Lai, Q.; Wan, Z.; Kuate, P.D.K. Generating Grid Multi-Scroll Attractors in Memristive Neural Networks. *IEEE Trans. Circuits Syst.* I Regul. Pap. 2023, 70, 1324–1336. [CrossRef]
- 22. Lovett, S.J.; Gibbs, G.A.; Pancholy, A. Yield and matching implications for static RAM memory array sense amplifier design. *IEEE J. Solid-State Circuits* 2000, *35*, 1200–1204. [CrossRef]
- Zhang, K.; Hose, K.; De, V.; Senyk, B. The scaling of data sensing schemes for high speed cache design in sub-0.18 μm technologies. In Proceedings of the 2000 Symposium on VLSI Circuits, Honolulu, HI, USA, 15–17 June 2000; pp. 226–227.
- Boley, J.; Calhoun, B. Stack based sense amplifier designs for reducing input-referred offset. In Proceedings of the Sixteenth International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 2–4 March 2015; pp. 1–4.
- Reniwal, B.S.; Singh, P.; Vijayvargiya, V.; Vishvakarma, S.K. A new sense amplifier design with improved input referred offset characteristics for energy-efficient sram. In Proceedings of the 2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID), Hyderabad, India, 7–11 January 2017; pp. 335–340.
- Patil, V.; Grover, A.; Parashar, A. Design of sense amplifier for wide voltage range operation of split supply memories in 22nm HKMG CMOS technology. In Proceedings of the 2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID), Bangalore, India, 4–8 January 2020; pp. 37–42.
- Saraswat, G.; Parashar, A. Voltage Boosted Schmitt Trigger Sense Amplifier (VBSTSA) with Improved Offset and Reaction Time For High Speed SRAMs. In Proceedings of the 2023 36th International Conference on VLSI Design and 2023 22nd International Conference on Embedded Systems (VLSID), Hyderabad, India, 8–12 January 2023.
- Liu, B.; Cai, J.; Yuan, J.; Hei, Y. A low-voltage SRAM sense amplifier with offset cancelling using digitized multiple body biasing. IEEE Trans. Circuits Syst. II Express Briefs 2017, 64, 442–446. [CrossRef]
- 29. Dhong, S.; Takahashi, O.; White, M.; Asano, T.; Nakazato, T.; Silberman, J.; Kawasumi, A.; Yoshihara, H. A 4.8GHz fully pipelined embedded SRAM in the streaming processor of a CELL processor. In Proceedings of the ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005, San Francisco, CA, USA, 10 February 2005; Volume 1, pp. 486–612.
- 30. Chen, N.; Chaba, R. Dual Sensing Current Latched Sense Amplifier. U.S. Patent 12 731 623, 3 March 2010.
- Tsai, M.-F.; Tsai, J.-H.; Fan, M.-L.; Su, P.; Chuang, C.-T. Variation tolerant CLSAs for nanoscale bulk-CMOS and FinFET SRAM. In Proceedings of the 2012 IEEE Asia Pacific Conference on Circuits and Systems, Kaohsiung, Taiwan, 2–5 December 2012; pp. 471–474.
- 32. Sarfraz, K.; He, J.; Chan, M. A 140-mV variation-tolerant deep sub-threshold SRAM in 65-nm CMOS. *IEEE J. Solid-State Circuits* 2017, 52, 2215–2220. [CrossRef]
- Patel, D.; Neale, A.; Wright, D.; Sachdev, M. Hybrid latch-type offset tolerant sense amplifier for low-voltage SRAMs. *IEEE Trans. Circuits Syst. I Regul. Pap.* 2019, 66, 2519–2532. [CrossRef]
- 34. Kawasumi, A.; Takeyama, Y.; Hirabayashi, O.; Kushida, K.; Tachibana, F.; Niki, Y.; Sasaki, S.; Yabe, T. Energy efficiency deterioration by variability in SRAM and circuit techniques for energy saving without voltage reduction. In Proceedings of the 2012 IEEE International Conference on IC Design & Technology (ICICDT), Austin, TX, USA, 30 May–1 June 2012; pp. 1–4.

- 35. Verma, N.; Chandrakasan, A.P. A High-Density 45 nm SRAM Using Small-Signal Non-Strobed Regenerative Sensing. *IEEE J. Solid-State Circuits* 2008, 44, 163–173. [CrossRef]
- Qazi, M.; Stawiasz, K.; Chang, L.; Chandrakasan, A. A 512kb 8T SRAM Macro Operating Down to 0.57 V with an AC-Coupled Sense Amplifier and Embedded Data-Retention-Voltage Sensor in 45nm SOI CMOS. *IEEE J. Solid-State Circuits* 2010, 46, 85–96. [CrossRef]
- Fragasse, R.; Dupaix, B.; Tantawy, R.; James, T.; Khalil, W. Sense amplifier offset cancellation and replica timing calibration for high-speed SRAMs. In Proceedings of the 2018 IEEE 9th Latin American Symposium on Circuits & Systems (LASCAS), Puerto Vallarta, Mexico, 25–28 February 2018; pp. 1–5.
- 38. Sinangil, M.E.; Poulton, J.W.; Fojtik, M.R.; Greer, T.H.; Tell, S.G.; Gotterba, A.J.; Wang, J.; Golbus, J.; Zimmer, B.; Dally, W.J.; et al. A 28 nm 2 Mbit 6 T SRAM with highly configurable low-voltage write-ability assist implementation and capacitor-based sense-amplifier input offset compensation. *IEEE J. Solid-State Circuits* **2015**, *51*, 557–567. [CrossRef]
- Giridhar, B.; Pinckney, N.; Sylvester, D.; Blaauw, D. 13.7 A reconfigurable sense amplifier with auto-zero calibration and preamplification in 28nm CMOS. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014.
- Fragasse, R.; Tantawy, R.; Dupaix, B.; Dean, T.; Disabato, D.; Belz, M.R.; Smith, D.; Mccue, J.; Khalil, W. Analysis of SRAM enhancements through sense amplifier capacitive offset correction and replica self-timing. *IEEE Trans. Circuits Syst. I Regul. Pap.* 2019, 66, 2037–2050. [CrossRef]
- Schinkel, D.; Mensink, E.; Klumperink, E.; van Tuijl, E.; Nauta, B. A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, San Francisco, CA, USA, 11–15 February 2007; pp. 314–315.
- 42. Jeong, H.; Park, J.; Oh, T.W.; Rim, W.; Song, T.; Kim, G.; Won, H.-S.; Jung, S.-O. Bitline precharging and preamplifying switching pMOS for high-speed low-power SRAM. *IEEE Trans. Circuits Syst. II Express Briefs* **2016**, *63*, 1059–1063. [CrossRef]
- Lee, S.; Park, J.; Jeong, H. Cross-Coupled nFET Preamplifier for Low Voltage SRAM. *IEEE Trans. Circuits Syst. II Express Briefs* 2023, 70, 3604–3608. [CrossRef]
- 44. Sharifkhani, M.; Rahiminejad, E.; Jahinuzzaman, S.M.; Sachdev, M. A compact hybrid current/voltage sense amplifier with offset cancellation for high-speed SRAMs. *IEEE Trans. Very Large Scale Integr. VLSI Syst.* **2011**, *19*, 883–894. [CrossRef]
- Shah, J.S.; Nairn, D.; Sachdev, M. An energy-efficient offset cancelling sense amplifier. *IEEE Trans. Circuits Syst. II Express Briefs* 2013, 60, 477–481. [CrossRef]
- Patel, D.; Neale, A.; Wright, D.; Sachdev, M. Body Biased Sense Amplifier With Auto-Offset Mitigation for Low-Voltage SRAMs. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 3265–3278. [CrossRef]
- Zhao, Y.; Wang, J.; Tong, Z.; Wu, X.; Peng, C.; Lu, W.; Zhao, Q.; Lin, Z. An offset cancellation technique for SRAM sense amplifier based on relation of the delay and offset. *Microelectron. J.* 2022, 128, 105578. [CrossRef]
- Mohammad, B.; Dadabhoy, P.; Lin, K.; Bassett, P. Comparative study of current mode and voltage mode sense amplifier used for 28nm SRAM. In Proceedings of the 2012 24th International Conference on Microelectronics (ICM), Algiers, Algeria, 16–20 December 2012; pp. 1–6. [CrossRef]
- Pu, Y.; Zhang, X.; Huang, J.; Muramatsu, A.; Nomura, M.; Hirairi, K.; Takata, H.; Sakurabayashi, T.; Miyano, S.; Takamiya, M.; et al. Misleading energy and performance claims in sub/near threshold digital systems. In Proceedings of the 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–11 November 2010; pp. 625–631.
- Moritz, G.; Giraud, B.; Noel, J.; Turgis, D.; Grover, A. Optimization of a voltage sense amplifier operating in ultra wide voltage range with back bias design techniques in 28nm utbb fd-soi technology. In Proceedings of the 2013 International Conference on IC Design Technology (ICICDT), Pavia, Italy, 29–31 May 2013; pp. 53–56.
- 51. Niki, Y.; Kawasumi, A.; Suzuki, A.; Takeyama, Y.; Hirabayashi, O.; Kushida, K.; Tachibana, F.; Fujimura, Y.; Yabe, T. A digitized replica bitline delay technique for random-variation-tolerant timing generation of SRAM sense amplifiers. *IEEE J. Solid-State Circuits* **2011**, *46*, 2545–2551. [CrossRef]
- Arslan, U.; McCartney, M.P.; Bhargava, M.; Li, X.; Mai, K.; Pileggi, L.T. Variation-tolerant SRAM sense-amplifier timing using configurable replica bitlines. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, USA, 21–24 September 2008; pp. 415–418.
- 53. Wang, P.; Zhou, K.; Zhang, H.; Gong, D. Design of replica bit line control circuit to optimize power for SRAM. *J. Semicond.* 2016, 37, 125002. [CrossRef]
- 54. Lin, Z.; Wu, X.; Li, Z.; Guan, L.; Peng, C.; Liu, C.; Chen, J. A pipeline replica bitline technique for suppressing timing variation of SRAM sense amplifiers in a 28-nm CMOS process. *IEEE J. Solid-State Circuits* **2016**, *52*, 669–677. [CrossRef]
- 55. Komatsu, S.; Yamaoka, M.; Morimoto, M.; Maeda, N.; Shimazaki, Y.; Osada, K. A 40-nm low-power SRAM with multi-stage replica-bitline technique for reducing timing variation. In Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 13–16 September 2009; pp. 701–704.
- 56. Amrutur, B.S.; Horowitz, M.A. Fast low-power decoders for RAMs. IEEE J. Solid-State Circuits 2001, 36, 1506–1515. [CrossRef]
- Chang, M.-F.; Yang, S.-M.; Chen, K.-T.; Liao, H.-J.; Lee, R. Improving the speed and power of compilable SRAM using dual-mode selftimed technique. In Proceedings of the 2007 IEEE International Workshop on Memory Technology, Design and Testing, Taipei, Taiwan, 3–5 December 2007; pp. 57–60.

- 58. Kim, T.-H.; Liu, J.; Keane, J.; Kim, C.H. A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing. *IEEE J. Solid-State Circuits* 2008, 43, 518–529. [CrossRef]
- 59. Wang, D.; Liao, H.; Yamauchi, H.; Chen, Y.; Lin, Y.; Lin, S.; Liu, D.C.; Chang, H.; Hwang, W. A 45nm dual-port SRAM with write and read capability enhancement at low voltage. In Proceedings of the 2007 IEEE International SOC Conference, Hsinchu, Taiwan, 26–29 September 2007; pp. 211–214.
- 60. Karl, E.; Wang, Y.; Ng, Y.-G.; Guo, Z.; Hamzaoglu, F.; Bhattacharya, U.; Zhang, K.; Mistry, K.; Bohr, M. A 4.6 GHz 162 Mb SRAM design in 22 nm trigate CMOS technology with integrated active VMIN-enhancing assist circuitry. In Proceedings of the 2012 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 19–23 February 2012.
- Chang, J.; Chen, Y.H.; Cheng, H.; Chan, W.M.; Liao, H.J.; Li, Q.; Chang, S.; Natarajan, S.; Lee, R.; Wang, P.W.; et al. A 20 nm 112Mb SRAM in highmetal-gate with assist circuitry for low-leakage and low-VMIN applications. In Proceedings of the 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, USA, 17–21 February 2013; pp. 316–333.
- 62. Song, T.; Rim, W.; Park, S.; Kim, Y.; Yang, G.; Kim, H.; Baek, S.; Jung, J.; Kwon, B.; Cho, S.; et al. A 10 nm FinFET 128 Mb SRAM with assist adjustment system for power, performance, and area optimization. *IEEE J. Solid State Circuits* 2017, *52*, 240–249. [CrossRef]
- 63. Baek, G.; Jeong, H. High-Density SRAM Read Access Yield Estimation Methodology. *IEEE Access* 2021, 9, 128288–128301. [CrossRef]

**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.