Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory

Lee, Sangheon; Park, Gwanwoo; Jeong, Hanwool

doi:10.3390/s24010016

Open AccessReview

Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory

by

Sangheon Lee

¹

,

Gwanwoo Park

¹ and

Hanwool Jeong

^1,2,*

¹

Department of Electronic Engineering, Kwangwoon University, Seoul 01897, Republic of Korea

²

Articron Inc., Ansan-si 15588, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(1), 16; https://doi.org/10.3390/s24010016

Submission received: 8 November 2023 / Revised: 14 December 2023 / Accepted: 17 December 2023 / Published: 19 December 2023

(This article belongs to the Special Issue Electronics for Sensors, Volume 3)

Download

Browse Figures

Versions Notes

Abstract

:

This paper comparatively reviews sensing circuit designs for the most widely used embedded memory, static random-access memory (SRAM). Many sensing circuits for SRAM have been proposed to improve power efficiency and speed, because sensing operations in SRAM dominantly determine the overall speed and power consumption of the system-on-chip. This phenomenon is more pronounced in the nanoscale era, where SRAM bit-cells implemented near minimum-sized transistors are highly influenced by variation effects. Under this condition, for stable sensing, the control signal for accessing the selected bit-cell (word-line, WL) should be asserted for a long time, leading to increases in the power dissipation and delay at the same time. By innovating sensing circuits that can reduce the WL pulse width, the sensing power and speed can be efficiently improved, simultaneously. Throughout this paper, the strength and weakness of many SRAM sensing circuits are introduced in terms of various aspects—speed, area, power, etc.

Keywords:

static random-access memory; sensing circuit; offset voltage

1. Introduction

System-on-chip design encounters considerable challenges related to power consumption and latency, with an influence emanating from static random-access memory (SRAM) [1,2,3,4]. Thus, the efficient management of SRAM power consumption and the enhancement of SRAM access speed becomes highly important. Although reducing the supply voltage (V_DD) proves effective in reducing power consumption, it introduces potential performance and stability trade-offs. In particular, the SRAM bit-cell, a circuit component for binary data storage, is typically constructed with near minimum-sized transistors to achieve high-density integration, resulting in significant performance variability due to process deviations [5,6,7,8]. Furthermore, to address read stability issues, read assist circuits are employed to suppress the word-line voltage, which can exacerbate performance degradation. Consequently, the optimization of SRAM circuits to minimize both power consumption and delay becomes crucial.

By analyzing the read operation, we can identify a method to simultaneously reduce power consumption and delay in SRAM. During the read operation, the bit-cell generates a voltage difference across the bit-line pair. Then, a sensing circuit measures this voltage difference and subsequently delivers the results to the external system. Importantly, the bit-line pair, which plays a fundamental role, has a significant capacitance, enough to make it the dominant contributor to both delay and power consumption during the read operation. Consequently, when a substantial voltage swing in the bit-line is necessitated for the read operation, it inevitably results in increased delays and power consumption. Thus, reducing the bit-line swing during the read operation can effectively decrease the power consumption and delay at the same time [9,10,11].

However, it is highly challenging to reduce bit-line voltage swing. This is because sensing circuits, especially the sense amplifier (SA) responsible for detecting bit-line swing, necessitate a sufficiently large bit-line voltage difference (ΔV_BL) for precise operation. This need arises due to transistor mismatch within the SA, causing asymmetry in its characteristics. The minimum input voltage difference (in this case, ΔV_BL) required for stable SA operation is known as the SA offset voltage (V_OS). To reduce the ΔV_BL, it becomes essential to lower the V_OS.

Additionally, the SA is crucially utilized not only in SRAM but also in novel components, improving the efficiency of data processing [12,13,14,15,16,17,18,19,20,21]. SAs are used as row ADCs in [12,13,14], binary activation functions in [15,16,17], multilevel sense amplifiers in [18], four-bit flash ADCs in [19], and sensing circuits in [20,21]. Therefore, research on low V_OS for high accuracy, low power consumption, fast speed, and high integration for efficient performance is crucial for SAs.

Consequently, there are numerous prior research efforts proposed to reduce the V_OS, the most important performance of SAs. The simplest method is to use larger width transistors for SAs, which can reduce the mismatch between paired transistors. However, this approach incurs area and power overhead. To reduce the V_OS while minimizing the area and power overhead, various offset reducing circuit techniques have been proposed [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47]. This paper aims to conduct a comparative analysis of these circuits, explaining their effectiveness in reducing the V_OS and achieving power and performance benefits.

The rest of this paper is organized as follows: Section 2 provides essential background information on SRAM read operations and conventional SRAM sensing circuits, including an examination of their limitations. This foundation is crucial for understanding the subsequent content. Section 3 delves into comprehensive introductions of various previously researched SRAM sensing circuits designed to reduce the V_OS, ultimately enhancing speed and power efficiency. Section 4 details a comparative analysis and discussion of the SRAM sensing circuits introduced in Section 3 from various perspectives.

2. Backgrounds on SRAM Read Operation and Conventional Sensing Circuits

Figure 1 presents the simplified circuits in the conventional SRAM for the read operation. In the following, we provide brief explanations for the structure and operation of each circuit shown in Figure 1.

At the top of Figure 1, the bit-cell is composed of six transistors. In this 6T bit-cell, two cross-coupled inverters are formed of M₁, M₂, M₃, and M₄ for storing and latching the binary data at two storage nodes, Q_T and Q_C. The two access transistors, M₅ and M₆, serve as control elements that regulate connections between the bit-line pair (BL_T and BL_C) and storage nodes (Q_T and Q_C). When the WL activates (i.e., WL = 1), access transistors are turned on to connect bit-lines to storage nodes.

Next, the bit-line pre-charge circuit is shown, which is formed of M_PCT, M_PCC, and M_EQ. These transistors are controlled by the low-enable pre-charge trigger signal, PCB, with their gates connected. When PCB = 0, M_PCT and M_PCC are turned on to pre-charge BL_T and BL_C up to V_DD, while M_EQ ensures that BL_T and BL_C are pre-charged to equal voltages.

The column multiplexer (MUX) implemented with M_C1, M_C2, …, M_C8 selects one bit-line pair from multiple pairs (four pairs in Figure 1) and connects it to the SA input pair SL_T and SL_C. The specific bit-line pair to be connected is determined by the column address signal, COLB[0:3], with only one of these signals set to low.

The SA plays a key role in the SRAM read operation. It amplifies the voltage difference between SL_T and SL_C, converting it into a full-logic swing voltage. This amplified signal is then made available at the SA’s differential outputs—SO_T and SO_C. Two commonly used conventional SA structures are the voltage-type latch SA (VLSA) and the current-type latch SA (CLSA), which are shown in Figure 2a,b, respectively [48]. Compared to VLSAs, CLSAs acquire SA input voltages, SL_C and SL_T, through the gate of access transistors, M_S1 and M_S2. Therefore, the SA input voltage drives high impedance and less sensitivity to the timing mismatch. However, CLSAs have additional transistors for sensing operations. Therefore, CLSAs have lower speed performance, higher energy consumption, and a larger area, compared to VLSAs. The SA enable signal (SAE), connected to M_S5–M_S7 of VLSA and M_S7–M_S9 of CLSA, is utilized for triggering the amplifying operation of the SA.

Figure 3 provides operational waveforms of relevant signals during the conventional SRAM read operation, divided into three phases: the pre-charge phase, the access phase, and the evaluation phase. In the pre-charge phase, the PCB becomes low, which pre-charges the bit-lines (BL_T and BL_C) and SA inputs (SL_T and SL_C) to V_DD through the bit-line pre-charge circuit and the SA input pre-charge circuit. Then, the access phase starts by making PCB = 1 to turn off the pre-charge circuits, while the WL for the selected bit-cell is asserted to reflect the data at Q_T and Q_C onto the bit-line pair of BL_T and BL_C. Figure 3 shows an example of bit-cell storing datum “1” (Q_T = 1 and Q_C = 0). In this example, BL_T remains high while BL_C falls due to the bit-cell current through M₆, creating a voltage difference between BL_T and BL_C. By lowering the COLB[i] in the selected column, the column MUX transistors transfer only the selected bit-line pair voltage to the SA inputs, SL_T and SL_C.

During the subsequent evaluation phase, the SA enable signal (SAE) becomes high to trigger the positive feedback configuration in the SA. In this manner, a small voltage difference between SL_T and SL_C, ΔV_IN,SA (See Figure 3), is amplified into the digital voltage difference at SA output nodes SO_T and SO_C. For example, the sensing operation of a VLSA in Figure 2a is shown in Figure 4.

When the sensing datum is “1”, the SL_T remains at V_DD while the SL_C decreases due to the bit-cell, reaching V_DD − ΔV_IN,SA, as shown on the left side of Figure 4. The voltages at the SA outputs, SO_T and SO_C, are equal to those at SL_T and SL_C, respectively, through the pass transistors M_S5 and M_S6. During the subsequent evaluation phase, the SAE rises, and current flows through paired nFETs.

The FETs in the SA, M_S1 and M_S2, are depicted as I_S1 and I_S2 in the middle of Figure 4. At the beginning of the evaluation phase, the V_GS of M_S2 (SO_T = V_DD) is greater than that of M_S1 (SO_C = V_DD − ΔV_IN,SA). Consequently, I_S2 > I_S1 makes SO_C fall faster than SO_T. This leads to positive feedback, formed by M_S1–M_S2–M_S3–M_S4. As a result, SO_T and SO_C eventually reach V_DD and 0 V, respectively, as shown on the right side of Figure 4, indicating a successful “1” datum sensing process.

However, it is not always guaranteed that the SA operation is stably performed. In Figure 5, there is a scenario where sensing failure occurs. The access phase is the same as the previous normal sensing operation (the left side of Figure 5). However, when the evaluation starts by triggering the SA, as shown in the middle of Figure 5, problems can arise. It should be noted that, although the V_GS of M_S2 (SO_T = V_DD) is greater than the V_GS of M_S1 (SO_C = V_DD − ΔV_IN,SA), I_S2 < I_S1. This can occur because there is a mismatch between the M_S1–M_S2 pair, specifically since the V_th of M_S1 is lower than the V_th of M_S2 [22]. Consequently, the SO_T (initially V_DD) falls more quickly than the SO_C (initially V_DD − ΔV_IN,SA). Therefore, SO_T and SO_C end up with 0 V and V_DD, respectively, meaning that sensing fails in attempting to sense datum “1”.

Here, the key point is that the mismatch between the paired transistors is responsible for the sensing failure. To prevent this sensing failure, ΔV_IN,SA should be large enough to compensate the effects of the transistor mismatch. This minimum required ΔV_IN,SA for stable sensing is the offset voltage in the SA, referred to as V_OS, and necessitates that ΔV_IN,SA > V_OS. This V_OS problem becomes severed in low-V_DD regions and is significantly affected by temperature [49,50]. To meet this condition, the WL pulse width is extended to achieve a sufficiently large ΔV_BL, which, in turn, results in a large ΔV_IN,SA. However, this increased ΔV_BL requirement not only causes delays but also raises power consumption, since more power is needed to pre-charge the significant capacitance of the BL pair, stemming from the combined effects of the long wire capacitance of the BL wire and the parasitic capacitance of the bit-cells.

Although employing large-sized transistors for sensing schemes can mitigate the mismatch problem, it incurs power, speed, and area overhead in the sensing stage [18]. In addition, the various replica bit-line delay or self-timed SAE generation techniques are proposed to minimize WL pulses [51,52,53,54,55,56,57,58], but their effects are limited because local variations cannot be considered. The speed and power issue due to the ΔV_BL requirement in SRAM becomes more severe in today’s advanced sub-nanometer technology nodes, because WL-suppressed assist circuits are widely used, which necessitates larger WL pulses for ΔV_BL requirements [59,60,61,62].

Therefore, it would be highly beneficial to reduce the V_OS, as it would alleviate the demand for a large ΔV_BL. In the following section, we describe SRAM sensing circuits designed to reduce the V_OS for the purpose of improving speed and power efficiency. We will explore these circuits in terms of their structure, operation, and key performance characteristics.

3. SRAM Sensing Circuits for Offset Reduction

3.1. Schmitt Trigger Sense Amplifiers

Schmitt triggers are often used to improve the robustness of a standard inverter by modifying the switching threshold. Utilizing this feature, the authors in [24,25,26] proposed the Schmitt trigger-based SA (STSA) to reduce V_OS, where one example structure is shown in Figure 6a. This structure intends to weaken the pull-down network of the inverter holding high voltages relative to that of the low-voltage inverter.

For example, when SL_T is V_DD while SL_C is V_DD − ΔV_IN,SA for datum “1” sensing, SO_T and SO_C become V_DD and V_DD − ΔV_IN,SA, respectively, at the end of the access phase. When the evaluation phase starts with SAE rising, M_S5 is more strongly turned on than M_S6 because SO_T > SO_C. Thus, the Z_T node (the source of M_S3) is more strongly pulled up than Z_C (the source of M_S4). In this manner, which adjusts not only the gate voltage but also controls the source voltages of M_S3 and M_S4 according to SO_T and SO_C, the V_GS of M_S3 is greatly suppressed. That is, the V_GS difference in two paired nFETs (M_S3–M_S4) in the STSA is larger than that in M_S1–M_S2 in the VLSA, which makes it more tolerant to the mismatch effects. In this manner, the STSA attempts to provide a reduced V_OS compared to the VLSA.

However, the STSA has a limited ability to reduce the V_OS. This is because there are additional transistor pairs existing in the STSA; thus, the mismatch effect can be larger. In particular, the mismatch between M_S5 and M_S6 and the mismatch between M_S1 and M_S2, which are not present in the VLSA, increase the asymmetricity in the SA and increase the V_OS. However, the circuit technique implemented in the STSA, performed by M_S1, M_S2, M_S5, and M_S6, effectively mitigates these mismatch effects, thereby compensating for the increase caused by the additional transistor pair. As a result, the final V_OS is reduced compared to the VLSA. Furthermore, the sensing delay is increased compared to the VLSA due to the use of a stacked nFET structure [26].

To mitigate the speed problem of STSAs, the voltage-boosted STSAs (VBSTSAs) are proposed [27], as shown in Figure 6b. In VBSTSAs, the negative voltage generator (NVG) used for the negative bit-line write-assist circuit is reutilized to accelerate the operation of STSAs. In the NVG, as the NVG operation starts, the BSTEN increases and the BSTENb decreases. Through the decreased BSTENb, M_S13, which was holding OUT to V_SS, is turned off, allowing OUT to reach a floating state. Subsequently, after M_S13 is completely turned off, BSTENd, delayed through inverters, decreases and OUT is lowered to a negative voltage through a coupling capacitor, C. Note that BSTENd should decrease after the M_S13 is fully turned off. Therefore, sufficient delay should be provided by the inverter in the NVG. Specifically, the ground voltage for the SA is pulled down to the negative voltage at the rising edge of the SAE, or 0 V otherwise. This is realized by making the switch, which is turned on only when the SAE is high, delivering the negative voltage generated by the NVG. Although sensing speed can be enhanced in this manner, it incurs a significant amount of power overhead. In addition, NVGs are not always used for write-assist circuits; other types of write-assist circuit, such as cell voltage collapse write assist, do not use NVGs.

3.2. Hybrid Latch-Type Sense Amplifiers

Some previously proposed SAs combine the features of VLSAs and CLSAs to reduce the V_OS, which can be referred to as hybrid latch-type SAs (HYSA) [28,29,30,31,32,33]. Figure 7a shows one example of an HYSA proposed in [32], the variation-tolerant SA (VTSA). For consistency in explanation with other structures, the polarity in this VTSA example is reversed from the original structure. The VTSA is primarily based on the CLSA structure but also incorporates features of a VLSA. Specifically, the SA outputs, SO_T and SO_C, are pre-charged to the SA inputs, SL_T and SL_C, using pass transistors M_S7 and M_S8.

When comparing VTSAs with VLSAs, a notable difference is observed in the pull-down networks of the positive feedback configurations in the SA. In the VTSA, these networks, consisting of M_S3 and M_S4, are not directly connected to the CM node as in the VLSA. Instead, they are connected to Z_T and Z_C nodes, as shown in Figure 7a. These nodes are pulled down by M_S1 and M_S2, respectively, with their gates controlled by SL_C and SL_T. This configuration effectively adjusts the V_GS of M_S3 and M_S4 for proper sensing.

The detailed operation of the VTSA is as follows: During the access phase, when SAE = 0 and datum “1” is being sensed, the SL_T is at V_DD, and SL_C is at V_DD − ΔV_IN,SA, making SO_T and SO_C pre-charged to V_DD and V_DD − ΔV_IN,SA, respectively, through M_S7 and M_S8, similar to the VLSA. Additionally, the gate voltages of M_S1 and M_S2, V_G,MS1 and V_G,MS2, become V_DD − ΔV_IN,SA and V_DD, respectively. When the evaluation phase begins with SAE = 1, Z_T and Z_C are pulled down by M_S1 and M_S2, respectively. In this configuration, since SL_T > SL_C, M_S1 can drive more current than M_S2, resulting in Z_C being pulled down more strongly than Z_T (i.e., Z_T > Z_C). As a result, compared to the VLSA, the difference between V_GS,MS3 and V_GS,MS4 is lager in the VTSA, indicating that the amplification can be more stabilized, and thus, V_OS can be reduced. This is due to adjustments made not only in the gate voltage conditions of M_S3 and M_S4 (V_G,MS3 < V_G,MS4), but also in their source voltage conditions (V_S,MS3 > V_S,MS4).

However, the VTSA has an additional pair of nFET transistors compared to the VLSA—M_S1 and M_S2—involved in the initial amplification of signals. This additional pair not only incurs area overhead but also potentially increases the mismatch effects. That is, the mismatch between M_S1 and M_S2, which does not need to be considered in VLSAs, can result in unintentional changes in Z_T and Z_C and degrade the sensing stability. In addition, stacked nFETs degrade the sensing delay and power consumption, like STSAs.

Figure 7b shows another example of an HYSA, the HYSA-QZ, which is proposed in [33]. This structure more aggressively pre-charges the internal nodes of the SA than the VTSA. The notation of QZ here means that not only output nodes (Q), but the internal nodes between the M_S1–M_S2 pair and M_S3–M_S4 pair (Z) are also pre-charged to SA inputs in a direction for precise sensing. As shown in Figure 7b, not only SO_T and SO_C are pre-charged to SL_T and SL_C, but also Z_T and Z_C are pre-charged to SL_T and SL_C, respectively. In this manner, the bias condition of the SA becomes more favorable for accurate sensing than the VTSA.

3.3. Capacitor-Based Offset-Compensated SAs

Several previously proposed SAs have addressed transistor mismatches by employing capacitors [34,35,36,37,38,39,40]. These capacitors capture the mismatches between paired transistors, and the stored mismatch information is subsequently utilized to bias the internal nodes of the SA for compensation. Figure 8a illustrates the configuration of a capacitor-based threshold-matching SA (TMSA), as presented in [38].

As demonstrated in Figure 8b,c, the TMSA comprises two main components: a VLSA part and the capacitor-based threshold-matching part. The primary goal of the TMSA is to compensate the mismatch between the M_S1–M_S2 pair, which is the most critical pair in a VLSA. This correction is accomplished by initially sampling the V_th of M_S1 and M_S2—V_th,MS1 and V_th,MS2—during the pre-charge phase. Then, the sampled V_th,MS1 and V_th,MS2 are stored at the source nodes of M_S1 and M_S2. This ensures that the current through M_S1 and M_S2 during the amplification operation—I_S1 and I_S2—are independent to their V_th mismatch.

The detailed operation that achieves this objective is illustrated in Figure 9a–d, in the example of sensing datum “1”, with a comprehensive explanation provided as follows.

(1): Pre-charge phase (Figure 9a): During this phase, the input and output nodes of the SA—SL_T, SL_C, SO_T, and SO_C—are pre-charged to V_DD. Then, the top-plate nodes of C₀ and C₁—CT_T and CT_C—are pre-charged to V_DD − V_th,MS1 and V_DD − V_th,MS2, respectively, and M_S1 and M_S2 become turned off. This pre-charge is conducted under the assumption that CT_T and CT_C are initially at 0 V before pre-charging (the rationale for this will be explained). In addition, the common bottom-plate node for C₀ and C₁, NRSC, is pre-charged to V_DD by M_S8, which is turned on by PCB = 0.
(2): Access phase (Figure 9b): In this phase, SL_C is lowered and becomes V_DD − ΔV_IN,SA by the bit-cell, causing the SO_C to also be V_DD − ΔV_IN,SA. In addition, the PCB becomes high, so the common bottom-plate node of C₀ and C₁, NRSC, becomes float-high.
(3): Evaluation phase (Figure 9c): This phase starts with the SAE rising, turning on M_S7, so the NRSC is pulled down. This results in negative capacitive voltage couplings from NRSC to CT_T and CT_C, through C₀ and C₁, respectively. Thus, CT_T and CT_C are decreased by ΔV, meaning that CT_T and CT_C are changed into V_DD − V_th,MS1 − ΔV and V_DD − V_th,MS2 − ΔV, respectively. These turn on M_S1 and M_S2, where the overdrive voltage (V_OV = V_GS − V_th) of M_S1 and M_S2—V_OV,MS1 and V_OV,MS2—become as follows:

V_OV,MS1 = V_GS,M1 − V_th,M1 = V(SO_C) − V(CT_T) − V_th,MS1

= (V_DD − ΔV_IN,SA) − (V_DD − V_th,MS1 − ΔV) − V_th,MS1 = ΔV − ΔV_IN,SA

V_OV,MS2 = V_GS,M2 − V_th,M2 = V(SO_T) − V(CT_C) − V_th,MS2

= V_DD − (V_DD − V_th,MS2 − ΔV) − V_th,MS2 = ΔV

The noticeable point is that V_OV,MS1 and V_OV,MS2, which determine I_S₁ and I_S₂, are independent of V_th,MS1 and V_th,MS2, respectively. Thus, even in the presence of a mismatch between V_th,MS1 and V_th,MS2, I_S1 and I_S2 can be stably generated (e.g., I_S1 < I_S2 for datum “1“ sensing as in Figure 9c) at the beginning of the evaluation phase. This renders the TMSA to be notably more robust than the conventional VLSA, leading to a reduced V_OS.
(4): Latching phase (Figure 9d): After the NRSC becomes low in the evaluation phase, this change in NRSC propagates to make LAT = V_DD through a delay buffer, which starts the latching phase. In this phase, CT_T and CT_C become 0 V, so SO_T and SO_C can latch the sensing results at the full digital level. This state is kept until the next pre-charge phase. Here, one can see that CT_T and CT_C are 0 V, and they are to be charged up to V_DD − V_th,MS₁ and V_DD − V_th,MS₂, respectively, in the next pre-charge phase.

Although the TMSA effectively reduces the V_OS by compensating the mismatch between M_S1 and M_S2, there are several shortcomings in this structure. First, the structure is still under the effect of a mismatch between capacitors, C₀ and C₁. The mismatch, however, is typically much smaller than the transistor V_th mismatch. Second, the implementation of capacitors and delay buffers in the TMSA results in a significant increase in power consumption and area requirements. In particular, a sufficiently large ΔV is necessary to turn on M_S1 and M_S2 in the early stage of the amplification stage; it is inevitable to employ large capacitors for C₀ and C₁. However, by placing the metal–oxide–metal (MOM) capacitors on top of the circuit layout, the area overhead can be avoided [39]. Consequently, a significant amount of power is required to charge up the NRSC from 0 V to V_DD in the pre-charge phase.

As an alternative approach, the variation-tolerant small-signal SA (VTS-SA) is proposed in [39], specifically addressing mismatches between the two inverters in the SA. This is achieved through the utilization of capacitors at the input acceptance part. The structure of the VTS-SA is shown in Figure 10 below.

The VTS-SA is based on a VLSA composed of M_S1–M_S2–M_S3–M_S4, while the SA input nodes, SL_T and SL_C, are accepted through coupling capacitors C_C1 and C_C2, respectively. By utilizing capacitors, the VTS-SA can capture and store the trip points of two inverters in SA-INV₁ (M_S1 and M_S3) and INV₂ (M_S2 and M_S4), shown in Figure 10. By biasing the two inverters with their respective trip points, the two inverters become highly sensitive to small voltage input variations. That is, even small input voltage changes can push the inverters to switch their output states. This enhanced voltage gain of the inverters contributes to the improved speed of the SA. Furthermore, trip-point biasing in the VTS-SA serves another crucial purpose: it allows the SA to adapt and account for process variations within the inverters. By individually setting the trip points, the VTS-SA makes each inverter operate primarily in response to input changes, minimizing its dependence on process variations as much as possible.

The detailed operations of the VTS-SA are illustrated in Figure 11a–c, where there are three main operation phases: (1) the trip-point bias phase, (2) the access phase, and (3) the evaluation phase.

(1): Trip-point bias phase (Figure 11a): In this phase, the input and output are shorted in INV₁ and INV₂ of the SA. As a result, the input and output of INV₁ and INV₂ are set to their respective trip points—V_bias,INV1 and V_bias,INV2. This is accomplished by turning on the M_S7 and M_S8 transistors through PRE = 1, while also turning on the header and footer switches M_S11 and M_S12 with EN = 1. In addition, SAE = 0 in this phase, to make the bottom plate of the coupling capacitors, SLI_T and SLI_C, also be equal to the trip points of the inverters.
(2): Access phase (Figure 11b): In this phase, the input–output connections are disconnected, and the two trip-point-biased inverters are ready to accept changes in SL_T and SL_C through capacitive couplings. Specifically, when sensing datum “1”, as demonstrated in Figure 11b, SL_C is decreased by ΔV_IN,SA. Then, SLI_C is decreased by ΔV_coup through capacitive coupling via C_C1. Due to trip-point bias, this input change of INV2 leads to a significant change in the output of INV₂, SO_T. As a result, an amplified voltage difference is observed between SO_T and SO_C, which is K × ΔV_IN,SA, where K > 1. It is important to note that, as previously mentioned, because the inverters are biased to their respective trip point, the output change is almost only determined by the input change, while largely independent to the process variations.
(3): Evaluation phase (Figure 11c): In this phase, the SAE becomes high; thus, the two inverters are connected in a cross-coupled fashion, by turning on M_S10 and M_S9. At the same time, the two cross-coupled inverters are isolated from the input by turning off M_S5 and M_S6. Through the positive feedback of the cross-coupled inverters, the final data are latched onto SO_T and SO_C at the full digital level, similar to the operation of other SAs.

Although the VTS-SA tries to reduce the V_OS by capturing the mismatch between INV₁ and INV₂ through trip-point biasing, there are several limitations to this structure. First, the mismatch between M_S5–M_S6, M_S7–M_S8, and M_S9–M_S10 are newly introduced in this structure, which limits V_OS reduction. Second, similar to the TMSA, the VTS-SA is still affected by mismatches between C_C1 and C_C2, although it is less influential than the transistor mismatch. Third, because the input voltage should be transferred through capacitive coupling, not all of the ΔV_IN,SA is delivered to the SA. This inefficiency contributes to an increase in effective V_OS. Fourth, the trip-point biasing process should be completed before the ΔV_IN,SA appears between SL_T and SL_C. This requirement potentially increases the circuit complexity. In addition, the short current from V_DD to V_SS is inevitable during the trip-point biasing, resulting in high power consumption.

The current-mode SA with a capacitive offset correction (CSA_COC) structure proposed in [40] utilizes a single capacitor for storing the trip points of inverters, so it is free from capacitor mismatch effects. The schematic of the CSA_COC is shown in Figure 12a, and the operation waveforms of its three main control clock signals—the trip-point storage enable, Φ_Trs; the trip-point bias enable, Φ_Trb; and the sense enable, SAE—are illustrated in Figure 12b.

The key concept of the CSA_COC is to store the difference in the trip point voltages of the two inverters, INV₁ and INV₂, in Figure 12a. The difference in the trip point voltages of the two inverters, ΔV_Tr = V_Tr1–V_Tr2, is stored across the single capacitor, C₀. Then, the two inverters are biased to compensate the trip-point difference, effectively correcting for the mismatch. The operation of CSA_COC unfolds in three phases, as illustrated in Figure 13a–c, with explanations for each provided as follows.

(1): Trip-point storage phase (Φ_Trs = 1, Figure 13a): In this phase, SL_T and SL_C are pre-charged to V_DD, and the input and output of each inverter, INV₁ and INV₂, are shorted. In this manner, the trip points of INV₁ and INV₂, V_Tr1 and V_Tr2, are captured and stored at the input and output nodes of the respective inverters, as shown in Figure 13a. It is accomplished by turning on M_S7, M_S8, M_S9, and M_S10 while turning off T₁, T₂, M_S5, and M_S6. The difference between two inverter trip points, ΔV_Tr, is stored across the capacitor, C₀.
(2): Trip-point bias phase (Φ_Trb =1, Figure 13b): During this phase, the input and output of INV1 and INV2 are disconnected by turning off M_S7 and MS₁₀. Subsequently, by utilizing the ΔV_Tr stored in C₀ in the previous phase, the input of each inverter is held as its respective trip point, while INV₁ and INV₂ are configured in the cross-coupled connection. For example, the input of INV₁ is kept as V_Tr1, while it is connected to the output of INV₂ (=SO_c), and vice versa. This is achieved by turning on M_S5 and M_S7 while turning off MS₁₁. Then, the voltage difference is made between SL_T and SL_C by the bit-cell, and develops the differential current through M_S3 and M_S4.
(3): Evaluation phase (SAE = 1, Figure 13c): In this phase, the two cross-coupled inverters are disconnected from C0 by turning off M_S8 and M_S9. Simultaneously, the positive feedback of the cross-coupled inverters is initiated by turning on M_S11, T₁, and T₂. As a result, the full digital voltage level appears at two differential outputs of the SA, SO_T and SO_C.

The CSA_COC is immune to capacitor mismatch due to use of a single capacitor, unlike the TMSA and VTS-SA. However, compared to the previous SAs in which the voltage between SL_T and SL_C is transferred to SO_T and SO_C through fully turned-on pFETs during the access phase, in the CSA_COC, the voltage difference between SO_T and SOC follows that of SL_T and SO_C through partially turned-on pFETs (current-based). This leads to voltage loss, effectively increasing the V_OS. In addition, there are numerous required switches and a control signal generation logic, which increases the circuit design complexity with power and area overhead.

3.4. Offset-Compensated Pre-Amplifiers

Another approach in offset compensation is the use of pre-amplifiers that amplify the bit-line signal preceding the SA stage, as seen in [41,42,43,44]. Instead of directly modifying the SA structure, these additional offset-compensating pre-amplifiers are employed in front of the SA. This allows for the required offset compensation while maintaining the original SA structure. One such example is the bit-line pre-charge and pre-amplifying switching pFET circuit (BP²SP), with its structure and key operational waveforms depicted in Figure 14a,b.

As shown in Figure 14b, BP²SP is operated in three phases, as explained below.

(1): Pre-charge phase (PCB = 0): In this phase, M_S13 and M_S14 in BP²SP are turned on to pre-charge BL_C and BL_T, respectively. This pre-charges BL_C and BL_T to V_DD − V_th,MS15 and V_DD − V_th,MS16, respectively, through a diode connection. It ensures that M_S15 and M_S16 have V_GS = V_th, allowing them to turn on immediately, regardless of V_th variations, when BL_C or BL_T is discharged in the subsequent phase. This compensates the V_th mismatch between M_S15 and M_S16. In the SA side, SL_T and SL_C are pre-discharged to 0 V through M_S8 and M_S9.
(2): Access phase (PCB = 1, WL = 1): During this phase, the data stored in the selected bit-line are reflected to the BL_T and BL_C. In the example shown in Figure 14b, datum “1” is sensed, so the BL_T remains close to its pre-charge level, V_DD − V_th,MS16, while BL_C decreases from V_DD − V_th,MS15. Because the BL_C is pre-charged at V_DD − V_th,MS15, M_S15 turns on instantly as soon as the BL_C decreases. This causes the BLX_T to increase rapidly. Simultaneously, the COLB is lowered to enable the column MUX, resulting in SL_T increasing and SL_C remaining at 0 V. As shown in Figure 14b, this phase effectively pre-amplifies the voltage difference between BL_T and BL_C to the voltage difference between SL_T and SL_C.
(3): Evaluation phase (SAE = 1): In this phase, the SAE is raised, meaning /SAE is lowered. Consequently, the VLSA is enabled to store the final sensing data in the form of a full digital voltage at the SO_T and SO_C nodes. In addition, during this phase, the bit-line equalization circuit—transmission gate T₁—is activated to equalize BL_T and BL_C. This ensures that the subsequent pre-charge operation of BL_T and BL_C can start with both bit-lines having the same low voltage level as the initial condition. This equalization step is important for maintaining consistency in the subsequent memory operation.

The operation principle of BP²SP is to use the same pFETs for using pre-charge bit-line and pre-amplify bit-line voltages. Specifically, by pre-charging the bit-line to capture the V_th variation of the pre-amplifying pFETs, these pre-amplifying pFETs can instantly turn on in response to bit-line pair voltage development. This allows the amplified voltage to be observed at SL_T and SL_C, reducing the required ΔV_BL for stable sensing, leading to improvements in speed and power efficiency. However, to make bit-line pairs to V_DD − V_th, it is necessary to ensure that the bit-line voltages are sufficiently lower than V_DD − V_th before pre-charge. This requirement increases the circuit complexity, especially when the memory is awakened from power-down mode or standby mode. In addition, after pre-charging the bit-line pair to V_DD − V_th, the bit-lines become floating, making them susceptible to noise. Moreover, the initial V_GS condition of pre-amplifier pFETs can significantly vary according to the pre-charge period, which means that the overall speed of the read operation is highly affected by the pre-charge time.

In [43], another pre-amplifier circuit for SRAM, the cross-coupled nFET pre-amplifier and pre-charge circuit (CCN-PP), is presented. The structure and operational waveforms of the CCN-PP are shown in Figure 15a,b. As depicted in Figure 15b, the CCN-PP operates in four phases.

(1): Pre-charge phase (PBE = 0, PCB = 0): During this phase, the pre-charging boost enable signal (PBE) and PCB are low, so the SA input pre-charge circuit (M_S3–M_S4–M_S5) and M_S6 are turned on. This maintains VDDSA as V_DD, while SLX_T and SLX_C are pre-charged to V_DD. It should be noted that, unlike the conventional pre-charge operation, all the column MUX transistors and bit-line equalization circuits (T₁) are turned on. As a result, SL_T, SL_X, BL_T, and BL_C are pre-charged through the CCN-PP. Because the CCN-PP is composed of nFETS, there a threshold voltage drop for pre-charging voltages. That is, BL_T and BL_C are pre-charged to V_DD − min(V_th,MS1, V_th,MS2).
(2): Access phase 1 (PBE = 1): During this phase, the unselected column MUX transistors are turned off and the PBE is raised. As a result, M_S6 is turned off and then the PBEd rises, boosting the VDDSA into V_DD + ΔV_C through C₀ coupling. Thus, the SA inputs, SLX_T and SLX_C, are also pre-charged to V_DD + ΔV_C. Accordingly, BL_T and BL_C can be slightly raised. In this phase, the WL is activated, so BL_T and BL_C start to be developed according to bit-cell data.
(3): Access phase 2 (PBE = 0, PCB = 1): With PCB rising, SLX_T and SLX_C are affected by the change in BL_T and BL_C through the CCN-PP. For example, when accessing the datum “1”, as shown in Figure 15b, BL_C and SL_C decrease, leading M_S2 to be turned on while M_S1 is kept turned off. The turned-on M_S2 makes SLX_C fall while SLX_T is kept high, close to V_DD + ΔV_C. Due to the positive feedback nature of cross-coupled nFETs, the voltage difference between SLX_T and SLX_C is larger than that of BL_T and BL_C, meaning that the bit-line voltage is pre-amplified.
(4): Evaluation phase (SAE = 1): High SAEs activate the SA to latch the data at SA outputs, SO_T and SO_C. In addition, similar to BP²SP, the bit-line equalization circuit is activated to provide proper bit-line initial conditions for the subsequent pre-charge phase.

Unlike BP²SP, the initial V_GS of pre-amplifier transistors in the CCN-PP are determined by access phase 1. Thus, the performance is less dependent on the pre-charge period, so a stable speed can be provided with the CCN-PP. However, as in BP²SP, the CCN-PP still suffers from floating BL_T and BL_C during the pre-charge phase. In addition, the CCN-PP cannot compensate the mismatch between M_S1 and M_S2, which is an inferior point compared to BP²SP. In addition, utilizing the VDDSA boosting circuit can incur a significant amount of power and area overhead.

In [44], the offset-cancelled current SA (OCCSA) is proposed. As shown in Figure 16, the OCCSA uses nFET MUX transistors instead of pFET MUX transistors. Here, the nFET MUX (PSA) operates as a common-gate amplifier, so it effectively pre-amplifies the BL. To bias these PSAs properly with offset-compensating features, BL_T and BL_C, the BL should be pre-charged lower than V_DD − V_th,MS1 and V_DD − V_th,MS2, respectively. To realize this, a separate supply voltage, V_prebl, is required. However, the incorporation of this new voltage source is highly costly due to its substantial power and area overheads, making the circuit impractical for actual implementation.

3.5. Other Structures

In [45], an SA with inherent offset cancellation (SAOC) is proposed, with its structure shown in Figure 17a. The SAOC utilizes pFETS—M_S10 and M_S11 in Figure 17a—for input reception, connecting SL_T and SL_C to the gate node of these pFETs. Before sensing, by driving SL_T and SL_C low and toggling the PRE from low to high, the |V_thp| of M_S10 and M_S11 is captured at the output nodes of SA—SO_T and SO_C, respectively. Subsequently, BL_T and BL_C are transferred into SL_T and SL_C by turned-on MUX transistors, while M_S10 and M_S11 are turned on by the low PRE. This results in the charging of SO_T and SO_C by M_S10 and M_S11. In this manner, the SAOC achieves sensing operations, compensating the mismatch between M_S10 and M_S11. However, it should be noted that the mismatch between the nFET MUX pair (M_S6 and M_S7) is not compensated, and pulling up SL_T and SL_C with nFETs based on BL_T and BL_C occurs losses during transmitting BL voltage differences to ΔV_IN,SA.

In [46], the body-biasing technique is used at critical sensing transistors for auto-offset mitigation features. A differential-input body-biased sense amplifier with floating output nodes (DIBBSA-FL) and a differential-input body-biased sense amplifier with pre-discharge output nodes (DIBBSA-PD) are shown in Figure 17b,c, respectively. The difference between the DIBBSA-FL and the DIBBSA-PD is that the DIBBSA-PD has additional transistors, M_S8 and M_S9, to predischarge SO_T and SO_C, while the DIBBSA-FL only equalizes SO_T and SO_C. The operations of DIBBSA-FL and DIBBSA-PD are as follows. During the sensing operation, the SAEB decreases and M_S3 and M_S4 turn on. Simultaneously, when BL_T is higher than BL_C, through the body-bias effect on M_S1, M_S2, M_S3, and M_S4, M_S1 and M_S3 become forward body-biased and M_S2 and M_S4 become reverse body-biased. Therefore, SO_T pulls up much faster than SO_C. However, recently, 3D FETs such as the FinFET and GAA FET have become commonly used. In these technologies, the body effect is nearly negligible. Therefore, using the body-bias technique in recent technologies is not suitable.

Figure 17d shows the cancellation based on delay and offset relation (CDOR) structure [47]. Before the sensing operation, the mismatch in the SA is captured by the sensing operation, with SL_T and SL_C equally set to V_DD. Because of the mismatch in the SA, SO_T and SO_C become (1, 0) or (0, 1), connected to the gate of M_S15 and M_S14, respectively. When SO_T and SO_C are (1, 0), this means that the pull-up strength on the SO_T side is higher than that on the SO_C side. Simultaneously, Q and QB become V_DD and V_SS, turning off M_S6 and M_S7. In the case of (SO_T, SO_C) = (1, 0), M_S14 turns on and M_S15 turns off, lowering the SL_T. Due to the decreased SL_T, the pull-up strength of the SO_C side becomes stronger, which operates as offset mitigation. However, the process of adjusting the voltage is highly challenging. This is because the voltage variance is highly dependent on the offset mitigation activation time and the sizes of the M_S6 and M_S7 transistors.

4. Comparison

Table 1 summarizes the comparison among the SRAM sensing circuit designs covered in Section 3.

Unlike the conventional SAs (VLSA and CLSA), the STSA, VTSA, and HYSA-QZ drive or pre-charge the internal nodes of the SA in favor of accurate sensing. In this manner, without using additional control signals or employing additional operation phases, the offset voltage can be efficiently reduced. In terms of reducing the V_OS, the VTSA and HYSA-QZ, which directly pre-charge the internal nodes using pass gates connected to SL_T and SL_C, outperform the STSA. This is because the mismatch effects in the gated FETs controlling the SL_T and SL_C in the STSA are larger than the mismatch effects in the transmission gates used by the VTSA or HYSA-QZ to transfer SL_T and SL_C. Compared with the VTSA, the HYSA-QZ can achieve a smaller V_OS because more internal nodes are pre-charged than the VTSA. However, the SA delay is increased in the STSA, VTSA, and HYSA-QZ compared to the VLSA, because of using increased stack numbers.

The TMSA, VTS-SA, and CSA_COC directly capture mismatches in SAs, utilizing a capacitor(s). In this manner, the V_OS can be further reduced compared to the STSA, VTSA, and HYSA-QZ. However, this improvement comes at a cost: introducing additional phases or control signals, biasing through short circuit currents, and using large capacitors increase the SA delay and energy consumption significantly. The trade-off between BL delay/energy and SA delay/energy becomes evident in this context. More precise compensation of SA mismatches can result in a smaller V_OS and reduced BL delay and energy. However, achieving this delicacy requires additional circuit components, which can lead to increased SA delay and energy consumption.

Pre-charging BL circuits, BP²SP and CCN-PP, offer an alternative approach to capturing transistor V_th values and reducing BL voltage development. They can be implemented more simply compared to SA mismatch compensation structures because pre-amplifiers have a simpler structure than SAs. However, controlling BL pre-charge levels can be challenging in practice, especially since they should be floating when diode-connection TRs are used for pre-charging.

In addition to the sensing circuit covered in Section 3, there are several other approaches for reducing V_BL requirements [44,45,46,47], as shown in the last four rows in Table 1. However, it is worth noting that these methods have specific characteristics that may affect their applicability. In one of these structures, the SAOC is introduced to address the mismatch between two input pFETs at the beginning of the read access to reduce the V_OS. However, it is important to note that the mismatches other transistor pairs, which are also critical for V_OS, are not able to be compensated. Thus, it may have increased the V_OS even compared to the conventional SAs. In addition, short-circuit current paths are inevitably formed, which limits its practical applicability.

The OCCSA utilizes the MUX transistors as the common gate amplifier to pre-amplify the V_BL. Although it is powerful, to operate the MUX as an amplifier, an additional high-voltage source is required for bit-line pre-charge (V_prebl), which significantly incurs power and area overheads. In addition, to compensate the mismatch between the MUX transistor pair, a significant amount of time is required for the separate bit-line pre-charge phase before the access phase, which substantially degrades the cycle time.

The DIBBSA-FL and DIBBSA-PD are proposed. In these structures, differential bit-line inputs are transferred to differential output nodes through pull-up pFETs, while the body of the output pull-up pFETs are biased with bit-lines to enhance sensing accuracy. However, a critical limitation of these approaches arises from the fact that most recent SRAMs utilize multiple gate FETs, such as finFETs and gate-all-around FETs, which exhibit minimal body effects. Consequently, the current or threshold voltage remains nearly independent of body voltage changes, rendering these structures inapplicable.

The CDOR-based offset compensating sensing circuit is introduced. This structure captures the mismatch in SAs during the pre-charge phase of the SRAM. This is achieved by enabling the SA (SAE = 1) with the condition of SL_T = SL_C = V_DD. In this manner, the mismatch information is stored at the differential output nodes, SO_T and SO_C. For example, if the mismatch favors the SA to make the SO_T become low, this mismatch capturing process makes SO_T become 0, while SO_C becomes high during the pre-charge phase. Then, utilizing this stored information, when the sensing phase starts, SL_T and SL_C are calibrated to compensate the mismatch. Although the compensation technique is innovative, the accuracy of this compensation technique is highly dependent on factors such as the width of the calibration timing and the sizing of the calibration transistor. This dependency can potentially result in an increase in the effective V_OS of the SA, which may render the structure less practical.

Figure 18 shows the minimum operating voltage of SAs according to technology scalability. The minimum operating voltage represents the minimum voltage that satisfies the 6σ sensing yield at the operating frequency of 1 GHz in the 7 nm, 14 nm, and 28 nm processes.

A quantitative comparison among the different SAs covered in Section 3 is shown in Table 2. It is simulated in TSMC 28 nm technology when a four-to-one MUX is used, with V_DD = 1.0 V, and the number of bit-cells per column is 256. The distribution of V_OS in the SAs is estimated as follows [63]: First, we assume that V_OS follows the Gaussian distribution. Thus, P_FailSA, the probability of sensing failure, can be expressed as follows:

P_{FailSA} = P (V_{OS} {> Δ V}_{IN, SA}) = P (\frac{V_{OS} - μ_{OS}}{σ_{OS}} > \frac{{Δ V}_{IN, SA} - μ_{OS}}{σ_{OS}}) = P (Z > \frac{{Δ V}_{IN, SA} - μ_{OS}}{σ_{OS}})

(1)

in (1), ΔV_IN,SA is the SA input voltage difference, µ_OS is the mean V_OS, σ_OS is the standard deviation of the V_OS, and Z is the standard Gaussian random variable. Second, representing the standard Gaussian cumulative distribution function (CDF) as Φ(z), Equation (1) can be shown as follows:

P_{FailSA} = 1 - Φ (\frac{{Δ V}_{IN, SA} - μ_{OS}}{σ_{OS}})

(2)

Third, through the inverse function, (2) can be expressed as follows:

μ_{OS} + σ_{OS} Φ^{- 1} (1 - P_{FailSA}) = Δ V_{IN, SA}

(3)

in (3), both P_failSA and ΔV_IN,SA are values obtainable through simulation. With the specified values for P_failSA and ΔV_IN,SA, only µ_OS and σ_OS remain as variables in (3). Thus, with two instances of (3), the two variables, µ_OS and σ_OS, can be derived. Therefore, due to a 1000-sample Monte Carlo simulation of V_INtest1 (ΔV_IN,SA = 10 mV) and V_INtest2 (ΔV_IN,SA = −10 mV), P_FailSA1 and P_FailSA2 can be determined and can be shown as the following two equations, using (3):

μ_{OS} + σ_{OS} Φ^{- 1} (1 - P_{FailSA}_{1}) = V_{INtest}_{1}

(4)

μ_{OS} + σ_{OS} Φ^{- 1} (1 - P_{FailSA}_{2}) = V_{INtest}_{2}

(5)

Finally, by calculating (4) and (5), μ_OS and σ_OS can be shown as follows:

μ_{OS} = \frac{Φ^{- 1} (1 - P_{FailSA}_{2}) V_{INtest}_{1} - Φ^{- 1} (1 - P_{FailSA}_{1}) V_{INtest}_{2}}{Φ^{- 1} (1 - P_{FailSA}_{2}) - Φ^{- 1} (1 - P_{FailSA}_{1})}

(6)

σ_{OS} = \frac{V_{INtest}_{1} - V_{INtest}_{2}}{Φ^{- 1} (1 - P_{FailSA}_{1}) - Φ^{- 1} (1 - P_{FailSA 2})}

(7)

In (6) and (7), because V_INtest1, V_INtest2, P_FailSA1, and P_FailSA2 are determined through simulation, μ_OS and σ_OS can be estimated. Additionally, the energy consumption is measured by integrating the sum of all currents flowing during one cycle. The energy consumptions shown correspond to those consumed at the four columns of the BL. As mentioned earlier, the reduction in V_OS can be observed to enhance the performance of BL delay/energy and SA delay/energy.

Funding

This research was supported by the Core Research Institute Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (No. 2018R1A6A1A03025242), MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-RS-2022-00156225), supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation), and the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT). (No. 2023-11-0830, Development of memory module and memory compiler for non-volatile PIM optimized for data characteristics and data access characteristics of AI processor).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

Author Hanwool Jeong is a founder of the company Articron Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhu, H.; Kursun, V. A comprehensive comparison of data stability enhancement techniques with novel nanoscale SRAM cells under parameter fluctuations. IEEE Trans. Circuits Syst. I Regul. Pap. 2011, 61, 1062–1070. [Google Scholar] [CrossRef]
Indumathi, G.; Aarthi alias Ananthakirupa, V.P.M.B. Energy optimization techniques on SRAM: A survey. In Proceedings of the 2014 International Conference on Communication and Network Technologies, Sivakasi, India, 18–19 December 2014; pp. 216–221. [Google Scholar] [CrossRef]
Saleh, R.; Lim, G.; Kadowaki, T.; Uchiyama, K. Trends in low power digital system-on-chip designs. In Proceedings of the International Symposium on Quality Electronic Design, San Jose, CA, USA, 18–21 March 2002; pp. 373–378. [Google Scholar] [CrossRef]
Lin, S.; Kim, Y.-B.; Lombardi, F. A 32nm SRAM design for low power and high stability. In Proceedings of the 2008 51st Midwest Symposium on Circuits and Systems, Knoxville, TN, USA, 10–13 August 2008; pp. 422–425. [Google Scholar] [CrossRef]
Pelgrom, M.; Duinmaijer, A.C.J.; Wlebers, A.P.G. Matching properties of MOS transistors. IEEE J. Solid-State Circuits 1989, 24, 1433–1439. [Google Scholar] [CrossRef]
Cho, K.; Park, J.; Oh, T.W.; Jung, S.-O. One-Sided Schmitt-Trigger-Based 9T SRAM Cell for Near-Threshold Operation. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 1551–1561. [Google Scholar] [CrossRef]
Nabavi, M.; Sachdev, M. A 290-mV 3.34-MHz 6T SRAM with pMOS access transistors and boosted wordline in 65-nm CMOS technology. IEEE J. Solid-State Circuits 2018, 53, 656–667. [Google Scholar] [CrossRef]
Abbasian, E. A Highly Stable Low-Energy 10T SRAM for Near-Threshold Operation. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 5195–5205. [Google Scholar] [CrossRef]
Khayatzadeh, M.; Lian, Y. Average-8 T differential-sensing Subthreshold SRAM with bit Interleaving and 1kbits per Bitline. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2014, 22, 971–982. [Google Scholar] [CrossRef]
Teman, A.; Pergament, L.; Cohen, O.; Fish, A. A 250 mV 8 kb 40 nm ultra-low power 9 T supply feedback SRAM (SF-SRAM). IEEE J. Solid-State Circuits 2011, 46, 2713–2726. [Google Scholar] [CrossRef]
Weste, N.; Harris, D. CMOS VLSI Design: A Circuits and Systems Perspective; Addison-Wesley: Boston, MA, USA, 2005. [Google Scholar]
Mu, J.; Kim, H.; Kim, B. SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 2412–2422. [Google Scholar] [CrossRef]
Yu, C.; Yoo, T.; Kim, H.; Kim, T.T.-H.; Chuan, K.C.T.; Kim, B. A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 667–679. [Google Scholar] [CrossRef]
Yu, C.; Yoo, T.; Kim, T.T.-H.; Chuan, K.C.T.; Kim, B. A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC. In Proceedings of the 2020 IEEE Custom Integrated Circuits Conference (CICC), Boston, MA, USA, 22–25 March 2020; pp. 1–4. [Google Scholar] [CrossRef]
Kim, H.; Kim, Y.; Ryu, S.; Kim, J.-J. Algorithm/Hardware Co-Design for In-Memory Neural Network Computing with Minimal Peripheral Circuit Overhead. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
Sun, X.; Peng, X.; Chen, P.-Y.; Liu, R.; Seo, J.-S.; Yu, S. Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Republic of Korea, 22–25 January 2018; pp. 574–579. [Google Scholar] [CrossRef]
Lee, S.-T.; Woo, S.Y.; Lee, J.-H. Low-Power Binary Neuron Circuit with Adjustable Threshold for Binary Neural Networks Using NAND Flash Memory. IEEE Access 2020, 8, 153334–153340. [Google Scholar] [CrossRef]
Liu, R.; Peng, X.; Sun, X.; Khwa, W.S.; Si, X.; Chen, J.J.; Li, J.F.; Chang, M.F.; Yu, S. Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks. In Proceedings of the 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 24–28 June 2018; pp. 1–6. [Google Scholar]
Dong, Q.; Sinangil, M.E.; Erbagci, B.; Sun, D.; Khwa, W.-S.; Liao, H.-J.; Wang, Y.; Chang, J. 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference—(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 242–244. [Google Scholar] [CrossRef]
Sun, J.; Wang, Y.; Liu, P.; Wen, S.; Wang, Y. Memristor-Based Neural Network Circuit With Multimode Generalization and Differentiation on Pavlov Associative Memory. IEEE Trans. Cybern. 2023, 53, 3351–3362. [Google Scholar] [CrossRef] [PubMed]
Lai, Q.; Wan, Z.; Kuate, P.D.K. Generating Grid Multi-Scroll Attractors in Memristive Neural Networks. IEEE Trans. Circuits Syst. I Regul. Pap. 2023, 70, 1324–1336. [Google Scholar] [CrossRef]
Lovett, S.J.; Gibbs, G.A.; Pancholy, A. Yield and matching implications for static RAM memory array sense amplifier design. IEEE J. Solid-State Circuits 2000, 35, 1200–1204. [Google Scholar] [CrossRef]
Zhang, K.; Hose, K.; De, V.; Senyk, B. The scaling of data sensing schemes for high speed cache design in sub-0.18 μm technologies. In Proceedings of the 2000 Symposium on VLSI Circuits, Honolulu, HI, USA, 15–17 June 2000; pp. 226–227. [Google Scholar]
Boley, J.; Calhoun, B. Stack based sense amplifier designs for reducing input-referred offset. In Proceedings of the Sixteenth International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 2–4 March 2015; pp. 1–4. [Google Scholar]
Reniwal, B.S.; Singh, P.; Vijayvargiya, V.; Vishvakarma, S.K. A new sense amplifier design with improved input referred offset characteristics for energy-efficient sram. In Proceedings of the 2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID), Hyderabad, India, 7–11 January 2017; pp. 335–340. [Google Scholar]
Patil, V.; Grover, A.; Parashar, A. Design of sense amplifier for wide voltage range operation of split supply memories in 22nm HKMG CMOS technology. In Proceedings of the 2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID), Bangalore, India, 4–8 January 2020; pp. 37–42. [Google Scholar]
Saraswat, G.; Parashar, A. Voltage Boosted Schmitt Trigger Sense Amplifier (VBSTSA) with Improved Offset and Reaction Time For High Speed SRAMs. In Proceedings of the 2023 36th International Conference on VLSI Design and 2023 22nd International Conference on Embedded Systems (VLSID), Hyderabad, India, 8–12 January 2023. [Google Scholar]
Liu, B.; Cai, J.; Yuan, J.; Hei, Y. A low-voltage SRAM sense amplifier with offset cancelling using digitized multiple body biasing. IEEE Trans. Circuits Syst. II Express Briefs 2017, 64, 442–446. [Google Scholar] [CrossRef]
Dhong, S.; Takahashi, O.; White, M.; Asano, T.; Nakazato, T.; Silberman, J.; Kawasumi, A.; Yoshihara, H. A 4.8GHz fully pipelined embedded SRAM in the streaming processor of a CELL processor. In Proceedings of the ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005, San Francisco, CA, USA, 10 February 2005; Volume 1, pp. 486–612. [Google Scholar]
Chen, N.; Chaba, R. Dual Sensing Current Latched Sense Amplifier. U.S. Patent 12 731 623, 3 March 2010. [Google Scholar]
Tsai, M.-F.; Tsai, J.-H.; Fan, M.-L.; Su, P.; Chuang, C.-T. Variation tolerant CLSAs for nanoscale bulk-CMOS and FinFET SRAM. In Proceedings of the 2012 IEEE Asia Pacific Conference on Circuits and Systems, Kaohsiung, Taiwan, 2–5 December 2012; pp. 471–474. [Google Scholar]
Sarfraz, K.; He, J.; Chan, M. A 140-mV variation-tolerant deep sub-threshold SRAM in 65-nm CMOS. IEEE J. Solid-State Circuits 2017, 52, 2215–2220. [Google Scholar] [CrossRef]
Patel, D.; Neale, A.; Wright, D.; Sachdev, M. Hybrid latch-type offset tolerant sense amplifier for low-voltage SRAMs. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 2519–2532. [Google Scholar] [CrossRef]
Kawasumi, A.; Takeyama, Y.; Hirabayashi, O.; Kushida, K.; Tachibana, F.; Niki, Y.; Sasaki, S.; Yabe, T. Energy efficiency deterioration by variability in SRAM and circuit techniques for energy saving without voltage reduction. In Proceedings of the 2012 IEEE International Conference on IC Design & Technology (ICICDT), Austin, TX, USA, 30 May–1 June 2012; pp. 1–4. [Google Scholar]
Verma, N.; Chandrakasan, A.P. A High-Density 45 nm SRAM Using Small-Signal Non-Strobed Regenerative Sensing. IEEE J. Solid-State Circuits 2008, 44, 163–173. [Google Scholar] [CrossRef]
Qazi, M.; Stawiasz, K.; Chang, L.; Chandrakasan, A. A 512kb 8T SRAM Macro Operating Down to 0.57 V with an AC-Coupled Sense Amplifier and Embedded Data-Retention-Voltage Sensor in 45nm SOI CMOS. IEEE J. Solid-State Circuits 2010, 46, 85–96. [Google Scholar] [CrossRef]
Fragasse, R.; Dupaix, B.; Tantawy, R.; James, T.; Khalil, W. Sense amplifier offset cancellation and replica timing calibration for high-speed SRAMs. In Proceedings of the 2018 IEEE 9th Latin American Symposium on Circuits & Systems (LASCAS), Puerto Vallarta, Mexico, 25–28 February 2018; pp. 1–5. [Google Scholar]
Sinangil, M.E.; Poulton, J.W.; Fojtik, M.R.; Greer, T.H.; Tell, S.G.; Gotterba, A.J.; Wang, J.; Golbus, J.; Zimmer, B.; Dally, W.J.; et al. A 28 nm 2 Mbit 6 T SRAM with highly configurable low-voltage write-ability assist implementation and capacitor-based sense-amplifier input offset compensation. IEEE J. Solid-State Circuits 2015, 51, 557–567. [Google Scholar] [CrossRef]
Giridhar, B.; Pinckney, N.; Sylvester, D.; Blaauw, D. 13.7 A reconfigurable sense amplifier with auto-zero calibration and pre-amplification in 28nm CMOS. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014. [Google Scholar]
Fragasse, R.; Tantawy, R.; Dupaix, B.; Dean, T.; Disabato, D.; Belz, M.R.; Smith, D.; Mccue, J.; Khalil, W. Analysis of SRAM enhancements through sense amplifier capacitive offset correction and replica self-timing. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 2037–2050. [Google Scholar] [CrossRef]
Schinkel, D.; Mensink, E.; Klumperink, E.; van Tuijl, E.; Nauta, B. A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, San Francisco, CA, USA, 11–15 February 2007; pp. 314–315. [Google Scholar]
Jeong, H.; Park, J.; Oh, T.W.; Rim, W.; Song, T.; Kim, G.; Won, H.-S.; Jung, S.-O. Bitline precharging and preamplifying switching pMOS for high-speed low-power SRAM. IEEE Trans. Circuits Syst. II Express Briefs 2016, 63, 1059–1063. [Google Scholar] [CrossRef]
Lee, S.; Park, J.; Jeong, H. Cross-Coupled nFET Preamplifier for Low Voltage SRAM. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 3604–3608. [Google Scholar] [CrossRef]
Sharifkhani, M.; Rahiminejad, E.; Jahinuzzaman, S.M.; Sachdev, M. A compact hybrid current/voltage sense amplifier with offset cancellation for high-speed SRAMs. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2011, 19, 883–894. [Google Scholar] [CrossRef]
Shah, J.S.; Nairn, D.; Sachdev, M. An energy-efficient offset cancelling sense amplifier. IEEE Trans. Circuits Syst. II Express Briefs 2013, 60, 477–481. [Google Scholar] [CrossRef]
Patel, D.; Neale, A.; Wright, D.; Sachdev, M. Body Biased Sense Amplifier With Auto-Offset Mitigation for Low-Voltage SRAMs. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 3265–3278. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, J.; Tong, Z.; Wu, X.; Peng, C.; Lu, W.; Zhao, Q.; Lin, Z. An offset cancellation technique for SRAM sense amplifier based on relation of the delay and offset. Microelectron. J. 2022, 128, 105578. [Google Scholar] [CrossRef]
Mohammad, B.; Dadabhoy, P.; Lin, K.; Bassett, P. Comparative study of current mode and voltage mode sense amplifier used for 28nm SRAM. In Proceedings of the 2012 24th International Conference on Microelectronics (ICM), Algiers, Algeria, 16–20 December 2012; pp. 1–6. [Google Scholar] [CrossRef]
Pu, Y.; Zhang, X.; Huang, J.; Muramatsu, A.; Nomura, M.; Hirairi, K.; Takata, H.; Sakurabayashi, T.; Miyano, S.; Takamiya, M.; et al. Misleading energy and performance claims in sub/near threshold digital systems. In Proceedings of the 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 7–11 November 2010; pp. 625–631. [Google Scholar]
Moritz, G.; Giraud, B.; Noel, J.; Turgis, D.; Grover, A. Optimization of a voltage sense amplifier operating in ultra wide voltage range with back bias design techniques in 28nm utbb fd-soi technology. In Proceedings of the 2013 International Conference on IC Design Technology (ICICDT), Pavia, Italy, 29–31 May 2013; pp. 53–56. [Google Scholar]
Niki, Y.; Kawasumi, A.; Suzuki, A.; Takeyama, Y.; Hirabayashi, O.; Kushida, K.; Tachibana, F.; Fujimura, Y.; Yabe, T. A digitized replica bitline delay technique for random-variation-tolerant timing generation of SRAM sense amplifiers. IEEE J. Solid-State Circuits 2011, 46, 2545–2551. [Google Scholar] [CrossRef]
Arslan, U.; McCartney, M.P.; Bhargava, M.; Li, X.; Mai, K.; Pileggi, L.T. Variation-tolerant SRAM sense-amplifier timing using configurable replica bitlines. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, USA, 21–24 September 2008; pp. 415–418. [Google Scholar]
Wang, P.; Zhou, K.; Zhang, H.; Gong, D. Design of replica bit line control circuit to optimize power for SRAM. J. Semicond. 2016, 37, 125002. [Google Scholar] [CrossRef]
Lin, Z.; Wu, X.; Li, Z.; Guan, L.; Peng, C.; Liu, C.; Chen, J. A pipeline replica bitline technique for suppressing timing variation of SRAM sense amplifiers in a 28-nm CMOS process. IEEE J. Solid-State Circuits 2016, 52, 669–677. [Google Scholar] [CrossRef]
Komatsu, S.; Yamaoka, M.; Morimoto, M.; Maeda, N.; Shimazaki, Y.; Osada, K. A 40-nm low-power SRAM with multi-stage replica-bitline technique for reducing timing variation. In Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 13–16 September 2009; pp. 701–704. [Google Scholar]
Amrutur, B.S.; Horowitz, M.A. Fast low-power decoders for RAMs. IEEE J. Solid-State Circuits 2001, 36, 1506–1515. [Google Scholar] [CrossRef]
Chang, M.-F.; Yang, S.-M.; Chen, K.-T.; Liao, H.-J.; Lee, R. Improving the speed and power of compilable SRAM using dual-mode selftimed technique. In Proceedings of the 2007 IEEE International Workshop on Memory Technology, Design and Testing, Taipei, Taiwan, 3–5 December 2007; pp. 57–60. [Google Scholar]
Kim, T.-H.; Liu, J.; Keane, J.; Kim, C.H. A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing. IEEE J. Solid-State Circuits 2008, 43, 518–529. [Google Scholar] [CrossRef]
Wang, D.; Liao, H.; Yamauchi, H.; Chen, Y.; Lin, Y.; Lin, S.; Liu, D.C.; Chang, H.; Hwang, W. A 45nm dual-port SRAM with write and read capability enhancement at low voltage. In Proceedings of the 2007 IEEE International SOC Conference, Hsinchu, Taiwan, 26–29 September 2007; pp. 211–214. [Google Scholar]
Karl, E.; Wang, Y.; Ng, Y.-G.; Guo, Z.; Hamzaoglu, F.; Bhattacharya, U.; Zhang, K.; Mistry, K.; Bohr, M. A 4.6 GHz 162 Mb SRAM design in 22 nm trigate CMOS technology with integrated active VMIN-enhancing assist circuitry. In Proceedings of the 2012 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, 19–23 February 2012. [Google Scholar]
Chang, J.; Chen, Y.H.; Cheng, H.; Chan, W.M.; Liao, H.J.; Li, Q.; Chang, S.; Natarajan, S.; Lee, R.; Wang, P.W.; et al. A 20 nm 112Mb SRAM in highmetal-gate with assist circuitry for low-leakage and low-VMIN applications. In Proceedings of the 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, USA, 17–21 February 2013; pp. 316–333. [Google Scholar]
Song, T.; Rim, W.; Park, S.; Kim, Y.; Yang, G.; Kim, H.; Baek, S.; Jung, J.; Kwon, B.; Cho, S.; et al. A 10 nm FinFET 128 Mb SRAM with assist adjustment system for power, performance, and area optimization. IEEE J. Solid State Circuits 2017, 52, 240–249. [Google Scholar] [CrossRef]
Baek, G.; Jeong, H. High-Density SRAM Read Access Yield Estimation Methodology. IEEE Access 2021, 9, 128288–128301. [Google Scholar] [CrossRef]

Figure 1. Simplified schematic of the conventional SRAM for the read operation.

Figure 2. Schematic of two commonly used SAs in SRAM: (a) voltage-type latch SA (VLSA) and (b) current-type latch SA.

Figure 3. Operational waveforms for the read operation relevant signals in the conventional SRAM.

Figure 4. Description of VLSA operation for sensing datum “1”.

Figure 5. Description of sensing failure in VLSA for sensing datum “1”.

Figure 6. Schematics of (a) Schmitt trigger-based SA (STSA) and (b) the voltage-boosted STSA (VBSTSA).

Figure 7. Schematics of two representative hybrid latch-type SAs: (a) variation-tolerant SA (VTSA) in [32] and (b) hybrid latch-type SA-QZ (HYSA-QZ) in [33].

Figure 8. (a) Schematic of capacitor-based threshold-matching SA (TMSA), (b) VLSA part in TMSA, and (c) capacitor-based threshold-matching circuit part.

Figure 9. Four-step operation of TMSA: (a) pre-charge phase, (b) access phase, (c) evaluation phase, and (d) latching phase.

Figure 10. Structure of VTS-SA.

Figure 11. Three operation phases of VTS-SA: (a) trip-point bias, (b) access phase, and (c) evaluation phase.

Figure 12. (a) Schematic of CSA_COC and (b) operation waveforms of three control clock signals.

Figure 13. Three operation phases of CSA_COC: (a) trip-point storage phase (Φ_Trs = 1), (b) trip-point bias phase (Φ_Trb = 1), and (c) evaluation phase (SAE = 1).

Figure 14. (a) Schematic of BP²SP and (b) its operational waveforms.

Figure 15. (a) Schematic of CCN-PP and (b) its operational waveforms.

Figure 16. Structure of OCCSA.

Figure 17. Structure of (a) SAOC, (b) DIBBSA-FL, (c) DIBBSA-PD, and (d) CDOR.

Figure 18. Minimum operating voltage of SAs according to technology scalability.

Table 1. Comparison of SRAM sensing circuit designs.

	Structure	Offset Reduction Technique	Components	Control Signals	Limitations
VLSA	Figure 2a	-	7 TR	PCB, SAE	Large V_OS
CLSA	Figure 2b	-	9 TR	PCB, SAE	Increased V_OS due to additional TR pair
STSA	Figure 6a	Driving Internal Nodes of VLSA (Z_T and Z_C)	11 TR	PCB, SAE	Speed degradation due to stack
VBSTSA	Figure 6b	STSA + Negative Boosting V_SS	14 TR + NVG (share)	PCB, SAE, BSTEN	Necessitating for NVG (power/area cost)
VTSA	Figure 7a	Pre-charging SO_T and SO_C to SL_T and SL_C in CLSA	9 TR	PCB, SAE	Speed degradation due to stack
HYSA-QZ	Figure 7b	Pre-charging output nodes and internal nodes of CLSA	11 TR	PCB, SAE	Speed degradation due to stack
TMSA	Figure 8a	Capturing V_th of pull-down nFETs through paired cap	11 TR + INV + Buffer + 2 C	PCB, SAE	Capacitor mismatch, Cap power/area overhead
VTS-SA	Figure 10	Capturing trip points of cross-coupled INVs with input acceptation via coupling cap pair	12 TR + 2 C	EN, PCB, PRE, SAE	Capacitor mismatch, power/area overhead
CSA_COC	Figure 12a	Capturing trip points of cross-coupled inverters via single capacitor	16 TR + 1C +2 OR (shared)	PCB, SAE, Φ_Trs, Φ_Trb	Many switches, control signal circuit
BP²SP	Figure 14a	Capturing V_th of pre-amplifying pFET pair at BL pre-charge	6TR + SA	PCB, SAE	Bit-line floating, unstable pre-charge level, power/area overhead
CCN-PP	Figure 15a	Pre-amplifying BL via cross-coupled nFET pair, while capturing V_th with boosted V_DD	4TR + 2C + Buffer + 1TR + SA	PCB, SAE, PBE	Bit-line floating, power/area overhead
OCCSA	[44]	Capturing V_th of MUX nFETs at BL pre-charge	7 TR	PCB, SAE	Additional Vprebl voltage generator, different MUX signal
SAOC	[45]	Capturing V_th of input pFETs at SA pre-charge	11 TR	PCB, SAE, OCEN	N1, N2 mismatch, control signal circuit
DIBBSA-FL, DIBBSA-PD	[46]	Body biasing	7 TR, 9TR +Body contact	PCB, SAE	Inapplicable to the recent technology whose body effect is minimal
CDOR	[47]	Lowering input voltage according to SA mismatch	15 TR	PCB, SAE, Q	Control signal circuit for added Q and different PCB, SAE operation

Table 2. Quantitative comparison of SRAM SAs at V_DD = 1.0 V in 28 nm technology.

	Standard Dev. of V_OS (mV)	BL Delay (ps)	SA Delay (ps)	Energy Consumption for Four BLs (fJ)	SA Energy Consumption (fJ)	Area (µm²)
VLSA	16.46	203.86	15.25	93.86	2.94	6.48
CLSA	27.77	323.32	27.69	110.35	3.99	7.88
STSA	12.24	159.57	19.41	86.90	3.67	8.49
VTSA	11.54	152.21	17.67	87.47	3.37	6.79
HYSA-QZ	10.39	140.25	16.83	79.21	3.65	7.09
TMSA	9.96	138.46	13.47	95.43	25.23	8.63
VTS-SA	5.75	91.84	15.25	76.60	16.38	7.11
CSA_COC	9.76	133.67	27.69	89.27	18.26	9.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, S.; Park, G.; Jeong, H. Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory. Sensors 2024, 24, 16. https://doi.org/10.3390/s24010016

AMA Style

Lee S, Park G, Jeong H. Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory. Sensors. 2024; 24(1):16. https://doi.org/10.3390/s24010016

Chicago/Turabian Style

Lee, Sangheon, Gwanwoo Park, and Hanwool Jeong. 2024. "Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory" Sensors 24, no. 1: 16. https://doi.org/10.3390/s24010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of High-Speed, Low-Power Sensing Circuits for Nano-Scale Embedded Memory

Abstract

1. Introduction

2. Backgrounds on SRAM Read Operation and Conventional Sensing Circuits

3. SRAM Sensing Circuits for Offset Reduction

3.1. Schmitt Trigger Sense Amplifiers

3.2. Hybrid Latch-Type Sense Amplifiers

3.3. Capacitor-Based Offset-Compensated SAs

3.4. Offset-Compensated Pre-Amplifiers

3.5. Other Structures

4. Comparison

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI