# Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The cache error rate increases for large volume and high density cache in many-core systems, and can compromise overall system reliability if not well-protected. Meanwhile, future many-core systems place a strict budget on fault-tolerant mechanisms in terms of latency, power and area overhead. Therefore, high reliability but low cost fault-tolerant mechanisms are required;
- The encoding and decoding of rectangular codes, which simply consists of several exclusive or (XOR) gates, can be very fast compared to other error control code, such as Hamming and BCH codes. In contrast, Hamming product codes can provide high error correction capability at the cost of extra check bits overhead;
- Much of cache lines in systems are clean, which means there is another copy in next-level memory. Therefore, it is enough to detect error in clean cache lines, and only correct error in dirty cache lines [5].

## 2. Related Works

#### 2.1. Soft Error Management Techniques

#### 2.2. Hard Error Management Techniques

#### 2.3. Reliable Circuits and Device Design for Cache

## 3. Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes

#### 3.1. Rectangular Codes

_{r}rows and k

_{c}columns (as shown in Figure 1a). For each of k

_{r}rows, a row-direction parity check bit is added to detect single errors in the row; similarly, for each of k

_{c}columns, a column-direction parity check bit is used to detect single errors in the column; the bottom right corner can be either the column parity check bit or the row parity check bit. Therefore, the code word length for k bits data is n = (k + k

_{r}+ k

_{c}+ 1), and the code rate is k/n = k/(k + k

_{r}+ k

_{c}+ 1). For example, supposing that each cache line has 1024-bit data: d0, d1, …, d1023. For rectangular codes, we add one row parity bit r0 for d0, d1, d2, …, d31, add one row parity bit r1 for d32, d33, d34, …, d63, and so on. Similarly, we add one column parity bit c0 for d0, d32, d64, …, d992, add one column parity bit c1 for d1, d33, d65, …, d993, and so on. Finally, we will add one extra check-on-check bit c. Then, we will store all these data and check bits in the following one-dimensional way: d0, d1, d2, …, d31, r0, d32, d33, d34, …, d63, r1, …, d992, d993, d994, …, d1023, r31, c0, c1, … , c32, c.

#### 3.2. Hamming Product Codes

_{r}rows and k

_{c}columns (as shown in Figure 1b). For each of k

_{r}rows, (n

_{c}− k

_{c}) row-direction Hamming check bits are added to correct single errors in the row; for each of the k

_{c}columns, (n

_{r}− k

_{r}) column-direction Hamming check bits are used to correct single errors in the column; the bottom right corner can be either column- or row-direction Hamming check bits. Therefore, the code word length for k bits data is n = n

_{r}×n

_{c}, and the code rate is k/n = k/(n

_{r}+ n

_{c}).

Cache Line Size | n_{c} | n_{r} | Rectangular Check Bits | Rectangular Code Rate | HPC Check Bits | HPC Code Rate |
---|---|---|---|---|---|---|

1024-bit cache line | 32 | 32 | 64 | 94.1% | 384 | 72.8% |

512-bit cache line | 32 | 16 | 48 | 91.4% | 256 | 66.7% |

256-bit cache line | 16 | 16 | 32 | 88.9% | 160 | 61.5%% |

#### 3.3. Two-Layer ECC Combining Rectangular and Hamming Product Codes

## 4. Analysis and Experimental Results

- 16-way SECDED: 16 interleaved SECDED codes are applied for every cache line;
- 8-way SECDED: 8 interleaved SECDED codes are applied for every cache line;
- 8-way DECTED: 8 interleaved DECTED codes are applied for every cache line;
- 4-way DECTED: 4 interleaved DECTED codes are applied for every cache line;
- 4-way 4EC5ED: 4 interleaved 4EC5ED codes are applied for every cache line;
- 2-way 4EC5ED: 2 interleaved 4EC5ED codes are applied for every cache line.

#### 4.1. Experimental Setup

Parameters | Value | |
---|---|---|

Tile Number | 1024 | |

Core Type | 64 bits single-issue | |

L1-I/L1-D Cache, Private | ||

Cache Size | 32 kb | |

Associativity | 4 | |

Replacement Policy | LRU | |

Tag Access Time | 1 cycle | |

Data Access Time | 1 cycle | |

L2 Cache, Private | ||

Cache Size | 256 kb | |

Associativity | 8 | |

Replacement Policy | LRU | |

Tag Access Time | 3 cycles | |

Data Access Time | 8 cycles | |

External Memory | ||

Memory Access Time | 100 cycles |

#### 4.2. Burst Error Control Capability

_{c}is odd, and the parity codes can detect odd numbers of errors.

_{c}+ 1 bits.

_{c}+ 1 bits, each column in Figure 1 has 1 error at most except for one, which has two errors. Otherwise, if two of the columns had two or more errors, this burst error would have at least n

_{c}+ 2 bits. Therefore, the Column SECDED Decoders can correct all errors except those in the column with two errors. Then, the Row SECDED Decoder at Stage 3 can correct this column. Thus, the proposed approach can correct single burst errors up to n

_{c}+ 1 bits. [Q.E.D.]

_{c}= k

_{r}= 32 to achieve minimal storage overhead; and in the case of 512-bit cache line, we set k

_{c}= 32, and k

_{r}= 16 for the same goal. The proposed approach can achieve the highest burst error detection and correction capability. If we take a 1024-bit cache line as example, the proposed approach can detect a 1024-bit burst error, which is 22×, 45×, 33×, 68×, 45×, 93× that achieved by 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED and 2-way 4EC5ED, respectively. Meanwhile, the proposed approach can correct 33-bit burst errors, which is 4.1×, 2.1×, 4.1×, 2.1×, 4.1×, 2.1× of that achieved by 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED and 2-way 4EC5ED, respectively.

**Figure 6.**Burst error (

**a**) detection capability; and (

**b**) correction capability (1024-bit cache line).

#### 4.3. Mean-Error-to-Failure Evaluation

^{4}, while METF of 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED, and 2-way 4EC5ED are 562, 442, 3.84 × 10

^{3}, 3.12 × 10

^{3}, 1.76 × 10

^{4}, and 1.08 × 10

^{4}, respectively. The proposed two-layer ECCs can improve METF by 96×, 123×, 14×, 17×, 3× and 5×, respectively. Even for a cache size of 128 kb, the proposed approach provides METF of 1241, while 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED, and 2-way 4EC5ED only obtain METF of 56, 41, 183, 112, 44 and 206 respectively. The proposed two-layer ECCs can improve METF by 22×, 30×, 6.8×, 11×, 2.8×, and 6.0×, respectively.

^{4}, while METF of 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED, and 2-way 4EC5ED are 629, 538, 5.91 × 10

^{3}, 3.89 × 10

^{3}, 3.16 × 10

^{4}, and 1.87 × 10

^{4}, respectively. The proposed two-layer ECCs can improve METF by 114×, 133×, 12×, 18×, 2.3× and 3.8×, respectively. Even for a cache size of 128 kb, the proposed approach provides METF of 1507, while 16-way SECDED, 8-way SECDED, 8-way DECTED, 4-way DECTED, 4-way 4EC5ED, and 2-way 4EC5ED only obtain METF of 68, 53, 298, 178, 700 and 449, respectively. The proposed two-layer ECCs can improve METF by 22×, 28×, 5.1×, 8.5×, 2.2×, and 3.4×, respectively.

#### 4.4. Mean-Time-to-Failure Evaluation

^{−5}/day and 10

^{−4}/day, respectively. As shown in the figure, the proposed approach can achieve the highest MTTF compared to all other approaches. For example, in the case of a 10

^{−5}/day fault rate, the proposed approach can correctly work for 947, 786, 762, 616, 481, 441, 408, and 323 days before reaching cache failure for cache sizes of 128 kb, 256 kb, 512 kb, 1 Mb, 2 Mb, 4 Mb, 8 Mb, and 16 MB, which are about 2.8×–3.3× those achieved by the next-best solution, the 4-way 4EC5ED. Even in the case of the high fault rate of 10

^{−4}/day, the proposed approach can still provide MTTF of 95, 79, 76, 62, 48, 44, 41, and 32 days for various cache sizes, while the next-best solution, 4-way 4EC5ED, can only achieve MTTF of 16, 13, 13, 12, 9.5, 9.5, 7.2 and 6.4 days, respectively.

^{−5}/day and 10

^{−4}/day, respectively. As shown in the figure, the proposed approach can achieve the highest MTTF compared to all other approaches. For example, in the case of a 10

^{−5}/day fault rate, the proposed approach can correctly work for 1150, 1024, 907, 727, 669, 556, 523, and 428 days before reaching cache failure for cache sizes of 128 kb, 256 kb, 512 kb, 1 Mb, 2 Mb, 4 Mb, 8 Mb, and 16 MB, which are about 2.0×–2.3× those achieved by next-best solution, the 4-way 4EC5ED. Even in the case of the high fault rate of 10

^{−4}/day, the proposed approach can still provide MTTF of 115, 102, 91, 73, 67, 56, 52 and 43 days for various cache sizes, while next-best solution, 4-way 4EC5ED, can only achieve MTTF of 34, 30, 24, 20, 19, 15, 13 and 11 days, respectively.

**Figure 10.**METF comparison (

**a**) Cache line size = 1024 bits, fault rate 10

^{−5}/day; and (

**b**) Cache line size = 1024 bits, fault rate 10

^{−4}/day.

**Figure 11.**METF comparison (

**a**) Cache line size = 512 bits, fault rate 10

^{−5}/day, and (

**b**) Cache line size = 512 bits, fault rate 10

^{−4}/day.

#### 4.5. Overhead Evaluation

^{m}. Therefore, SECDED requires 9, 8, and 7 check bits for data lengths of 128, 64 and 32 bits; DECTED requires 19, 17, and 15 check bits for data lengths of 256, 128 and 64 bits; and 4EC5ED requires 41, 37 and 33 check bits for data lengths of 512, 256 and 128 bits. In addition, SEC requires six and five bits for data lengths of 32 and 16 bits, and the parity code requires 1 bit for any data length. Based on this information, we can evaluate the storage overhead of various cache error control methods.

(a) | Data/ Way | Check bits/Way | #Way | #Total bits | (b) | Data/ Way | Check bits/Way | #Way | #Total bits |
---|---|---|---|---|---|---|---|---|---|

16-way SECDED | 64 | 8 | 16 | 128 | 16-way SECDED | 32 | 7 | 16 | 112 |

8-way SECDED | 128 | 9 | 8 | 72 | 8-way SECDED | 64 | 8 | 8 | 4 |

8-way DECTED | 128 | 17 | 8 | 136 | 8-way DECTED | 64 | 15 | 8 | 120 |

4-way DECTED | 256 | 19 | 4 | 76 | 4-way DECTED | 128 | 17 | 4 | 68 |

4-way 4EC5ED | 256 | 37 | 4 | 148 | 4-way 4EC5ED | 128 | 33 | 4 | 132 |

2-way 4EC5ED | 512 | 41 | 2 | 84 | 2-way 4EC5ED | 256 | 37 | 2 | 74 |

Proposed | * | * | * | 64 | Proposed | * | * | * | 48 |

#### 4.6. Performance Degradation

## 5. Conclusions

## References

- Agostinelli, M.; Hicks, J.; Xu, J.; Woolery, B.; Mistry, K.; Zhang, K.; Jacobs, S.; Jopling, J.; Yang, W.; Lee, B.; et al. Erratic Fluctuations of SRAM Cache V
_{min}at the 90 nm Process Technology Node. In Proceedings of the IEEE International Electron Devices Meeting, IEDM Technical Digest, Washington, DC, USA, 5–7 December 2005. - Borkar, S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro
**2005**, 25, 10–17. [Google Scholar] [CrossRef] - Ruckerbauer, F.X.; Georgakos, X. Soft Error Rates in 65 nm SRAMs—Analysis of New Phenomena. In Proceedings of the IEEE International On-Line Testing Symposium (IOLTS), Crete, Greece, 8–11 July 2007; pp. 203–204.
- Zhang, M.; Ampadu, P. Variation-Tolerant Cache by Two-Layer Error Control Codes. In Proceedings of the 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), New York, NY, USA, 2–4 October 2013; pp. 161–166.
- Yoon, D.H.; Erez, M. Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA-36), Austin, TX, USA, 20–24 June 2009; pp. 116–127.
- Reick, K.; Sanda, P.N.; Swaney, S.; Kellington, J.W.; Mack, M.J.; Floyd, M.S.; Henderson, D. Fault-Tolerant Design of the IBM Power6 Microprocessor. IEEE Micro
**2008**, 28, 30–38. [Google Scholar] [CrossRef] - Chishti, Z.; Alameldeen, A.R.; Wilkerson, C.; Wu, W.; Lu, S.-L. Improving Cache Lifetime Reliability at Ultra-Low Voltages. In Proceedings of the International Symposium on Microarchitecture, New York, NY, USA, 12–16 December 2009; pp. 89–99.
- Wilkerson, C.; Alameldeen, A.R.; Chishti, Z.; Wu, W.; Somasekhar, D.; Lu, S.-L. Reducing Cache Power with Low-Cost, Multi-bit Error-Correcting Codes. In Proceedings of the International Symposium on Computer Architecture (ISCA), Saint-Malo, France, 19–23 June 2010.
- Radaelli, D.; Puchner, H.; Wong, S.; Daniel, S. Investigation of multi-bit upsets in a 150 nm technology SRAM device. IEEE Trans. Nucl. Sci.
**2005**, 52, 2433–2437. [Google Scholar] [CrossRef] - Baeg, S.; Wen, S.; Wong, R. SRAM Interleaving distance selection with a soft error failure model. IEEE Trans. Nucl. Sci.
**2009**, 56, 2111–2118. [Google Scholar] [CrossRef] - Kim, J.; Hardavellas, N.; Mai, K.; Falsafi, B.; Hoe, J.C. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding. In Proceedings of the 40th International Symposium on Micro-architecture (Micro-40), Chicago, IL, USA, 1–5 December 2007.
- Argyrides, C.; Pradhan, D.K.; Kocak, T. Matrix codes for reliable and cost efficient memory chips. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
**2011**, 19, 420–428. [Google Scholar] [CrossRef] - Wang, S.; Hu, J.; Ziavras, S.G. Replicating tag entries for reliability enhancement in cache tag arrays. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
**2012**, 20, 634–654. [Google Scholar] - Kim, J.; Kim, S.; Lee, Y. SimTag: Exploiting Tag Bits Similarity to Improve the Reliability of the Data Caches. In Proceedings of the Conference on DesignAutomationand Test in Europe (DATE’10), Dresden, Germany, 8–12 March 2010; pp. 941–944.
- Wang, S. Characterizing System-Level Vulnerability for Instruction Caches against Soft Errors. In Proceedings of 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT’11), Vancouver, Canada, 3–5 October2011; pp. 356–363.
- Schuster, S. Multiple word/bit line redundancy for semiconductor memories. IEEE J. Solid-State Circuits
**1978**, 13, 698–703. [Google Scholar] [CrossRef] - Horiguchi, M. Redundancy Techniques for High-Density DRAMS. In Proceedings of the 2nd Annual IEEE International Conference Innovative Systems Silicon, Austin, TX, USA, 8–10 October 1997; pp. 22–29.
- Agarwal, A.; Paul, B.C.; Mahmoodi, H.; Datta, A.; Roy, K. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans. Very Large Scale Integr. (VLSI) Syst
**2005**, 13, 27–38. [Google Scholar] [CrossRef] - Lee, H.; Cho, S.; Childers, B.R. Performance of Graceful Degradation for Cache faults. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI ’07), Porto Alegre, Brazil, 9–11 March 2007.
- Abella, J.;Carretero, J.; Chaparro, P.; Vera, X.; Gonzalez, A. Low Vcc
_{min}Fault-Tolerant Cache with Highly Predictable Performance. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09), New York, NY, USA, 12–16 December 2009. - Wilkerson, C.; Gao, H.; Alameldeen, A.R.; Chishti, Z.; Khellah, M.; Lu, S.-L. Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08), Beijing, China, 21–25 June 2008; pp. 203–214.
- Lu, S.-L.; Alameldeen, A.; Bowman, K.; Chishti, Z.; Wilkerson, C.; Wei, W. Architectural-Level Error-Tolerant Techniques for Low Supply Voltage Cache Operation. In Proceedings of the IEEE International Conference on IC Design & Technology (ICICDT’11), Kaohsiung, Taiwan, 2–4 May 2011.
- Aladmeldeen, A.; Wagner, I.; Chishti, Z.; Wu, W.; Wilkerson, C.; Lu, S.-L. Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes. In Proceedings of the 38th International Symposium on Computer Architecture, San Jose, CA, USA, 4–8 June 2011.
- Zhang, M.; Stojanovic, V.M.; Ampadu, P. Reliable ultra-low-voltage cache design for many-core systems. IEEE Trans. Circuits Syst. II
**2012**, 59, 858–862. [Google Scholar] [CrossRef] - Ampadu, P.; Zhang, M.; Stojanovic, V. Breaking the Energy Barrier in Fault-Tolerant Caches for Multicore Systems. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 18–22 March 2013; pp. 731–736.
- Li, C.; Zhang, M.; Ampadu, P. Reliable Ultra-Low Voltage Cache with Variation-Tolerance. In Proceedings of the IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS), Columbus, OH, USA, 4–7 August 2013; pp. 121–124.
- Chang, L.; Fried, D.; Hergenrother, J.; Sleight, J.; Dennard, R.; Montoye, R.R.; Sekaric, L.; McNab, S.; Topol, W.; Adams, C.; et al. Stable SRAM Cell Design for the 32 nm Node and Beyond. In Proceedings of the 2005 Symposium on VLSI Technology, Digest of Technical Papers, Kyoto, Japan, 14–16 June 2005; pp. 128–129.
- Calhoun, B.H.; Chandrakasan, A. A 256 kb sub-threshold SRAM in 65 nm CMOS. IEEE J. Solid State Circuits
**2007**, 42, 680–688. [Google Scholar] [CrossRef] - Fu, B.; Ampadu, P. On hamming product codes with type-II hybrid ARQ for on-chip interconnects. IEEE Trans. Circuits Syst. I
**2009**, 56, 2042–2054. [Google Scholar] [CrossRef] - Kurian, G.; Chen, S.; Chen, C.-H.O.; Miller, J.E.; Michel, J.; Wei, L.; Antoniadis, D.A.; Peh, L.S.; Kimerling, L.; Stojanovic, V.; Agarwal, A. Cross-Layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System using Real Application Workloads. In Proceedings of the IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, 21–25 May 2012.
- Miller, J.; Kasture, H.; Kurian, G.; Gruenwald, C.; Beckmann, N.; Celio, C.; Eastep, J.; Agarwal, A. Graphite: A Distributed Parallel Simulator for Multicores. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), Bangalore, India, 9–14 January 2009.
- Thoziyoor, S.; Ahn, J.; Monchiero, M.; Brockman, J.; Jouppi, N. A Comprehensive Memory Modeling Tool and its Application to the Design and Analysis of Future Memory Hierarchies. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), Beijing, China, 21–25 June 2008.
- Naseer, R.; Draper, J. Parallel Double Error Correcting Code Design to Mitigate Multi-Bit Upsets in SRAMs. In Proceedings of the 34th European Solid-State Circuits Conference (ESSCIRC), Edinburgh, UK, 15–19 September 2008; pp. 222–225.

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Zhang, M.; Ampadu, P.
Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error. *J. Low Power Electron. Appl.* **2014**, *4*, 44-62.
https://doi.org/10.3390/jlpea4010044

**AMA Style**

Zhang M, Ampadu P.
Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error. *Journal of Low Power Electronics and Applications*. 2014; 4(1):44-62.
https://doi.org/10.3390/jlpea4010044

**Chicago/Turabian Style**

Zhang, Meilin, and Paul Ampadu.
2014. "Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error" *Journal of Low Power Electronics and Applications* 4, no. 1: 44-62.
https://doi.org/10.3390/jlpea4010044