## 1. Introduction

- Definition-based matrix multiplication (will be referred to as D3.0 in this work).
- Basic divide-and-conquer matrix multiplication by Strassen (D2.8).
- Optimized divide-and-conquer multiplication (D2.4).

## 2. Literature Review

## 3. Methods and Procedures

#### 3.1. Experimental Environment

#### 3.2. Test Dataset Generation

`int`in C++). A dataset was run three hundred times in each case, of which the first twenty were ignored to bypass the initial thermal state of the system and start from a consistent point. The remaining runs were enough to obtain a reliable average.

#### 3.3. Executable Files

#### 3.4. Profiling Tools

## 4. Results and Discussion

#### 4.1. Miss Rate Analysis

#### 4.2. Main Memory Trends

#### 4.3. Algorithm Behavior

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

HPC | High Performance Computing |

D3.0 | The definition-based matrix multiplication |

D2.8 | Strassen’s divide-and-conquer matrix multiplication |

D2.4 | An optimized divide-and-conquer matrix multiplication |

CPU | Central Processing Unit |

GPU | Graphics Processing Unit |

EXE | Executable |

**Figure 2.**Power consumption in watts (W): estimated boundaries for L1, L2, and L3 caches marked for each case. Note the recursive divide-and-conquer methods spilled earlier due to increased stack storage overheads.

**Figure 3.**Total energy consumption in kJ, where the estimated points of spill out to main memory are marked.

**Figure 4.**Total energy consumption in joule (J) for small matrix dimensions, where computation is estimated to be within L1 cache.

**Figure 5.**Execution time in seconds. Note the time trend seems to closely follow the total energy consumption.

**Figure 6.**A detailed view of execution time (ms) for small matrix dimensions within estimated L1 cache boundary.

Processor | Intel Xeon E5-2680/v3 2.50 GHz 12 cores |

Cache | L1 data: 12 × 32 KB (8-way set associative) |

L1 instruction: 12 × 32 KB (8-way set associative) | |

L2: 12 × 256 KB (8-way set associative) | |

L3: 30 MB shared (20-way set associative) | |

Memory | 8 GB |

Operating System | Linux Ubuntu 16.04 64-bit |

Compiler | GCC 7.5.0 (Ubuntu 7.5.0-3ubuntu118.04) |

**Table 2.**Average energy in millijoules (mJ), power in watts, and percentage difference of D2.4 relative to the other methods, where negative indicates better performance. The region of best power savings is marked.

Matrix Dimension | Energy | Power | % D2.4 Advantage | |||||||
---|---|---|---|---|---|---|---|---|---|---|

Energy | Power | |||||||||

D3.0 | D2.8 | D2.4 | D3.0 | D2.8 | D2.4 | D3.0 | D2.8 | D3.0 | D2.8 | |

50 | 39.6 | 47.2 | 53.5 | 13.2 | 11.8 | 10.7 | 35 | 13 | −19 | −9 |

100 | 132.3 | 136 | 141.9 | 14.7 | 13.6 | 12.9 | 7 | 4 | −12 | −5 |

150 | 385 | 402.3 | 394.8 | 15.4 | 14.9 | 14.1 | 3 | −2 | −8 | −5 |

200 | 1333.8 | 1312 | 1271.7 | 17.1 | 16.4 | 15.7 | −5 | −3 | −8 | −4 |

250 | 3860.6 | 3698.4 | 3633.7 | 19.4 | 18.4 | 17.9 | −6 | −2 | −8 | −3 |

300 | 14,552 | 10,341.2 | 7819.5 | 21.4 | 20.6 | 19.5 | −46 | −24 | −9 | −5 |

350 | 50,020 | 29,106 | 18,144 | 24.4 | 23.1 | 22.4 | −64 | −38 | −8 | −3 |

400 | 166,123 | 79,679.8 | 37,861.2 | 27.1 | 25.4 | 23.4 | −77 | −52 | −14 | −8 |

450 | 559,056 | 256,522 | 85,106.8 | 30.4 | 29.2 | 26.3 | −85 | −67 | −13 | −10 |

500 | 1,771,599 | 755,158.6 | 183,804.8 | 32.1 | 30.7 | 28.4 | −90 | −76 | −12 | −7 |

550 | 5,644,914 | 2,231,517.6 | 380,553.6 | 34.1 | 32.4 | 29.4 | −93 | −83 | −14 | −9 |

600 | 18,422,747 | 6,576,116.8 | 833,593.6 | 37.1 | 34.1 | 32.2 | −95 | −87 | −13 | −6 |

650 | 58,690,200.6 | 19,493,061.4 | 1,775,916.8 | 39.4 | 36.1 | 34.3 | −97 | −91 | −13 | −5 |

700 | 184,560,201 | 58,058,035.2 | 3,738,227.2 | 41.3 | 38.4 | 36.1 | −98 | −94 | −13 | −6 |

750 | 587,196,378 | 169,003,328.4 | 8,201,318.4 | 43.8 | 41.4 | 39.6 | −99 | −95 | −10 | −4 |

800 | 1,825,939,422 | 493,783,689.6 | 17,065,369.6 | 45.4 | 43.2 | 41.2 | −99 | −97 | −9 | −5 |

850 | 5,815,657,278 | 1,449,803,759 | 35,870,412.8 | 48.2 | 45.3 | 43.3 | −99 | −98 | −10 | −4 |

900 | 8,591,024,333 | 3,852,873,000 | 75,220,172.8 | 50.4 | 47.9 | 45.4 | −99 | −98 | −10 | −5 |

1000 | 10,566,721,109 | 5,256,980,789 | 156,404,940.8 | 52.4 | 50.7 | 47.2 | −99 | −97 | −10 | −7 |

1100 | 12,542,732,410 | 6,813,082,979 | 327,390,003.2 | 54.4 | 52.1 | 49.4 | −97 | −95 | −9 | −5 |

1200 | 14,483,051,326 | 8,602,704,239 | 677,312,921.6 | 56.8 | 54.8 | 51.1 | −95 | −92 | −10 | −7 |

1300 | 18,771,993,578 | 14,159,804,022 | 2,046,518,886 | 60.1 | 71.2 | 77.2 | −89 | −86 | 28 | 8 |

1400 | 21,728,744,360 | 15,771,824,266 | 4,214,980,608 | 64.2 | 73.4 | 79.5 | −81 | −73 | 24 | 8 |

1500 | 24,178,945,769 | 19,860,856,069 | 8,493,583,565 | 66.8 | 76.1 | 80.1 | −65 | −57 | 20 | 5 |

Matrix | L1 Misses | L2 Misses | L3 Misses | ||||||
---|---|---|---|---|---|---|---|---|---|

Dimension | D3.0 | D2.8 | D2.4 | D3.0 | D2.8 | D2.4 | D3.0 | D2.8 | D2.4 |

50 | 50,641 | 24,312 | 21,643 | 12,471 | 10,478 | 8741 | 24 | 17 | 15 |

100 | 48,531 | 24,781 | 22,314 | 12,781 | 10,241 | 8914 | 22 | 19 | 17 |

150 | 53,152 | 26,140 | 23,146 | 13,784 | 11,364 | 9246 | 24 | 21 | 19 |

200 | 53,941 | 27,140 | 24,691 | 17,425 | 12,634 | 10,656 | 25 | 23 | 20 |

250 | 54,631 | 28,631 | 25,631 | 18,421 | 13,847 | 11,634 | 27 | 27 | 23 |

300 | 55,981 | 29,140 | 26,147 | 22,641 | 14,852 | 12,647 | 29 | 28 | 24 |

350 | 56,910 | 30,147 | 27,931 | 30,145 | 15,362 | 14,654 | 31 | 30 | 26 |

400 | 57,931 | 31,651 | 28,146 | 33,652 | 16,324 | 15,698 | 33 | 31 | 27 |

450 | 59,713 | 32,950 | 30,147 | 41,320 | 17,422 | 16,874 | 35 | 33 | 28 |

500 | 60,235 | 33,165 | 31,460 | 50,361 | 21,632 | 21,698 | 38 | 34 | 30 |

550 | 75,321 | 33,714 | 32,785 | 55,617 | 23,547 | 22,948 | 41 | 36 | 33 |

600 | 79,310 | 34,601 | 33,147 | 70,142 | 29,841 | 25,478 | 43 | 38 | 34 |

650 | 85,312 | 33,631 | 33,910 | 82,156 | 35,261 | 33,695 | 45 | 39 | 39 |

700 | 87,932 | 35,489 | 34,942 | 83,149 | 39,475 | 35,954 | 46 | 41 | 40 |

750 | 91,324 | 36,631 | 35,147 | 90,145 | 44,361 | 41,658 | 49 | 43 | 41 |

800 | 95,312 | 37,326 | 36,147 | 95,961 | 55,641 | 50,647 | 48 | 46 | 43 |

850 | 98,123 | 38,971 | 37,120 | 97,447 | 62,145 | 59,841 | 50 | 49 | 45 |

900 | 99,145 | 39,361 | 28,147 | 99,147 | 70,456 | 63,587 | 52 | 50 | 47 |

1000 | 99,569 | 42,698 | 30,958 | 99,365 | 72,941 | 65,941 | 58 | 54 | 56 |

1100 | 914,320 | 714,327 | 678,910 | 916,347 | 578,912 | 469,820 | 60 | 59 | 58 |

1200 | 4,678,940 | 3,768,453 | 2,876,453 | 1,090,657 | 1,019,876 | 1,009,765 | 5698 | 4698 | 3548 |

1300 | 3,547,931 | 4,236,941 | 5,631,740 | 1,011,649 | 1,156,941 | 1,296,148 | 70,658 | 82,658 | 90,568 |

1400 | 7,890,147 | 8,316,740 | 12,321,945 | 1,260,478 | 1,340,658 | 1,345,964 | 150,968 | 192,689 | 210,658 |

1500 | 11,365,741 | 12,630,948 | 20,103,941 | 1,345,968 | 1,406,157 | 1,469,123 | 185,698 | 245,698 | 410,698 |

