High-Performance Computing and Its Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 March 2024) | Viewed by 8210

Special Issue Editors


E-Mail Website
Guest Editor
School of Software, Shanghai Jiao Tong University, Shanghai 200240, China
Interests: distributed systems; virtualization in clouds; industrial big data
Special Issues, Collections and Topics in MDPI journals
Enflame-Tech Inc., Shanghai 201203, China
Interests: machine learning system; heterogeneous computing; scientific computing

Special Issue Information

Dear Colleagues,

The main topic of this Special Issue will be the design and optimization of high-performance computing systems and their application to advanced research frontiers.

High-performance computing is rapidly evolving from focusing on dedicated computational systems to a broader view of system stacks across the hardware architecture to their corresponding software system. Thus, the design and implementation of a modern, advanced high-performance computing system is now a mixture and composed problem that you have to consider on multiple components: hardware devices and their architecture; tooling and infrastructural software components such as high-performance runtime systems and device drivers; proper abstraction in the form of languages and APIs to simplify workloads for the system good; and solutions, optimizations, and strategies that build on top of former components. To this end, proper and elegant design of HPC systems remains challenging and of great importance, as it not only beneficial to HPC research itself but also to all research and industrial frontiers that utilize HPC solutions to fix their own problems.

This Special Issue seeks high-quality manuscripts discussing the design, optimization of HPC systems, and their application in other cross-domain research frontiers that can utilize HPC solutions. Topics include but are not limited to the following:

  • Design of novel hardware architecture on CPU, GPU, ASIC, FPGA, memory devices, and Chiplets for accommodating future HPC workloads;
  • Design and practices of HPC systems in innovative architectures including CPU, GPU, ASIC, FPGA, memory devices for in-memory computing, and Chiplets;
  • Architecture-oriented high-performance computing methodologies, algorithms, and energy efficiency optimizations;
  • Design of and research into innovative HPC paradigms and tooling in the form of programming models, compilers, developer toolkits, software stacks, and domain-specific languages;
  • Scalable HPC methodologies and applications to large-scale deep learning models with distributed computing, heterogeneous computing, virtualization, and cloud computing;
  • High-performance technologies and applications on domain-specific scientific computing, such as numerical methods, quantum computing, fluid dynamics computational economics, computational chemistry, and many other domain-specific topics;
  • Security, privacy, and reliability in high-performance computing technologies on distributed, federated systems;
  • Industrial HPC use cases, solutions, and practices.

Prof. Dr. Jianguo Yao
Dr. Heng Shi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • high-performance computing
  • heterogeneous computing
  • deep learning
  • distributed systems
  • chiplets
  • in-memory computing
  • scientific computing

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 7471 KiB  
Article
Vectorization Programming Based on HR DSP Using SIMD
by Chunhu Xie, Huachun Wu and Jian Zhou
Electronics 2023, 12(13), 2922; https://doi.org/10.3390/electronics12132922 - 03 Jul 2023
Viewed by 1227
Abstract
Single instruction multiple data (SIMD) vector extension has become an essential feature of high-performance processors. Architectures such as x86, ARM, MIPS, and PowerPC have specific vector extension instruction sets and SIMD micro-architectures. Using SIMD vectorization programming can significantly improve the performance of application [...] Read more.
Single instruction multiple data (SIMD) vector extension has become an essential feature of high-performance processors. Architectures such as x86, ARM, MIPS, and PowerPC have specific vector extension instruction sets and SIMD micro-architectures. Using SIMD vectorization programming can significantly improve the performance of application algorithms while keeping the hardware overhead low. In addition, other methods can enhance algorithm performance, such as selecting the best SIMD vectorization model for algorithms, ensuring sufficient instruction streams, implementing reasonable and effective cache data prefetching, and aligning data access and storage addresses according to instruction characteristics. The goal of this paper is three-fold. First, we introduce the basic structural characteristics of a general RISC processor, Hua Rui (HR) DSP, with a custom vector instruction set based on compatibility with an MIPS64 fixed-point and floating-point instruction set, as well as a Fei Teng (FT) processor compatible with an ARMv8 instruction set. Second, we summarize the fundamental principles of SIMD vectorization programming design for the HR DSP, which provides ideas for other scholars or engineering and technical personnel to study the algorithm performance using SIMD vectorization optimization. Third, we implement representative typical algorithms based on the HR and FT platforms and obtain experimental results that show improvement in algorithm SIMD vectorization optimization according to the vector programming design principles summarized in this article can improve the single-core performance of scalar implementation without vectorization, instruction streams, and cache data prefetching by 4–22 times for mean filter, accumulation, and matrix–matrix multiplication, which is significantly better than the performance improvement of 3–13 times for the FT platform. Moreover, the performance of matrix–matrix multiplication using the best vectorization model on the HR platform is about 84% higher than that of the common SIMD vectorization model. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

16 pages, 1846 KiB  
Article
Applying Address Encryption and Timing Noise to Enhance the Security of Caches
by Dehua Wu, Sha Tao and Wanlin Gao
Electronics 2023, 12(8), 1799; https://doi.org/10.3390/electronics12081799 - 11 Apr 2023
Viewed by 1000
Abstract
Encrypting the mapping relationship between physical and cache addresses has been a promising technique to prevent conflict-based cache side-channel attacks. However, this method is not foolproof and the attackers can still build a side-channel despite the increased difficulty of finding the minimal eviction [...] Read more.
Encrypting the mapping relationship between physical and cache addresses has been a promising technique to prevent conflict-based cache side-channel attacks. However, this method is not foolproof and the attackers can still build a side-channel despite the increased difficulty of finding the minimal eviction set. To address this issue, we propose a new protection method that integrates both address encryption and timing noise extension mechanisms. By adding the timing noise extension mechanism to the address encryption method, we can randomly generate cache misses that prevent the attackers from pruning the eviction set. Our analysis shows that the timing noise extension mechanism can cause the attackers to fail in obtaining accurate timing information for accessing memory. Furthermore, our proposal reduces the timing noise generating rate, minimizing performance overhead. Our experiments on SPEC CPU 2017 show that the integrated mechanism only resulted in a tiny performance overhead of 2.9%. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

22 pages, 3973 KiB  
Article
Reliability Analysis of FinFET Based High Performance Circuits
by Alluri Navaneetha and Kalagadda Bikshalu
Electronics 2023, 12(6), 1407; https://doi.org/10.3390/electronics12061407 - 15 Mar 2023
Cited by 4 | Viewed by 2662
Abstract
In the VLSI industry, the ability to anticipate variability tolerance is essential to understanding the circuits’ potential future performance. The cadence virtuoso tool is used in this study to assess how PVT fluctuations affect various fin-shaped field effect transistor (FinFET) circuits. In this [...] Read more.
In the VLSI industry, the ability to anticipate variability tolerance is essential to understanding the circuits’ potential future performance. The cadence virtuoso tool is used in this study to assess how PVT fluctuations affect various fin-shaped field effect transistor (FinFET) circuits. In this research, high-performance FinFET-based circuits at 7 nm are discussed with a variation in temperature and voltage. The idea behind the technology is the improvement of power dissipation and delay reduction at the rise of temperature and reduced supply voltage. With the use of a multi-gate predictive model, simulation is carried out employing diverse domino logic at the 7 nm technology node of FinFET files. The proposed set-reset logic circuit and high-speed cascade circuit method shows less power dissipation and delay compared to the existing current mirror footed domino, high-speed clocked delay, and modified high-speed clocked delay with a variation of temperature and supply voltage. For the proposed set-reset logic circuit and high speed cascade circuit, a Monte Carlo simulation is done to find the mean and standard deviation. FinFET simulations are run on the suggested circuit for the reduction of delay for the rise of temperature and reduction of supply voltage from 0.7 V to 0.3 V. In comparison, the proposed method results in a maximum power decrease compared to existing ones. Compared to the existing one, proposed techniques achieve a maximum delay and area reduction. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

12 pages, 3084 KiB  
Communication
A Hybrid GPU and CPU Parallel Computing Method to Accelerate Millimeter-Wave Imaging
by Li Ding, Zhaomiao Dong, Huagang He and Qibin Zheng
Electronics 2023, 12(4), 840; https://doi.org/10.3390/electronics12040840 - 07 Feb 2023
Cited by 2 | Viewed by 1443
Abstract
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed [...] Read more.
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed performance in a graphics processing unit (GPU) platform. Therefore, in this paper, we present an acceleration optimization method based on the hybrid GPU and central processing unit (CPU) parallel computation for implementing the RMA. The proposed method exploits the strong logic-control capability of the CPU to assist the GPU in processing the logic controls of the interpolation stage. The common positions of wavenumber-domain components to be interpolated are calculated by the CPU and stored in the constant memory for broadcast at any time. This avoids the repetitive computation consumed in a GPU-only scheme. Then the GPU is responsible for the remaining matrix-related steps and outputs the needed wavenumber-domain values. The imaging experiments verify the acceleration efficiency of the proposed method and demonstrate that the speedup ratio of our proposed method is more than 15 times of that by the CPU-only method, and more than 2 times of that by the GPU-only method. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

11 pages, 500 KiB  
Article
Mapping and Optimization Method of SpMV on Multi-DSP Accelerator
by Sheng Liu, Yasong Cao and Shuwei Sun
Electronics 2022, 11(22), 3699; https://doi.org/10.3390/electronics11223699 - 11 Nov 2022
Cited by 1 | Viewed by 1241
Abstract
Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense vector, and the sparseness of a sparse matrix is often more than 90%. Usually, the sparse matrix is compressed to save storage resources, but this causes irregular access to dense [...] Read more.
Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense vector, and the sparseness of a sparse matrix is often more than 90%. Usually, the sparse matrix is compressed to save storage resources, but this causes irregular access to dense vectors in the algorithm, which takes a lot of time and degrades the SpMV performance of the system. In this study, we design a dedicated channel in the DMA to implement an indirect memory access process to speed up the SpMV operation. On this basis, we propose six SpMV algorithm schemes and map them to optimize the performance of SpMV. The results show that the M processor’s SpMV performance reached 6.88 GFLOPS. Besides, the average performance of the HPCG benchmark is 2.8 GFLOPS. Full article
(This article belongs to the Special Issue High-Performance Computing and Its Applications)
Show Figures

Figure 1

Back to TopTop