Topic Editors

Department of Applied Mathematics and Mathematical Modeling, North-Caucasus Federal University, 355009 Stavropol, Russia
Computing Platform Lab, Samsung Advanced Institute of Technology, Samsung Electronics, Suwon 16678, Republic of Korea

Theory and Applications of High Performance Computing

Abstract submission deadline
31 August 2024
Manuscript submission deadline
30 November 2024
Viewed by
10467

Topic Information

Dear Colleagues,

The performance of computing devices is an important topic in the context of global digitalization and the widespread introduction of digital data processing systems. The speed of devices does not keep pace with the growth of information that must be registered, stored, processed, and transmitted. The insufficient performance of computing devices is an important problem inherent in modern computer technologies. It is this characteristic of devices that is at the forefront of many areas of modern science and technology. The increase in computing speed is achieved using many different approaches based on parallelized computations, reducing the accuracy of representing and processing digital data, using non-traditional number systems, modifying existing computing blocks of hardware architectures, etc.

The subject of “Theory and Applications of High Performance Computing” is interdisciplinary in nature and offers articles on various theoretical and practical aspects, as well as practical applications using modern computer technologies.

The following topics are considered from the point of view of theoretical aspects, among others:

  • The organization of computations using non-traditional number systems;
  • High-speed data processing with reduced accuracy;
  • Effective methods of parallel computing organization.

Consideration of the following issues is welcome from the point of view of practical aspects:

  • Approaches to the implementation of digital data processing methods in modern specialized devices such as ASIC and FPGA;
  • Architectures of data-processing units with reduced computational complexity.

Practical applications include:

  • Systems for processing signals, images, and video data;
  • High-performance devices for the registration of digital information;
  • Highly parallel multiprocessor computing systems;
  • Intelligent data-processing systems;
  • Data-transmission systems.

Dr. Pavel Lyakhov
Dr. Maxim Deryabin
Topic Editors

Keywords

  • high-speed data processing
  • high-performance devices
  • parallelized computing
  • non-traditional number systems
  • reduced accuracy
  • signal processing
  • image processing
  • video processing
  • ASIC
  • FPGA

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Electronics
electronics
2.9 4.7 2012 15.6 Days CHF 2400 Submit
Applied Sciences
applsci
2.7 4.5 2011 16.9 Days CHF 2400 Submit
Big Data and Cognitive Computing
BDCC
3.7 4.9 2017 18.2 Days CHF 1800 Submit
Mathematics
mathematics
2.4 3.5 2013 16.9 Days CHF 2600 Submit
Chips
chips
- - 2022 15.0 days * CHF 1000 Submit

* Median value for all MDPI journals in the second half of 2023.


Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (9 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
19 pages, 5239 KiB  
Article
Enhancing Regular Expression Processing through Field-Programmable Gate Array-Based Multi-Character Non-Deterministic Finite Automata
by Chuang Zhang, Xuebin Tang and Yuanxi Peng
Electronics 2024, 13(9), 1635; https://doi.org/10.3390/electronics13091635 (registering DOI) - 24 Apr 2024
Abstract
This work investigates the advantages of FPGA-based Multi-Character Non-Deterministic Finite Automata (MC-NFA) for enhancing regular expression processing over traditional software-based methods. By integrating Field-Programmable Gate Arrays (FPGAs) within a data processing framework, our study showcases significant improvements in processing efficiency, accuracy, and resource [...] Read more.
This work investigates the advantages of FPGA-based Multi-Character Non-Deterministic Finite Automata (MC-NFA) for enhancing regular expression processing over traditional software-based methods. By integrating Field-Programmable Gate Arrays (FPGAs) within a data processing framework, our study showcases significant improvements in processing efficiency, accuracy, and resource utilization for complex pattern matching tasks. We present a novel approach that not only accelerates database and network security applications, but also contributes to the evolving landscape of computational efficiency and hardware acceleration. The findings illustrate that FPGA’s coherent access to main memory and the efficient use of resources lead to considerable gains in processing times and throughput for handling regular expressions, unaffected by expression complexity and driven primarily by dataset size and match location. Our research further introduces a phase shift compensation technique that elevates match accuracy to optimal levels, highlighting FPGA’s potential for real-time, accurate data processing. The study confirms that the benefits of using FPGA for these tasks do not linearly correlate with an increase in resource consumption, underscoring the technology’s efficiency. This paper not only solidifies the case for adopting FPGA technology in complex data processing tasks, but also lays the groundwork for future explorations into optimizing hardware accelerators for broader applications. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

32 pages, 9164 KiB  
Article
Performance Analysis and Improvement for CRUD Operations in Relational Databases from Java Programs Using JPA, Hibernate, Spring Data JPA
by Alexandru Marius Bonteanu and Cătălin Tudose
Appl. Sci. 2024, 14(7), 2743; https://doi.org/10.3390/app14072743 - 25 Mar 2024
Viewed by 525
Abstract
The role of databases is to allow for the persistence of data, no matter if they are of the SQL or NoSQL type. In SQL databases, data are structured in a set of tables in the relational database model, grouped in rows and [...] Read more.
The role of databases is to allow for the persistence of data, no matter if they are of the SQL or NoSQL type. In SQL databases, data are structured in a set of tables in the relational database model, grouped in rows and columns. CRUD operations (create, read, update, and delete) are used to manage the information contained in relational databases. Several dialects of the SQL language exist, as well as frameworks for mapping Java classes (models) to a relational database. The question is what we should choose for our Java application, and why? A comparison of the most frequently used relational database management systems, mixed with the most frequently used frameworks should give us some guidance about when to use what. The evaluation is conducted based on the time taken for each CRUD operation to run, from thousands to hundreds of thousands of entries, using the possible combinations in the relational database system and the framework. Aiming to assess and improve the performance, the experiments included the possibility of warming-up the Java Virtual Machine before the execution of queries. Also, the research investigated the time spent using different methods of code to determine the critical regions (bottlenecks). Thus, the conclusions provide a comprehensive overview of the performances of Java applications accessing databases depending on the suite decisions considering the database type, the framework in use, and the type of operation, with clear comparisons between the alternatives, the key findings of the advantages and drawbacks of each of them, and supporting architects and developers in their technological decisions and improving the speed of their programs. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

10 pages, 1557 KiB  
Article
High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling
by Pavel Lyakhov, Nataliya Semyonova, Nikolay Nagornov, Maxim Bergerman and Albina Abdulsalyamova
Mathematics 2023, 11(22), 4644; https://doi.org/10.3390/math11224644 - 14 Nov 2023
Viewed by 679
Abstract
Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology. Modern image processing systems cannot keep up with the rapid growth in digital visual information. Various approaches are used to reduce the computational [...] Read more.
Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology. Modern image processing systems cannot keep up with the rapid growth in digital visual information. Various approaches are used to reduce the computational complexity and increase computational speeds. The Winograd method (WM) is one of the most promising. However, this method is used to obtain sequential values. Its use for wavelet image processing requires expanding the calculation methodology to cases of downsampling. This paper proposes a new approach to reduce the computational complexity of wavelet image processing based on the WM with decimation. Calculations have been carried out and formulas have been derived that implement digital filtering using the WM with downsampling. The derived formulas can be used for 1D filtering with an arbitrary downsampling stride. Hardware modeling of wavelet image filtering on an FPGA showed that the WM reduces the computational time by up to 66%, with increases in the hardware costs and power consumption of 95% and 344%, respectively, compared to the direct method. A promising direction for further research is the implementation of the developed approach on ASIC and the use of modular computing for more efficient parallelization of calculations and an even greater increase in the device speed. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

14 pages, 912 KiB  
Article
Improved Parallel Implementation of 1D Discrete Wavelet Transform Using CPU-GPU
by Eduardo Rodriguez-Martinez, Cesar Benavides-Alvarez, Carlos Aviles-Cruz, Fidel Lopez-Saca and Andres Ferreyra-Ramirez
Electronics 2023, 12(16), 3400; https://doi.org/10.3390/electronics12163400 - 10 Aug 2023
Viewed by 667
Abstract
This work describes a data-level parallelization strategy to accelerate the discrete wavelet transform (DWT) which was implemented and compared in two multi-threaded architectures, both with shared memory. The first considered architecture was a multi-core server and the second one was a graphics processing [...] Read more.
This work describes a data-level parallelization strategy to accelerate the discrete wavelet transform (DWT) which was implemented and compared in two multi-threaded architectures, both with shared memory. The first considered architecture was a multi-core server and the second one was a graphics processing unit (GPU). The main goal of the research is to improve the computation times for popular DWT algorithms for representative modern GPU architectures. Comparisons were based on performance metrics (i.e., execution time, speedup, efficiency, and cost) for five decomposition levels of the DWT Daubechies db6 over random arrays of lengths 103, 104, 105, 106, 107, 108, and 109. The execution times in our proposed GPU strategy were around 1.2×105 s, compared to 3501×105 s for the sequential implementation. On the other hand, the maximum achievable speedup and efficiency were reached by our proposed multi-core strategy for a number of assigned threads equal to 32. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

14 pages, 2155 KiB  
Article
Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment
by Prathamesh Lahande, Parag Kaveri and Jatinderkumar Saini
Informatics 2023, 10(3), 64; https://doi.org/10.3390/informatics10030064 - 02 Aug 2023
Cited by 1 | Viewed by 1200
Abstract
Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason [...] Read more.
Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named reinforcement learning-shortest job first (RL-SJF) has been implemented by integrating the RL technique with the existing SJF algorithm. An experiment was conducted in a simulation platform to compare the working of RL-SJF with SJF, and challenging tasks were computed in multiple scenarios. The experimental results convey that the RL-SJF algorithm enhances the resource-scheduling process by improving the aggregate cost by 14.88% compared to the SJF algorithm. Additionally, the RL-SJF algorithm provided a fault-tolerance mechanism by computing 55.52% of the total tasks compared to 11.11% of the SJF algorithm. Thus, the RL-SJF algorithm improves the overall cloud performance and provides the ideal quality of service (QoS). Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

20 pages, 1917 KiB  
Article
An FPGA Architecture for the RRT Algorithm Based on Membrane Computing
by Zeyi Shang, Zhe Wei, Sergey Verlan, Jianming Li and Zhige He
Electronics 2023, 12(12), 2741; https://doi.org/10.3390/electronics12122741 - 20 Jun 2023
Viewed by 1125
Abstract
This paper investigates an FPGA architecture whose primary function is to accelerate parallel computations involved in the rapid-exploring random tree (RRT) algorithm. The RRT algorithm is inherently serial, while in each computing step there are many computations that can be executed simultaneously. Nevertheless, [...] Read more.
This paper investigates an FPGA architecture whose primary function is to accelerate parallel computations involved in the rapid-exploring random tree (RRT) algorithm. The RRT algorithm is inherently serial, while in each computing step there are many computations that can be executed simultaneously. Nevertheless, how to carry out these parallel computations on an FPGA so that a high degree of acceleration can be realized is the key issue. Membrane computing is a parallel computing paradigm inspired from the structures and functions of eukaryotic cells. As a newly proposed membrane computing model, the generalized numerical P system (GNPS) is intrinsically parallel; so, it is a good candidate for modeling parallel computations in the RRT algorithm. Open problems for the FPGA implementation of the RRT algorithm and GNPS include: (1) whether it possible to model the RRT with GNPS; (2) if yes, how to design such an FPGA architecture to achieve a better speedup; and (3) instead of implementing GNPSs with a fixed-point-number format, how to devise a GNPS FPGA architecture working with a floating-point-number format. In this paper, we modeled the RRT with a GNPS at first, showing that it is feasible to model the RRT with a GNPS. An FPGA architecture was fabricated according to the GNPS-modeled RRT. In this architecture, computations, which can be executed in parallel, are accommodated in different inner membranes of the GNPS. These membranes are designed as Verilog modules in the register transfer level model. All the computations within a membrane are triggered by the same clock impulse to implement parallel computing. The proposed architecture is validated by implementing it on the Xilinx VC707 FPGA evaluation board. Compared with the software simulation of the GNPS-modeled RRT, the FPGA architecture achieves a speedup of a 104 order of magnitude. Although this speedup is obtained on a small map, it reveals that this architecture promises to accelerate the RRT algorithm to a higher level compared with the previously reported architectures. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

17 pages, 7511 KiB  
Article
Acceleration of a Production-Level Unstructured Grid Finite Volume CFD Code on GPU
by Jian Zhang, Zhe Dai, Ruitian Li, Liang Deng, Jie Liu and Naichun Zhou
Appl. Sci. 2023, 13(10), 6193; https://doi.org/10.3390/app13106193 - 18 May 2023
Cited by 1 | Viewed by 1426
Abstract
Due to the complex topological relationship, poor data locality, and data racing problems in unstructured CFD computing, how to parallelize the finite volume method algorithms in shared memory to efficiently explore the hardware capabilities of many-core GPUs has become a significant challenge. Based [...] Read more.
Due to the complex topological relationship, poor data locality, and data racing problems in unstructured CFD computing, how to parallelize the finite volume method algorithms in shared memory to efficiently explore the hardware capabilities of many-core GPUs has become a significant challenge. Based on a production-level unstructured CFD software, three shared memory parallel programming strategies, atomic operation, colouring, and reduction were designed and implemented by deeply analysing its computing behaviour and memory access mode. Several data locality optimization methods—grid reordering, loop fusion, and multi-level memory access—were proposed. Aimed at the sequential attribute of LU-SGS solution, two methods based on cell colouring and hyperplane were implemented. All the parallel methods and optimization techniques implemented were comprehensively analysed and evaluated by the three-dimensional grid of the M6 wing and CHN-T1 aeroplane. The results show that using the Cuthill–McKee grid renumbering and loop fusion optimization techniques can improve memory access performance by 10%. The proposed reduction strategy, combined with multi-level memory access optimization, has a significant acceleration effect, speeding up the hot spot subroutine with data races three times. Compared with the serial CPU version, the overall speed-up of the GPU codes can reach 127. Compared with the parallel CPU version, the overall speed-up of the GPU codes can achieve more than thirty times the result in the same Message Passing Interface (MPI) ranks. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

16 pages, 3095 KiB  
Article
A High-Performance Computing Cluster for Distributed Deep Learning: A Practical Case of Weed Classification Using Convolutional Neural Network Models
by Manuel López-Martínez, Germán Díaz-Flórez, Santiago Villagrana-Barraza, Luis O. Solís-Sánchez, Héctor A. Guerrero-Osuna, Genaro M. Soto-Zarazúa and Carlos A. Olvera-Olvera
Appl. Sci. 2023, 13(10), 6007; https://doi.org/10.3390/app13106007 - 13 May 2023
Cited by 1 | Viewed by 1588
Abstract
One of the main concerns in precision agriculture (PA) is the growth of weeds within a crop field. Currently, to prevent the spread of weeds, automatic techniques and computational tools are used to help to identify, classify, and detect the different types of [...] Read more.
One of the main concerns in precision agriculture (PA) is the growth of weeds within a crop field. Currently, to prevent the spread of weeds, automatic techniques and computational tools are used to help to identify, classify, and detect the different types of weeds found in agricultural fields. One of the technologies that can help us to process digital information gathered from the agricultural fields is high-performance computing (HPC); this technology has been adopted to carry out projects requiring extra processing and storage in order to execute tasks with a large computational cost. This paper shows the implementation of an HPC cluster (HPCC), in which image processing (IP) and analysis are executed using deep learning (DL) techniques, specifically, convolutional neural networks (CNNs) with the VGG16 and InceptionV3 models, to classify different weed species. The results show the great benefits of using high-performance computing clusters in PA, specifically for classifying images. To apply distributed computing within the HPCC, the Keras and Horovod frameworks were used to train the CNN models, obtaining the best time with the InceptionV3 model with a value of 37 min 55.193 s using six HPCC cores, obtaining an accuracy of 0.65 as a result. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

16 pages, 1187 KiB  
Article
PGA: A New Hybrid PSO and GA Method for Task Scheduling with Deadline Constraints in Distributed Computing
by Kaili Shao, Ying Song and Bo Wang
Mathematics 2023, 11(6), 1548; https://doi.org/10.3390/math11061548 - 22 Mar 2023
Cited by 9 | Viewed by 1468
Abstract
Distributed computing, e.g., cluster and cloud computing, has been applied in almost all areas for data processing, while high resource efficiency and user satisfaction are still the ambition of distributed computing. Task scheduling is indispensable for achieving the goal. As the task scheduling [...] Read more.
Distributed computing, e.g., cluster and cloud computing, has been applied in almost all areas for data processing, while high resource efficiency and user satisfaction are still the ambition of distributed computing. Task scheduling is indispensable for achieving the goal. As the task scheduling problem is NP-hard, heuristics and meta-heuristics are frequently applied. Every method has its own advantages and limitations. Thus, in this paper, we designed a hybrid heuristic task scheduling problem by exploiting the high global search ability of the Genetic Algorithm (GA) and the fast convergence of Particle Swarm Optimization (PSO). Different from existing hybrid heuristic approaches that simply sequentially perform two or more algorithms, the PGA applies the evolutionary method of a GA and integrates self- and social cognitions into the evolution. We conduct extensive simulated environments for the performance evaluation, where simulation parameters are set referring to some recent related works. Experimental results show that the PGA has 27.9–65.4% and 33.8–69.6% better performance than several recent works, on average, in user satisfaction and resource efficiency, respectively. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

Back to TopTop