Theory and Applications of High Performance Computing

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400	Submit
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400	Submit
Big Data and Cognitive Computing BDCC	3.7	4.9	2017	18.2 Days	CHF 1800	Submit
Mathematics mathematics	2.4	3.5	2013	16.9 Days	CHF 2600	Submit
Chips chips	-	-	2022	15.0 days *	CHF 1000	Submit

19 pages, 5239 KiB

Open AccessArticle

Enhancing Regular Expression Processing through Field-Programmable Gate Array-Based Multi-Character Non-Deterministic Finite Automata

by Chuang Zhang, Xuebin Tang and Yuanxi Peng

Electronics 2024, 13(9), 1635; https://doi.org/10.3390/electronics13091635 - 24 Apr 2024

Viewed by 116

Abstract

This work investigates the advantages of FPGA-based Multi-Character Non-Deterministic Finite Automata (MC-NFA) for enhancing regular expression processing over traditional software-based methods. By integrating Field-Programmable Gate Arrays (FPGAs) within a data processing framework, our study showcases significant improvements in processing efficiency, accuracy, and resource [...] Read more.

This work investigates the advantages of FPGA-based Multi-Character Non-Deterministic Finite Automata (MC-NFA) for enhancing regular expression processing over traditional software-based methods. By integrating Field-Programmable Gate Arrays (FPGAs) within a data processing framework, our study showcases significant improvements in processing efficiency, accuracy, and resource utilization for complex pattern matching tasks. We present a novel approach that not only accelerates database and network security applications, but also contributes to the evolving landscape of computational efficiency and hardware acceleration. The findings illustrate that FPGA’s coherent access to main memory and the efficient use of resources lead to considerable gains in processing times and throughput for handling regular expressions, unaffected by expression complexity and driven primarily by dataset size and match location. Our research further introduces a phase shift compensation technique that elevates match accuracy to optimal levels, highlighting FPGA’s potential for real-time, accurate data processing. The study confirms that the benefits of using FPGA for these tasks do not linearly correlate with an increase in resource consumption, underscoring the technology’s efficiency. This paper not only solidifies the case for adopting FPGA technology in complex data processing tasks, but also lays the groundwork for future explorations into optimizing hardware accelerators for broader applications. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

32 pages, 9164 KiB

Open AccessArticle

Performance Analysis and Improvement for CRUD Operations in Relational Databases from Java Programs Using JPA, Hibernate, Spring Data JPA

by Alexandru Marius Bonteanu and Cătălin Tudose

Appl. Sci. 2024, 14(7), 2743; https://doi.org/10.3390/app14072743 - 25 Mar 2024

Viewed by 556

Abstract

The role of databases is to allow for the persistence of data, no matter if they are of the SQL or NoSQL type. In SQL databases, data are structured in a set of tables in the relational database model, grouped in rows and [...] Read more.

The role of databases is to allow for the persistence of data, no matter if they are of the SQL or NoSQL type. In SQL databases, data are structured in a set of tables in the relational database model, grouped in rows and columns. CRUD operations (create, read, update, and delete) are used to manage the information contained in relational databases. Several dialects of the SQL language exist, as well as frameworks for mapping Java classes (models) to a relational database. The question is what we should choose for our Java application, and why? A comparison of the most frequently used relational database management systems, mixed with the most frequently used frameworks should give us some guidance about when to use what. The evaluation is conducted based on the time taken for each CRUD operation to run, from thousands to hundreds of thousands of entries, using the possible combinations in the relational database system and the framework. Aiming to assess and improve the performance, the experiments included the possibility of warming-up the Java Virtual Machine before the execution of queries. Also, the research investigated the time spent using different methods of code to determine the critical regions (bottlenecks). Thus, the conclusions provide a comprehensive overview of the performances of Java applications accessing databases depending on the suite decisions considering the database type, the framework in use, and the type of operation, with clear comparisons between the alternatives, the key findings of the advantages and drawbacks of each of them, and supporting architects and developers in their technological decisions and improving the speed of their programs. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

10 pages, 1557 KiB

Open AccessArticle

High-Speed Wavelet Image Processing Using the Winograd Method with Downsampling

by Pavel Lyakhov, Nataliya Semyonova, Nikolay Nagornov, Maxim Bergerman and Albina Abdulsalyamova

Mathematics 2023, 11(22), 4644; https://doi.org/10.3390/math11224644 - 14 Nov 2023

Viewed by 685

Abstract

Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology. Modern image processing systems cannot keep up with the rapid growth in digital visual information. Various approaches are used to reduce the computational [...] Read more.

Wavelets are actively used to solve a wide range of image processing problems in various fields of science and technology. Modern image processing systems cannot keep up with the rapid growth in digital visual information. Various approaches are used to reduce the computational complexity and increase computational speeds. The Winograd method (WM) is one of the most promising. However, this method is used to obtain sequential values. Its use for wavelet image processing requires expanding the calculation methodology to cases of downsampling. This paper proposes a new approach to reduce the computational complexity of wavelet image processing based on the WM with decimation. Calculations have been carried out and formulas have been derived that implement digital filtering using the WM with downsampling. The derived formulas can be used for 1D filtering with an arbitrary downsampling stride. Hardware modeling of wavelet image filtering on an FPGA showed that the WM reduces the computational time by up to 66%, with increases in the hardware costs and power consumption of 95% and 344%, respectively, compared to the direct method. A promising direction for further research is the implementation of the developed approach on ASIC and the use of modular computing for more efficient parallelization of calculations and an even greater increase in the device speed. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

14 pages, 912 KiB

Open AccessArticle

Improved Parallel Implementation of 1D Discrete Wavelet Transform Using CPU-GPU

by Eduardo Rodriguez-Martinez, Cesar Benavides-Alvarez, Carlos Aviles-Cruz, Fidel Lopez-Saca and Andres Ferreyra-Ramirez

Electronics 2023, 12(16), 3400; https://doi.org/10.3390/electronics12163400 - 10 Aug 2023

Viewed by 675

Abstract

This work describes a data-level parallelization strategy to accelerate the discrete wavelet transform (DWT) which was implemented and compared in two multi-threaded architectures, both with shared memory. The first considered architecture was a multi-core server and the second one was a graphics processing [...] Read more.

This work describes a data-level parallelization strategy to accelerate the discrete wavelet transform (DWT) which was implemented and compared in two multi-threaded architectures, both with shared memory. The first considered architecture was a multi-core server and the second one was a graphics processing unit (GPU). The main goal of the research is to improve the computation times for popular DWT algorithms for representative modern GPU architectures. Comparisons were based on performance metrics (i.e., execution time, speedup, efficiency, and cost) for five decomposition levels of the DWT Daubechies db6 over random arrays of lengths

10^{3}

,

10^{4}

,

10^{5}

,

10^{6}

,

10^{7}

,

10^{8}

, and

10^{9}

. The execution times in our proposed GPU strategy were around

1.2 \times 10^{- 5}

s, compared to

3501 \times 10^{- 5}

s for the sequential implementation. On the other hand, the maximum achievable speedup and efficiency were reached by our proposed multi-core strategy for a number of assigned threads equal to 32. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

14 pages, 2155 KiB

Open AccessArticle

Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment

by Prathamesh Lahande, Parag Kaveri and Jatinderkumar Saini

Informatics 2023, 10(3), 64; https://doi.org/10.3390/informatics10030064 - 02 Aug 2023

Cited by 1 | Viewed by 1202

Abstract

Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason [...] Read more.

Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named reinforcement learning-shortest job first (RL-SJF) has been implemented by integrating the RL technique with the existing SJF algorithm. An experiment was conducted in a simulation platform to compare the working of RL-SJF with SJF, and challenging tasks were computed in multiple scenarios. The experimental results convey that the RL-SJF algorithm enhances the resource-scheduling process by improving the aggregate cost by 14.88% compared to the SJF algorithm. Additionally, the RL-SJF algorithm provided a fault-tolerance mechanism by computing 55.52% of the total tasks compared to 11.11% of the SJF algorithm. Thus, the RL-SJF algorithm improves the overall cloud performance and provides the ideal quality of service (QoS). Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

20 pages, 1917 KiB

Open AccessArticle

An FPGA Architecture for the RRT Algorithm Based on Membrane Computing

by Zeyi Shang, Zhe Wei, Sergey Verlan, Jianming Li and Zhige He

Electronics 2023, 12(12), 2741; https://doi.org/10.3390/electronics12122741 - 20 Jun 2023

Viewed by 1130

Abstract

This paper investigates an FPGA architecture whose primary function is to accelerate parallel computations involved in the rapid-exploring random tree (RRT) algorithm. The RRT algorithm is inherently serial, while in each computing step there are many computations that can be executed simultaneously. Nevertheless, [...] Read more.

This paper investigates an FPGA architecture whose primary function is to accelerate parallel computations involved in the rapid-exploring random tree (RRT) algorithm. The RRT algorithm is inherently serial, while in each computing step there are many computations that can be executed simultaneously. Nevertheless, how to carry out these parallel computations on an FPGA so that a high degree of acceleration can be realized is the key issue. Membrane computing is a parallel computing paradigm inspired from the structures and functions of eukaryotic cells. As a newly proposed membrane computing model, the generalized numerical P system (GNPS) is intrinsically parallel; so, it is a good candidate for modeling parallel computations in the RRT algorithm. Open problems for the FPGA implementation of the RRT algorithm and GNPS include: (1) whether it possible to model the RRT with GNPS; (2) if yes, how to design such an FPGA architecture to achieve a better speedup; and (3) instead of implementing GNPSs with a fixed-point-number format, how to devise a GNPS FPGA architecture working with a floating-point-number format. In this paper, we modeled the RRT with a GNPS at first, showing that it is feasible to model the RRT with a GNPS. An FPGA architecture was fabricated according to the GNPS-modeled RRT. In this architecture, computations, which can be executed in parallel, are accommodated in different inner membranes of the GNPS. These membranes are designed as Verilog modules in the register transfer level model. All the computations within a membrane are triggered by the same clock impulse to implement parallel computing. The proposed architecture is validated by implementing it on the Xilinx VC707 FPGA evaluation board. Compared with the software simulation of the GNPS-modeled RRT, the FPGA architecture achieves a speedup of a

10^{4}

order of magnitude. Although this speedup is obtained on a small map, it reveals that this architecture promises to accelerate the RRT algorithm to a higher level compared with the previously reported architectures. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

17 pages, 7511 KiB

Open AccessArticle

Acceleration of a Production-Level Unstructured Grid Finite Volume CFD Code on GPU

by Jian Zhang, Zhe Dai, Ruitian Li, Liang Deng, Jie Liu and Naichun Zhou

Appl. Sci. 2023, 13(10), 6193; https://doi.org/10.3390/app13106193 - 18 May 2023

Cited by 1 | Viewed by 1439

Abstract

Due to the complex topological relationship, poor data locality, and data racing problems in unstructured CFD computing, how to parallelize the finite volume method algorithms in shared memory to efficiently explore the hardware capabilities of many-core GPUs has become a significant challenge. Based [...] Read more.

Due to the complex topological relationship, poor data locality, and data racing problems in unstructured CFD computing, how to parallelize the finite volume method algorithms in shared memory to efficiently explore the hardware capabilities of many-core GPUs has become a significant challenge. Based on a production-level unstructured CFD software, three shared memory parallel programming strategies, atomic operation, colouring, and reduction were designed and implemented by deeply analysing its computing behaviour and memory access mode. Several data locality optimization methods—grid reordering, loop fusion, and multi-level memory access—were proposed. Aimed at the sequential attribute of LU-SGS solution, two methods based on cell colouring and hyperplane were implemented. All the parallel methods and optimization techniques implemented were comprehensively analysed and evaluated by the three-dimensional grid of the M6 wing and CHN-T1 aeroplane. The results show that using the Cuthill–McKee grid renumbering and loop fusion optimization techniques can improve memory access performance by 10%. The proposed reduction strategy, combined with multi-level memory access optimization, has a significant acceleration effect, speeding up the hot spot subroutine with data races three times. Compared with the serial CPU version, the overall speed-up of the GPU codes can reach 127. Compared with the parallel CPU version, the overall speed-up of the GPU codes can achieve more than thirty times the result in the same Message Passing Interface (MPI) ranks. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

16 pages, 3095 KiB

Open AccessArticle

A High-Performance Computing Cluster for Distributed Deep Learning: A Practical Case of Weed Classification Using Convolutional Neural Network Models

by Manuel López-Martínez, Germán Díaz-Flórez, Santiago Villagrana-Barraza, Luis O. Solís-Sánchez, Héctor A. Guerrero-Osuna, Genaro M. Soto-Zarazúa and Carlos A. Olvera-Olvera

Appl. Sci. 2023, 13(10), 6007; https://doi.org/10.3390/app13106007 - 13 May 2023

Cited by 1 | Viewed by 1603

Abstract

One of the main concerns in precision agriculture (PA) is the growth of weeds within a crop field. Currently, to prevent the spread of weeds, automatic techniques and computational tools are used to help to identify, classify, and detect the different types of [...] Read more.

One of the main concerns in precision agriculture (PA) is the growth of weeds within a crop field. Currently, to prevent the spread of weeds, automatic techniques and computational tools are used to help to identify, classify, and detect the different types of weeds found in agricultural fields. One of the technologies that can help us to process digital information gathered from the agricultural fields is high-performance computing (HPC); this technology has been adopted to carry out projects requiring extra processing and storage in order to execute tasks with a large computational cost. This paper shows the implementation of an HPC cluster (HPCC), in which image processing (IP) and analysis are executed using deep learning (DL) techniques, specifically, convolutional neural networks (CNNs) with the VGG16 and InceptionV3 models, to classify different weed species. The results show the great benefits of using high-performance computing clusters in PA, specifically for classifying images. To apply distributed computing within the HPCC, the Keras and Horovod frameworks were used to train the CNN models, obtaining the best time with the InceptionV3 model with a value of 37 min 55.193 s using six HPCC cores, obtaining an accuracy of 0.65 as a result. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

16 pages, 1187 KiB

Open AccessArticle

PGA: A New Hybrid PSO and GA Method for Task Scheduling with Deadline Constraints in Distributed Computing

by Kaili Shao, Ying Song and Bo Wang

Mathematics 2023, 11(6), 1548; https://doi.org/10.3390/math11061548 - 22 Mar 2023

Cited by 9 | Viewed by 1483

Abstract

Distributed computing, e.g., cluster and cloud computing, has been applied in almost all areas for data processing, while high resource efficiency and user satisfaction are still the ambition of distributed computing. Task scheduling is indispensable for achieving the goal. As the task scheduling [...] Read more.

Distributed computing, e.g., cluster and cloud computing, has been applied in almost all areas for data processing, while high resource efficiency and user satisfaction are still the ambition of distributed computing. Task scheduling is indispensable for achieving the goal. As the task scheduling problem is NP-hard, heuristics and meta-heuristics are frequently applied. Every method has its own advantages and limitations. Thus, in this paper, we designed a hybrid heuristic task scheduling problem by exploiting the high global search ability of the Genetic Algorithm (GA) and the fast convergence of Particle Swarm Optimization (PSO). Different from existing hybrid heuristic approaches that simply sequentially perform two or more algorithms, the PGA applies the evolutionary method of a GA and integrates self- and social cognitions into the evolution. We conduct extensive simulated environments for the performance evaluation, where simulation parameters are set referring to some recent related works. Experimental results show that the PGA has 27.9–65.4% and 33.8–69.6% better performance than several recent works, on average, in user satisfaction and resource efficiency, respectively. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

Topic Menu

Topic Editors

Theory and Applications of High Performance Computing

Topic Information

Keywords

Participating Journals

Published Papers (9 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI