High-Performance Computer Architectures and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 August 2021) | Viewed by 37640

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Architecture and Technology/CITIC, University of Granada, 18071 Granada, Spain
Interests: computer architecture; communication networks; distributed systems; security; neutrino telescopes
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

HPC offers stunning opportunities to solve complex problems thanks to the use of advanced hardware platforms, parallelization, distributed programming, and algorithms. Computational power rises steadily due to continuous evolution in computer architectures, networks, and software. However, it is a great challenge to take advantage of this huge amount of resources.

Thanks to the creative vision of scientists and engineers that are one step ahead of the latest technologies, they can offer new solutions that finally impact on the wellbeing of our society. A combination of these technologies and algorithms can unravel a broad spectrum of tough problems with the use of simulation models, optimization techniques, extensive dataset processing, or machine learning. Furthermore, related issues have to be considered, such as energy-efficient solutions, secure frameworks, scalability, load balancing, I/O systems, and cloud platforms, among others.

This Special Issue aims to gather innovative contributions, with new approaches, proposals, techniques, and applications in this field. The topics of interest include but are not limited to:

- High-performance computer architecture;

- High-performance computer system software;

- Multicore and multithreaded architecture methods;

- Applications related to high-performance computing;

- Distributed systems;

- High-performance computing in cloud platforms;

- Data-intensive computing;

- Optimal energy solutions;

- Energy-aware scheduling;

- Accelerators such as GPUs and TPUs.

Dr. Antonio F. Díaz
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • High-performance computing
  • Cluster computing
  • Simulation tools
  • Distributed computing
  • Parallelization techniques
  • Computer architecture
  • Computer networks
  • GPU
  • Energy-aware scheduling
  • HPC-I/O systems

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 13276 KiB  
Article
Performance Evaluation of NVMe-over-TCP Using Journaling File Systems in International WAN
by Se-young Yu
Electronics 2021, 10(20), 2486; https://doi.org/10.3390/electronics10202486 - 13 Oct 2021
Viewed by 2117
Abstract
Distributing Big Data for science is pushing the capabilities of networks and computing systems. However, the fundamental concept of copying data from one machine to another has not been challenged in collaborative science. As recent storage system development uses modern fabrics to provide [...] Read more.
Distributing Big Data for science is pushing the capabilities of networks and computing systems. However, the fundamental concept of copying data from one machine to another has not been challenged in collaborative science. As recent storage system development uses modern fabrics to provide faster remote data access with lower overhead, traditional data movement using Data Transfer Nodes must cope with the paradigm shift from a store-and-forward model to streaming data with direct storage access over the networks. This study evaluates NVMe-over-TCP (NVMe-TCP) in a long-distance network using different file systems and configurations to characterize remote NVMe file system access performance in MAN and WAN data moving scenarios. We found that NVMe-TCP is more suitable for remote data read than remote data write over the networks, and using RAID0 can significantly improve performance in a long-distance network. Additionally, a fine-tuning file system can improve remote write performance in DTNs with a long-distance network. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

25 pages, 921 KiB  
Article
Straightforward Heterogeneous Computing with the oneAPI Coexecutor Runtime
by Raúl Nozal and Jose Luis Bosque
Electronics 2021, 10(19), 2386; https://doi.org/10.3390/electronics10192386 - 29 Sep 2021
Cited by 5 | Viewed by 2415
Abstract
Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the [...] Read more.
Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the efforts required to provide them with co-execution capabilities that can fully exploit the applications. There are many proposals to simplify the programming and management of acceleration devices and multi-core CPUs. However, in many cases, portability and ease of use compromise the efficiency of different devices—even more so when co-executing. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using two heterogeneous systems composed of an integrated GPU and CPU. Static and dynamic load balancers are integrated and evaluated, highlighting single and co-execution strategies and the most significant key points of this promising technology. Experimental results show that co-execution is worthwhile when using dynamic algorithms and improves the efficiency even further when using unified shared memory. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

16 pages, 1252 KiB  
Article
Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment
by Jun-Yeong Lee, Moon-Hyun Kim, Syed Asif Raza Shah, Sang-Un Ahn, Heejun Yoon and Seo-Young Noh
Electronics 2021, 10(12), 1471; https://doi.org/10.3390/electronics10121471 - 18 Jun 2021
Cited by 5 | Viewed by 3732
Abstract
Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks [...] Read more.
Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

19 pages, 357 KiB  
Article
Assessment of OpenMP Master–Slave Implementations for Selected Irregular Parallel Applications
by Paweł Czarnul
Electronics 2021, 10(10), 1188; https://doi.org/10.3390/electronics10101188 - 16 May 2021
Cited by 5 | Viewed by 1962
Abstract
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks [...] Read more.
The paper investigates various implementations of a master–slave paradigm using the popular OpenMP API and relative performance of the former using modern multi-core workstation CPUs. It is assumed that a master partitions available input into a batch of predefined number of data chunks which are then processed in parallel by a set of slaves and the procedure is repeated until all input data has been processed. The paper experimentally assesses performance of six implementations using OpenMP locks, the tasking construct, dynamically partitioned for loop, without and with overlapping merging results and data generation, using the gcc compiler. Two distinct parallel applications are tested, each using the six aforementioned implementations, on two systems representing desktop and worstation environments: one with Intel i7-7700 3.60 GHz Kaby Lake CPU and eight logical processors and the other with two Intel Xeon E5-2620 v4 2.10 GHz Broadwell CPUs and 32 logical processors. From the application point of view, irregular adaptive quadrature numerical integration, as well as finding a region of interest within an irregular image is tested. Various compute intensities are investigated through setting various computing accuracy per subrange and number of image passes, respectively. Results allow programmers to assess which solution and configuration settings such as the numbers of threads and thread affinities shall be preferred. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

18 pages, 1710 KiB  
Article
A Web-Based Tool for Automatic Detection and Visualization of DNA Differentially Methylated Regions
by Lisardo Fernández, Ricardo Olanda, Mariano Pérez and Juan M. Orduña
Electronics 2021, 10(9), 1083; https://doi.org/10.3390/electronics10091083 - 03 May 2021
Cited by 2 | Viewed by 2493
Abstract
The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the understanding of genetic diseases related to abnormal cell behavior. DNA methylation analysis tools have become especially relevant in recent years. However, these tools have a high computational cost and some [...] Read more.
The study of Deoxyribonucleic Acid (DNA) methylation has allowed important advances in the understanding of genetic diseases related to abnormal cell behavior. DNA methylation analysis tools have become especially relevant in recent years. However, these tools have a high computational cost and some of them require the configuration of specific hardware and software, extending the time for research and diagnosis. In previous works, we proposed some tools for DNA methylation analysis and a new tool, called HPG-DHunter, for the detection and visualization of Differentially Methylated Regions (DMRs). Even though this tool offers a user-friendly interface, its installation and maintenance requires the information technology knowledge specified above. In this paper, we propose our tool as a web-based application, which allows biomedical researchers the use of a powerful tool for methylation analysis, even for those not specialized in the management of Graphics Processing Units (GPUs) and their related software. The performance evaluation results show that this web-based version of HPG-DHunter tool improves the response time offered to the user, also offering an improved interface and higher visualization quality, while showing the same efficiency in DMR identification than the standalone version. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

17 pages, 6553 KiB  
Article
Enabling Large-Scale Simulations of Quantum Transport with Manycore Computing
by Yosang Jeong and Hoon Ryu
Electronics 2021, 10(3), 253; https://doi.org/10.3390/electronics10030253 - 22 Jan 2021
Viewed by 1395
Abstract
The non-equilibrium Green’s function (NEGF) is being utilized in the field of nanoscience to predict transport behaviors of electronic devices. This work explores how much performance improvement can be driven for quantum transport simulations with the aid of manycore computing, where the core [...] Read more.
The non-equilibrium Green’s function (NEGF) is being utilized in the field of nanoscience to predict transport behaviors of electronic devices. This work explores how much performance improvement can be driven for quantum transport simulations with the aid of manycore computing, where the core numerical operation involves a recursive process of matrix multiplication. Major techniques adopted for performance enhancement are data restructuring, matrix tiling, thread scheduling, and offload computing, and we present technical details on how they are applied to optimize the performance of simulations in computing hardware, including Intel Xeon Phi Knights Landing (KNL) systems and NVIDIA general purpose graphic processing unit (GPU) devices. With a target structure of a silicon nanowire that consists of 100,000 atoms and is described with an atomistic tight-binding model, the effects of optimization techniques on the performance of simulations are rigorously tested in a KNL node equipped with two Quadro GV100 GPU devices, and we observe that computation is accelerated by a factor of up to ∼20 against the unoptimized case. The feasibility of handling large-scale workloads in a huge computing environment is also examined with nanowire simulations in a wide energy range, where good scalability is procured up to 2048 KNL nodes. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

18 pages, 4990 KiB  
Article
A Web-Based Tool for Simulating Molecular Dynamics in Cloud Environments
by Gonzalo Nicolas-Barreales, Aaron Sujar and Alberto Sanchez
Electronics 2021, 10(2), 185; https://doi.org/10.3390/electronics10020185 - 15 Jan 2021
Cited by 6 | Viewed by 4594
Abstract
Molecular dynamics simulations take advantage of supercomputing environments, e.g., to solve molecular systems composed of millions of atoms. Supercomputers are increasing their computing and memory power while they are becoming more complex with the introduction of Multi-GPU environments. Despite these capabilities, the molecular [...] Read more.
Molecular dynamics simulations take advantage of supercomputing environments, e.g., to solve molecular systems composed of millions of atoms. Supercomputers are increasing their computing and memory power while they are becoming more complex with the introduction of Multi-GPU environments. Despite these capabilities, the molecular dynamics simulation is not an easy process. It requires properly preparing the simulation data and configuring the entire operation, e.g., installing and managing specific software packages to take advantage of the potential of Multi-GPU supercomputers. We propose a web-based tool that facilitates the management of molecular dynamics workflows to be used in combination with a multi-GPU cloud environment. The tool allows users to perform data pipeline and run the simulation in a cloud environment, even for those who are not specialized in the development of molecular dynamics simulators or cloud management. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

14 pages, 3147 KiB  
Article
Parallel Multiphysics Simulation of Package Systems Using an Efficient Domain Decomposition Method
by Weijie Wang, Yannan Liu, Zhenguo Zhao and Haijing Zhou
Electronics 2021, 10(2), 158; https://doi.org/10.3390/electronics10020158 - 13 Jan 2021
Cited by 3 | Viewed by 1892
Abstract
With the continuing downscaling in feature sizes, the thermal impact on material properties and geometrical deformations can no longer be ignored in the analysis of the electromagnetic compatibility or electromagnetic interference of package systems, including System-in-Package and antenna arrays. We present a high-performance [...] Read more.
With the continuing downscaling in feature sizes, the thermal impact on material properties and geometrical deformations can no longer be ignored in the analysis of the electromagnetic compatibility or electromagnetic interference of package systems, including System-in-Package and antenna arrays. We present a high-performance numerical simulation program that is intended to perform large-scale multiphysics simulations using the finite element method. An efficient domain decomposition method was developed to accelerate the multiphysics loops of electromagnetic–thermal stress simulations by considering the fact that the electromagnetic field perturbations caused by geometrical deformation are small and constrained in one or a few subdomains. The multi-level parallelism of the algorithm was also obtained based on an in-house developed parallel infrastructure. Numerical examples showed that our algorithm is able to enable simulation with multiple processors in parallel and, more importantly, achieve a significant reduction in computation time compared with traditional methods. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

18 pages, 2578 KiB  
Article
DTaPO: Dynamic Thermal-Aware Performance Optimization for Dark Silicon Many-Core Systems
by Mohammed Sultan Mohammed, Ali A. M. Al-Kubati, Norlina Paraman, Ab Al-Hadi Ab Rahman and M. N. Marsono
Electronics 2020, 9(11), 1980; https://doi.org/10.3390/electronics9111980 - 23 Nov 2020
Cited by 6 | Viewed by 1978
Abstract
Future many-core systems need to handle high power density and chip temperature effectively. Some cores in many-core systems need to be turned off or ‘dark’ to manage chip power and thermal density. This phenomenon is also known as the dark silicon problem. This [...] Read more.
Future many-core systems need to handle high power density and chip temperature effectively. Some cores in many-core systems need to be turned off or ‘dark’ to manage chip power and thermal density. This phenomenon is also known as the dark silicon problem. This problem prevents many-core systems from utilizing and gaining improved performance from a large number of processing cores. This paper presents a dynamic thermal-aware performance optimization of dark silicon many-core systems (DTaPO) technique for optimizing dark silicon a many-core system performance under temperature constraint. The proposed technique utilizes both task migration and dynamic voltage frequency scaling (DVFS) for optimizing the performance of a many-core system while keeping system temperature in a safe operating limit. Task migration puts hot cores in low-power states and moves tasks to cooler dark cores to aggressively reduce chip temperature while maintaining high overall system performance. To reduce task migration overhead due to cold start, the source core (i.e., active core) keeps its L2 cache content during the initial migration phase. The destination core (i.e., dark core) can access it to reduce the impact of cold start misses. Moreover, the proposed technique limits tasks migration among cores that share the last level cache (LLC). In the case of major thermal violation and no cooler cores being available, DVFS is used to reduce the hot cores temperature gradually by reducing their frequency. Experimental results for different threshold temperatures show that DTaPO can keep the average system temperature below the thermal limit. Affirmatively, the execution time penalty is reduced by up to 18% compared with using only DVFS for all thermal thresholds. Moreover, the average peak temperature is reduced by up to 10.8°C. In addition, the experimental results show that DTaPO improves the system’s performance by up to 80% compared to optimal sprinting patterns (OSP) and reduces the temperature by up to 13.6°C. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

18 pages, 431 KiB  
Article
Exploiting Heterogeneous Parallelism on Hybrid Metaheuristics for Vector Autoregression Models
by Javier Cuenca, José-Matías Cutillas-Lozano, Domingo Giménez, Alberto Pérez-Bernabeu and José J. López-Espín
Electronics 2020, 9(11), 1781; https://doi.org/10.3390/electronics9111781 - 27 Oct 2020
Viewed by 1455
Abstract
In the last years, the huge amount of data available in many disciplines makes the mathematical modeling, and, more concretely, econometric models, a very important technique to explain those data. One of the most used of those econometric techniques is the Vector Autoregression [...] Read more.
In the last years, the huge amount of data available in many disciplines makes the mathematical modeling, and, more concretely, econometric models, a very important technique to explain those data. One of the most used of those econometric techniques is the Vector Autoregression Models (VAR) which are multi-equation models that linearly describe the interactions and behavior of a group of variables by using their past. Traditionally, Ordinary Least Squares and Maximum likelihood estimators have been used in the estimation of VAR models. These techniques are consistent and asymptotically efficient under ideal conditions of the data and the identification problem. Otherwise, these techniques would yield inconsistent parameter estimations. This paper considers the estimation of a VAR model by minimizing the difference between the dependent variables in a certain time, and the expression of their own past and the exogenous variables of the model (in this case denoted as VARX model). The solution of this optimization problem is approached through hybrid metaheuristics. The high computational cost due to the huge amount of data makes it necessary to exploit High-Performance Computing for the acceleration of methods to obtain the models. The parameterized, parallel implementation of the metaheuristics and the matrix formulation ease the simultaneous exploitation of parallelism for groups of hybrid metaheuristics. Multilevel and heterogeneous parallelism are exploited in multicore CPU plus multiGPU nodes, with the optimum combination of the different parallelism parameters depending on the particular metaheuristic and the problem it is applied to. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

12 pages, 260 KiB  
Article
A Parallel Algorithm for Matheuristics: A Comparison of Optimization Solvers
by Martín González, Jose J. López-Espín and Juan Aparicio
Electronics 2020, 9(9), 1541; https://doi.org/10.3390/electronics9091541 - 21 Sep 2020
Cited by 3 | Viewed by 2936
Abstract
Metaheuristic and exact methods are one of the most common tools to solve Mixed-Integer Optimization Problems (MIPs). Most of these problems are NP-hard problems, being intractable to obtain optimal solutions in a reasonable time when the size of the problem is huge. In [...] Read more.
Metaheuristic and exact methods are one of the most common tools to solve Mixed-Integer Optimization Problems (MIPs). Most of these problems are NP-hard problems, being intractable to obtain optimal solutions in a reasonable time when the size of the problem is huge. In this paper, a hybrid parallel optimization algorithm for matheuristics is studied. In this algorithm, exact and metaheuristic methods work together to solve a Mixed Integer Linear Programming (MILP) problem which is divided into two different subproblems, one of which is linear (and easier to solve by exact methods) and the other discrete (and is solved using metaheuristic methods). Even so, solving this problem has a high computational cost. The algorithm proposed follows an efficient decomposition which is based on the nature of the decision variables (continuous versus discrete). Because of the high cost of the algorithm, as this kind of problem belongs to NP-hard problems, parallelism techniques have been incorporated at different levels to reduce the computing cost. The matheuristic has been optimized both at the level of the problem division and internally. This configuration offers the opportunity to improve the computational time and the fitness function. The paper also focuses on the performance of different optimization software packages working in parallel. In particular, a comparison of two well-known optimization software packages (CPLEX and GUROBI) is performed when they work executing several simultaneous instances, solving various problems at the same time. Thus, this paper proposes and studies a two-level parallel algorithm based on message-passing (MPI) and shared memory (Open MP) schemes where the two subproblems are considered and where the linear problem is solved by using and studying optimization software packages (CPLEX and GUROBI). Experiments have also been carried out to ascertain the performance of the application using different programming paradigms (shared memory and distributed memory). Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

26 pages, 1722 KiB  
Article
Multiprotocol Authentication Device for HPC and Cloud Environments Based on Elliptic Curve Cryptography
by Antonio F. Díaz, Ilia Blokhin, Mancia Anguita, Julio Ortega and Juan J. Escobar
Electronics 2020, 9(7), 1148; https://doi.org/10.3390/electronics9071148 - 16 Jul 2020
Cited by 1 | Viewed by 2677
Abstract
Multifactor authentication is a relevant tool in securing IT infrastructures combining two or more credentials. We can find smartcards and hardware tokens to leverage the authentication process, but they have some limitations. Users connect these devices in the client node to log in [...] Read more.
Multifactor authentication is a relevant tool in securing IT infrastructures combining two or more credentials. We can find smartcards and hardware tokens to leverage the authentication process, but they have some limitations. Users connect these devices in the client node to log in or request access to services. Alternatively, if an application wants to use these resources, the code has to be amended with bespoke solutions to provide access. Thanks to advances in system-on-chip devices, we can integrate cryptographically robust, low-cost solutions. In this work, we present an autonomous device that allows multifactor authentication in client–server systems in a transparent way, which facilitates its integration in High-Performance Computing (HPC) and cloud systems, through a generic gateway. The proposed electronic token (eToken), based on the system-on-chip ESP32, provides an extra layer of security based on elliptic curve cryptography. Secure communications between elements use Message Queuing Telemetry Transport (MQTT) to facilitate their interconnection. We have evaluated different types of possible attacks and the impact on communications. The proposed system offers an efficient solution to increase security in access to services and systems. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

13 pages, 7882 KiB  
Article
Comparative Performance Evaluation of Modern Heterogeneous High-Performance Computing Systems CPUs
by Aleksei Sorokin, Sergey Malkovsky, Georgiy Tsoy, Alexander Zatsarinnyy and Konstantin Volovich
Electronics 2020, 9(6), 1035; https://doi.org/10.3390/electronics9061035 - 23 Jun 2020
Cited by 4 | Viewed by 3302
Abstract
The study presents a comparison of computing systems based on IBM POWER8, IBM POWER9, and Intel Xeon Platinum 8160 processors running parallel applications. Memory subsystem bandwidth was studied, parallel programming technologies were compared, and the operating modes and capabilities of simultaneous multithreading technology [...] Read more.
The study presents a comparison of computing systems based on IBM POWER8, IBM POWER9, and Intel Xeon Platinum 8160 processors running parallel applications. Memory subsystem bandwidth was studied, parallel programming technologies were compared, and the operating modes and capabilities of simultaneous multithreading technology were analyzed. Performance analysis for the studied computing systems running parallel applications based on the OpenMP and MPI technologies was carried out by using the NAS Parallel Benchmarks. An assessment of the results obtained during experimental calculations led to the conclusion that IBM POWER8 and Intel Xeon Platinum 8160 systems have almost the same maximum memory bandwidth, but require a different number of threads for efficient utilization. The IBM POWER9 system has the highest maximum bandwidth, which can be attributed to the large number of memory channels per socket. Based on the results of numerical experiments, recommendations are given on how the hardware of a similar grade can be utilized to solve various scientific problems, including recommendations on optimal processor architecture choice for leveraging the operation of high-performance hybrid computing platforms. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

15 pages, 2380 KiB  
Article
PipeCache: High Hit Rate Rule-Caching Scheme Based on Multi-Stage Cache Tables
by Jialun Yang, Tao Li, Jinli Yan, Junnan Li, Chenglong Li and Baosheng Wang
Electronics 2020, 9(6), 999; https://doi.org/10.3390/electronics9060999 - 15 Jun 2020
Cited by 8 | Viewed by 2375
Abstract
OpenFlow switches hardware cannot store all the OpenFlow rules due to a limited resource. The rule-caching scheme is one of the best solutions to solve the hardware size limitation. In OpenFlow switches, Multiple Flow Tables (MFTs) provide more flexible flow control than a [...] Read more.
OpenFlow switches hardware cannot store all the OpenFlow rules due to a limited resource. The rule-caching scheme is one of the best solutions to solve the hardware size limitation. In OpenFlow switches, Multiple Flow Tables (MFTs) provide more flexible flow control than a single table. Exact match and wildcard match are two typical matching methods. The exact match applies to a single flow table and multiple flow tables, but the performance is low under frequently changing traffic. Many commodity switches use Ternary Content Addressable Memory (TCAM) to support fast wildcard lookups. Earlier works on wildcard-match rule-caching focus on the dependency problem caused by the overlaps of match fields. Their designs cannot handle the rule-caching problem of MFTs because they are based on a single flow table, instead of the widely used MFTs structure. So, we propose a new design named PipeCache that solves the problem of MFTs rule-caching scheme based on wildcard-match. In our design, we logically split the TCAM resources, and assign them to each flow table according to the size of each flow table. Each flow table will cache the selected rules into their assigned TCAM resources and be updated in time by our algorithms to make the most use of the limited TCAM resources. We compare our structure with the exact-match scheme and the wildcard-match scheme based on a single flow table under different cache sizes and traffic localities. Experiment results show that our design PipeCache improves cache hit rate by up to 18.2% compared to the exact-match scheme and by up to 21.2% compared to the wildcard-match scheme based on a single flow table. Full article
(This article belongs to the Special Issue High-Performance Computer Architectures and Applications)
Show Figures

Figure 1

Back to TopTop