Submit to Symmetry Review for Symmetry Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 9352

Share This Special Issue

Special Issue Editors

Prof. Dr. Chengshi Zheng

E-Mail Website
Guest Editor

1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: statistical-model-based speech processing; microphone array signal processing; machine learning for speech and audio processing

Prof. Dr. Xiaodong Li

E-Mail Website
Guest Editor

1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing 100190, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: audio/speech signal processing and system development; electroacoustic device/system design and development; active noise and vibration control; sound and vibration signal monitoring and analysis; acoustic measurement and metering

Prof. Dr. Jinqiu Sang

E-Mail Website
Guest Editor

1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: 3D audio reproduction; Binaural hearing

Special Issue Information

Dear Colleagues,

Audio and speech processing has a broad range of potential applications in our daily life, and it has already been widely used in different types of systems, such as audio entertainment, human–machine speech interaction, privacy and security, audio-visual conference, and hearing-assistive systems. Over the last half-century, both statistical signal processing (SSP)-based and machine learning (ML)-based technologies have made great progress, accelerating the research and application of audio and speech processing in a large number of devices. Generally, the signal processing method should match the configurations of the sound recording/reproduction devices, auditory perception, and their potential applications. For audio and speech processing, symmetrical and asymmetrical problems can arise due to the placement of microphones and/or speakers. In human hearing, it is also not always symmetric, as some people’s ears are symmetric while others are asymmetric. Numerous studies have focused on solving the symmetrical and asymmetrical problems for practical applications.

This Special Issue invites original research that investigates the symmetry/asymmetry in audio and speech processing as well as auditory perception. We welcome work that studies the mechanisms, methodologies, and treatments of the symmetrical and asymmetrical problems in the field of audio and speech processing.

Potential topics of interest include, but are not limited to:

Symmetric/asymmetric microphone array and microphone array network for speech and audio signal processing;
Symmetric/asymmetric windows and filter-bank design for speech and audio processing;
3D audio reproduction with symmetric/asymmetric speaker array;
Symmetric/asymmetric hearing impairments and hearing-assistive devices;
Symmetric/asymmetric binaural hearing and signal processing;
Symmetric/asymmetric beamforming patterns for audio and speech signals;
Symmetric/asymmetric tinnitus and treatments.

Prof. Dr. Chengshi Zheng
Prof. Dr. Xiaodong Li
Prof. Dr. Jinqiu Sang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

machine learning
statistical signal processing
microphone array
binaural hearing
wireless acoustic sensor network
beamforming
audio signal processing
speaker array
privacy and security

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

19 pages, 14716 KiB

Open AccessArticle

An Improved Two-Stage Spherical Harmonic ESPRIT-Type Algorithm

by Haocheng Zhou, Zhenghong Liu, Liyan Luo, Mei Wang and Xiyu Song

Symmetry 2023, 15(8), 1607; https://doi.org/10.3390/sym15081607 - 19 Aug 2023

Cited by 1 | Viewed by 878

Abstract

Sensor arrays are gradually becoming a current research hotspot due to their flexible beam control, high signal gain, robustness against extreme interference, and high spatial resolution. Among them, spherical microphone arrays with complex rotational symmetry can capture more sound field information than planar arrays and can convert the collected multiple speech signals into the spherical harmonic domain for processing through spherical modal decomposition. The subspace class direction of arrival (DOA) estimation algorithm is sensitive to noise and reverberation, and its performance can be improved by introducing relative sound pressure and frequency-smoothing techniques. The introduction of the relative sound pressure can increase the difference between the eigenvalues corresponding to the signal subspace and the noise subspace, which is helpful to estimate the number of active sound sources. The eigenbeam estimation of signal parameters via the rotational invariance technique (EB-ESPRIT) is a well-known subspace-based algorithm for a spherical microphone array. The EB-ESPRIT cannot estimate the DOA when the elevation angle approaches 90°. Huang et al. proposed a two-step ESPRIT (TS-ESPRIT) algorithm to solve this problem. The TS-ESPRIT algorithm estimates the elevation and azimuth angles of the signal independently, so there is a problem with DOA parameter pairing. In this paper, the DOA parameter pairing problem of the TS-ESPRIT algorithm is solved by introducing generalized eigenvalue decomposition without increasing the computation of the algorithm. At the same time, the estimation of the elevation angle is given by the arctan function, which increases the estimation accuracy of the elevation angle of the algorithm. The robustness of the algorithm in a noisy environment is also enhanced by introducing the relative sound pressure into the algorithm. Finally, the simulation and field-testing results show that the proposed method not only solves the problem of DOA parameter pairing, but also outperforms the traditional methods in DOA estimation accuracy. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances)

► Show Figures

Figure 1

11 pages, 536 KiB

Open AccessArticle

A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation

by Wupeng Xie, Xiaoxiao Xiang, Xiaojuan Zhang and Guanghong Liu

Symmetry 2023, 15(2), 261; https://doi.org/10.3390/sym15020261 - 17 Jan 2023

Cited by 1 | Viewed by 1689

Abstract

Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances)

► Show Figures

Figure 1

13 pages, 1600 KiB

Open AccessArticle

Multichannel Variational Autoencoder-Based Speech Separation in Designated Speaker Order

by Lele Liao, Guoliang Cheng, Haoxin Ruan, Kai Chen and Jing Lu

Symmetry 2022, 14(12), 2514; https://doi.org/10.3390/sym14122514 - 28 Nov 2022

Cited by 1 | Viewed by 1344

Abstract

The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation matrix and the deep generative model and proves to be a competitive speech separation method. However, the output (global) permutation ambiguity still exists and turns out to be a fundamental problem in applications. In this paper, we address this problem by employing two dedicated encoders. One encodes the speaker identity for the guidance of the output sorting, and the other encodes the linguistic information for the reconstruction of the source signals. The instance normalization (IN) and the adaptive instance normalization (adaIN) are applied to the networks to disentangle the speaker representations from the content representations. The separated sources are arranged in designated order by a symmetric permutation alignment scheme. In the experiments, we test the proposed method in different gender combinations and various reverberant conditions and generalize it to unseen speakers. The results validate its reliable sorting accuracy and good separation performance. The proposed method outperforms the other baseline methods and maintains stable performance, achieving over 20 dB SIR improvement even in high reverberant environments. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances)

► Show Figures

Figure 1

17 pages, 14081 KiB

Open AccessArticle

A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement

by Wenzhe Liu, Andong Li, Xiao Wang, Minmin Yuan, Yi Chen, Chengshi Zheng and Xiaodong Li

Symmetry 2022, 14(6), 1081; https://doi.org/10.3390/sym14061081 - 24 May 2022

Cited by 5 | Viewed by 2154

Abstract

Most deep-learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients, to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural filter that fully exploits the spectro-temporal-spatial information in the beamspace domain. Specifically, multiple beams are designed to steer towards all directions, using a parameterized super-directive beamformer in the first stage. After that, a deep-learning-based filter is learned by, simultaneously, modeling the spectro-temporal-spatial discriminability of the speech and the interference, so as to extract the desired speech, coarsely, in the second stage. Finally, to further suppress the interference components, especially at low frequencies, a residual estimation module is adopted, to refine the output of the second stage. Experimental results demonstrate that the proposed approach outperforms many state-of-the-art (SOTA) multi-channel methods, on the generated multi-channel speech dataset based on the DNS-Challenge dataset. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances)

► Show Figures

Figure 1

12 pages, 2560 KiB

Open AccessArticle

Narrowband Active Noise Control Using Decimated Controller for Disturbance with Close Frequencies

by Fengyan An and Bilong Liu

Symmetry 2022, 14(3), 607; https://doi.org/10.3390/sym14030607 - 18 Mar 2022

Viewed by 1438

Abstract

In this paper, multi-channel active noise control systems subjected to narrowband disturbances with close frequencies are investigated. Instead of controlling each frequency separately, a mixed-reference signal is assumed and thus a transversal controller is utilized. First, the convergent behaviors of a generalized FxLMS-based algorithm are theoretically analyzed in the mean sense, from which the influence of the controller structure on the convergence rate is revealed. A novel narrowband algorithm is then proposed, in which a decimated transversal controller is used to alleviate the computational burden. Simulations based on a 4 × 8 active-noise-control system are carried out to verify the proposed method. The results show that a good convergence rate can be obtained, and the computational complexity can also be greatly reduced. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances)

► Show Figures

Journal Menu

Journal Browser

Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI