Symmetry/Asymmetry in Speech and Audio Processing: Topics, Challenges and Advances

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 9352

Special Issue Editors


E-Mail Website
Guest Editor
1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: statistical-model-based speech processing; microphone array signal processing; machine learning for speech and audio processing

E-Mail Website
Guest Editor
1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing 100190, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: audio/speech signal processing and system development; electroacoustic device/system design and development; active noise and vibration control; sound and vibration signal monitoring and analysis; acoustic measurement and metering

E-Mail Website
Guest Editor
1. Institute of Acoustics, University of Chinese Academy of Sciences, Beijing, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
Interests: 3D audio reproduction; Binaural hearing

Special Issue Information

Dear Colleagues,

Audio and speech processing has a broad range of potential applications in our daily life, and it has already been widely used in different types of systems, such as audio entertainment, human–machine speech interaction, privacy and security, audio-visual conference, and hearing-assistive systems. Over the last half-century, both statistical signal processing (SSP)-based and machine learning (ML)-based technologies have made great progress, accelerating the research and application of audio and speech processing in a large number of devices. Generally, the signal processing method should match the configurations of the sound recording/reproduction devices, auditory perception, and their potential applications. For audio and speech processing, symmetrical and asymmetrical problems can arise due to the placement of microphones and/or speakers. In human hearing, it is also not always symmetric, as some people’s ears are symmetric while others are asymmetric. Numerous studies have focused on solving the symmetrical and asymmetrical problems for practical applications.

This Special Issue invites original research that investigates the symmetry/asymmetry in audio and speech processing as well as auditory perception. We welcome work that studies the mechanisms, methodologies, and treatments of the symmetrical and asymmetrical problems in the field of audio and speech processing.

Potential topics of interest include, but are not limited to:

  • Symmetric/asymmetric microphone array and microphone array network for speech and audio signal processing;
  • Symmetric/asymmetric windows and filter-bank design for speech and audio processing;
  • 3D audio reproduction with symmetric/asymmetric speaker array;
  • Symmetric/asymmetric hearing impairments and hearing-assistive devices;
  • Symmetric/asymmetric binaural hearing and signal processing;
  • Symmetric/asymmetric beamforming patterns for audio and speech signals;
  • Symmetric/asymmetric tinnitus and treatments.

Prof. Dr. Chengshi Zheng
Prof. Dr. Xiaodong Li
Prof. Dr. Jinqiu Sang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • statistical signal processing
  • microphone array
  • binaural hearing
  • wireless acoustic sensor network
  • beamforming
  • audio signal processing
  • speaker array
  • privacy and security

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 14716 KiB  
Article
An Improved Two-Stage Spherical Harmonic ESPRIT-Type Algorithm
by Haocheng Zhou, Zhenghong Liu, Liyan Luo, Mei Wang and Xiyu Song
Symmetry 2023, 15(8), 1607; https://doi.org/10.3390/sym15081607 - 19 Aug 2023
Cited by 1 | Viewed by 878
Abstract
Sensor arrays are gradually becoming a current research hotspot due to their flexible beam control, high signal gain, robustness against extreme interference, and high spatial resolution. Among them, spherical microphone arrays with complex rotational symmetry can capture more sound field information than planar [...] Read more.
Sensor arrays are gradually becoming a current research hotspot due to their flexible beam control, high signal gain, robustness against extreme interference, and high spatial resolution. Among them, spherical microphone arrays with complex rotational symmetry can capture more sound field information than planar arrays and can convert the collected multiple speech signals into the spherical harmonic domain for processing through spherical modal decomposition. The subspace class direction of arrival (DOA) estimation algorithm is sensitive to noise and reverberation, and its performance can be improved by introducing relative sound pressure and frequency-smoothing techniques. The introduction of the relative sound pressure can increase the difference between the eigenvalues corresponding to the signal subspace and the noise subspace, which is helpful to estimate the number of active sound sources. The eigenbeam estimation of signal parameters via the rotational invariance technique (EB-ESPRIT) is a well-known subspace-based algorithm for a spherical microphone array. The EB-ESPRIT cannot estimate the DOA when the elevation angle approaches 90°. Huang et al. proposed a two-step ESPRIT (TS-ESPRIT) algorithm to solve this problem. The TS-ESPRIT algorithm estimates the elevation and azimuth angles of the signal independently, so there is a problem with DOA parameter pairing. In this paper, the DOA parameter pairing problem of the TS-ESPRIT algorithm is solved by introducing generalized eigenvalue decomposition without increasing the computation of the algorithm. At the same time, the estimation of the elevation angle is given by the arctan function, which increases the estimation accuracy of the elevation angle of the algorithm. The robustness of the algorithm in a noisy environment is also enhanced by introducing the relative sound pressure into the algorithm. Finally, the simulation and field-testing results show that the proposed method not only solves the problem of DOA parameter pairing, but also outperforms the traditional methods in DOA estimation accuracy. Full article
Show Figures

Figure 1

11 pages, 536 KiB  
Article
A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation
by Wupeng Xie, Xiaoxiao Xiang, Xiaojuan Zhang and Guanghong Liu
Symmetry 2023, 15(2), 261; https://doi.org/10.3390/sym15020261 - 17 Jan 2023
Cited by 1 | Viewed by 1689
Abstract
Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework [...] Read more.
Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech. Full article
Show Figures

Figure 1

13 pages, 1600 KiB  
Article
Multichannel Variational Autoencoder-Based Speech Separation in Designated Speaker Order
by Lele Liao, Guoliang Cheng, Haoxin Ruan, Kai Chen and Jing Lu
Symmetry 2022, 14(12), 2514; https://doi.org/10.3390/sym14122514 - 28 Nov 2022
Cited by 1 | Viewed by 1344
Abstract
The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation matrix and the deep generative model and proves to be a competitive speech separation method. However, the output (global) permutation ambiguity still exists and turns out to be a fundamental problem [...] Read more.
The multichannel variational autoencoder (MVAE) integrates the rule-based update of a separation matrix and the deep generative model and proves to be a competitive speech separation method. However, the output (global) permutation ambiguity still exists and turns out to be a fundamental problem in applications. In this paper, we address this problem by employing two dedicated encoders. One encodes the speaker identity for the guidance of the output sorting, and the other encodes the linguistic information for the reconstruction of the source signals. The instance normalization (IN) and the adaptive instance normalization (adaIN) are applied to the networks to disentangle the speaker representations from the content representations. The separated sources are arranged in designated order by a symmetric permutation alignment scheme. In the experiments, we test the proposed method in different gender combinations and various reverberant conditions and generalize it to unseen speakers. The results validate its reliable sorting accuracy and good separation performance. The proposed method outperforms the other baseline methods and maintains stable performance, achieving over 20 dB SIR improvement even in high reverberant environments. Full article
Show Figures

Figure 1

17 pages, 14081 KiB  
Article
A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement
by Wenzhe Liu, Andong Li, Xiao Wang, Minmin Yuan, Yi Chen, Chengshi Zheng and Xiaodong Li
Symmetry 2022, 14(6), 1081; https://doi.org/10.3390/sym14061081 - 24 May 2022
Cited by 5 | Viewed by 2154
Abstract
Most deep-learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients, to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural filter [...] Read more.
Most deep-learning-based multi-channel speech enhancement methods focus on designing a set of beamforming coefficients, to directly filter the low signal-to-noise ratio signals received by microphones, which hinders the performance of these approaches. To handle these problems, this paper designs a causal neural filter that fully exploits the spectro-temporal-spatial information in the beamspace domain. Specifically, multiple beams are designed to steer towards all directions, using a parameterized super-directive beamformer in the first stage. After that, a deep-learning-based filter is learned by, simultaneously, modeling the spectro-temporal-spatial discriminability of the speech and the interference, so as to extract the desired speech, coarsely, in the second stage. Finally, to further suppress the interference components, especially at low frequencies, a residual estimation module is adopted, to refine the output of the second stage. Experimental results demonstrate that the proposed approach outperforms many state-of-the-art (SOTA) multi-channel methods, on the generated multi-channel speech dataset based on the DNS-Challenge dataset. Full article
Show Figures

Figure 1

12 pages, 2560 KiB  
Article
Narrowband Active Noise Control Using Decimated Controller for Disturbance with Close Frequencies
by Fengyan An and Bilong Liu
Symmetry 2022, 14(3), 607; https://doi.org/10.3390/sym14030607 - 18 Mar 2022
Viewed by 1438
Abstract
In this paper, multi-channel active noise control systems subjected to narrowband disturbances with close frequencies are investigated. Instead of controlling each frequency separately, a mixed-reference signal is assumed and thus a transversal controller is utilized. First, the convergent behaviors of a generalized FxLMS-based [...] Read more.
In this paper, multi-channel active noise control systems subjected to narrowband disturbances with close frequencies are investigated. Instead of controlling each frequency separately, a mixed-reference signal is assumed and thus a transversal controller is utilized. First, the convergent behaviors of a generalized FxLMS-based algorithm are theoretically analyzed in the mean sense, from which the influence of the controller structure on the convergence rate is revealed. A novel narrowband algorithm is then proposed, in which a decimated transversal controller is used to alleviate the computational burden. Simulations based on a 4 × 8 active-noise-control system are carried out to verify the proposed method. The results show that a good convergence rate can be obtained, and the computational complexity can also be greatly reduced. Full article
Show Figures

Figure 1

Back to TopTop