Recent Advances in Audio, Speech and Music Processing and Analysis

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Circuit and Signal Processing".

Deadline for manuscript submissions: 16 May 2024 | Viewed by 3918

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, University of the Peloponnese, 24100 Kalamata, Greece
Interests: digital sound processing and analysis; blind speech separation; speech recognition; EEG/MEG brain signal analysis; medical image analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Music Technology & Acoustics, Hellenic Mediterranean University, 74133 Rethymnon, Greece
Interests: networked music performance; machine musicianship; music information retrieval; musical acoustics

Special Issue Information

Dear Colleagues,

Audio plays an important role in everyday life since it is incorporated in various applications from broadcasting and telecommunications to the entertainment, multimedia, and gaming industries. Although less popular than image processing technology, which has overwhelmed the industry in recent years, audio processing in academia is under vigorous research and technological development. The relevant research initiatives are involved with speech recognition, audio compression, noise canceling, speaker verification and identification, voice synthesis, and voice transcription systems, to name a few. Furthermore, with respect to music signals, research initiatives focus on music information retrieval for music streaming and recommendation, networked music making, teaching and performing, autonomous, semi-autonomous computer musicians, and many more. This Special Issue gives the opportunity to disseminate state of the art progress on emerging applications, algorithms, and systems related to audio, speech, and music processing and analysis. 

Topics of interest include, but are not limited to:

  • Audio and speech analysis and recognition.
  • Deep learning for robust speech recognition systems.
  • Active noise cancelling systems.
  • Blind speech separation.
  • Robust speech recognition in multi-simultaneous speaker environments.
  • Room acoustics modeling.
  • Environmental sound recognition.
  • Music information retrieval.
  • Networked music performance systems.
  • Internet of Sounds technologies and applications.
  • Computer accompaniment and machine musicianship.
  • Digital music representations and collaborative music making.
  • Online music education technologies.
  • Computational approaches to musical acoustics.
  • Music generation using deep learning.

Dr. Athanasios Koutras
Dr. Chrisoula Alexandraki
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sound analysis
  • sound processing
  • music information retrieval
  • audio analysis
  • audio recognition
  • music technology
  • computational music cognition

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

12 pages, 1510 KiB  
Article
Modeling Temporal Lobe Epilepsy during Music Large-Scale Form Perception Using the Impulse Pattern Formulation (IPF) Brain Model
by Rolf Bader
Electronics 2024, 13(2), 362; https://doi.org/10.3390/electronics13020362 - 15 Jan 2024
Viewed by 479
Abstract
Musical large-scale form is investigated using an electronic dance music piece fed into a Finite-Difference Time-Domain physical model of the cochlea, which again is input into an Impulse Pattern Formulation (IPF) Brain model. In previous studies, experimental EEG data showed an enhanced correlation [...] Read more.
Musical large-scale form is investigated using an electronic dance music piece fed into a Finite-Difference Time-Domain physical model of the cochlea, which again is input into an Impulse Pattern Formulation (IPF) Brain model. In previous studies, experimental EEG data showed an enhanced correlation between brain synchronization and the musical piece’s amplitude and fractal correlation dimension, representing musical tension and expectancy time points within the large-scale form of musical pieces. This is also in good agreement with a FitzHugh–Nagumo oscillator model.However, this model cannot display temporal developments in large-scale forms. The IPF Brain model shows a high correlation between cochlea input and brain synchronization at the gamma band range around 50 Hz, and also a strong negative correlation with low frequencies, associated with musical rhythm, during time frames with low cochlea input amplitudes. Such a high synchronization corresponds to temporal lobe epilepsy, often associated with creativity or spirituality. Therefore, the IPF Brain model results suggest that these conscious states occur at times of low external input at low frequencies, where isochronous musical rhythms are present. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

18 pages, 4347 KiB  
Article
Applying the Lombard Effect to Speech-in-Noise Communication
by Gražina Korvel, Krzysztof Kąkol, Povilas Treigys and Bożena Kostek
Electronics 2023, 12(24), 4933; https://doi.org/10.3390/electronics12244933 - 08 Dec 2023
Viewed by 767
Abstract
This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The [...] Read more.
This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting; then, the frequency changes in the speech signals were detected using the McAulay and Quartieri algorithm based on a 2D speech representation; next, an average formant track error was computed as a metric to evaluate the quality of the speech signals in noise. Three image assessment methods, namely the SSIM (Structural SIMilarity) index, RMSE (Root Mean Square Error), and dHash (Difference Hash) were used for this purpose. Furthermore, this study analyzed various spectral features of the speech signals in relation to the Lombard effect and the noise types. Finally, this study proposed a method for automatic noise profiling and applied pitch modifications to neutral speech signals according to the profile and the frequency change patterns. This study used an overlap-add synthesis in the STRAIGHT vocoder to generate the synthesized speech. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

13 pages, 1515 KiB  
Article
Blind Source Separation with Strength Pareto Evolutionary Algorithm 2 (SPEA2) Using Discrete Wavelet Transform
by Husamettin Celik and Nurhan Karaboga
Electronics 2023, 12(21), 4383; https://doi.org/10.3390/electronics12214383 - 24 Oct 2023
Viewed by 816
Abstract
This paper presents a new method for separating the mixed audio signals of simultaneous speakers using Blind Source Separation (BSS). The separation of mixed signals is an important issue today. In order to obtain more efficient and superior source estimation performance, a new [...] Read more.
This paper presents a new method for separating the mixed audio signals of simultaneous speakers using Blind Source Separation (BSS). The separation of mixed signals is an important issue today. In order to obtain more efficient and superior source estimation performance, a new algorithm that solves the BSS problem with Multi-Objective Optimization (MOO) methods was developed in this study. In this direction, we tested the application of two methods. Firstly, the Discrete Wavelet Transform (DWT) was used to eliminate the limited aspects of the traditional methods used in BSS and the small coefficients in the signals. Afterwards, the BSS process was optimized with the multi-purpose Strength Pareto Evolutionary Algorithm 2 (SPEA2). Secondly, the Minkowski distance method was proposed for distance measurement by using density information in the discrimination of individuals with raw fitness values for the concept of Pareto dominance. With this proposed method, the originals (original source signals) were estimated by separating the randomly mixed male and two female speech signals. Simulation and experimental results proved that the efficiency and performance of the proposed method can effectively solve BSS problems. In addition, the Pareto front approximation performance of this method also confirmed that it is superior in the Inverted Generational Distance (IGD) indicator. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

Review

Jump to: Research

14 pages, 2488 KiB  
Review
Review of Advances in Speech Processing with Focus on Artificial Neural Networks
by Douglas O’Shaughnessy
Electronics 2023, 12(13), 2887; https://doi.org/10.3390/electronics12132887 - 30 Jun 2023
Viewed by 962
Abstract
Speech is the primary way via which most humans communicate. Computers facilitate this transfer of information, especially when people interact with databases. While some methods to manipulate and interpret speech date back many decades (e.g., Fourier analysis), other processing techniques were developed late [...] Read more.
Speech is the primary way via which most humans communicate. Computers facilitate this transfer of information, especially when people interact with databases. While some methods to manipulate and interpret speech date back many decades (e.g., Fourier analysis), other processing techniques were developed late last century (e.g., linear predictive coding and hidden Markov models). Nonetheless, the last 25 years have seen major advances leading to the wide acceptance of computer-based speech processing, e.g., cellular telephones and real-time online conversations. This paper reviews older techniques and recent methods that focus largely on artificial neural networks. The major highlights in speech research are examined, without delving into mathematical detail, while giving insight into the research choices that have been made. The focus of this work is to understand how and why the discussed methods function well. Full article
(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)
Show Figures

Figure 1

Back to TopTop