Modeling of Multimodal Speech Recognition and Language Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 May 2024 | Viewed by 1137

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576, Singapore
Interests: automatic lyrics transcription; speech recognition; speech-to-singing conversion; singing information processing; music information retrieval and multi-modal processing.

E-Mail Website
Guest Editor
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: multi-modal fusion; speaker localization and tracking; speech-related topics

E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576, Singapore
Interests: multi-modal processing; speaker recognition; active speaker detection; self-supervised learning
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Interests: neuromorphic computing; deep learning; speech recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The Institute of Scientific and Industrial Research, Osaka University, Suita 565-0871, Japan
Interests: voice conversion; speech synthesis; facial expression recognition; multimodal emotion recognition; statistical signal processing

Special Issue Information

Dear Colleagues,

This Special Issue, ‘Modeling of Multimodal Speech Recognition and Language Processing,’ aims to delve into the rapidly evolving landscape of automatic speech recognition (ASR) and language processing. It seeks to collate papers exploring innovative approaches that bridge the gap between human speech comprehension and computational interpretation, as well as that emphasize the development of novel techniques to enhance ASR and language modeling, particularly in challenging environments such as diverse noisy settings, multimodal contexts, and multi-lingual speech recognition.

By concentrating on challenging real-world scenarios, we encourage researchers to push the boundaries of existing knowledge and contribute ground-breaking solutions to the field. Furthermore, this Special Issue is designed to provide a comprehensive resource for researchers, both newcomers and experts, by presenting cutting-edge research, methodologies, and insights that are directly applicable to real-world ASR and language processing challenges.

In relation to the existing approaches, this Special Issue seeks to build upon the foundation laid by prior research in ASR and language processing, as well as extend and enhance the existing literature by focusing on emerging challenges, such as multimodal recognition and security concerns, that have gained prominence in recent years. By addressing these gaps in the literature, we aim to offer a forward-looking perspective on ASR and language processing, showcasing practical solutions and insights that align with contemporary demands. Researchers can expect to find valuable references and inspiration to address the most pressing issues in the field, making this Special Issue a pivotal addition to the existing body of work.

Topics of interests include, but are not limited to, the following:

  • Robust speech recognition;
  • Language modeling;
  • Multi-lingual speech recognition;
  • Audio-visual speech recognition;
  • Fast decoding techniques;
  • Representation learning for audio, text, or/and vision;
  • Speaker recognition for speech recognition;
  • Audio security and adversarial attacks on speech recognition models;
  • Large speech models;
  • Large language models for speech recognition.

Dr. Xiaoxue Gao
Dr. Xinyuan Qian
Dr. Ruijie Tao
Dr. Malu Zhang
Dr. Zhaojie Luo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • robust speech recognition
  • novel approaches for speech recognition
  • multi-lingual speech recognition
  • audio-visual speech recognition
  • language modelling
  • self-supervised learning for speech processing
  • representation learning for audio, language and vision

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1256 KiB  
Article
Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language
by Eva Kiktová, Rudolph Sock and Peter Getlík
Electronics 2024, 13(3), 602; https://doi.org/10.3390/electronics13030602 - 01 Feb 2024
Viewed by 416
Abstract
This study deals with an acoustic perceptual test performed on the basis of adaptive matrix tests, which represent a modern and reliable tool that can be used not only in perceptual phonetics but also for detecting problems related to hearing. The tests used, [...] Read more.
This study deals with an acoustic perceptual test performed on the basis of adaptive matrix tests, which represent a modern and reliable tool that can be used not only in perceptual phonetics but also for detecting problems related to hearing. The tests used, based on the first Slovak adaptive matrix, provided extensive test material, which was evaluated through a series of tests implemented according to ICRA (International Collegium of Rehabilitative Audiology) guidelines. Healthy listeners took part in the tests, and, during the tests, they listened to prepared sentence stimuli simultaneously with noise. Out of a total number of 30 tests, 15 tests met the demanding criteria. The tests were evaluated from the point of view of the word recognition score, the slope of the psychometric curve function, and also the threshold values corresponding to word recognition at the levels of 20%, 50%, and 80%. We also investigated and compared the impact of two different testing strategies (open and closed test format) and also the impact of experience or unfamiliarity with the test routine used. The created tests achieved SRT50 = −7.03 ± 0.79 dB and a slope of 13.13 ± 1.60%/dB. Full article
(This article belongs to the Special Issue Modeling of Multimodal Speech Recognition and Language Processing)
Show Figures

Figure 1

Back to TopTop