Machine Learning and Deep Learning in Speech Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 1827

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science, College of Engineering, University of Seoul, Seoul, Korea
Interests: artificial intelligence; speech recognition; speaker recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Speech recognition has been researched for more than 50 years, and recently, machine learning methods, especially deep learning, has caused breakthroughs in the field. Technology continues to be improved rapidly at this moment.

Therefore, this Special Issue intends to present new ideas and experimental results in speech recognition and other speech-related engineering technologies, from design, service, and theory to their practical uses.  Areas relevant to speech recognition include, but are not limited to,

  • Robust speech recognition in adverse environments
  • Speech enhancement
  • Recognition of speaker, emotion, or health conditions
  • Technology for disordered speech
  • Spoken machine translation
  • Speech embedding
  • Network architecture
  • Anti-spoofing detection

This Special Issue will publish high-quality, original research papers and surveys of recent methods. 

Prof. Dr. Ha-Jin Yu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • speech recognition
  • speech enhancement
  • speaker recognition
  • emotion recognition
  • health condition recognition

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 3292 KiB  
Article
A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages
by Lixu Sun, Nurmemet Yolwas and Lina Jiang
Appl. Sci. 2023, 13(8), 4836; https://doi.org/10.3390/app13084836 - 12 Apr 2023
Cited by 2 | Viewed by 1333
Abstract
Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on [...] Read more.
Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on the quality of negative sample sets in speech contrastive learning. In this paper, we propose the false negatives impact elimination (FNIE) method to filter false negative samples and improve the quality of the negative sample set in speech. FNIE compares the support vector with the negative sample vector set and optimizes the corresponding loss function, allowing the model to learn better speech representations and achieve superior results in low-resource speech recognition. Experiments demonstrate that FNIE effectively filters negative samples, enhances the quality of the negative sample set, and improves the accuracy of speech recognition. The quality of the negative sample set significantly affects the model’s learning ability, and using too many negative samples can deteriorate it. In a low-resource setting, our FNIE method achieved a relative improvement of 2.98% in WER on the English dataset, 14.3% in WER on the Uyghur dataset, and 4.04% in CER on the Mandarin dataset compared to the baseline model. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning in Speech Recognition)
Show Figures

Figure 1

Back to TopTop