Reprint

Future Speech Interfaces with Sensors and Machine Intelligence

Edited by
March 2023
252 pages
  • ISBN978-3-0365-6938-3 (Hardback)
  • ISBN978-3-0365-6939-0 (PDF)

This book is a reprint of the Special Issue Future Speech Interfaces with Sensors and Machine Intelligence that was published in

Chemistry & Materials Science
Engineering
Environmental & Earth Sciences
Summary

Speech is the most spontaneous and natural means of communication, as well as the preferred modality for interacting with mobile or fixed electronic devices, but speech in-terfaces have drawbacks, such as a lack of user privacy; non-inclusivity for certain users; poor robustness in noisy conditions; and the difficulty of creating complex man–machine interfaces. The Special Issue “Future Speech Interfaces with Sensors and Machine Intelligence” assembles eleven contributions that cover multimodal and silent speech interfaces; lip reading applications; novel sensors for speech interfaces; and enhanced speech inclusivity tools for future speech interfaces. The articles make important improvements beyond the state of the art, advancing the state of the art to new frontiers in some cases. Short summaries of all articles, grouped by topic, are presented, followed by a global commentary and evaluation.

Format
  • Hardback
License
© 2022 by the authors; CC BY-NC-ND license
Keywords
neural machine translation (NMT); transformer; Arabic dialects; modern standard Arabic; subword units; multi-head attention; shared vocabulary; self-attention; 3D densely connected CNN; 3D multi-layer feature fusion CNN; convolutional neural network; deep learning; lipreading; speech recognition; visual speech recognition; silent speech; continuous-wave radar; European Portuguese; machine learning; multimodal speech; lip reading; ultrasound tongue imaging; pose estimation; speech kinematics; keypoints; landmarks; audio-visual speech recognition; lip-reading; application programming interface; multi-modal interaction; deep neural networks; lipreading; visual speech recognition; multi-view VSR; deep learning; attention mechanism; spatial attention module; convolutional neural network; local self-attention; connectionist temporal classification; text-to-lip; speech synthesis; text-to-speech; speech-to-lip; zero-shot adaptation; generative models; deep learning; artificial intelligence; objective measures; audio-visual speech recognition; hybrid models; end-to-end recognition; reliability measures; decision fusion net; articulation-to-speech synthesis; silent speech interface; speaker adaption; voice conversion; deep learning; audiovisual speech recognition; lipreading; multimodal interaction; edutainment; virtual aquarium; speech processing; ultrasound imaging; deep learning; multimodal speech; silent speech interfaces; lip reading; speech sensors