sensors-logo

Journal Browser

Journal Browser

Analytics and Applications of Audio and Image Sensing Techniques

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (28 February 2022) | Viewed by 48955
Please contact the Guest Editor or the Section Managing Editor at (ava.jiang@mdpi.com) for any queries.

Special Issue Editor


E-Mail Website
Guest Editor
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Interests: audio signal analysis; music information retrieval; knowledge discovery in databases; multimedia; human–computer interaction; data mining; computer science; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Edition of the Journal Sensors is focused on original research involving the use of various audio and image sensing devices, both simultaneously and separately. The goal is to collect a diverse set of papers that span a wide range of analyses and possible applications.

Specifically, the interest is in papers that address the use of environment-based sensors, i.e., placed on the ground, cased in the air or water, etc., and the development of software utilizing the output of these sensors. Papers focused on the construction of optimized sensors are also welcomed.

Prof. Alicja Wieczorkowska
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Digital signal processing
  • Audio signal analysis
  • Image analysis
  • Pattern recognition

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 200 KiB  
Editorial
Analytics and Applications of Audio and Image Sensing Techniques
by Alicja Wieczorkowska
Sensors 2022, 22(21), 8443; https://doi.org/10.3390/s22218443 - 03 Nov 2022
Cited by 1 | Viewed by 904
Abstract
Nowadays, with numerous sensors placed everywhere around us, we can obtain signals collected from a variety of environment-based sensors, including the ones placed on the ground, cased in the air or water, etc [...] Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)

Research

Jump to: Editorial

18 pages, 3663 KiB  
Article
Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer
by Krzysztof Szklanny and Jakub Lachowicz
Sensors 2022, 22(9), 3188; https://doi.org/10.3390/s22093188 - 21 Apr 2022
Cited by 2 | Viewed by 1766
Abstract
Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of [...] Read more.
Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient’s speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0–100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient’s synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

20 pages, 9279 KiB  
Article
Comparison of Infrared Thermography and Other Traditional Techniques to Assess Moisture Content of Wall Specimens
by Letícia C. M. Dafico, Eva Barreira, Ricardo M. S. F. Almeida and Helena Carasek
Sensors 2022, 22(9), 3182; https://doi.org/10.3390/s22093182 - 21 Apr 2022
Cited by 5 | Viewed by 1654
Abstract
High moisture content is a recurrent problem in masonry and can jeopardize durability. Therefore, precise and easy-to-use techniques are welcome both to evaluate the state of conservation and to help in the diagnosis of moisture-related problems. In this research, the humidification and drying [...] Read more.
High moisture content is a recurrent problem in masonry and can jeopardize durability. Therefore, precise and easy-to-use techniques are welcome both to evaluate the state of conservation and to help in the diagnosis of moisture-related problems. In this research, the humidification and drying process of two wall specimens were assessed by infrared thermography and the results were compared with two traditional techniques: surface moisture meter and the gravimetric method. Two climatic chambers were used to impose different ambience conditions to each specimen, to evaluate the impact of air temperature and relative humidity in the results. The qualitative analysis of the thermal images allowed the identification of the phenomena. The quantitative analysis showed that the order of magnitude of the temperature gradient that translates high humidity levels is substantially different in the two chambers, pointing to the influence of the surrounding environment. The presented analysis contributes to identifying the criteria indicative of moisture-related problems in two different scenarios and discusses the correlation between the non-destructive techniques and the moisture content in the masonry walls. The limitations and future research gaps regarding the use of IRT to assess moisture are also highlighted. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

18 pages, 953 KiB  
Article
Context-Unsupervised Adversarial Network for Video Sensors
by Gemma Canet Tarrés and Montse Pardàs
Sensors 2022, 22(9), 3171; https://doi.org/10.3390/s22093171 - 21 Apr 2022
Cited by 2 | Viewed by 1276
Abstract
Foreground object segmentation is a crucial first step for surveillance systems based on networks of video sensors. This problem in the context of dynamic scenes has been widely explored in the last two decades, but it still has open research questions due to [...] Read more.
Foreground object segmentation is a crucial first step for surveillance systems based on networks of video sensors. This problem in the context of dynamic scenes has been widely explored in the last two decades, but it still has open research questions due to challenges such as strong shadows, background clutter and illumination changes. After years of solid work based on statistical background pixel modeling, most current proposals use convolutional neural networks (CNNs) either to model the background or to make the foreground/background decision. Although these new techniques achieve outstanding results, they usually require specific training for each scene, which is unfeasible if we aim at designing software for embedded video systems and smart cameras. Our approach to the problem does not require specific context or scene training, and thus no manual labeling. We propose a network for a refinement step on top of conventional state-of-the-art background subtraction systems. By using a statistical technique to produce a rough mask, we do not need to train the network for each scene. The proposed method can take advantage of the specificity of the classic techniques, while obtaining the highly accurate segmentation that a deep learning system provides. We also show the advantage of using an adversarial network to improve the generalization ability of the network and produce more consistent results than an equivalent non-adversarial network. The results provided were obtained by training the network on a common database, without fine-tuning for specific scenes. Experiments on the unseen part of the CDNet database provided 0.82 a F-score, and 0.87 was achieved for LASIESTA databases, which is a database unrelated to the training one. On this last database, the results outperformed by 8.75% those available in the official table. The results achieved for CDNet are well above those of the methods not based on CNNs, and according to the literature, among the best for the context-unsupervised CNNs systems. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

18 pages, 3840 KiB  
Article
Musical Instrument Identification Using Deep Learning Approach
by Maciej Blaszke and Bożena Kostek
Sensors 2022, 22(8), 3033; https://doi.org/10.3390/s22083033 - 15 Apr 2022
Cited by 12 | Viewed by 5952
Abstract
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It [...] Read more.
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

12 pages, 1762 KiB  
Article
Detection and Recognition of Pollen Grains in Multilabel Microscopic Images
by Elżbieta Kubera, Agnieszka Kubik-Komar, Paweł Kurasiński, Krystyna Piotrowska-Weryszko and Magdalena Skrzypiec
Sensors 2022, 22(7), 2690; https://doi.org/10.3390/s22072690 - 31 Mar 2022
Cited by 16 | Viewed by 2944
Abstract
Analysis of pollen material obtained from the Hirst-type apparatus, which is a tedious and labor-intensive process, is usually performed by hand under a microscope by specialists in palynology. This research evaluated the automatic analysis of pollen material performed based on digital microscopic photos. [...] Read more.
Analysis of pollen material obtained from the Hirst-type apparatus, which is a tedious and labor-intensive process, is usually performed by hand under a microscope by specialists in palynology. This research evaluated the automatic analysis of pollen material performed based on digital microscopic photos. A deep neural network called YOLO was used to analyze microscopic images containing the reference grains of three taxa typical of Central and Eastern Europe. YOLO networks perform recognition and detection; hence, there is no need to segment the image before classification. The obtained results were compared to other deep learning object detection methods, i.e., Faster R-CNN and RetinaNet. YOLO outperformed the other methods, as it gave the mean average precision (mAP@.5:.95) between 86.8% and 92.4% for the test sets included in the study. Among the difficulties related to the correct classification of the research material, the following should be noted: significant similarities of the grains of the analyzed taxa, the possibility of their simultaneous occurrence in one image, and mutual overlapping of objects. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

22 pages, 83637 KiB  
Article
Image Denoising Using a Compressive Sensing Approach Based on Regularization Constraints
by Assia El Mahdaoui, Abdeldjalil Ouahabi and Mohamed Said Moulay
Sensors 2022, 22(6), 2199; https://doi.org/10.3390/s22062199 - 11 Mar 2022
Cited by 68 | Viewed by 5256
Abstract
In remote sensing applications and medical imaging, one of the key points is the acquisition, real-time preprocessing and storage of information. Due to the large amount of information present in the form of images or videos, compression of these data is necessary. Compressed [...] Read more.
In remote sensing applications and medical imaging, one of the key points is the acquisition, real-time preprocessing and storage of information. Due to the large amount of information present in the form of images or videos, compression of these data is necessary. Compressed sensing is an efficient technique to meet this challenge. It consists in acquiring a signal, assuming that it can have a sparse representation, by using a minimum number of nonadaptive linear measurements. After this compressed sensing process, a reconstruction of the original signal must be performed at the receiver. Reconstruction techniques are often unable to preserve the texture of the image and tend to smooth out its details. To overcome this problem, we propose, in this work, a compressed sensing reconstruction method that combines the total variation regularization and the non-local self-similarity constraint. The optimization of this method is performed by using an augmented Lagrangian that avoids the difficult problem of nonlinearity and nondifferentiability of the regularization terms. The proposed algorithm, called denoising-compressed sensing by regularization (DCSR) terms, will not only perform image reconstruction but also denoising. To evaluate the performance of the proposed algorithm, we compare its performance with state-of-the-art methods, such as Nesterov’s algorithm, group-based sparse representation and wavelet-based methods, in terms of denoising and preservation of edges, texture and image details, as well as from the point of view of computational complexity. Our approach permits a gain up to 25% in terms of denoising efficiency and visual quality using two metrics: peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

18 pages, 1768 KiB  
Article
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
by Adam Kurowski, Jozef Kotus, Piotr Odya and Bozena Kostek
Sensors 2022, 22(4), 1641; https://doi.org/10.3390/s22041641 - 19 Feb 2022
Cited by 2 | Viewed by 1663
Abstract
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. [...] Read more.
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way of objectively measuring the quality of, e.g., an acoustical adaptation of conference rooms or public address systems. The wide use of this measure and implementation of this method on numerous measurement devices make STI a popular choice when the speech-related quality of rooms has to be estimated. However, the STI measure has a significant drawback which excludes it from some particular use cases. For instance, if one would like to enhance speech intelligibility by employing a nonlinear digital processing algorithm, the STI method is not suitable to measure the impact of such an algorithm, as it requires that the measurement signal should not be altered in a nonlinear way. Consequently, if a nonlinear speech enhancing algorithm has to be tested, the STI—a standard way of estimating speech transmission cannot be used. In this work, we would like to propose a method based on the STI method but modified in such a way that it makes it possible to employ it for the estimation of the performance of the nonlinear speech intelligibility enhancement method. The proposed approach is based upon a broadband comparison of cumulated energy of the transmitted envelope modulation and the received modulation, so we called it broadband STI (bSTI). Its credibility with regard to signals altered by the environment or nonlinear speech changed by a DSP algorithm is checked by performing a comparative analysis of ten selected impulse responses for which a baseline value of STI was known. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

16 pages, 8291 KiB  
Article
A Setup for Camera-Based Detection of Simulated Pathological States Using a Neonatal Phantom
by Florian Voss, Simon Lyra, Daniel Blase, Steffen Leonhardt and Markus Lüken
Sensors 2022, 22(3), 957; https://doi.org/10.3390/s22030957 - 26 Jan 2022
Cited by 3 | Viewed by 2541
Abstract
Premature infants are among the most vulnerable patients in a hospital. Due to numerous complications associated with immaturity, a continuous monitoring of vital signs with a high sensitivity and accuracy is required. Today, wired sensors are attached to the patient’s skin. However, adhesive [...] Read more.
Premature infants are among the most vulnerable patients in a hospital. Due to numerous complications associated with immaturity, a continuous monitoring of vital signs with a high sensitivity and accuracy is required. Today, wired sensors are attached to the patient’s skin. However, adhesive electrodes can be potentially harmful as they can damage the very thin immature skin. Although unobtrusive monitoring systems using cameras show the potential to replace cable-based techniques, advanced image processing algorithms are data-driven and, therefore, need much data to be trained. Due to the low availability of public neonatal image data, a patient phantom could help to implement algorithms for the robust extraction of vital signs from video recordings. In this work, a camera-based system is presented and validated using a neonatal phantom, which enabled a simulation of common neonatal pathologies such as hypo-/hyperthermia and brady-/tachycardia. The implemented algorithm was able to continuously measure and analyze the heart rate via photoplethysmography imaging with a mean absolute error of 0.91 bpm, as well as the distribution of a neonate’s skin temperature with a mean absolute error of less than 0.55 °C. For accurate measurements, a temperature gain offset correction on the registered image from two infrared thermography cameras was performed. A deep learning-based keypoint detector was applied for temperature mapping and guidance for the feature extraction. The presented setup successfully detected several levels of hypo- and hyperthermia, an increased central-peripheral temperature difference, tachycardia and bradycardia. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

19 pages, 6360 KiB  
Article
Prototyping Mobile Storytelling Applications for People with Aphasia
by Krzysztof Szklanny, Marcin Wichrowski and Alicja Wieczorkowska
Sensors 2022, 22(1), 14; https://doi.org/10.3390/s22010014 - 21 Dec 2021
Cited by 4 | Viewed by 3152
Abstract
Aphasia is a partial or total loss of the ability to articulate ideas or comprehend spoken language, resulting from brain damage, in a person whose language skills were previously normal. Our goal was to find out how a storytelling app can help people [...] Read more.
Aphasia is a partial or total loss of the ability to articulate ideas or comprehend spoken language, resulting from brain damage, in a person whose language skills were previously normal. Our goal was to find out how a storytelling app can help people with aphasia to communicate and share daily experiences. For this purpose, the Aphasia Create app was created for tablets, along with Aphastory for the Google Glass device. These applications facilitate social participation and enhance quality of life by using visual storytelling forms composed of photos, drawings, icons, etc., that can be saved and shared. We performed usability tests (supervised by a neuropsychologist) on six participants with aphasia who were able to communicate. Our work contributes (1) evidence that the functions implemented in the Aphasia Create tablet app suit the needs of target users, but older people are often not familiar with tactile devices, (2) reports that the Google Glass device may be problematic for persons with right-hand paresis, and (3) a characterization of the design guidelines for apps for aphasics. Both applications can be used to work with people with aphasia, and can be further developed. Aphasic centers, in which the apps were presented, expressed interest in using them to work with patients. The Aphasia Create app won the Enactus Poland National Competition in 2015. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

15 pages, 4716 KiB  
Article
The Dependence of Flue Pipe Airflow Parameters on the Proximity of an Obstacle to the Pipe’s Mouth
by Damian Węgrzyn, Piotr Wrzeciono and Alicja Wieczorkowska
Sensors 2022, 22(1), 10; https://doi.org/10.3390/s22010010 - 21 Dec 2021
Cited by 2 | Viewed by 2512
Abstract
This paper describes the influence of the presence of an obstacle near the flue pipe’s mouth on the air jet, which directly affects the parameters of the sound generated by the flue pipe. Labial pipes of the most common types of mouth were [...] Read more.
This paper describes the influence of the presence of an obstacle near the flue pipe’s mouth on the air jet, which directly affects the parameters of the sound generated by the flue pipe. Labial pipes of the most common types of mouth were tested. The method of interval calculus was used instead of invasive measuring instruments. The obtained results prove that the proximity of an obstacle affects the sound’s fundamental frequency, as the airflow speed coming out of the flue pipe’s mouth changes. The relationship between the airflow speed, the value of the Reynolds number, and the Strouhal number was also established. The thesis of the influence of the proximity of an obstacle on the fundamental frequency of the sound of a flue pipe was generalized, and formulas for calculating the untuning of the sound of the pipe were presented for various types of mouth. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

20 pages, 3644 KiB  
Article
Toward Capturing Scientific Evidence in Elderly Care: Efficient Extraction of Changing Facial Feature Points
by Kosuke Hirayama, Sinan Chen, Sachio Saiki and Masahide Nakamura
Sensors 2021, 21(20), 6726; https://doi.org/10.3390/s21206726 - 10 Oct 2021
Cited by 7 | Viewed by 1917
Abstract
To capture scientific evidence in elderly care, a user-defined facial expression sensing service was proposed in our previous study. Since the time-series data of feature values have been growing at a high rate as the measurement time increases, it may be difficult to [...] Read more.
To capture scientific evidence in elderly care, a user-defined facial expression sensing service was proposed in our previous study. Since the time-series data of feature values have been growing at a high rate as the measurement time increases, it may be difficult to find points of interest, especially for detecting changes from the elderly facial expression, such as many elderly people can only be shown in a micro facial expression due to facial wrinkles and aging. The purpose of this paper is to implement a method to efficiently find points of interest (PoI) from the facial feature time-series data of the elderly. In the proposed method, the concept of changing point detection into the analysis of feature values is incorporated by us, to automatically detect big fluctuations or changes in the trend in feature values and detect the moment when the subject’s facial expression changed significantly. Our key idea is to introduce the novel concept of composite feature value to achieve higher accuracy and apply change-point detection to it as well as to single feature values. Furthermore, the PoI finding results from the facial feature time-series data of young volunteers and the elderly are analyzed and evaluated. By the experiments, it is found that the proposed method is able to capture the moment of large facial movements even for people with micro facial expressions and obtain information that can be used as a clue to investigate their response to care. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

10 pages, 771 KiB  
Article
Deep Learning-Based Analysis of Face Images as a Screening Tool for Genetic Syndromes
by Maciej Geremek and Krzysztof Szklanny
Sensors 2021, 21(19), 6595; https://doi.org/10.3390/s21196595 - 02 Oct 2021
Cited by 5 | Viewed by 2106
Abstract
Approximately 4% of the world’s population suffers from rare diseases. A vast majority of these disorders have a genetic background. The number of genes that have been linked to human diseases is constantly growing, but there are still genetic syndromes that remain to [...] Read more.
Approximately 4% of the world’s population suffers from rare diseases. A vast majority of these disorders have a genetic background. The number of genes that have been linked to human diseases is constantly growing, but there are still genetic syndromes that remain to be discovered. The diagnostic yield of genetic testing is continuously developing, and the need for testing is becoming more significant. Due to limited resources, including trained clinical geneticists, patients referred to clinical genetics units must be accurately selected. Around 30–40% of genetic disorders are associated with specific facial characteristics called dysmorphic features. As part of our research, we analyzed the performance of classifiers based on deep learning face recognition models in detecting dysmorphic features. We tested two classification problems: a multiclass problem (15 genetic disorders vs. controls) and a two-class problem (disease vs. controls). In the multiclass task, the best result reached an accuracy level of 84%. The best accuracy result in the two-class problem reached 96%. More importantly, the binary classifier detected disease features in patients with diseases that were not previously present in the training dataset. The classifier was able to generalize differences between patients and controls, and to detect abnormalities without information about the specific disorder. This indicates that a screening tool based on deep learning and facial recognition could not only detect known diseases, but also detect patients with diseases that were not previously known. In the future, this tool could help in screening patients before they are referred to the genetic unit. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

16 pages, 1606 KiB  
Article
Polynomial, Neural Network, and Spline Wavelet Models for Continuous Wavelet Transform of Signals
by Andrey Stepanov
Sensors 2021, 21(19), 6416; https://doi.org/10.3390/s21196416 - 26 Sep 2021
Cited by 5 | Viewed by 1951
Abstract
In this paper a modified wavelet synthesis algorithm for continuous wavelet transform is proposed, allowing one to obtain a guaranteed approximation of the maternal wavelet to the sample of the analyzed signal (overlap match) and, at the same time, a formalized representation of [...] Read more.
In this paper a modified wavelet synthesis algorithm for continuous wavelet transform is proposed, allowing one to obtain a guaranteed approximation of the maternal wavelet to the sample of the analyzed signal (overlap match) and, at the same time, a formalized representation of the wavelet. What distinguishes this method from similar ones? During the procedure of wavelets’ synthesis for continuous wavelet transform it is proposed to use splines and artificial neural networks. The paper also suggests a comparative analysis of polynomial, neural network, and wavelet spline models. It also deals with feasibility of using these models in the synthesis of wavelets during such studies like fine structure of signals, as well as in analysis of large parts of signals whose shape is variable. A number of studies have shown that during the wavelets’ synthesis, the use of artificial neural networks (based on radial basis functions) and cubic splines enables the possibility of obtaining guaranteed accuracy in approaching the maternal wavelet to the signal’s sample (with no approximation error). It also allows for its formalized representation, which is especially important during software implementation of the algorithm for calculating the continuous conversions at digital signal processors and microcontrollers. This paper demonstrates the possibility of using synthesized wavelet, obtained based on polynomial, neural network, and spline models, during the performance of an inverse continuous wavelet transform. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

18 pages, 1916 KiB  
Article
Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions
by Piotr Odya, Jozef Kotus, Adam Kurowski and Bozena Kostek
Sensors 2021, 21(18), 6320; https://doi.org/10.3390/s21186320 - 21 Sep 2021
Cited by 6 | Viewed by 2040
Abstract
The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces [...] Read more.
The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking slowly. It is possible to use algorithms that reduce the rate of speech (RoS) in real time. Therefore, the study aims to find recommended values of RoS in the context of STI (speech transmission index) in different acoustic environments. In the experiments, speech intelligibility for six impulse responses recorded in spaces with different STIs is investigated using a sentence test (for the Polish language). Fifteen subjects with normal hearing participated in these tests. The results of the analytical analysis enabled us to propose a curve specifying the maximum RoS values translating into understandable speech under given acoustic conditions. This curve can be used in speech processing control technology as well as compressive reverse acoustic sensing. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

18 pages, 1732 KiB  
Article
Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication
by Jingyu Quan, Yoshihiro Miyake and Takayuki Nozawa
Sensors 2021, 21(16), 5317; https://doi.org/10.3390/s21165317 - 06 Aug 2021
Cited by 6 | Viewed by 2945
Abstract
During social interaction, humans recognize others’ emotions via individual features and interpersonal features. However, most previous automatic emotion recognition techniques only used individual features—they have not tested the importance of interpersonal features. In the present study, we asked whether interpersonal features, especially time-lagged [...] Read more.
During social interaction, humans recognize others’ emotions via individual features and interpersonal features. However, most previous automatic emotion recognition techniques only used individual features—they have not tested the importance of interpersonal features. In the present study, we asked whether interpersonal features, especially time-lagged synchronization features, are beneficial to the performance of automatic emotion recognition techniques. We explored this question in the main experiment (speaker-dependent emotion recognition) and supplementary experiment (speaker-independent emotion recognition) by building an individual framework and interpersonal framework in visual, audio, and cross-modality, respectively. Our main experiment results showed that the interpersonal framework outperformed the individual framework in every modality. Our supplementary experiment showed—even for unknown communication pairs—that the interpersonal framework led to a better performance. Therefore, we concluded that interpersonal features are useful to boost the performance of automatic emotion recognition tasks. We hope to raise attention to interpersonal features in this study. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

17 pages, 26142 KiB  
Article
Pattern Recognition in Music on the Example of Reconstruction of Chest Organ from Kamień Pomorski
by Piotr Wrzeciono
Sensors 2021, 21(12), 4163; https://doi.org/10.3390/s21124163 - 17 Jun 2021
Cited by 1 | Viewed by 2417
Abstract
The chest organ, which gained popularity at the beginning of the 17th century, is a small pipe organ the size of a large box. Several years ago, while compiling an inventory, a previously unidentified chest organ was discovered at St. John the Baptist’s [...] Read more.
The chest organ, which gained popularity at the beginning of the 17th century, is a small pipe organ the size of a large box. Several years ago, while compiling an inventory, a previously unidentified chest organ was discovered at St. John the Baptist’s Co-Cathedral in Kamień Pomorski. Regrettably, the instrument did not possess any of its original pipes. What remained, however, was an image of the front pipes preserved on the chest door. The main issue involved in the reconstruction of a historic instrument is the restoration of its original tuning (temperament). Additionally, it is important to establish the frequency of A4, as this sound serves as a standard pitch reference in instrument tuning. The study presents a new method that aims to address the above-mentioned problems. To this end, techniques to search for the most probable temperament and establish the correct A4 frequency were developed. The solution is based on the modeling of sound generation in flue pipes, as well as statistical analysis to help match a model to the parameters preserved in the chest organ drawing. Additionally, differentalues of the A4 sound values were defined for temperatures ranging from 10 C to 20 C. The tuning system proposed in 1523 by Pietro Aaron proved to be the most probable temperament. In the process of testing the developed flue pipe model, the maximum tuning temperature was established as 15.8 C. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

10 pages, 1891 KiB  
Article
Deep Learning Methods for Improving Pollen Monitoring
by Elżbieta Kubera, Agnieszka Kubik-Komar, Krystyna Piotrowska-Weryszko and Magdalena Skrzypiec
Sensors 2021, 21(10), 3526; https://doi.org/10.3390/s21103526 - 19 May 2021
Cited by 13 | Viewed by 3298
Abstract
The risk of pollen-induced allergies can be determined and predicted based on data derived from pollen monitoring. Hirst-type samplers are sensors that allow airborne pollen grains to be detected and their number to be determined. Airborne pollen grains are deposited on adhesive-coated tape, [...] Read more.
The risk of pollen-induced allergies can be determined and predicted based on data derived from pollen monitoring. Hirst-type samplers are sensors that allow airborne pollen grains to be detected and their number to be determined. Airborne pollen grains are deposited on adhesive-coated tape, and slides are then prepared, which require further analysis by specialized personnel. Deep learning can be used to recognize pollen taxa based on microscopic images. This paper presents a method for recognizing a taxon based on microscopic images of pollen grains, allowing the pollen monitoring process to be automated. In this research, a deep CNN (convolutional neural network) model was built from scratch. Publicly available deep neural network models, pre-trained on image data (not including microscopic pictures), were also used. The results show that even a simple deep learning model produces quite good results when the classification of pollen grain taxa is performed directly from the images. The best deep learning model achieved 97.88% accuracy in the difficult task of recognizing three types of pollen grains (birch, alder, and hazel) with similar structures. The derived models can be used to build a system to support pollen monitoring experts in their work. Full article
(This article belongs to the Special Issue Analytics and Applications of Audio and Image Sensing Techniques)
Show Figures

Figure 1

Back to TopTop