sensors-logo

Journal Browser

Journal Browser

Machine Learning and Deep Learning in Image/Video Processing and Sensing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 25 May 2024 | Viewed by 8737

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, China
Interests: machine learning; image processing; SAR

E-Mail Website
Guest Editor
College of Intelligence and Computing, Tianjin University, Tianjin, China
Interests: machine learning; optimization methods

Special Issue Information

Dear Colleagues,

Machine learning has been a powerful tool for humanity in many aspects of daily life. The efforts of scientists have enabled machines and computers to process data and make decisions in order that we may live more conveniently. Deep learning one important subcategory of machine learning. With the advances of the internet and the development of various sensors, the amount of available data is growing rapidly, and the models are also becoming increasingly large in scale. These facts have posed great challenges, prompting questions such as: How can we construct large models? How can we solve large-scale problems with machine learning or deep learning methods? And how can we apply these methods to process various data, e.g., text, images, and videos? This Special Issue will focus on machine learning methods for image/video processing and recognition.

Topics include, but are not limited to, the following:

  • Machine learning methods for image processing;
  • Machine learning methods for image recognition;
  • Machine learning methods for video processing;
  • Deep learning methods for image/video processing;
  • Deep learning methods for image/video recognition;
  • Image analysis and enhancement;
  • Video analysis and enhancement;
  • Data mining methods.

Dr. Hongying Liu
Prof. Dr. Fanhua Shang 
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 24020 KiB  
Article
MAD-UNet: A Multi-Region UAV Remote Sensing Network for Rural Building Extraction
by Hang Xue, Ke Liu, Yumeng Wang, Yuxin Chen, Caiyi Huang, Pengfei Wang and Lin Li
Sensors 2024, 24(8), 2393; https://doi.org/10.3390/s24082393 - 09 Apr 2024
Viewed by 308
Abstract
For the development of an idyllic rural landscape, an accurate survey of rural buildings is essential. The extraction of rural structures from unmanned aerial vehicle (UAV) remote sensing imagery is prone to errors such as misclassifications, omissions, and subpar edge detailing. This study [...] Read more.
For the development of an idyllic rural landscape, an accurate survey of rural buildings is essential. The extraction of rural structures from unmanned aerial vehicle (UAV) remote sensing imagery is prone to errors such as misclassifications, omissions, and subpar edge detailing. This study introduces a multi-scale fusion and detail enhancement network for rural building extraction, termed the Multi-Attention-Detail U-shaped Network (MAD-UNet). Initially, an atrous convolutional pyramid pooling module is integrated between the encoder and decoder to enhance the main network’s ability to identify buildings of varying sizes, thereby reducing omissions. Additionally, a Multi-scale Feature Fusion Module (MFFM) is constructed within the decoder, utilizing superficial detail features to refine the layered detail information, which improves the extraction of small-sized structures and their edges. A coordination attention mechanism and deep supervision modules are simultaneously incorporated to minimize misclassifications. MAD-UNet has been tested on a private UAV building dataset and the publicly available Wuhan University (WHU) Building Dataset and benchmarked against models such as U-Net, PSPNet, DeepLabV3+, HRNet, ISANet, and AGSCNet, achieving Intersection over Union (IoU) scores of 77.43% and 91.02%, respectively. The results demonstrate its effectiveness in extracting rural buildings from UAV remote sensing images across different regions. Full article
Show Figures

Figure 1

14 pages, 978 KiB  
Article
Incremental Learning for Online Data Using QR Factorization on Convolutional Neural Networks
by Jonghong Kim, WonHee Lee, Sungdae Baek, Jeong-Ho Hong and Minho Lee
Sensors 2023, 23(19), 8117; https://doi.org/10.3390/s23198117 - 27 Sep 2023
Viewed by 773
Abstract
Catastrophic forgetting, which means a rapid forgetting of learned representations while learning new data/samples, is one of the main problems of deep neural networks. In this paper, we propose a novel incremental learning framework that can address the forgetting problem by learning new [...] Read more.
Catastrophic forgetting, which means a rapid forgetting of learned representations while learning new data/samples, is one of the main problems of deep neural networks. In this paper, we propose a novel incremental learning framework that can address the forgetting problem by learning new incoming data in an online manner. We develop a new incremental learning framework that can learn extra data or new classes with less catastrophic forgetting. We adopt the hippocampal memory process to the deep neural networks by defining the effective maximum of neural activation and its boundary to represent a feature distribution. In addition, we incorporate incremental QR factorization into the deep neural networks to learn new data with both existing labels and new labels with less forgetting. The QR factorization can provide the accurate subspace prior, and incremental QR factorization can reasonably express the collaboration between new data with both existing classes and new class with less forgetting. In our framework, a set of appropriate features (i.e., nodes) provides improved representation for each class. We apply our method to the convolutional neural network (CNN) for learning Cifar-100 and Cifar-10 datasets. The experimental results show that the proposed method efficiently alleviates the stability and plasticity dilemma in the deep neural networks by providing the performance stability of a trained network while effectively learning unseen data and additional new classes. Full article
Show Figures

Figure 1

22 pages, 16026 KiB  
Article
Image Recommendation System Based on Environmental and Human Face Information
by Hye-min Won, Yong Seok Heo and Nojun Kwak
Sensors 2023, 23(11), 5304; https://doi.org/10.3390/s23115304 - 02 Jun 2023
Cited by 1 | Viewed by 1841
Abstract
With the advancement of computer hardware and communication technologies, deep learning technology has made significant progress, enabling the development of systems that can accurately estimate human emotions. Factors such as facial expressions, gender, age, and the environment influence human emotions, making it crucial [...] Read more.
With the advancement of computer hardware and communication technologies, deep learning technology has made significant progress, enabling the development of systems that can accurately estimate human emotions. Factors such as facial expressions, gender, age, and the environment influence human emotions, making it crucial to understand and capture these intricate factors. Our system aims to recommend personalized images by accurately estimating human emotions, age, and gender in real time. The primary objective of our system is to enhance user experiences by recommending images that align with their current emotional state and characteristics. To achieve this, our system collects environmental information, including weather conditions and user-specific environment data through APIs and smartphone sensors. Additionally, we employ deep learning algorithms for real-time classification of eight types of facial expressions, age, and gender. By combining this facial information with the environmental data, we categorize the user’s current situation into positive, neutral, and negative stages. Based on this categorization, our system recommends natural landscape images that are colorized using Generative Adversarial Networks (GANs). These recommendations are personalized to match the user’s current emotional state and preferences, providing a more engaging and tailored experience. Through rigorous testing and user evaluations, we assessed the effectiveness and user-friendliness of our system. Users expressed satisfaction with the system’s ability to generate appropriate images based on the surrounding environment, emotional state, and demographic factors such as age and gender. The visual output of our system significantly impacted users’ emotional responses, resulting in a positive mood change for most users. Moreover, the system’s scalability was positively received, with users acknowledging its potential benefits when installed outdoors and expressing a willingness to continue using it. Compared to other recommender systems, our integration of age, gender, and weather information provides personalized recommendations, contextual relevance, increased engagement, and a deeper understanding of user preferences, thereby enhancing the overall user experience. The system’s ability to comprehend and capture intricate factors that influence human emotions holds promise in various domains, including human–computer interaction, psychology, and social sciences. Full article
Show Figures

Figure 1

24 pages, 11682 KiB  
Article
EDPNet: An Encoding–Decoding Network with Pyramidal Representation for Semantic Image Segmentation
by Dong Chen, Xianghong Li, Fan Hu, P. Takis Mathiopoulos, Shaoning Di, Mingming Sui and Jiju Peethambaran
Sensors 2023, 23(6), 3205; https://doi.org/10.3390/s23063205 - 17 Mar 2023
Cited by 2 | Viewed by 1544
Abstract
This paper proposes an encoding–decoding network with a pyramidal representation module, which will be referred to as EDPNet, and is designed for efficient semantic image segmentation. On the one hand, during the encoding process of the proposed EDPNet, the enhancement of the Xception [...] Read more.
This paper proposes an encoding–decoding network with a pyramidal representation module, which will be referred to as EDPNet, and is designed for efficient semantic image segmentation. On the one hand, during the encoding process of the proposed EDPNet, the enhancement of the Xception network, i.e., Xception+ is employed as a backbone to learn the discriminative feature maps. The obtained discriminative features are then fed into the pyramidal representation module, from which the context-augmented features are learned and optimized by leveraging a multi-level feature representation and aggregation process. On the other hand, during the image restoration decoding process, the encoded semantic-rich features are progressively recovered with the assistance of a simplified skip connection mechanism, which performs channel concatenation between high-level encoded features with rich semantic information and low-level features with spatial detail information. The proposed hybrid representation employing the proposed encoding–decoding and pyramidal structures has a global-aware perception and captures fine-grained contours of various geographical objects very well with high computational efficiency. The performance of the proposed EDPNet has been compared against PSPNet, DeepLabv3, and U-Net, employing four benchmark datasets, namely eTRIMS, Cityscapes, PASCAL VOC2012, and CamVid. EDPNet acquired the highest accuracy of 83.6% and 73.8% mIoUs on eTRIMS and PASCAL VOC2012 datasets, while its accuracy on the other two datasets was comparable to that of PSPNet, DeepLabv3, and U-Net models. EDPNet achieved the highest efficiency among the compared models on all datasets. Full article
Show Figures

Figure 1

22 pages, 2835 KiB  
Article
Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison
by Giovanni Costantini, Valerio Cesarini, Pietro Di Leo, Federica Amato, Antonio Suppa, Francesco Asci, Antonio Pisani, Alessandra Calculli and Giovanni Saggio
Sensors 2023, 23(4), 2293; https://doi.org/10.3390/s23042293 - 18 Feb 2023
Cited by 18 | Viewed by 3619
Abstract
Parkinson’s Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and [...] Read more.
Parkinson’s Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients. Full article
Show Figures

Figure 1

Back to TopTop