Mathematical Methods and Applications for Artificial Intelligence and Computer Vision

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (15 February 2023) | Viewed by 50652

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
1. Department of Computer Languages and Computer Science, University of Málaga, 29071 Málaga, Spain
2. Biomedic Research Institute of Málaga (IBIMA), 29010 Málaga, Spain
Interests: neural networks; deep learning; computer vision; pattern recognition; unsupervised learning

E-Mail Website
Guest Editor
1. Department of Computer Languages and Computer Science, University of Málaga, 29071 Málaga, Spain
2. Biomedic Research Institute of Málaga (IBIMA), 29010 Málaga, Spain
Interests: artificial intelligence; artificial neural networks; deep learning; unsupervised learning; computer vision; image processing

E-Mail Website
Guest Editor
1. Department of Computer Languages and Computer Science, University of Málaga, 29071 Málaga, Spain
2. Biomedic Research Institute of Málaga (IBIMA), 29010 Málaga, Spain
Interests: neural networks; deep learning; computer vision; pattern recognition; optimization problems; discrete mathematics; location problems; logistics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent advances in machine learning and pattern recognition have sparked a revolution in many fields of artificial intelligence. Automated extraction of knowledge from Big Data has led to myriad accomplishments in science and technology. In particular, computer vision has dramatically benefited from deep learning and other advanced methods. Mathematical models are a key factor to the success of these strategies since they enable a quantitative understanding of the underlying learning processes, as well as provide a principled, solid foundation for the evaluation of such approaches.

This Special Issue will focus on recent theoretical and applied studies of computational intelligence and related fields, emphasizing computer vision. Topics include but are not limited to:

  • Supervised learning;
  • Unsupervised learning;
  • Reinforcement learning;
  • Deep learning;
  • Pattern recognition;
  • Image analysis and enhancement;
  • Computer vision;
  • Natural language processing;
  • Time-series analysis;
  • Data mining.

Advances in mathematical methods for artificial intelligence and computer vision and their cutting-edge applications are particularly welcome in this Special Issue.

Prof. Dr. Ezequiel López-Rubio
Dr. Esteban J. Palomo
Prof. Dr. Enrique Domínguez
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Machine learning
  • Deep learning
  • Image processing
  • Computer vision
  • Data mining

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 1412 KiB  
Article
Analysis and Recognition of Human Gait Activity Based on Multimodal Sensors
by Diego Teran-Pineda, Karl Thurnhofer-Hemsi and Enrique Dominguez
Mathematics 2023, 11(6), 1538; https://doi.org/10.3390/math11061538 - 22 Mar 2023
Cited by 3 | Viewed by 1775
Abstract
Remote health monitoring plays a significant role in research areas related to medicine, neurology, rehabilitation, and robotic systems. These applications include Human Activity Recognition (HAR) using wearable sensors, signal processing, mathematical methods, and machine learning to improve the accuracy of remote health monitoring [...] Read more.
Remote health monitoring plays a significant role in research areas related to medicine, neurology, rehabilitation, and robotic systems. These applications include Human Activity Recognition (HAR) using wearable sensors, signal processing, mathematical methods, and machine learning to improve the accuracy of remote health monitoring systems. To improve the detection and accuracy of human activity recognition, we create a novel method to reduce the complexities of extracting features using the HuGaDB dataset. Our model extracts power spectra; due to the high dimensionality of features, sliding windows techniques are used to determine frequency bandwidth automatically, where an improved QRS algorithm selects the first dominant spectrum amplitude. In addition, the bandwidth algorithm has been used to reduce the dimensionality of data, remove redundant dimensions, and improve feature extraction. In this work, we have considered widely used machine learning classifiers. Our proposed method was evaluated using the accelerometer angles spectrum installed in six parts of the body and then reducing the bandwidth to know the evolution. Our approach attains an accuracy rate of 95.1% in the HuGaDB dataset with 70% of bandwidth, outperforming others in the human activity recognition accuracy. Full article
Show Figures

Figure 1

20 pages, 3221 KiB  
Article
Breast Abnormality Boundary Extraction in Mammography Image Using Variational Level Set and Self-Organizing Map (SOM)
by Noor Ain Syazwani Mohd Ghani, Abdul Kadir Jumaat, Rozi Mahmud, Mohd Azdi Maasar, Farizuwana Akma Zulkifle and Aisyah Mat Jasin
Mathematics 2023, 11(4), 976; https://doi.org/10.3390/math11040976 - 14 Feb 2023
Cited by 3 | Viewed by 1122
Abstract
A mammography provides a grayscale image of the breast. The main challenge of analyzing mammography images is to extract the region boundary of the breast abnormality for further analysis. In computer vision, this method is also known as image segmentation. The variational level [...] Read more.
A mammography provides a grayscale image of the breast. The main challenge of analyzing mammography images is to extract the region boundary of the breast abnormality for further analysis. In computer vision, this method is also known as image segmentation. The variational level set mathematical model has been proven to be effective for image segmentation. Several selective types of variational level set models have recently been formulated to accurately segment a specific object on images. However, these models are incapable of handling complex intensity inhomogeneity images, and the segmentation process tends to be slow. Therefore, this study formulated a new selective type of the variational level set model to segment mammography images that incorporate a machine learning algorithm known as Self-Organizing Map (SOM). In addition to that, the Gaussian function was applied in the model as a regularizer to speed up the processing time. Then, the accuracy of the segmentation’s output was evaluated using the Jaccard, Dice, Accuracy and Error metrics, while the efficiency was assessed by recording the computational time. Experimental results indicated that the new proposed model is able to segment mammography images with the highest segmentation accuracy and fastest computational speed compared to other iterative models. Full article
Show Figures

Graphical abstract

26 pages, 14183 KiB  
Article
VPP: Visual Pollution Prediction Framework Based on a Deep Active Learning Approach Using Public Road Images
by Mohammad AlElaiwi, Mugahed A. Al-antari, Hafiz Farooq Ahmad, Areeba Azhar, Badar Almarri and Jamil Hussain
Mathematics 2023, 11(1), 186; https://doi.org/10.3390/math11010186 - 29 Dec 2022
Cited by 6 | Viewed by 4386
Abstract
Visual pollution (VP) is the deterioration or disruption of natural and man-made landscapes that ruins the aesthetic appeal of an area. It also refers to physical elements that limit the movability of people on public roads, such as excavation barriers, potholes, and dilapidated [...] Read more.
Visual pollution (VP) is the deterioration or disruption of natural and man-made landscapes that ruins the aesthetic appeal of an area. It also refers to physical elements that limit the movability of people on public roads, such as excavation barriers, potholes, and dilapidated sidewalks. In this paper, an end-to-end visual pollution prediction (VPP) framework based on a deep active learning (DAL) approach is proposed to simultaneously detect and classify visual pollutants from whole public road images. The proposed framework is architected around the following steps: real VP dataset collection, pre-processing, a DAL approach for automatic data annotation, data splitting as well as augmentation, and simultaneous VP detection and classification. This framework is designed to predict VP localization and classify it into three categories: excavation barriers, potholes, and dilapidated sidewalks. A real dataset with 34,460 VP images was collected from various regions across the Kingdom of Saudi Arabia (KSA) via the Ministry of Municipal and Rural Affairs and Housing (MOMRAH), and this was used to develop and fine-tune the proposed artificial intelligence (AI) framework via the use of five AI predictors: MobileNetSSDv2, EfficientDet, Faster RCNN, Detectron2, and YOLO. The proposed VPP-based YOLO framework outperforms competitor AI predictors with superior prediction performance at 89% precision, 88% recall, 89% F1-score, and 93% mAP. The DAL approach plays a crucial role in automatically annotating the VP images and supporting the VPP framework to improve prediction performance by 18% precision, 27% recall, and 25% mAP. The proposed VPP framework is able to simultaneously detect and classify distinct visual pollutants from annotated images via the DAL strategy. This technique is applicable for real-time monitoring applications. Full article
Show Figures

Figure 1

22 pages, 7540 KiB  
Article
Infrared Target-Background Separation Based on Weighted Nuclear Norm Minimization and Robust Principal Component Analysis
by Sur Singh Rawat, Sukhendra Singh, Youseef Alotaibi, Saleh Alghamdi and Gyanendra Kumar
Mathematics 2022, 10(16), 2829; https://doi.org/10.3390/math10162829 - 09 Aug 2022
Cited by 12 | Viewed by 1468
Abstract
The target detection ability of an infrared small target detection (ISTD) system is advantageous in many applications. The highly varied nature of the background image and small target characteristics make the detection process extremely difficult. To address this issue, this study proposes an [...] Read more.
The target detection ability of an infrared small target detection (ISTD) system is advantageous in many applications. The highly varied nature of the background image and small target characteristics make the detection process extremely difficult. To address this issue, this study proposes an infrared patch model system using non-convex (IPNCWNNM) weighted nuclear norm minimization (WNNM) and robust principal component analysis (RPCA). As observed in the most advanced methods of infrared patch images (IPI), the edges, sometimes in a crowded background, can be detected as targets due to the extreme shrinking of singular values (SV). Therefore, a non-convex WNNM and RPCA have been utilized in this paper, where varying weights are assigned to the SV rather than the same weights for all SV in the existing nuclear norm minimization (NNM) of IPI-based methods. The alternate direction method of multiplier (ADMM) is also employed in the mathematical evaluation of the proposed work. The observed evaluations demonstrated that in terms of background suppression and target detection proficiency, the suggested technique performed better than the cited baseline methods. Full article
Show Figures

Figure 1

20 pages, 3022 KiB  
Article
Geodesics in the TPS Space
by Valerio Varano, Stefano Gabriele, Franco Milicchio, Stefan Shlager, Ian Dryden and Paolo Piras
Mathematics 2022, 10(9), 1562; https://doi.org/10.3390/math10091562 - 05 May 2022
Viewed by 1227
Abstract
In shape analysis, the interpolation of shapes’ trajectories is often performed by means of geodesics in an appropriate Riemannian Shape Space. Over the past several decades, different metrics and shape spaces have been proposed, including Kendall shape space, LDDMM based approaches, and elastic [...] Read more.
In shape analysis, the interpolation of shapes’ trajectories is often performed by means of geodesics in an appropriate Riemannian Shape Space. Over the past several decades, different metrics and shape spaces have been proposed, including Kendall shape space, LDDMM based approaches, and elastic contour, among others. Once a Riemannian space is chosen, geodesics and parallel transports can be used to build splines or piecewise geodesics paths. In a recent paper, we introduced a new Riemannian shape space named TPS Space based on the Thin Plate Spline interpolant and characterized by an appropriate metric and parallel transport rule. In the present paper, we further explore the geometry of the TPS Space by characterizing the properties of its geodesics. Several applications show the capability of the proposed formulation to conserve important physical properties of deformation, such as local strains and global elastic energy. Full article
Show Figures

Figure 1

19 pages, 4041 KiB  
Article
SVseg: Stacked Sparse Autoencoder-Based Patch Classification Modeling for Vertebrae Segmentation
by Syed Furqan Qadri, Linlin Shen, Mubashir Ahmad, Salman Qadri, Syeda Shamaila Zareen and Muhammad Azeem Akbar
Mathematics 2022, 10(5), 796; https://doi.org/10.3390/math10050796 - 02 Mar 2022
Cited by 36 | Viewed by 4129
Abstract
Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies such as vertebral compression fractures and other abnormalities, as well as for clinical diagnostic treatment and surgical planning. An automatic and objective system for vertebra segmentation is required, but its development [...] Read more.
Precise vertebrae segmentation is essential for the image-related analysis of spine pathologies such as vertebral compression fractures and other abnormalities, as well as for clinical diagnostic treatment and surgical planning. An automatic and objective system for vertebra segmentation is required, but its development is likely to run into difficulties such as low segmentation accuracy and the requirement of prior knowledge or human intervention. Recently, vertebral segmentation methods have focused on deep learning-based techniques. To mitigate the challenges involved, we propose deep learning primitives and stacked Sparse autoencoder-based patch classification modeling for Vertebrae segmentation (SVseg) from Computed Tomography (CT) images. After data preprocessing, we extract overlapping patches from CT images as input to train the model. The stacked sparse autoencoder learns high-level features from unlabeled image patches in an unsupervised way. Furthermore, we employ supervised learning to refine the feature representation to improve the discriminability of learned features. These high-level features are fed into a logistic regression classifier to fine-tune the model. A sigmoid classifier is added to the network to discriminate the vertebrae patches from non-vertebrae patches by selecting the class with the highest probabilities. We validated our proposed SVseg model on the publicly available MICCAI Computational Spine Imaging (CSI) dataset. After configuration optimization, our proposed SVseg model achieved impressive performance, with 87.39% in Dice Similarity Coefficient (DSC), 77.60% in Jaccard Similarity Coefficient (JSC), 91.53% in precision (PRE), and 90.88% in sensitivity (SEN). The experimental results demonstrated the method’s efficiency and significant potential for diagnosing and treating clinical spinal diseases. Full article
Show Figures

Figure 1

19 pages, 5372 KiB  
Article
Infrared Small Target Detection Based on Partial Sum Minimization and Total Variation
by Sur Singh Rawat, Saleh Alghamdi, Gyanendra Kumar, Youseef Alotaibi, Osamah Ibrahim Khalaf and Lal Pratap Verma
Mathematics 2022, 10(4), 671; https://doi.org/10.3390/math10040671 - 21 Feb 2022
Cited by 40 | Viewed by 3056
Abstract
In the advanced applications, based on infrared detection systems, the precise detection of small targets has become a tough work today. This becomes even more difficult when the background is highly dense in addition to the nature of small targets. The problem raised [...] Read more.
In the advanced applications, based on infrared detection systems, the precise detection of small targets has become a tough work today. This becomes even more difficult when the background is highly dense in addition to the nature of small targets. The problem raised above is solved in various ways, including infrared patch image (IPI) based methods which are considered to have the best performance. In addition, the greater shrinkage of singular values in the methods based on IPI leads to the problem of nuclear norm minimization (NNM), which leads to the problem of incorrectly recognizing small targets in a highly complex background. Hence, this paper proposed a new method for infrared small target detection (ISTD) via total variation and partial sum minimization (TV-PSMSV). The proposed TV-PSMVS in this work basically replaces the IPI’s NNM with partial sum minimization (PSM) of singular values and, additionally, the total variance (TV) regularization term is inducted to the background patch image (BPI) to suppress the complex background and enhance the target object of interest. The mathematical solution of the proposed TV-PSMSV approach was performed using alternating direction multiplier (ADMM) to verify the proposed solution. The experimental evaluation using real and synthetic data set was performed, and the result revealed that the proposed TV-PSMSV outperformed existing referenced methods in the terms of background suppression factor (BSF) and the signal to gain ratio (SCRG). Full article
Show Figures

Figure 1

21 pages, 1489 KiB  
Article
A Class-Incremental Learning Method Based on Preserving the Learned Feature Space for EEG-Based Emotion Recognition
by Magdiel Jiménez-Guarneros and Roberto Alejo-Eleuterio
Mathematics 2022, 10(4), 598; https://doi.org/10.3390/math10040598 - 15 Feb 2022
Cited by 3 | Viewed by 2521
Abstract
Deep learning-based models have shown to be one of the main active research topics in emotion recognition systems from Electroencephalogram (EEG) signals. However, a significant challenge is to effectively recognize new emotions that are incorporated sequentially, as current models must perform retraining from [...] Read more.
Deep learning-based models have shown to be one of the main active research topics in emotion recognition systems from Electroencephalogram (EEG) signals. However, a significant challenge is to effectively recognize new emotions that are incorporated sequentially, as current models must perform retraining from scratch. In this paper, we propose a Class-Incremental Learning (CIL) method, named Incremental Learning preserving the Learned Feature Space (IL2FS), in order to enable deep learning models to incorporate new emotions (classes) into the already known. IL2FS performs a weight aligning to correct the bias on new classes, while it incorporates margin ranking loss and triplet loss to preserve the inter-class separation and feature space alignment on known classes. We evaluated IL2FS over two public datasets (DREAMER and DEAP) for emotion recognition and compared it with other recent and popular CIL methods reported in computer vision. Experimental results show that IL2FS outperforms other CIL methods by obtaining an average accuracy of 59.08 ± 08.26% and 79.36 ± 04.68% on DREAMER and DEAP, recognizing data from new emotions that are incorporated sequentially. Full article
Show Figures

Figure 1

19 pages, 5676 KiB  
Article
Enhanced Convolutional Neural Network Model for Cassava Leaf Disease Identification and Classification
by Umesh Kumar Lilhore, Agbotiname Lucky Imoize, Cheng-Chi Lee, Sarita Simaiya, Subhendu Kumar Pani, Nitin Goyal, Arun Kumar and Chun-Ta Li
Mathematics 2022, 10(4), 580; https://doi.org/10.3390/math10040580 - 13 Feb 2022
Cited by 35 | Viewed by 4326
Abstract
Cassava is a crucial food and nutrition security crop cultivated by small-scale farmers and it can survive in a brutal environment. It is a significant source of carbohydrates in African countries. Sometimes, Cassava crops can be infected by leaf diseases, affecting the overall [...] Read more.
Cassava is a crucial food and nutrition security crop cultivated by small-scale farmers and it can survive in a brutal environment. It is a significant source of carbohydrates in African countries. Sometimes, Cassava crops can be infected by leaf diseases, affecting the overall production and reducing farmers’ income. The existing Cassava disease research encounters several challenges, such as poor detection rate, higher processing time, and poor accuracy. This research provides a comprehensive learning strategy for real-time Cassava leaf disease identification based on enhanced CNN models (ECNN). The existing Standard CNN model utilizes extensive data processing features, increasing the computational overhead. A depth-wise separable convolution layer is utilized to resolve CNN issues in the proposed ECNN model. This feature minimizes the feature count and computational overhead. The proposed ECNN model utilizes a distinct block processing feature to process the imbalanced images. To resolve the color segregation issue, the proposed ECNN model uses a Gamma correction feature. To decrease the variable selection process and increase the computational efficiency, the proposed ECNN model uses global average election polling with batch normalization. An experimental analysis is performed over an online Cassava image dataset containing 6256 images of Cassava leaves with five disease classes. The dataset classes are as follows: class 0: “Cassava Bacterial Blight (CBB)”; class 1: “Cassava Brown Streak Disease (CBSD)”; class 2: “Cassava Green Mottle (CGM)”; class 3: “Cassava Mosaic Disease (CMD)”; and class 4: “Healthy”. Various performance measuring parameters, i.e., precision, recall, measure, and accuracy, are calculated for existing Standard CNN and the proposed ECNN model. The proposed ECNN classifier significantly outperforms and achieves 99.3% accuracy for the balanced dataset. The test findings prove that applying a balanced database of images improves classification performance. Full article
Show Figures

Figure 1

19 pages, 19985 KiB  
Article
Single Image Super-Resolution with Arbitrary Magnification Based on High-Frequency Attention Network
by Jun-Seok Yun and Seok-Bong Yoo
Mathematics 2022, 10(2), 275; https://doi.org/10.3390/math10020275 - 16 Jan 2022
Cited by 10 | Viewed by 2933
Abstract
Among various developments in the field of computer vision, single image super-resolution of images is one of the most essential tasks. However, compared to the integer magnification model for super-resolution, research on arbitrary magnification has been overlooked. In addition, the importance of single [...] Read more.
Among various developments in the field of computer vision, single image super-resolution of images is one of the most essential tasks. However, compared to the integer magnification model for super-resolution, research on arbitrary magnification has been overlooked. In addition, the importance of single image super-resolution at arbitrary magnification is emphasized for tasks such as object recognition and satellite image magnification. In this study, we propose a model that performs arbitrary magnification while retaining the advantages of integer magnification. The proposed model extends the integer magnification image to the target magnification in the discrete cosine transform (DCT) spectral domain. The broadening of the DCT spectral domain results in a lack of high-frequency components. To solve this problem, we propose a high-frequency attention network for arbitrary magnification so that high-frequency information can be restored. In addition, only high-frequency components are extracted from the image with a mask generated by a hyperparameter in the DCT domain. Therefore, the high-frequency components that have a substantial impact on image quality are recovered by this procedure. The proposed framework achieves the performance of an integer magnification and correctly retrieves the high-frequency components lost between the arbitrary magnifications. We experimentally validated our model’s superiority over state-of-the-art models. Full article
Show Figures

Figure 1

19 pages, 14563 KiB  
Article
Super-Resolved Recognition of License Plate Characters
by Sung-Jin Lee and Seok Bong Yoo
Mathematics 2021, 9(19), 2494; https://doi.org/10.3390/math9192494 - 05 Oct 2021
Cited by 9 | Viewed by 2471
Abstract
Object detection and recognition are crucial in the field of computer vision and are an active area of research. However, in actual object recognition processes, recognition accuracy is often degraded due to resolution mismatches between training and test image data. To solve this [...] Read more.
Object detection and recognition are crucial in the field of computer vision and are an active area of research. However, in actual object recognition processes, recognition accuracy is often degraded due to resolution mismatches between training and test image data. To solve this problem, we designed and developed an integrated object recognition and super-resolution framework by proposing an image super-resolution technique that improves object recognition accuracy. In detail, we collected a number of license plate training images through web-crawling and artificial data generation, and the image super-resolution artificial neural network was trained by defining an objective function to be robust to image flips. To verify the performance of the proposed algorithm, we experimented with the trained image super-resolution and recognition on representative test images and confirmed that the proposed super-resolution technique improves the accuracy of character recognition. For character recognition with the 4× magnification, the proposed method remarkably increased the mean average precision by 49.94% compared to the existing state-of-the-art method. Full article
Show Figures

Figure 1

29 pages, 10615 KiB  
Article
Enlargement of the Field of View Based on Image Region Prediction Using Thermal Videos
by Ganbayar Batchuluun, Na Rae Baek and Kang Ryoung Park
Mathematics 2021, 9(19), 2379; https://doi.org/10.3390/math9192379 - 25 Sep 2021
Cited by 1 | Viewed by 1400
Abstract
Various studies have been conducted for detecting humans in images. However, there are the cases where a part of human body disappears in the input image and leaves the camera field of view (FOV). Moreover, there are the cases where a pedestrian comes [...] Read more.
Various studies have been conducted for detecting humans in images. However, there are the cases where a part of human body disappears in the input image and leaves the camera field of view (FOV). Moreover, there are the cases where a pedestrian comes into the FOV as a part of the body slowly appears. In these cases, human detection and tracking fail by existing methods. Therefore, we propose the method for predicting a wider region than the FOV of a thermal camera based on the image prediction generative adversarial network version 2 (IPGAN-2). When an experiment was conducted using the marathon subdataset of the Boston University-thermal infrared video benchmark open dataset, the proposed method showed higher image prediction (structural similarity index measure (SSIM) of 0.9437) and object detection (F1 score of 0.866, accuracy of 0.914, and intersection over union (IoU) of 0.730) accuracies than state-of-the-art methods. Full article
Show Figures

Figure 1

19 pages, 2800 KiB  
Article
Alternative Thresholding Technique for Image Segmentation Based on Cuckoo Search and Generalized Gaussians
by Jorge Munoz-Minjares, Osbaldo Vite-Chavez, Jorge Flores-Troncoso and Jorge M. Cruz-Duarte
Mathematics 2021, 9(18), 2287; https://doi.org/10.3390/math9182287 - 17 Sep 2021
Cited by 3 | Viewed by 2922
Abstract
Object segmentation is a widely studied topic in digital image processing, as to it can be used for countless applications in several fields. This process is traditionally achieved by computing an optimal threshold from the image intensity histogram. Several algorithms have been proposed [...] Read more.
Object segmentation is a widely studied topic in digital image processing, as to it can be used for countless applications in several fields. This process is traditionally achieved by computing an optimal threshold from the image intensity histogram. Several algorithms have been proposed to find this threshold based on different statistical principles. However, the results generated via these algorithms contradict one another due to the many variables that can disturb an image. An accepted strategy to achieve the optimal histogram threshold, to distinguish between the object and the background, is to estimate two data distributions and find their intersection. This work proposes a strategy based on the Cuckoo Search Algorithm (CSA) and the Generalized Gaussian (GG) distribution to assess the optimal threshold. To test this methodology, we carried out several experiments in synthetic and practical scenarios and compared our results against other well-known algorithms from the literature. These practical cases comprise a medical image database and our own generated database. The results in a simulated environment show an evident advantage of the proposed strategy against other algorithms. In a real environment, this ranks among the best algorithms, making it a reliable alternative. Full article
Show Figures

Figure 1

20 pages, 45632 KiB  
Article
Style Transformation Method of Stage Background Images by Emotion Words of Lyrics
by Hyewon Yoon, Shuyu Li and Yunsick Sung
Mathematics 2021, 9(15), 1831; https://doi.org/10.3390/math9151831 - 03 Aug 2021
Cited by 1 | Viewed by 2222
Abstract
Recently, with the development of computer technology, deep learning has expanded to the field of art, which requires creativity, which is a unique ability of humans, and an understanding of the human emotions expressed in art to process them as data. The field [...] Read more.
Recently, with the development of computer technology, deep learning has expanded to the field of art, which requires creativity, which is a unique ability of humans, and an understanding of the human emotions expressed in art to process them as data. The field of art is integrating with various industrial fields, among which artificial intelligence (AI) is being used in stage art, to create visual images. As it is difficult for a computer to process emotions expressed in songs as data, existing stage background images for song performances are human designed. Recently, research has been conducted to enable AI to design stage background images on behalf of humans. However, there is no research on reflecting emotions contained in song lyrics to stage background images. This paper proposes a style transformation method to reflect emotions in stage background images. First, multiple verses and choruses are derived from song lyrics, one at a time, and emotion words included in each verse and chorus are extracted. Second, the probability distribution of the emotion words is calculated for each verse and chorus, and the image with the most similar probability distribution from an image dataset with emotion word tags in advance is selected for each verse and chorus. Finally, for each verse and chorus, the stage background images with the transferred style are outputted. Through an experiment, the similarity between the stage background and the image transferred to the style of the image with similar emotion words probability distribution was 38%, and the similarity between the stage background image and the image transferred to the style of the image with completely different emotion word probability distribution was 8%. The proposed method reduced the total variation loss of change from 1.0777 to 0.1597. The total variation loss is the sum of content loss and style loss based on weights. This shows that the style transferred image is close to edge information about the content of the input image, and the style is close to the target style image. Full article
Show Figures

Figure 1

Review

Jump to: Research

54 pages, 3508 KiB  
Review
Auto-Encoders in Deep Learning—A Review with New Perspectives
by Shuangshuang Chen and Wei Guo
Mathematics 2023, 11(8), 1777; https://doi.org/10.3390/math11081777 - 07 Apr 2023
Cited by 22 | Viewed by 11986
Abstract
Deep learning, which is a subfield of machine learning, has opened a new era for the development of neural networks. The auto-encoder is a key component of deep structure, which can be used to realize transfer learning and plays an important role in [...] Read more.
Deep learning, which is a subfield of machine learning, has opened a new era for the development of neural networks. The auto-encoder is a key component of deep structure, which can be used to realize transfer learning and plays an important role in both unsupervised learning and non-linear feature extraction. By highlighting the contributions and challenges of recent research papers, this work aims to review state-of-the-art auto-encoder algorithms. Firstly, we introduce the basic auto-encoder as well as its basic concept and structure. Secondly, we present a comprehensive summarization of different variants of the auto-encoder. Thirdly, we analyze and study auto-encoders from three different perspectives. We also discuss the relationships between auto-encoders, shallow models and other deep learning models. The auto-encoder and its variants have successfully been applied in a wide range of fields, such as pattern recognition, computer vision, data generation, recommender systems, etc. Then, we focus on the available toolkits for auto-encoders. Finally, this paper summarizes the future trends and challenges in designing and training auto-encoders. We hope that this survey will provide a good reference when using and designing AE models. Full article
Show Figures

Figure 1

Back to TopTop