Sensors

Research

Jump to: Review

31 pages, 23995 KiB

Open AccessArticle

A Transformer-Based Model for Super-Resolution of Anime Image

by Shizhuo Xu, Vibekananda Dutta, Xin He and Takafumi Matsumaru

Sensors 2022, 22(21), 8126; https://doi.org/10.3390/s22218126 - 24 Oct 2022

Cited by 4 | Viewed by 3655

Abstract

Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on [...] Read more.

Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image’s low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2×, 4×, 8×). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

15 pages, 3465 KiB

Open AccessArticle

Bed-Exit Behavior Recognition for Real-Time Images within Limited Range

by Cheng-Jian Lin, Ta-Sen Wei, Peng-Ta Liu, Bing-Hong Chen and Chi-Huang Shih

Sensors 2022, 22(15), 5495; https://doi.org/10.3390/s22155495 - 23 Jul 2022

Cited by 1 | Viewed by 1277

Abstract

In the context of behavior recognition, the emerging bed-exit monitoring system demands a rapid deployment in the ward to support mobility and personalization. Mobility means the system can be installed and removed as required without construction; personalization indicates human body tracking is limited [...] Read more.

In the context of behavior recognition, the emerging bed-exit monitoring system demands a rapid deployment in the ward to support mobility and personalization. Mobility means the system can be installed and removed as required without construction; personalization indicates human body tracking is limited to the bed region so that only the target is monitored. To satisfy the above-mentioned requirements, the behavior recognition system aims to: (1) operate in a small-size device, typically an embedded system; (2) process a series of images with narrow fields of view (NFV) to detect bed-related behaviors. In general, wide-range images are preferred to obtain a good recognition performance for diverse behaviors, while NFV images are used with abrupt activities and therefore fit single-purpose applications. This paper develops an NFV-based behavior recognition system with low complexity to realize a bed-exit monitoring application on embedded systems. To achieve effectiveness and low complexity, a queueing-based behavior classification is proposed to keep memories of object tracking information and a specific behavior can be identified from continuous object movement. The experimental results show that the developed system can recognize three bed behaviors, namely off bed, on bed and return, for NFV images with accuracy rates of 95~100%. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

15 pages, 8206 KiB

Open AccessArticle

RCA-LF: Dense Light Field Reconstruction Using Residual Channel Attention Networks

by Ahmed Salem, Hatem Ibrahem and Hyun-Soo Kang

Sensors 2022, 22(14), 5254; https://doi.org/10.3390/s22145254 - 14 Jul 2022

Cited by 2 | Viewed by 1359

Abstract

Dense multi-view image reconstruction has played an active role in research for a long time and interest has recently increased. Multi-view images can solve many problems and enhance the efficiency of many applications. This paper presents a more specific solution for reconstructing high-density [...] Read more.

Dense multi-view image reconstruction has played an active role in research for a long time and interest has recently increased. Multi-view images can solve many problems and enhance the efficiency of many applications. This paper presents a more specific solution for reconstructing high-density light field (LF) images. We present this solution for images captured by Lytro Illum cameras to solve the implicit problem related to the discrepancy between angular and spatial resolution resulting from poor sensor resolution. We introduce the residual channel attention light field (RCA-LF) structure to solve different LF reconstruction tasks. In our approach, view images are grouped in one stack where epipolar information is available. We use 2D convolution layers to process and extract features from the stacked view images. Our method adopts the channel attention mechanism to learn the relation between different views and give higher weight to the most important features, restoring more texture details. Finally, experimental results indicate that the proposed model outperforms earlier state-of-the-art methods for visual and numerical evaluation. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

15 pages, 1683 KiB

Open AccessArticle

Sentiment Analysis: An ERNIE-BiLSTM Approach to Bullet Screen Comments

by Yen-Hao Hsieh and Xin-Ping Zeng

Sensors 2022, 22(14), 5223; https://doi.org/10.3390/s22145223 - 13 Jul 2022

Cited by 8 | Viewed by 2379

Abstract

Sentiment analysis is one of the fields of affective computing, which detects and evaluates people’s psychological states and sentiments through text analysis. It is an important application of text mining technology and is widely used to analyze comments. Bullet screen videos have become [...] Read more.

Sentiment analysis is one of the fields of affective computing, which detects and evaluates people’s psychological states and sentiments through text analysis. It is an important application of text mining technology and is widely used to analyze comments. Bullet screen videos have become a popular way for people to interact and communicate while watching online videos. Existing studies have focused on the form, content, and function of bullet screen comments, but few have examined bullet screen comments using natural language processing. Bullet screen comments are short text messages of different lengths and ambiguous emotional information, which makes it extremely challenging in natural language processing. Hence, it is important to understand how we can use the characteristics of bullet screen comments and sentiment analysis to understand the sentiments expressed and trends in bullet screen comments. This study poses the following research question: how can one analyze the sentiments ex-pressed in bullet screen comments accurately and effectively? This study mainly proposes an ERNIE-BiLSTM approach for sentiment analysis on bullet screen comments, which provides effective and innovative thinking for the sentiment analysis of bullet screen comments. The experimental results show that the ERNIE-BiLSTM approach has a higher accuracy rate, precision rate, recall rate, and F1-score than other methods. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

14 pages, 1764 KiB

Open AccessArticle

End-to-End Train Horn Detection for Railway Transit Safety

by Van-Thuan Tran, Wei-Ho Tsai, Yury Furletov and Mikhail Gorodnichev

Sensors 2022, 22(12), 4453; https://doi.org/10.3390/s22124453 - 12 Jun 2022

Cited by 2 | Viewed by 2065

Abstract

The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing [...] Read more.

The train horn sound is an active audible warning signal used for warning commuters and railway employees of the oncoming train(s), assuring a smooth operation and traffic safety, especially at barrier-free crossings. This work studies deep learning-based approaches to develop a system providing the early detection of train arrival based on the recognition of train horn sounds from the traffic soundscape. A custom dataset of train horn sounds, car horn sounds, and traffic noises is developed to conduct experiments and analysis. We propose a novel two-stream end-to-end CNN model (i.e., THD-RawNet), which combines two approaches of feature extraction from raw audio waveforms, for audio classification in train horn detection (THD). Besides a stream with a sequential one-dimensional CNN (1D-CNN) as in existing sound classification works, we propose to utilize multiple 1D-CNN branches to process raw waves in different temporal resolutions to extract an image-like representation for the 2D-CNN classification part. Our experiment results and comparative analysis have proved the effectiveness of the proposed two-stream network and the method of combining features extracted in multiple temporal resolutions. The THD-RawNet obtained better accuracies and robustness compared to those of baseline models trained on either raw audio or handcrafted features, in which at the input size of one second the network yielded an accuracy of 95.11% for testing data in normal traffic conditions and remained above a 93% accuracy for the considerable noisy condition of-10 dB SNR. The proposed THD system can be integrated into the smart railway crossing systems, private cars, and self-driving cars to improve railway transit safety. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

18 pages, 17708 KiB

Open AccessArticle

A Underwater Sequence Image Dataset for Sharpness and Color Analysis

by Miao Yang, Ge Yin, Haiwen Wang, Jinnai Dong, Zhuoran Xie and Bing Zheng

Sensors 2022, 22(9), 3550; https://doi.org/10.3390/s22093550 - 07 May 2022

Cited by 5 | Viewed by 1815

Abstract

The complex underwater environment usually leads to the problem of quality degradation in underwater images, and the distortion of sharpness and color are the main factors to the quality of underwater images. The paper discloses an underwater sequence image dataset called TankImage-I with [...] Read more.

The complex underwater environment usually leads to the problem of quality degradation in underwater images, and the distortion of sharpness and color are the main factors to the quality of underwater images. The paper discloses an underwater sequence image dataset called TankImage-I with gradually changing sharpness and color distortion collected in a pool. TankImage-I contains two plane targets, a total of 78 images. It includes two lighting conditions and three different water transparency. The imaging distance is also changed during the photographing process. The paper introduces the relevant details of the photographing process, and provides the measurement results of the sharpness and color distortion of the sequence images. In addition, we verify the performance of 14 image quality assessment methods on TankImage-I, and analyze the results of 14 image quality assessment methods from the aspects of sharpness and color, which provides a reference for the design and improvement of underwater image quality assessment algorithm and underwater imaging system design. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

14 pages, 5397 KiB

Open AccessArticle

End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks

by Ahmed Salem, Hatem Ibrahem, Bilel Yagoub and Hyun-Soo Kang

Sensors 2022, 22(9), 3540; https://doi.org/10.3390/s22093540 - 06 May 2022

Cited by 1 | Viewed by 1525

Abstract

Light field (LF) technology has become a focus of great interest (due to its use in many applications), especially since the introduction of the consumer LF camera, which facilitated the acquisition of dense LF images. Obtaining densely sampled LF images is costly due [...] Read more.

Light field (LF) technology has become a focus of great interest (due to its use in many applications), especially since the introduction of the consumer LF camera, which facilitated the acquisition of dense LF images. Obtaining densely sampled LF images is costly due to the trade-off between spatial and angular resolutions. Accordingly, in this research, we suggest a learning-based solution to this challenging problem, reconstructing dense, high-quality LF images. Instead of training our model with several images of the same scene, we used raw LF images (lenslet images). The raw LF format enables the encoding of several images of the same scene into one image. Consequently, it helps the network to understand and simulate the relationship between different images, resulting in higher quality images. We divided our model into two successive modules: LFR and LF augmentation (LFA). Each module is represented using a convolutional neural network-based residual network (CNN). We trained our network to lessen the absolute error between the novel and reference views. Experimental findings on real-world datasets show that our suggested method has excellent performance and superiority over state-of-the-art approaches. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

19 pages, 105796 KiB

Open AccessArticle

Variational Model for Single-Image Reflection Suppression Based on Multiscale Thresholding

by Pei-Chiang Shao

Sensors 2022, 22(6), 2271; https://doi.org/10.3390/s22062271 - 15 Mar 2022

Viewed by 1625

Abstract

Reflections often cause degradation in image quality for pictures taken through glass medium. Removing the undesired reflections is becoming increasingly important. For human vision, it can produce much more pleasing results for multimedia applications. For machine vision, it can benefit various applications such [...] Read more.

Reflections often cause degradation in image quality for pictures taken through glass medium. Removing the undesired reflections is becoming increasingly important. For human vision, it can produce much more pleasing results for multimedia applications. For machine vision, it can benefit various applications such as image segmentation and classification. Reflection removal is itself a highly illposed inverse problem that is very difficult to solve, especially for a single input image. Existing methods mainly rely on various prior information and assumptions to alleviate the ill-posedness. In this paper, we design a variational model based on multiscale hard thresholding to both effectively and efficiently suppress image reflections. A direct solver using the discrete cosine transform for implementing the proposed variational model is also provided. Both synthetic and real glass images are used in the numerical experiments to compare the performance of the proposed algorithm with other representative algorithms. The experimental results show the superiority of our algorithm over the previous ones. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

16 pages, 2847 KiB

Open AccessArticle

Convolutional Blur Attention Network for Cell Nuclei Segmentation

by Phuong Thi Le, Tuan Pham, Yi-Chiung Hsu and Jia-Ching Wang

Sensors 2022, 22(4), 1586; https://doi.org/10.3390/s22041586 - 18 Feb 2022

Cited by 12 | Viewed by 2657

Abstract

Accurately segmented nuclei are important, not only for cancer classification, but also for predicting treatment effectiveness and other biomedical applications. However, the diversity of cell types, various external factors, and illumination conditions make nucleus segmentation a challenging task. In this work, we present [...] Read more.

Accurately segmented nuclei are important, not only for cancer classification, but also for predicting treatment effectiveness and other biomedical applications. However, the diversity of cell types, various external factors, and illumination conditions make nucleus segmentation a challenging task. In this work, we present a new deep learning-based method for cell nucleus segmentation. The proposed convolutional blur attention (CBA) network consists of downsampling and upsampling procedures. A blur attention module and a blur pooling operation are used to retain the feature salience and avoid noise generation in the downsampling procedure. A pyramid blur pooling (PBP) module is proposed to capture the multi-scale information in the upsampling procedure. The superiority of the proposed method has been compared with a few prior segmentation models, namely U-Net, ENet, SegNet, LinkNet, and Mask RCNN on the 2018 Data Science Bowl (DSB) challenge dataset and the multi-organ nucleus segmentation (MoNuSeg) at MICCAI 2018. The Dice similarity coefficient and some evaluation matrices, such as F1 score, recall, precision, and average Jaccard index (AJI) were used to evaluate the segmentation efficiency of these models. Overall, the proposal method in this paper has the best performance, the AJI indicator on the DSB dataset and MoNuSeg is 0.8429, 0.7985, respectively. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

20 pages, 7490 KiB

Open AccessArticle

Detail Preserving Low Illumination Image and Video Enhancement Algorithm Based on Dark Channel Prior

by Lingli Guo, Zhenhong Jia, Jie Yang and Nikola K. Kasabov

Sensors 2022, 22(1), 85; https://doi.org/10.3390/s22010085 - 23 Dec 2021

Cited by 4 | Viewed by 2409

Abstract

In low illumination situations, insufficient light in the monitoring device results in poor visibility of effective information, which cannot meet practical applications. To overcome the above problems, a detail preserving low illumination video image enhancement algorithm based on dark channel prior is proposed [...] Read more.

In low illumination situations, insufficient light in the monitoring device results in poor visibility of effective information, which cannot meet practical applications. To overcome the above problems, a detail preserving low illumination video image enhancement algorithm based on dark channel prior is proposed in this paper. First, a dark channel refinement method is proposed, which is defined by imposing a structure prior to the initial dark channel to improve the image brightness. Second, an anisotropic guided filter (AnisGF) is used to refine the transmission, which preserves the edges of the image. Finally, a detail enhancement algorithm is proposed to avoid the problem of insufficient detail in the initial enhancement image. To avoid video flicker, the next video frames are enhanced based on the brightness of the first enhanced frame. Qualitative and quantitative analysis shows that the proposed algorithm is superior to the contrast algorithm, in which the proposed algorithm ranks first in average gradient, edge intensity, contrast, and patch-based contrast quality index. It can be effectively applied to the enhancement of surveillance video images and for wider computer vision applications. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

Review

Jump to: Research

65 pages, 9169 KiB

Open AccessReview

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

by Ubaid Ullah, Jeong-Sik Lee, Chang-Hyeon An, Hyeonjin Lee, Su-Yeong Park, Rock-Hyun Baek and Hyun-Chul Choi

Sensors 2022, 22(18), 6816; https://doi.org/10.3390/s22186816 - 08 Sep 2022

Cited by 2 | Viewed by 3347

Abstract

For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language [...] Read more.

For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements have also solved future challenges for researchers. Moreover, the connection between these two domains is mainly subjected to GAN, thus limiting the horizons of this field. This review analyzes Text-to-Image (T2I) synthesis as a broader picture, Text-guided Visual-output (T2Vo), with the primary goal being to highlight the gaps by proposing a more comprehensive taxonomy. We broadly categorize text-guided visual output into three main divisions and meaningful subdivisions by critically examining an extensive body of literature from top-tier computer vision venues and closely related fields, such as machine learning and human–computer interaction, aiming at state-of-the-art models with a comparative analysis. This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research. Full article

(This article belongs to the Special Issue AI Multimedia Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

AI Multimedia Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (11 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI