Editorial

4 pages, 177 KiB

Open AccessEditorial

Digital Signal, Image and Video Processing for Emerging Multimedia Technology

by Byung-Gyu Kim

Electronics 2020, 9(12), 2012; https://doi.org/10.3390/electronics9122012 - 27 Nov 2020

Cited by 3 | Viewed by 2006

Recent developments in image/video-based deep learning technology have enabled new services in the field of multimedia and recognition technology [...] Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

Research

Jump to: Editorial, Other

16 pages, 4663 KiB

Open AccessArticle

Multiple Feature Dependency Detection for Deep Learning Technology—Smart Pet Surveillance System Implementation

by Ming-Fong Tsai, Pei-Ching Lin, Zi-Hao Huang and Cheng-Hsun Lin

Electronics 2020, 9(9), 1387; https://doi.org/10.3390/electronics9091387 - 27 Aug 2020

Cited by 7 | Viewed by 3291

Abstract

Image identification, machine learning and deep learning technologies have been applied in various fields. However, the application of image identification currently focuses on object detection and identification in order to determine a single momentary picture. This paper not only proposes multiple feature dependency [...] Read more.

Image identification, machine learning and deep learning technologies have been applied in various fields. However, the application of image identification currently focuses on object detection and identification in order to determine a single momentary picture. This paper not only proposes multiple feature dependency detection to identify key parts of pets (mouth and tail) but also combines the meaning of the pet’s bark (growl and cry) to identify the pet’s mood and state. Therefore, it is necessary to consider changes of pet hair and ages. To this end, we add an automatic optimization identification module subsystem to respond to changes of pet hair and ages in real time. After successfully identifying images of featured parts each time, our system captures images of the identified featured parts and stores them as effective samples for subsequent training and improving the identification ability of the system. When the identification result is transmitted to the owner each time, the owner can get the current mood and state of the pet in real time. According to the experimental results, our system can use a faster R-CNN model to improve 27.47%, 68.17% and 26.23% accuracy of traditional image identification in the mood of happy, angry and sad respectively. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

15 pages, 2926 KiB

Open AccessArticle

Learning to See in Extremely Low-Light Environments with Small Data

by Yifeng Xu, Huigang Wang, Garth Douglas Cooper, Shaowei Rong and Weitao Sun

Electronics 2020, 9(6), 1011; https://doi.org/10.3390/electronics9061011 - 17 Jun 2020

Cited by 1 | Viewed by 2749

Abstract

Recent advances in deep learning have shown exciting promise in various artificial intelligence vision tasks, such as image classification, image noise reduction, object detection, semantic segmentation, and more. The restoration of the image captured in an extremely dark environment is one of the [...] Read more.

Recent advances in deep learning have shown exciting promise in various artificial intelligence vision tasks, such as image classification, image noise reduction, object detection, semantic segmentation, and more. The restoration of the image captured in an extremely dark environment is one of the subtasks in computer vision. Some of the latest progress in this field depends on sophisticated algorithms and massive image pairs taken in low-light and normal-light conditions. However, it is difficult to capture pictures of the same size and the same location under two different light level environments. We propose a method named NL2LL to collect the underexposure images and the corresponding normal exposure images by adjusting camera settings in the “normal” level of light during the daytime. The normal light of the daytime provides better conditions for taking high-quality image pairs quickly and accurately. Additionally, we describe the regularized denoising autoencoder is effective for restoring a low-light image. Due to high-quality training data, the proposed restoration algorithm achieves superior results for images taken in an extremely low-light environment (about 100× underexposure). Our algorithm surpasses most contrasted methods solely relying on a small amount of training data, 20 image pairs. The experiment also shows the model adapts to different brightness environments. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

13 pages, 13557 KiB

Open AccessArticle

Development Design of Wrist-Mounted Dive Computer for Marine Leisure Activities

by Jeongho Lee and Dongsan Jun

Electronics 2020, 9(5), 727; https://doi.org/10.3390/electronics9050727 - 28 Apr 2020

Cited by 2 | Viewed by 4399

Abstract

Divers conventionally use underwater notepad or flash to communicate each other in the water. For safe marine leisure activities, touchscreen based intuitive means of communications such as drawing and writing are needed to be integrated into the conventional dive computers. In this paper, [...] Read more.

Divers conventionally use underwater notepad or flash to communicate each other in the water. For safe marine leisure activities, touchscreen based intuitive means of communications such as drawing and writing are needed to be integrated into the conventional dive computers. In this paper, we propose a wrist-mounted dive computer, so called DiverPAD, for underwater drawing and writing. For the framework design of proposed DiverPAD, firmware, communication protocol, user interface (UI), and underwater touchscreen functions are designed and integrated on DiverPAD. As a key feature, we deployed an electrical insulator based capacitive touchscreen which enables divers to perform underwater drawing and writing for clear and immediate information delivery in the water. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

18 pages, 32425 KiB

Open AccessArticle

Detecting Objects from Space: An Evaluation of Deep-Learning Modern Approaches

by Khang Nguyen, Nhut T. Huynh, Phat C. Nguyen, Khanh-Duy Nguyen, Nguyen D. Vo and Tam V. Nguyen

Electronics 2020, 9(4), 583; https://doi.org/10.3390/electronics9040583 - 30 Mar 2020

Cited by 25 | Viewed by 6491

Abstract

Unmanned aircraft systems or drones enable us to record or capture many scenes from the bird’s-eye view and they have been fast deployed to a wide range of practical domains, i.e., agriculture, aerial photography, fast delivery and surveillance. Object detection task is one [...] Read more.

Unmanned aircraft systems or drones enable us to record or capture many scenes from the bird’s-eye view and they have been fast deployed to a wide range of practical domains, i.e., agriculture, aerial photography, fast delivery and surveillance. Object detection task is one of the core steps in understanding videos collected from the drones. However, this task is very challenging due to the unconstrained viewpoints and low resolution of captured videos. While deep-learning modern object detectors have recently achieved great success in general benchmarks, i.e., PASCAL-VOC and MS-COCO, the robustness of these detectors on aerial images captured by drones is not well studied. In this paper, we present an evaluation of state-of-the-art deep-learning detectors including Faster R-CNN (Faster Regional CNN), RFCN (Region-based Fully Convolutional Networks), SNIPER (Scale Normalization for Image Pyramids with Efficient Resampling), Single-Shot Detector (SSD), YOLO (You Only Look Once), RetinaNet, and CenterNet for the object detection in videos captured by drones. We conduct experiments on VisDrone2019 dataset which contains 96 videos with 39,988 annotated frames and provide insights into efficient object detectors for aerial images. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

16 pages, 9144 KiB

Open AccessArticle

Multiscale Image Matting Based Multi-Focus Image Fusion Technique

by Sarmad Maqsood, Umer Javed, Muhammad Mohsin Riaz, Muhammad Muzammil, Fazal Muhammad and Sunghwan Kim

Electronics 2020, 9(3), 472; https://doi.org/10.3390/electronics9030472 - 12 Mar 2020

Cited by 17 | Viewed by 3404

Abstract

Multi-focus image fusion is a very essential method of obtaining an all focus image from multiple source images. The fused image eliminates the out of focus regions, and the resultant image contains sharp and focused regions. A novel multiscale image fusion system based [...] Read more.

Multi-focus image fusion is a very essential method of obtaining an all focus image from multiple source images. The fused image eliminates the out of focus regions, and the resultant image contains sharp and focused regions. A novel multiscale image fusion system based on contrast enhancement, spatial gradient information and multiscale image matting is proposed to extract the focused region information from multiple source images. In the proposed image fusion approach, the multi-focus source images are firstly refined over an image enhancement algorithm so that the intensity distribution is enhanced for superior visualization. The edge detection method based on a spatial gradient is employed for obtaining the edge information from the contrast stretched images. This improved edge information is further utilized by a multiscale window technique to produce local and global activity maps. Furthermore, a trimap and decision maps are obtained based upon the information provided by these near and far focus activity maps. Finally, the fused image is achieved by using an enhanced decision maps and fusion rule. The proposed multiscale image matting (MSIM) makes full use of the spatial consistency and the correlation among source images and, therefore, obtains superior performance at object boundaries compared to region-based methods. The achievement of the proposed method is compared with some of the latest techniques by performing qualitative and quantitative evaluation. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

14 pages, 543 KiB

Open AccessFeature PaperArticle

Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning

by Sudhanshu Kumar, Monika Gahalawat, Partha Pratim Roy, Debi Prosad Dogra and Byung-Gyu Kim

Electronics 2020, 9(2), 374; https://doi.org/10.3390/electronics9020374 - 22 Feb 2020

Cited by 62 | Viewed by 11040

Abstract

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied [...] Read more.

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

14 pages, 3525 KiB

Open AccessArticle

Image Text Deblurring Method Based on Generative Adversarial Network

by Chunxue Wu, Haiyan Du, Qunhui Wu and Sheng Zhang

Electronics 2020, 9(2), 220; https://doi.org/10.3390/electronics9020220 - 27 Jan 2020

Cited by 10 | Viewed by 4935

Abstract

In the automatic sorting process of express delivery, a three-segment code is used to represent a specific area assigned by a specific delivery person. In the process of obtaining the courier order information, the camera is affected by factors such as light, noise, [...] Read more.

In the automatic sorting process of express delivery, a three-segment code is used to represent a specific area assigned by a specific delivery person. In the process of obtaining the courier order information, the camera is affected by factors such as light, noise, and subject shake, which will cause the information on the courier order to be blurred, and some information will be lost. Therefore, this paper proposes an image text deblurring method based on a generative adversarial network. The model of the algorithm consists of two generative adversarial networks, combined with Wasserstein distance, using a combination of adversarial loss and perceptual loss on unpaired datasets to train the network model to restore the captured blurred images into clear and natural image. Compared with the traditional method, the advantage of this method is that the loss function between the input and output images can be calculated indirectly through the positive and negative generative adversarial networks. The Wasserstein distance can achieve a more stable training process and a more realistic generation effect. The constraints of adversarial loss and perceptual loss make the model capable of training on unpaired datasets. The experimental results on the GOPRO test dataset and the self-built unpaired dataset showed that the two indicators, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), increased by 13.3% and 3%, respectively. The human perception test results demonstrated that the algorithm proposed in this paper was better than the traditional blur algorithm as the deblurring effect was better. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

13 pages, 2453 KiB

Open AccessArticle

A Novel Rate Control Algorithm Based on ρ Model for Multiview High Efficiency Video Coding

by Tao Yan, In-Ho Ra, Qian Zhang, Hang Xu and Linyun Huang

Electronics 2020, 9(1), 166; https://doi.org/10.3390/electronics9010166 - 16 Jan 2020

Cited by 6 | Viewed by 2958

Abstract

Most existing rate control algorithms are based on the rate-quantization (R-Q) model. However, with video coding schemes becoming more flexible, it is very difficult to accurately model the R-Q relationship. Therefore, in this study we propose a novel ρ domain rate control algorithm [...] Read more.

Most existing rate control algorithms are based on the rate-quantization (R-Q) model. However, with video coding schemes becoming more flexible, it is very difficult to accurately model the R-Q relationship. Therefore, in this study we propose a novel ρ domain rate control algorithm for multiview high efficiency video coding (MV-HEVC). Firstly, in order to further improve the efficiency of MV-HEVC, this paper uses our previous research algorithm to optimize the MV-HEVC prediction structure. Then, we established the ρ domain rate control model based on multi-objective optimization. Finally, it used image similarity to analyze the correlation between viewpoints, using encoded information and frame complexity to proceed in bit allocation and bit rate control of the inter-view, frame lay, and base unit. The experimental simulation results show that the algorithm can simultaneously maintain high coding efficiency, where the average error of the actual bit rate and the target bit rate is only 0.9%. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

16 pages, 5935 KiB

Open AccessArticle

Real-Time Detection and Recognition of Multiple Moving Objects for Aerial Surveillance

by Wahyu Rahmaniar, Wen-June Wang and Hsiang-Chieh Chen

Electronics 2019, 8(12), 1373; https://doi.org/10.3390/electronics8121373 - 20 Nov 2019

Cited by 17 | Viewed by 4477

Abstract

Detection of moving objects by unmanned aerial vehicles (UAVs) is an important application in the aerial transportation system. However, there are many problems to be handled such as high-frequency jitter from UAVs, small size objects, low-quality images, computation time reduction, and detection correctness. [...] Read more.

Detection of moving objects by unmanned aerial vehicles (UAVs) is an important application in the aerial transportation system. However, there are many problems to be handled such as high-frequency jitter from UAVs, small size objects, low-quality images, computation time reduction, and detection correctness. This paper considers the problem of the detection and recognition of moving objects in a sequence of images captured from a UAV. A new and efficient technique is proposed to achieve the above objective in real time and in real environment. First, the feature points between two successive frames are found for estimating the camera movement to stabilize sequence of images. Then, region of interest (ROI) of the objects are detected as the moving object candidate (foreground). Furthermore, static and dynamic objects are classified based on the most motion vectors that occur in the foreground and background. Based on the experiment results, the proposed method achieves a precision rate of 94% and the computation time of 47.08 frames per second (fps). In comparison to other methods, the performance of the proposed method surpasses those of existing methods. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

19 pages, 2064 KiB

Open AccessArticle

An Approach to Hyperparameter Optimization for the Objective Function in Machine Learning

by Yonghoon Kim and Mokdong Chung

Electronics 2019, 8(11), 1267; https://doi.org/10.3390/electronics8111267 - 01 Nov 2019

Cited by 14 | Viewed by 4139

Abstract

In machine learning, performance is of great value. However, each learning process requires much time and effort in setting each parameter. The critical problem in machine learning is determining the hyperparameters, such as the learning rate, mini-batch size, and regularization coefficient. In particular, [...] Read more.

In machine learning, performance is of great value. However, each learning process requires much time and effort in setting each parameter. The critical problem in machine learning is determining the hyperparameters, such as the learning rate, mini-batch size, and regularization coefficient. In particular, we focus on the learning rate, which is directly related to learning efficiency and performance. Bayesian optimization using a Gaussian Process is common for this purpose. In this paper, based on Bayesian optimization, we attempt to optimize the hyperparameters automatically by utilizing a Gamma distribution, instead of a Gaussian distribution, to improve the training performance of predicting image discrimination. As a result, our proposed method proves to be more reasonable and efficient in the estimation of learning rate when training the data, and can be useful in machine learning. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

15 pages, 2082 KiB

Open AccessArticle

CNN-Based Ternary Classification for Image Steganalysis

by Sanghoon Kang, Hanhoon Park and Jong-Il Park

Electronics 2019, 8(11), 1225; https://doi.org/10.3390/electronics8111225 - 26 Oct 2019

Cited by 10 | Viewed by 3460

Abstract

This study proposes a convolutional neural network (CNN)-based steganalytic method that allows ternary classification to simultaneously identify WOW and UNIWARD, which are representative adaptive image steganographic algorithms. WOW and UNIWARD have very similar message embedding methods in terms of measuring and minimizing the [...] Read more.

This study proposes a convolutional neural network (CNN)-based steganalytic method that allows ternary classification to simultaneously identify WOW and UNIWARD, which are representative adaptive image steganographic algorithms. WOW and UNIWARD have very similar message embedding methods in terms of measuring and minimizing the degree of distortion of images caused by message embedding. This similarity between WOW and UNIWARD makes it difficult to distinguish between both algorithms even in a CNN-based classifier. Our experiments particularly show that WOW and UNIWARD cannot be distinguished by simply combining binary CNN-based classifiers learned to separately identify both algorithms. Therefore, to identify and classify WOW and UNIWARD, WOW and UNIWARD must be learned at the same time using a single CNN-based classifier designed for ternary classification. This study proposes a method for ternary classification that learns and classifies cover, WOW stego, and UNIWARD stego images using a single CNN-based classifier. A CNN structure and a preprocessing filter are also proposed to effectively classify/identify WOW and UNIWARD. Experiments using BOSSBase 1.01 database images confirmed that the proposed method could make a ternary classification with an accuracy of approximately 72%. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

16 pages, 6974 KiB

Open AccessArticle

False Positive Decremented Research for Fire and Smoke Detection in Surveillance Camera using Spatial and Temporal Features Based on Deep Learning

by Yeunghak Lee and Jaechang Shim

Electronics 2019, 8(10), 1167; https://doi.org/10.3390/electronics8101167 - 15 Oct 2019

Cited by 12 | Viewed by 4265

Abstract

Fire must be extinguished early, as it leads to economic losses and losses of precious lives. Vision-based methods have many difficulties in algorithm research due to the atypical nature fire flame and smoke. In this study, we introduce a novel smoke detection algorithm [...] Read more.

Fire must be extinguished early, as it leads to economic losses and losses of precious lives. Vision-based methods have many difficulties in algorithm research due to the atypical nature fire flame and smoke. In this study, we introduce a novel smoke detection algorithm that reduces false positive detection using spatial and temporal features based on deep learning from factory installed surveillance cameras. First, we calculated the global frame similarity and mean square error (MSE) to detect the moving of fire flame and smoke from input surveillance cameras. Second, we extracted the fire flame and smoke candidate area using the deep learning algorithm (Faster Region-based Convolutional Network (R-CNN)). Third, the final fire flame and smoke area was decided by local spatial and temporal information: frame difference, color, similarity, wavelet transform, coefficient of variation, and MSE. This research proposed a new algorithm using global and local frame features, which is well presented object information to reduce false positive based on the deep learning method. Experimental results show that the false positive detection of the proposed algorithm was reduced to about 99.9% in maintaining the smoke and fire detection performance. It was confirmed that the proposed method has excellent false detection performance. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

20 pages, 5287 KiB

Open AccessArticle

WS-AM: Weakly Supervised Attention Map for Scene Recognition

by Shifeng Xia, Jiexian Zeng, Lu Leng and Xiang Fu

Electronics 2019, 8(10), 1072; https://doi.org/10.3390/electronics8101072 - 21 Sep 2019

Cited by 13 | Viewed by 3364

Abstract

Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition. Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition. However, the existing scene recognition methods based on CNN do not sufficiently [...] Read more.

Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition. Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition. However, the existing scene recognition methods based on CNN do not sufficiently take into account the relationship between image regions and categories when choosing local regions, which results in many redundant local regions and degrades recognition accuracy. In this paper, we propose an effective method for exploring discriminative regions of the scene image. Our method utilizes the gradient-weighted class activation mapping (Grad-CAM) technique and weakly supervised information to generate the attention map (AM) of scene images, dubbed WS-AM—weakly supervised attention map. The regions, where the local mean and the local center value are both large in the AM, correspond to the discriminative regions helpful for scene recognition. We sampled discriminative regions on multiple scales and extracted the features of large-scale and small-scale regions with two different pre-trained CNNs, respectively. The features from two different scales were aggregated by the improved vector of locally aggregated descriptor (VLAD) coding and max pooling, respectively. Finally, the pre-trained CNN was used to extract the global feature of the image in the fully- connected (fc) layer, and the local features were combined with the global feature to obtain the image representation. We validated the effectiveness of our method on three benchmark datasets: MIT Indoor 67, Scene 15, and UIUC Sports, and obtained 85.67%, 94.80%, and 95.12% accuracy, respectively. Compared with some state-of-the-art methods, the WS-AM method requires fewer local regions, so it has a better real-time performance. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

14 pages, 1326 KiB

Open AccessArticle

Layer Selection in Progressive Transmission of Motion-Compensated JPEG2000 Video

by José Carmelo Maturana-Espinosa, Juan Pablo García-Ortiz, Daniel Müller and Vicente González-Ruiz

Electronics 2019, 8(9), 1032; https://doi.org/10.3390/electronics8091032 - 13 Sep 2019

Cited by 1 | Viewed by 2346

Abstract

MCJ2K (Motion-Compensated JPEG2000) is a video codec based on MCTF (Motion- Compensated Temporal Filtering) and J2K (JPEG2000). MCTF analyzes a sequence of images, generating a collection of temporal sub-bands, which are compressed with J2K. The R/D (Rate-Distortion) performance in MCJ2K is better than [...] Read more.

MCJ2K (Motion-Compensated JPEG2000) is a video codec based on MCTF (Motion- Compensated Temporal Filtering) and J2K (JPEG2000). MCTF analyzes a sequence of images, generating a collection of temporal sub-bands, which are compressed with J2K. The R/D (Rate-Distortion) performance in MCJ2K is better than the MJ2K (Motion JPEG2000) extension, especially if there is a high level of temporal redundancy. MCJ2K codestreams can be served by standard JPIP (J2K Interactive Protocol) servers, thanks to the use of only J2K standard file formats. In bandwidth-constrained scenarios, an important issue in MCJ2K is determining the amount of data of each temporal sub-band that must be transmitted to maximize the quality of the reconstructions at the client side. To solve this problem, we have proposed two rate-allocation algorithms which provide reconstructions that are progressive in quality. The first, OSLA (Optimized Sub-band Layers Allocation), determines the best progression of quality layers, but is computationally expensive. The second, ESLA (Estimated-Slope sub-band Layers Allocation), is sub-optimal in most cases, but much faster and more convenient for real-time streaming scenarios. An experimental comparison shows that even when a straightforward motion compensation scheme is used, the R/D performance of MCJ2K competitive is compared not only to MJ2K, but also with respect to other standard scalable video codecs. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

15 pages, 2844 KiB

Open AccessFeature PaperArticle

Design of Efficient Perspective Affine Motion Estimation/Compensation for Versatile Video Coding (VVC) Standard

by Young-Ju Choi, Dong-San Jun, Won-Sik Cheong and Byung-Gyu Kim

Electronics 2019, 8(9), 993; https://doi.org/10.3390/electronics8090993 - 05 Sep 2019

Cited by 11 | Viewed by 5262

Abstract

The fundamental motion model of the conventional block-based motion compensation in High Efficiency Video Coding (HEVC) is a translational motion model. However, in the real world, the motion of an object exists in the form of combining many kinds of motions. In Versatile [...] Read more.

The fundamental motion model of the conventional block-based motion compensation in High Efficiency Video Coding (HEVC) is a translational motion model. However, in the real world, the motion of an object exists in the form of combining many kinds of motions. In Versatile Video Coding (VVC), a block-based 4-parameter and 6-parameter affine motion compensation (AMC) is being applied. In natural videos, in the majority of cases, a rigid object moves without any regularity rather than maintains the shape or transform with a certain rate. For this reason, the AMC still has a limit to compute complex motions. Therefore, more flexible motion model is desired for new video coding tool. In this paper, we design a perspective affine motion compensation (PAMC) method which can cope with more complex motions such as shear and shape distortion. The proposed PAMC utilizes perspective and affine motion model. The perspective motion model-based method uses four control point motion vectors (CPMVs) to give degree of freedom to all four corner vertices. Besides, the proposed algorithm is integrated into the AMC structure so that the existing affine mode and the proposed perspective mode can be executed adaptively. Because the block with the perspective motion model is a rectangle without specific feature, the proposed PAMC shows effective encoding performance for the test sequence containing irregular object distortions or dynamic rapid motions in particular. Our proposed algorithm is implemented on VTM 2.0. The experimental results show that the BD-rate reduction of the proposed technique can be achieved up to 0.45% and 0.30% on Y component for random access (RA) and low delay P (LDP) configurations, respectively. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

15 pages, 9262 KiB

Open AccessArticle

MID Filter: An Orientation-Based Nonlinear Filter For Reducing Multiplicative Noise

by Ibrahim Furkan Ince, Omer Faruk Ince and Faruk Bulut

Electronics 2019, 8(9), 936; https://doi.org/10.3390/electronics8090936 - 26 Aug 2019

Cited by 4 | Viewed by 3210

Abstract

In this study, an edge-preserving nonlinear filter is proposed to reduce multiplicative noise by using a filter structure based on mathematical morphology. This method is called the minimum index of dispersion (MID) filter. MID is an improved and extended version of MCV (minimum [...] Read more.

In this study, an edge-preserving nonlinear filter is proposed to reduce multiplicative noise by using a filter structure based on mathematical morphology. This method is called the minimum index of dispersion (MID) filter. MID is an improved and extended version of MCV (minimum coefficient of variation) and MLV (mean least variance) filters. Different from these filters, this paper proposes an extra-layer for the value-and-criterion function in which orientation information is employed in addition to the intensity information. Furthermore, the selection function is re-modeled by performing low-pass filtering (mean filtering) to reduce multiplicative noise. MID outputs are benchmarked with the outputs of MCV and MLV filters in terms of structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), mean squared error (MSE), standard deviation, and contrast value metrics. Additionally, F Score, which is a hybrid metric that is the combination of all five of those metrics, is presented in order to evaluate all the filters. Experimental results and extensive benchmarking studies show that the proposed method achieves promising results better than conventional MCV and MLV filters in terms of robustness in both edge preservation and noise removal. Noise filter methods normally cannot give better results in noise removal and edge-preserving at the same time. However, this study proves a great contribution that MID filter produces better results in both noise cleaning and edge preservation. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

26 pages, 5706 KiB

Open AccessArticle

Adaptive Algorithm on Block-Compressive Sensing and Noisy Data Estimation

by Yongjun Zhu, Wenbo Liu and Qian Shen

Electronics 2019, 8(7), 753; https://doi.org/10.3390/electronics8070753 - 03 Jul 2019

Cited by 10 | Viewed by 3174

Abstract

In this paper, an altered adaptive algorithm on block-compressive sensing (BCS) is developed by using saliency and error analysis. A phenomenon has been observed that the performance of BCS can be improved by means of rational block and uneven sampling ratio as well [...] Read more.

In this paper, an altered adaptive algorithm on block-compressive sensing (BCS) is developed by using saliency and error analysis. A phenomenon has been observed that the performance of BCS can be improved by means of rational block and uneven sampling ratio as well as adopting error analysis in the process of reconstruction. The weighted mean information entropy is adopted as the basis for partitioning of BCS which results in a flexible block group. Furthermore, the synthetic feature (SF) based on local saliency and variance is introduced to step-less adaptive sampling that works well in distinguishing and sampling between smooth blocks and detail blocks. The error analysis method is used to estimate the optimal number of iterations in sparse reconstruction. Based on the above points, an altered adaptive block-compressive sensing algorithm with flexible partitioning and error analysis is proposed in the article. On the one hand, it provides a feasible solution for the partitioning and sampling of an image, on the other hand, it also changes the iteration stop condition of reconstruction, and then improves the quality of the reconstructed image. The experimental results verify the effectiveness of the proposed algorithm and illustrate a good improvement in the indexes of the Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), Gradient Magnitude Similarity Deviation (GMSD), and Block Effect Index (BEI). Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

25 pages, 4648 KiB

Open AccessArticle

Wiener–Granger Causality Theory Supported by a Genetic Algorithm to Characterize Natural Scenery

by César Benavides-Álvarez, Juan Villegas-Cortez, Graciela Román-Alonso and Carlos Avilés-Cruz

Electronics 2019, 8(7), 726; https://doi.org/10.3390/electronics8070726 - 26 Jun 2019

Cited by 2 | Viewed by 3195

Abstract

Image recognition and classification have been widely used for research in computer vision systems. This paper aims to implement a new strategy called Wiener-Granger Causality theory for classifying natural scenery images. This strategy is based on self-content images extracted using a Content-Based Image [...] Read more.

Image recognition and classification have been widely used for research in computer vision systems. This paper aims to implement a new strategy called Wiener-Granger Causality theory for classifying natural scenery images. This strategy is based on self-content images extracted using a Content-Based Image Retrieval (CBIR) methodology (to obtain different texture features); later, a Genetic Algorithm (GA) is implemented to select the most relevant natural elements from the images which share similar causality patterns. The proposed method is comprised of a sequential feature extraction stage, a time series conformation task, a causality estimation phase, causality feature selection throughout the GA implementation (using the classification process into the fitness function). A classification stage was implemented and 700 images of natural scenery were used for validating the results. Tested in the distribution system implementation, the technical efficiency of the developed system is 100% and 96% for resubstitution and cross-validation methodologies, respectively. This proposal could help with recognizing natural scenarios in the navigation of an autonomous car or possibly a drone, being an important element in the safety of autonomous vehicles navigation. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

16 pages, 3930 KiB

Open AccessArticle

An Efficient Separable Reversible Data Hiding Using Paillier Cryptosystem for Preserving Privacy in Cloud Domain

by Ahmad Neyaz Khan, Ming Yu Fan, Muhammad Irshad Nazeer, Raheel Ahmed Memon, Asad Malik and Mohammed Aslam Husain

Electronics 2019, 8(6), 682; https://doi.org/10.3390/electronics8060682 - 17 Jun 2019

Cited by 12 | Viewed by 3724

Abstract

Reversible data hiding in encrypted image (RDHEI) is advantageous to scenarios where complete recovery of the original cover image and additional data are required. In some of the existing RDHEI schemes, the image pre-processing step involved is an overhead for the resource-constrained devices [...] Read more.

Reversible data hiding in encrypted image (RDHEI) is advantageous to scenarios where complete recovery of the original cover image and additional data are required. In some of the existing RDHEI schemes, the image pre-processing step involved is an overhead for the resource-constrained devices on the sender’s side. In this paper, an efficient separable reversible data hiding scheme over a homomorphically encrypted image that assures privacy preservation of the contents in the cloud environment is proposed. This proposed scheme comprises three stakeholders: content-owner, data hider, and receiver. Initially, the content-owner encrypts the original image and sends the encrypted image to the data hider. The data hider embeds the encrypted additional data into the encrypted image and then sends the marked encrypted image to the receiver. On the receiver’s side, both additional data and the original image are extracted in a separable manner, i.e., additional data and the original image are extracted independently and completely from the marked encrypted image. The present scheme uses public key cryptography and facilitates the encryption of the original image on the content-owner side, without any pre-processing step involved. In addition, our experiment used distinct images to demonstrate the image-independency and the obtained results show high embedding rate where the peak signal noise ratio (PSNR) is +∞ dB for the directly decrypted image. Finally, a comparison is drawn, which shows that the proposed scheme is an optimized approach for resource-constrained devices as it omits the image pre-processing step. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

16 pages, 8435 KiB

Open AccessArticle

Wavelet-Integrated Deep Networks for Single Image Super-Resolution

by Faisal Sahito, Pan Zhiwen, Junaid Ahmed and Raheel Ahmed Memon

Electronics 2019, 8(5), 553; https://doi.org/10.3390/electronics8050553 - 17 May 2019

Cited by 12 | Viewed by 3845

Abstract

We propose a scale-invariant deep neural network model based on wavelets for single image super-resolution (SISR). The wavelet approximation images and their corresponding wavelet sub-bands across all predefined scale factors are combined to form a big training data set. Then, mappings are determined [...] Read more.

We propose a scale-invariant deep neural network model based on wavelets for single image super-resolution (SISR). The wavelet approximation images and their corresponding wavelet sub-bands across all predefined scale factors are combined to form a big training data set. Then, mappings are determined between the wavelet sub-band images and their corresponding approximation images. Finally, the gradient clipping process is used to boost the training speed of the algorithm. Furthermore, stationary wavelet transform (SWT) is used instead of a discrete wavelet transform (DWT), due to its up-scaling property. In this way, we can preserve more information about the images. In the proposed model, the high-resolution image is recovered with detailed features, due to redundancy (across the scale) property of wavelets. Experimental results show that the proposed model outperforms state-of-the algorithms in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

17 pages, 25762 KiB

Open AccessArticle

Reversible Data Hiding Using Inter-Component Prediction in Multiview Video Plus Depth

by Jin Young Lee, Cheonshik Kim and Ching-Nung Yang

Electronics 2019, 8(5), 514; https://doi.org/10.3390/electronics8050514 - 09 May 2019

Cited by 3 | Viewed by 2901

Abstract

With the advent of 3D video compression and Internet technology, 3D videos have been deployed worldwide. Data hiding is a part of watermarking technologies and has many capabilities. In this paper, we use 3D video as a cover medium for secret communication using [...] Read more.

With the advent of 3D video compression and Internet technology, 3D videos have been deployed worldwide. Data hiding is a part of watermarking technologies and has many capabilities. In this paper, we use 3D video as a cover medium for secret communication using a reversible data hiding (RDH) technology. RDH is advantageous, because the cover image can be completely recovered after extraction of the hidden data. Recently, Chung et al. introduced RDH for depth map using prediction-error expansion (PEE) and rhombus prediction for marking of 3D videos. The performance of Chung et al.’s method is efficient, but they did not find the way for developing pixel resources to maximize data capacity. In this paper, we will improve the performance of embedding capacity using PEE, inter-component prediction, and allowable pixel ranges. Inter-component prediction utilizes a strong correlation between the texture image and the depth map in MVD. Moreover, our proposed scheme provides an ability to control the quality of depth map by a simple formula. Experimental results demonstrate that the proposed method is more efficient than the existing RDH methods in terms of capacity. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

15 pages, 3061 KiB

Open AccessArticle

CCTV Video Processing Metadata Security Scheme Using Character Order Preserving-Transformation in the Emerging Multimedia

by Jinsu Kim, Namje Park, Geonwoo Kim and Seunghun Jin

Electronics 2019, 8(4), 412; https://doi.org/10.3390/electronics8040412 - 09 Apr 2019

Cited by 38 | Viewed by 4748

Abstract

Intelligent video surveillance environments enable the gathering of various types of information about the object being recorded, through the analysis of real-time video data collected from CCTV systems and the automated processing that utilize the information. However, the surveillance environments face the risks [...] Read more.

Intelligent video surveillance environments enable the gathering of various types of information about the object being recorded, through the analysis of real-time video data collected from CCTV systems and the automated processing that utilize the information. However, the surveillance environments face the risks of privacy exposure, which necessitates secure countermeasures. Video meta-data, in particular, contain various types of personal information that is analyzed based on big data and are thus fraught with high levels of confidentiality breaches. Despite such risks, it is not appropriate to implement encryption for video meta-data considering the efficiency issue. This paper proposes a character order preserving (COP)-transformation technique that allows the secure protection of video meta-data. The proposed technique has the merits of preventing the recovery of original meta information through meta transformation and allowing direct queries on the data transformed, increasing significantly both security and efficiency in the video meta-data processing. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

Other

Jump to: Editorial, Research

22 pages, 2179 KiB

Open AccessBrief Report

Image Classification with Convolutional Neural Networks Using Gulf of Maine Humpback Whale Catalog

by Nuria Gómez Blas, Luis Fernando de Mingo López, Alberto Arteta Albert and Javier Martínez Llamas

Electronics 2020, 9(5), 731; https://doi.org/10.3390/electronics9050731 - 29 Apr 2020

Cited by 6 | Viewed by 4228

Abstract

While whale cataloging provides the opportunity to demonstrate the potential of bio preservation as sustainable development, it is essential to have automatic identification models. This paper presents a study and implementation of a convolutional neural network to identify and recognize humpback whale specimens [...] Read more.

While whale cataloging provides the opportunity to demonstrate the potential of bio preservation as sustainable development, it is essential to have automatic identification models. This paper presents a study and implementation of a convolutional neural network to identify and recognize humpback whale specimens by processing their tails patterns. This work collects datasets of composed images of whale tails, then trains a neural network by analyzing and pre-processing images with TensorFlow and Keras frameworks. This paper focuses on an identification problem, that is, since it is an identification challenge, each whale is a separate class and whales were photographed multiple times and one attempts to identify a whale class in the testing set. Other possible alternatives with lower cost are also introduced and are the subject of discussion in this paper. This paper reports about a network that is not necessarily the best one in terms of accuracy, but this work tries to minimize resources using an image downsampling and a small architecture, interesting for embedded system. Full article

(This article belongs to the Special Issue Digital Signal, Image and Video Processing for Emerging Multimedia Technology)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Digital Signal, Image and Video Processing for Emerging Multimedia Technology

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (24 papers)

Editorial

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI