Digital Signal, Image and Video Processing for Emerging Multimedia Technology, Volume II

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 August 2022) | Viewed by 27615

Special Issue Editor

Special Issue Information

Dear Colleagues,

Recent developments in image/video-based deep learning technology have enabled new services in the field of multimedia and recognition technology. The technologies underlying the development of these recognition and emerging services are based on essential signal and image processing algorithms. In addition, the recent realistic media services, mixed reality, augmented reality and virtual reality media services also require very high-definition media creation, personalization, and transmission technologies, and this demand continues to grow. To accommodate these needs, international standardization and industry are studying various digital signal and image processing technologies to provide a variety of new or future media services.While this Special Issue invites topics broadly across the advanced signal, image and video processing algorithms and technologies for emerging multimedia services, some specific topics include, but are not limited to:

  • Signal/image/video processing algorithm for advanced machine learning
  • Fast and complexity-reducing mechanisms to support real-time systems
  • Protecting technologies for privacy/personalized information
  • Advanced circuit and system design and implementation for emerging multimedia services
  • Image/video-based recognition algorithms using deep neural networks
  • Novel applications for emerging multimedia services
  • Efficient media sharing schemes in distributed environments

Prof. Dr. Byung-Gyu Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Emerging multimedia
  • Signal/image/video processing
  • Real-time systems
  • Advanced machine learning
  • Image/video-based deep learning

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 3414 KiB  
Article
Speech Emotion Recognition Based on Parallel CNN-Attention Networks with Multi-Fold Data Augmentation
by John Lorenzo Bautista, Yun Kyung Lee and Hyun Soon Shin
Electronics 2022, 11(23), 3935; https://doi.org/10.3390/electronics11233935 - 28 Nov 2022
Cited by 13 | Viewed by 2262
Abstract
In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running [...] Read more.
In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running in parallel, was used to model both spatial features and temporal feature representations. Multiple Augmentation techniques using Additive White Gaussian Noise (AWGN), SpecAugment, Room Impulse Response (RIR), and Tanh Distortion techniques were used to augment the training data to further generalize the model representation. Raw audio data were transformed into Mel-Spectrograms as the model’s input. Using CNN’s proven capability in image classification and spatial feature representations, the spectrograms were treated as an image with the height and width represented by the spectrogram’s time and frequency scales. Temporal feature representations were represented by attention-based models Transformer, and BLSTM-Attention modules. Proposed architectures of the parallel CNN-based networks running along with Transformer and BLSTM-Attention modules were compared with standalone CNN architectures and attention-based networks, as well as with hybrid architectures with CNN layers wrapped in time-distributed wrappers stacked on attention-based networks. In these experiments, the highest accuracy of 89.33% for a Parallel CNN-Transformer network and 85.67% for a Parallel CNN-BLSTM-Attention Network were achieved on a 10% hold-out test set from the dataset. These networks showed promising results based on their accuracies, while keeping significantly less training parameters compared with non-parallel hybrid models. Full article
Show Figures

Figure 1

13 pages, 1086 KiB  
Article
An Adjacency Encoding Information-Based Fast Affine Motion Estimation Method for Versatile Video Coding
by Ximei Li, Jun He, Qi Li and Xingru Chen
Electronics 2022, 11(21), 3429; https://doi.org/10.3390/electronics11213429 - 23 Oct 2022
Cited by 3 | Viewed by 1282
Abstract
Versatile video coding (VVC), a new generation video coding standard, achieves significant improvements over high efficiency video coding (HEVC) due to its added advanced coding tools. Despite the fact that affine motion estimation adopted in VVC takes into account the translational, rotational, and [...] Read more.
Versatile video coding (VVC), a new generation video coding standard, achieves significant improvements over high efficiency video coding (HEVC) due to its added advanced coding tools. Despite the fact that affine motion estimation adopted in VVC takes into account the translational, rotational, and scaling motions of the object to improve the accuracy of interprediction, this technique adds a high computational complexity, making VVC unsuitable for use in real-time applications. To address this issue, an adjacency encoding information-based fast affine motion estimation method for VVC is proposed in this paper. First, this paper counts the probability of using the affine mode in interprediction. Then we analyze the trade-off between computational complexity and performance improvement based on statistical information. Finally, by exploring the mutual exclusivity between skip and affine modes, an enhanced method is proposed to reduce interprediction complexity. Experimental results show that compared with the VVC, the proposed low-complexity method achieves 10.11% total encoding time reduction and 40.85% time saving of affine motion estimation with a 0.16% Bjøontegaard delta bitrate (BDBR) increase. Full article
Show Figures

Figure 1

17 pages, 4815 KiB  
Article
Assessment of Compressed and Decompressed ECG Databases for Telecardiology Applying a Convolution Neural Network
by Ekta Soni, Arpita Nagpal, Puneet Garg and Plácido Rogerio Pinheiro
Electronics 2022, 11(17), 2708; https://doi.org/10.3390/electronics11172708 - 29 Aug 2022
Cited by 12 | Viewed by 1639
Abstract
Incalculable numbers of patients in hospitals as a result of COVID-19 made the screening of heart patients arduous. Patients who need regular heart monitoring were affected the most. Telecardiology is used for regular remote heart monitoring of such patients. However, the resultant huge [...] Read more.
Incalculable numbers of patients in hospitals as a result of COVID-19 made the screening of heart patients arduous. Patients who need regular heart monitoring were affected the most. Telecardiology is used for regular remote heart monitoring of such patients. However, the resultant huge electrocardiogram (ECG) data obtained through regular monitoring affects available storage space and transmission bandwidth. These signals can take less space if stored or sent in a compressed form. To recover them at the receiver end, they are decompressed. We have combined telecardiology with automatic ECG arrhythmia classification using CNN and proposed an algorithm named TELecardiology using a Deep Convolution Neural Network (TELDCNN). Discrete cosine transform (DCT), 16-bit quantization, and run length encoding (RLE) were used for compression, and a convolution neural network (CNN) was applied for classification. The database was formed by combining real-time signals (taken from a designed ECG device) with an online database from Physionet. Four kinds of databases were considered and classified. The attained compression ratio was 2.56, and the classification accuracies for compressed and decompressed databases were 0.966 and 0.990, respectively. Comparing the classification performance of compressed and decompressed databases shows that the decompressed signals can classify the arrhythmias more appropriately than their compressed-only form, although at the cost of increased computational time. Full article
Show Figures

Figure 1

15 pages, 824 KiB  
Article
Aspect-Based Sentiment Analysis in Hindi Language by Ensembling Pre-Trained mBERT Models
by Abhilash Pathak, Sudhanshu Kumar, Partha Pratim Roy and Byung-Gyu Kim
Electronics 2021, 10(21), 2641; https://doi.org/10.3390/electronics10212641 - 28 Oct 2021
Cited by 13 | Viewed by 3130
Abstract
Sentiment Analysis is becoming an essential task for academics, as well as for commercial companies. However, most current approaches only identify the overall polarity of a sentence, instead of the polarity of each aspect mentioned in the sentence. Aspect-Based Sentiment Analysis (ABSA) identifies [...] Read more.
Sentiment Analysis is becoming an essential task for academics, as well as for commercial companies. However, most current approaches only identify the overall polarity of a sentence, instead of the polarity of each aspect mentioned in the sentence. Aspect-Based Sentiment Analysis (ABSA) identifies the aspects within the given sentence, and the sentiment that was expressed for each aspect. Recently, the use of pre-trained models such as BERT has achieved state-of-the-art results in the field of natural language processing. In this paper, we propose two ensemble models based on multilingual-BERT, namely, mBERT-E-MV and mBERT-E-AS. Using different methods, we construct an auxiliary sentence from this aspect and convert the ABSA problem to a sentence-pair classification task. We then fine-tune different pre-trained BERT models and ensemble them for a final prediction based on the proposed model; we achieve new, state-of-the-art results for datasets belonging to different domains in the Hindi language. Full article
Show Figures

Figure 1

12 pages, 2994 KiB  
Article
Context-Based Inter Mode Decision Method for Fast Affine Prediction in Versatile Video Coding
by Seongwon Jung and Dongsan Jun
Electronics 2021, 10(11), 1243; https://doi.org/10.3390/electronics10111243 - 24 May 2021
Cited by 14 | Viewed by 2235
Abstract
Versatile Video Coding (VVC) is the most recent video coding standard developed by Joint Video Experts Team (JVET) that can achieve a bit-rate reduction of 50% with perceptually similar quality compared to the previous method, namely High Efficiency Video Coding (HEVC). Although VVC [...] Read more.
Versatile Video Coding (VVC) is the most recent video coding standard developed by Joint Video Experts Team (JVET) that can achieve a bit-rate reduction of 50% with perceptually similar quality compared to the previous method, namely High Efficiency Video Coding (HEVC). Although VVC can support the significant coding performance, it leads to the tremendous computational complexity of VVC encoder. In particular, VVC has newly adopted an affine motion estimation (AME) method to overcome the limitations of the translational motion model at the expense of higher encoding complexity. In this paper, we proposed a context-based inter mode decision method for fast affine prediction that determines whether the AME is performed or not in the process of rate-distortion (RD) optimization for optimal CU-mode decision. Experimental results showed that the proposed method significantly reduced the encoding complexity of AME up to 33% with unnoticeable coding loss compared to the VVC Test Model (VTM). Full article
Show Figures

Figure 1

12 pages, 8519 KiB  
Article
Two-Dimensional Audio Compression Method Using Video Coding Schemes
by Seonjae Kim, Dongsan Jun, Byung-Gyu Kim, Seungkwon Beack, Misuk Lee and Taejin Lee
Electronics 2021, 10(9), 1094; https://doi.org/10.3390/electronics10091094 - 06 May 2021
Cited by 2 | Viewed by 2182
Abstract
As video compression is one of the core technologies that enables seamless media streaming within the available network bandwidth, it is crucial to employ media codecs to support powerful coding performance and higher visual quality. Versatile Video Coding (VVC) is the latest video [...] Read more.
As video compression is one of the core technologies that enables seamless media streaming within the available network bandwidth, it is crucial to employ media codecs to support powerful coding performance and higher visual quality. Versatile Video Coding (VVC) is the latest video coding standard developed by the Joint Video Experts Team (JVET) that can compress original data hundreds of times in the image or video; the latest audio coding standard, Unified Speech and Audio Coding (USAC), achieves a compression rate of about 20 times for audio or speech data. In this paper, we propose a pre-processing method to generate a two-dimensional (2D) audio signal as an input of a VVC encoder, and investigate the applicability to 2D audio compression using the video coding scheme. To evaluate the coding performance, we measure both signal-to-noise ratio (SNR) and bits per sample (bps). The experimental result shows the possibility of researching 2D audio encoding using video coding schemes. Full article
Show Figures

Figure 1

17 pages, 9369 KiB  
Article
New Image Encryption Algorithm Using Hyperchaotic System and Fibonacci Q-Matrix
by Khalid M. Hosny, Sara T. Kamal, Mohamed M. Darwish and George A. Papakostas
Electronics 2021, 10(9), 1066; https://doi.org/10.3390/electronics10091066 - 30 Apr 2021
Cited by 64 | Viewed by 4476
Abstract
In the age of Information Technology, the day-life required transmitting millions of images between users. Securing these images is essential. Digital image encryption is a well-known technique used in securing image content. In image encryption techniques, digital images are converted into noise images [...] Read more.
In the age of Information Technology, the day-life required transmitting millions of images between users. Securing these images is essential. Digital image encryption is a well-known technique used in securing image content. In image encryption techniques, digital images are converted into noise images using secret keys, where restoring them to their originals required the same keys. Most image encryption techniques depend on two steps: confusion and diffusion. In this work, a new algorithm presented for image encryption using a hyperchaotic system and Fibonacci Q-matrix. The original image is confused in this algorithm, utilizing randomly generated numbers by the six-dimension hyperchaotic system. Then, the permutated image diffused using the Fibonacci Q-matrix. The proposed image encryption algorithm tested using noise and data cut attacks, histograms, keyspace, and sensitivity. Moreover, the proposed algorithm’s performance compared with several existing algorithms using entropy, correlation coefficients, and robustness against attack. The proposed algorithm achieved an excellent security level and outperformed the existing image encryption algorithms. Full article
Show Figures

Figure 1

12 pages, 2359 KiB  
Article
WMNet: A Lossless Watermarking Technique Using Deep Learning for Medical Image Authentication
by Yueh-Peng Chen, Tzuo-Yau Fan and Her-Chang Chao
Electronics 2021, 10(8), 932; https://doi.org/10.3390/electronics10080932 - 14 Apr 2021
Cited by 11 | Viewed by 2749
Abstract
Traditional watermarking techniques extract the watermark from a suspected image, allowing the copyright information regarding the image owner to be identified by the naked eye or by similarity estimation methods such as bit error rate and normalized correlation. However, this process should be [...] Read more.
Traditional watermarking techniques extract the watermark from a suspected image, allowing the copyright information regarding the image owner to be identified by the naked eye or by similarity estimation methods such as bit error rate and normalized correlation. However, this process should be more objective. In this paper, we implemented a model based on deep learning technology that can accurately identify the watermark copyright, known as WMNet. In the past, when establishing deep learning models, a large amount of training data needed to be collected. While constructing WMNet, we implemented a simulated process to generate a large number of distorted watermarks, and then collected them to form a training dataset. However, not all watermarks in the training dataset could properly provide copyright information. Therefore, according to the set restrictions, we divided the watermarks in the training dataset into two categories; consequently, WMNet could learn and identify the copyright information that the watermarks contained, so as to assist in the copyright verification process. Even if the retrieved watermark information was incomplete, the copyright information it contained could still be interpreted objectively and accurately. The results show that the method proposed by this study is relatively effective. Full article
Show Figures

Figure 1

16 pages, 3560 KiB  
Article
Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
by Xiaoyu Wu, Tiantian Wang and Shengjin Wang
Electronics 2020, 9(12), 2125; https://doi.org/10.3390/electronics9122125 - 11 Dec 2020
Cited by 5 | Viewed by 2505
Abstract
Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency [...] Read more.
Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval. Full article
Show Figures

Figure 1

22 pages, 3563 KiB  
Article
A Robust Forgery Detection Method for Copy–Move and Splicing Attacks in Images
by Mohammad Manzurul Islam, Gour Karmakar, Joarder Kamruzzaman and Manzur Murshed
Electronics 2020, 9(9), 1500; https://doi.org/10.3390/electronics9091500 - 12 Sep 2020
Cited by 14 | Viewed by 3678
Abstract
Internet of Things (IoT) image sensors, social media, and smartphones generate huge volumes of digital images every day. Easy availability and usability of photo editing tools have made forgery attacks, primarily splicing and copy–move attacks, effortless, causing cybercrimes to be on the rise. [...] Read more.
Internet of Things (IoT) image sensors, social media, and smartphones generate huge volumes of digital images every day. Easy availability and usability of photo editing tools have made forgery attacks, primarily splicing and copy–move attacks, effortless, causing cybercrimes to be on the rise. While several models have been proposed in the literature for detecting these attacks, the robustness of those models has not been investigated when (i) a low number of tampered images are available for model building or (ii) images from IoT sensors are distorted due to image rotation or scaling caused by unwanted or unexpected changes in sensors’ physical set-up. Moreover, further improvement in detection accuracy is needed for real-word security management systems. To address these limitations, in this paper, an innovative image forgery detection method has been proposed based on Discrete Cosine Transformation (DCT) and Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. First, images are divided into non-overlapping fixed size blocks and 2D block DCT is applied to capture changes due to image forgery. Then LBP is applied to the magnitude of the DCT array to enhance forgery artifacts. Finally, the mean value of a particular cell across all LBP blocks is computed, which yields a fixed number of features and presents a more computationally efficient method. Using Support Vector Machine (SVM), the proposed method has been extensively tested on four well known publicly available gray scale and color image forgery datasets, and additionally on an IoT based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples. Full article
Show Figures

Figure 1

Back to TopTop