New Technologies in Digital Media Processing: When Computer Vision Meets Natural Language Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 May 2023) | Viewed by 8100

Special Issue Editors

College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
Interests: computer vision; pattern recognition; machine learning
Computer School, Beijing Information & Technology University, Beijing, China
Interests: computer vision; computer graphics; virtual reality
Computer School, Beijing Information & Technology University, Beijing 100192, China
Interests: natural language processing; computer vision; computer graphics; virtual reality

Special Issue Information

Dear Colleagues,

With the rapid development of deep learning technologies, existing computer vision (CV) theories have been widely and successfully used in many applications, such as city security, automatic drive, face recognition, computer-aided medical diagnosis, and remote sensing. Meanwhile, the critical objective of natural language processing (NLP) is to understand word-based data in relation to semantics. Thus, the application scope of NLP is somewhat different from that of conventional image processing, leading to a clear gap between them. Additionally, the advancements in deep learning tools in the NLP community are lagging behind the CV field. In fact, increasing evidence has illustrated the value of mature deep learning-related solutions and multi-modality data fusion. Consequently, we believe that more research should consider how the CV community can benefit from progress in NLP. This Special Issue will bring together researchers in both CV and NLP and share the latest research and technical progress on multi-modality-related applications, bridging the gap between these two research fields. We welcome all submissions which cover both CV and NLP.

Prof. Dr. Chenglizhao Chen
Dr. Wenfeng Song
Dr. Xia Hou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • natural language processing
  • machine learning

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 5104 KiB  
Article
Video Saliency Object Detection with Motion Quality Compensation
by Hengsen Wang, Chenglizhao Chen, Linfeng Li and Chong Peng
Electronics 2023, 12(7), 1618; https://doi.org/10.3390/electronics12071618 - 30 Mar 2023
Cited by 1 | Viewed by 1083
Abstract
Video saliency object detection is one of the classic research problems in computer vision, yet existing works rarely focus on the impact of input quality on model performance. As optical flow is a key input for video saliency detection models, its quality significantly [...] Read more.
Video saliency object detection is one of the classic research problems in computer vision, yet existing works rarely focus on the impact of input quality on model performance. As optical flow is a key input for video saliency detection models, its quality significantly affects model performance. Traditional optical flow models only calculate the optical flow between two consecutive video frames, ignoring the motion state of objects over a period of time, leading to low-quality optical flow and reduced performance of video saliency object detection models. Therefore, this paper proposes a new optical flow model that improves the quality of optical flow by expanding the flow perception range and uses high-quality optical flow to enhance the performance of video saliency object detection models. Experimental results on the datasets show that the proposed optical flow model can significantly improve optical flow quality, with the S-M values on the DAVSOD dataset increasing by about 39%, 49%, and 44% compared to optical flow models such as PWCNet, SpyNet, and LFNet. In addition, experiments that fine-tuning the benchmark model LIMS demonstrate that improving input quality can further improve model performance. Full article
Show Figures

Figure 1

18 pages, 21778 KiB  
Article
Semi-Supervised Portrait Matting via the Collaboration of Teacher–Student Network and Adaptive Strategies
by Xinyue Zhang, Guodong Wang, Chenglizhao Chen, Hao Dong and Mingju Shao
Electronics 2022, 11(24), 4080; https://doi.org/10.3390/electronics11244080 - 08 Dec 2022
Cited by 1 | Viewed by 1044
Abstract
In the portrait matting domain, existing methods rely entirely on annotated images for learning. However, delicate manual annotations are time-consuming and there are few detailed datasets available. To reduce complete dependency on labeled datasets, we design a semi-supervised network (ASSN) with two kinds [...] Read more.
In the portrait matting domain, existing methods rely entirely on annotated images for learning. However, delicate manual annotations are time-consuming and there are few detailed datasets available. To reduce complete dependency on labeled datasets, we design a semi-supervised network (ASSN) with two kinds of innovative adaptive strategies for portrait matting. Three pivotal sub-modules are embedded in our architecture, including a static teacher network (S-TN), a static student network (S-SN), and an adaptive student network (A-SN). S-TN and S-SN are modules that need to be trained with a small number of high-quality labeled datasets. Moreover, A-SN and S-SN share the same module parameters. When processing unlabeled datasets, A-SN adopts the adaptive strategies designed by us to discard the dependence on labeled datasets. The adaptive strategies include: (i) An auxiliary adaption: The teacher network with complicated design not only provides alpha mattes for the adaptive student network but also transmits rough segmentation results and edge graphs as optimization reference standards. (ii) A self-adjusting adaption: The adaptive network can make self-supervised to the characteristics of different layers. In addition, we have produced a finely annotated dataset for scholars in the field. Compared with existing datasets, our dataset complements the following two types of data neglected in previous datasets: (i) Images taken by multiple people. (ii) Images under low light conditions. Full article
Show Figures

Figure 1

14 pages, 19358 KiB  
Article
RefinePose: Towards More Refined Human Pose Estimation
by Hao Dong, Guodong Wang, Chenglizhao Chen and Xinyue Zhang
Electronics 2022, 11(23), 4060; https://doi.org/10.3390/electronics11234060 - 06 Dec 2022
Cited by 2 | Viewed by 2400
Abstract
Human pose estimation is a very important research topic in computer vision and attracts more and more researchers. Recently, ViTPose based on heatmap representation refreshed the state of the art for pose estimation methods. However, we find that ViTPose still has room for [...] Read more.
Human pose estimation is a very important research topic in computer vision and attracts more and more researchers. Recently, ViTPose based on heatmap representation refreshed the state of the art for pose estimation methods. However, we find that ViTPose still has room for improvement in our experiments. On the one hand, the PatchEmbedding module of ViTPose uses a convolutional layer with a stride of 14 × 14 to downsample the input image, resulting in the loss of a significant amount of feature information. On the other hand, the two decoding methods (Classical Decoder and Simple Decoder) used by ViTPose are not refined enough: transpose convolution in the Classical Decoder produces the inherent chessboard effect; the upsampling factor in the Simple Decoder is too large, resulting in the blurry heatmap. To this end, we propose a novel pose estimation method based on ViTPose, termed RefinePose. In RefinePose, we design the GradualEmbedding module and Fusion Decoder, respectively, to solve the above problems. More specifically, the GradualEmbedding module only downsamples the image to 1/2 of the original size in each downsampling stage, and it reduces the input image to a fixed size (16 × 112 in ViTPose) through multiple downsampling stages. At the same time, we fuse the outputs of max pooling layers and convolutional layers in each downsampling stage, which retains more meaningful feature information. In the decoding stage, the Fusion Decoder designed by us combines bilinear interpolation with max unpooling layers, and gradually upsamples the feature maps to restore the predicted heatmap. In addition, we also design the FeatureAggregation module to aggregate features after sampling (upsampling and downsampling). We validate the RefinePose on the COCO dataset. The experiments show that RefinePose has achieved better performance than ViTPose. Full article
Show Figures

Figure 1

12 pages, 436 KiB  
Article
A Novel Knowledge Base Question Answering Method Based on Graph Convolutional Network and Optimized Search Space
by Xia Hou, Jintao Luo, Junzhe Li, Liangguo Wang and Hongbo Yang
Electronics 2022, 11(23), 3897; https://doi.org/10.3390/electronics11233897 - 25 Nov 2022
Cited by 3 | Viewed by 1146
Abstract
Knowledge base question answering (KBQA) aims to provide answers to natural language questions from information in the knowledge base. Although many methods perform well when dealing with simple questions, there are still two challenges for complex questions: huge search space and information missing [...] Read more.
Knowledge base question answering (KBQA) aims to provide answers to natural language questions from information in the knowledge base. Although many methods perform well when dealing with simple questions, there are still two challenges for complex questions: huge search space and information missing from the query graphs’ structure. To solve these problems, we propose a novel KBQA method based on a graph convolutional network and optimized search space. When generating the query graph, we rank the query graphs by both their semantic and structural similarities with the question. Then, we just use the top k for the next step. In this process, we specifically extract the structure information of the query graphs by a graph convolutional network while extracting semantic information by a pre-trained model. Thus, we can enhance the method’s ability to understand complex questions. We also introduce a constraint function to optimize the search space. Furthermore, we use the beam search algorithm to reduce the search space further. Experiments on the WebQuestionsSP dataset demonstrate that our method outperforms some baseline methods, showing that the structural information of the query graph has a significant impact on the KBQA task. Full article
Show Figures

Figure 1

17 pages, 8393 KiB  
Article
Railway Obstacle Intrusion Detection Based on Convolution Neural Network Multitask Learning
by Haixia Pan, Yanan Li, Hongqiang Wang and Xiaomeng Tian
Electronics 2022, 11(17), 2697; https://doi.org/10.3390/electronics11172697 - 28 Aug 2022
Cited by 10 | Viewed by 1821
Abstract
The detection of train obstacle intrusion is very important for the safe running of trains. In this paper, we design a multitask intrusion detection model to warn of the intrusion of detected target obstacles in railway scenes. In addition, we design a multiobjective [...] Read more.
The detection of train obstacle intrusion is very important for the safe running of trains. In this paper, we design a multitask intrusion detection model to warn of the intrusion of detected target obstacles in railway scenes. In addition, we design a multiobjective optimization algorithm that performs with different task complexity. Through the shared structure reparameterized backbone network, our multitask learning model utilizes resources effectively. Our work achieves competitive results on both object detection and line detection, and achieves excellent inference time performance (50 FPS). Our work is the first to introduce a multitask approach to realize the assisted-driving function in a railway scene. Full article
Show Figures

Figure 1

Back to TopTop