Deep Learning for Computer Vision

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 August 2023) | Viewed by 22422

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
Interests: manchine learning, data mining, computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
Interests: deep learning; computer vision

E-Mail Website
Guest Editor
School of Software, Shandong University, Jinan 250100, China
Interests: deep learning; representation learning; data mining

E-Mail Website
Guest Editor
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: image analyisis; computer vision; pattern recognition; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
Interests: computer vision; deep learning and visual analytics

Special Issue Information

Dear Colleagues,

Deep learning has achieved significant successes in many areas of application, such as object recognition, recommender systems and natural language processing. Due to the vast amount of images and videos available on the Internet, deep learning technologies are also adopted in many computer-vision tasks such as image generation and enhancement, as well as in object detection and tracking, and greatly improve the development of the computer-vision area.

This Special Issue aims to provide an academic platform to publish high-quality research papers on deep learning methods and their applications to computer vision, including (but not limited to) extended versions of the outstanding SDAI2022 (https://www.sdaai.org.cn/sdai2022) papers.

Potential topics of interest for this Special Issue include:

  • Deep learning theory.
  • Deep learning algorithms.
  • Neural architecture search.
  • Generative neural networks.
  • Deep reinforcement learning.
  • Object recognition.
  • Object detection.
  • Object tracking.
  • Image generation.
  • Super resolution.
  • Other deep learning applications for computer vision.

Prof. Dr. Xiushan Nie
Dr. Guoqiang Zhong
Dr. Yongshun Gong
Prof. Dr. Bin Fan
Dr. Xin Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • computer vision
  • representation learning
  • object recognition
  • object detection
  • objection tracking

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 18605 KiB  
Article
A Lightweight Network Based on Improved YOLOv5s for Insulator Defect Detection
by Cong Liu, Wentao Yi, Min Liu, Yifeng Wang, Sheng Hu and Minghu Wu
Electronics 2023, 12(20), 4292; https://doi.org/10.3390/electronics12204292 - 17 Oct 2023
Viewed by 1047
Abstract
Insulators on transmission lines can be damaged to different degrees due to extreme weather conditions, which threaten the safe operation of the power system. In order to detect damaged insulators in time and meet the needs of real-time detection, this paper proposes a [...] Read more.
Insulators on transmission lines can be damaged to different degrees due to extreme weather conditions, which threaten the safe operation of the power system. In order to detect damaged insulators in time and meet the needs of real-time detection, this paper proposes a multi-defect and lightweight detection algorithm for insulators based on the improved YOLOv5s. To reduce the network parameters, we have integrated the Ghost module and introduced C3Ghost as a replacement for the backbone network. This enhancement enables a more efficient detection model. Moreover, we have added a new detection layer specifically designed for small objects, and embedded an attention mechanism into the network, significantly improving its detection capability for smaller insulators. Furthermore, we use the K-means++ algorithm to recluster the prior boxes and replace Efficient IoU Loss as the new loss function, which has better matching and convergence on the insulator defect dataset we constructed. The experimental results demonstrate the effectiveness of our proposed algorithm. Compared to the original algorithm, our model reduces the number of parameters by 41.1%, while achieving an mAP@0.5 of 94.8%. It also achieves a processing speed of 32.52 frames per second. These improvements make the algorithm well-suited for practical insulator detection and enable its deployment in edge devices. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

16 pages, 9171 KiB  
Article
Pairwise Guided Multilayer Cross-Fusion Network for Bird Image Recognition
by Jingsheng Lei, Yao Jin, Liya Huang, Yuan Ji and Shengying Yang
Electronics 2023, 12(18), 3817; https://doi.org/10.3390/electronics12183817 - 09 Sep 2023
Viewed by 624
Abstract
Bird identification is the first step in collecting data on bird diversity and abundance, which also helps research on bird distribution and population measurements. Most research has built end-to-end training models for bird detection task via CNNs or attentive models, but many perform [...] Read more.
Bird identification is the first step in collecting data on bird diversity and abundance, which also helps research on bird distribution and population measurements. Most research has built end-to-end training models for bird detection task via CNNs or attentive models, but many perform unsatisfactorily in fine-grained bird recognition. Bird recognition tasks are highly influenced by factors, including the similar appearance of different subcategories, diverse bird postures, and other interference factors such as tree branches and leaves from the background. To tackle this challenge, we propose the Progressive Cross-Union Network (PC-Net) to capture more subtle parts with low-level attention maps. Based on cross-layer information exchange and pairwise learning, the proposed method uses two modules to improve feature representation and localization. First, it utilizes low- and high-level information for cross-layer feature fusion, which enables the network to extract more comprehensive and discriminative features. Second, the network incorporates deep semantic localization to identify and enhance the most relevant regions in the images. In addition, the network is designed with a semantic guidance loss to improve its generalization for variable bird poses. The PC-Net was evaluated on an extensively used birds dataset (CUB-200-2011), which contains 200 birds subcategories. The results demonstrate that the PC-Net achieved an impressive recognition accuracy of 89.2%, thereby outperforming maintained methods in bird subcategory identification. We also achieved competitive results on two other datasets with data on cars and airplanes. The results indicated that the PC-Net improves the accuracy of diverse bird recognition, as well as other fine-grained recognition scenarios. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

17 pages, 5201 KiB  
Article
LPE-Unet: An Improved UNet Network Based on Perceptual Enhancement
by Suwei Wang, Chenxun Yuan and Caiming Zhang
Electronics 2023, 12(12), 2750; https://doi.org/10.3390/electronics12122750 - 20 Jun 2023
Cited by 1 | Viewed by 1073
Abstract
In Computed Tomography (CT) images of the coronary arteries, the segmentation of calcified plaques is extremely important for the examination, diagnosis, and treatment of coronary heart disease. However, one characteristic of the lesion is that it has a small size, which brings two [...] Read more.
In Computed Tomography (CT) images of the coronary arteries, the segmentation of calcified plaques is extremely important for the examination, diagnosis, and treatment of coronary heart disease. However, one characteristic of the lesion is that it has a small size, which brings two difficulties. One is the class imbalance when computing loss function and the other is that small-scale targets are prone to losing details in the continuous downsampling process, and the blurred boundary makes the segmentation accuracy less satisfactory. Therefore, the segmentation of calcified plaques is a very challenging task. To address the above problems, in this paper, we design a framework named LPE-UNet, which adopts an encoder–decoder structure similar to UNet. The framework includes two powerful modules named the low-rank perception enhancement module and the noise filtering module. The low-rank perception enhancement module extracts multi-scale context features by increasing the receptive field size to aid target detection and then uses an attention mechanism to filter out redundant features. The noise filtering module suppresses noise interference in shallow features to high-level features in the process of multi-scale feature fusion. It computes a pixel-wise weight map of low-level features and filters out useless and harmful information. To alleviate the problem of class imbalance caused by small-sized lesions, we use a weighted cross-entropy loss function and Dice loss to perform mixed supervised training on the network. The proposed method was evaluated on the calcified plaque segmentation dataset, achieving a high F1 score of 0.941, IoU of 0.895, and Dice of 0.944. This result verifies the effectiveness and superiority of our approach for accurately segmenting calcified plaques. As there is currently no authoritative publicly available calcified plaque segmentation dataset, we have constructed a new dataset for coronary artery calcified plaque segmentation (Calcified Plaque Segmentation Dataset, CPS Dataset). Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

15 pages, 1539 KiB  
Article
Detection of Illegal Transactions of Cryptocurrency Based on Mutual Information
by Kewei Zhao, Guixin Dong and Dong Bian
Electronics 2023, 12(7), 1542; https://doi.org/10.3390/electronics12071542 - 24 Mar 2023
Cited by 3 | Viewed by 1753
Abstract
In recent times, there has been a swift advancement in the field of cryptocurrency. The advent of cryptocurrency has provided us with convenience and prosperity, but has also given rise to certain illicit and unlawful activities. Unlike classical currency, cryptocurrency conceals the activities [...] Read more.
In recent times, there has been a swift advancement in the field of cryptocurrency. The advent of cryptocurrency has provided us with convenience and prosperity, but has also given rise to certain illicit and unlawful activities. Unlike classical currency, cryptocurrency conceals the activities of criminals and exposes their behavioral patterns, allowing us to determine whether present cryptocurrency transactions are legitimate by analyzing their behavioral patterns. There are two issues to consider when determining whether cryptocurrency transactions are legitimate. One is that most cryptocurrency transactions comply with laws and regulations, but only a small portion of them are used for illegal activities, which is related to the sample imbalance problem. The other issue concerns the excessive volume of data, and there are some unknown illegal transactions, so the data set contains an abundance of unlabeled data. As a result, it is critical to accurately distinguish between which transactions among the plethora of cryptocurrency transactions are legitimate and which are illegal. This presents quite a difficult challenge. Consequently, this paper combines mutual information and self-supervised learning to create a self-supervised model on the basis of mutual information that is used to improve the massive amount of untagged data that exist in the data set. Simultaneously, by merging the conventional cross-entropy loss function with mutual information, a novel loss function is created. It is employed to address the issue of sample imbalance in data sets. The F1-Score results obtained from our experimentation demonstrate that the novel loss function in the GCN method improves the performance of cryptocurrency illegal behavior detection by four points compared with the traditional loss function of cross-entropy; use of the self-supervised network that relies on mutual information improves the performance by three points compared with the original GCN method; using both together improves the performance by six points. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

21 pages, 9328 KiB  
Article
Stretching Deep Architectures: A Deep Learning Method without Back-Propagation Optimization
by Li-Na Wang, Yuchen Zheng, Hongxu Wei, Junyu Dong and Guoqiang Zhong
Electronics 2023, 12(7), 1537; https://doi.org/10.3390/electronics12071537 - 24 Mar 2023
Viewed by 1164
Abstract
In recent years, researchers have proposed many deep learning algorithms for data representation learning. However, most deep networks require extensive training data and a lot of training time to obtain good results. In this paper, we propose a novel deep learning method based [...] Read more.
In recent years, researchers have proposed many deep learning algorithms for data representation learning. However, most deep networks require extensive training data and a lot of training time to obtain good results. In this paper, we propose a novel deep learning method based on stretching deep architectures that are composed of stacked feature learning models. Hence, the method is called “stretching deep architectures” (SDA). In the feedforward propagation of SDA, feature learning models are firstly stacked and learned layer by layer, and then the stretching technique is applied to map the last layer of the features to a high-dimensional space. Since the feature learning models are optimized effectively, and the stretching technique can be easily calculated, the training of SDA is very fast. More importantly, the learning of SDA does not need back-propagation optimization, which is quite different from most of the existing deep learning models. We have tested SDA in visual texture perception, handwritten text recognition, and natural image classification applications. Extensive experiments demonstrate the advantages of SDA over traditional feature learning models and related deep learning models. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

15 pages, 11506 KiB  
Article
Attention-Oriented Deep Multi-Task Hash Learning
by Letian Wang, Ziyu Meng, Fei Dong, Xiao Yang, Xiaoming Xi and Xiushan Nie
Electronics 2023, 12(5), 1226; https://doi.org/10.3390/electronics12051226 - 04 Mar 2023
Viewed by 1107
Abstract
Hashing has wide applications in image retrieval at large scales due to being an efficient approach to approximate nearest neighbor calculation. It can squeeze complex high-dimensional arrays via binarization while maintaining the semantic properties of the original samples. Currently, most existing hashing methods [...] Read more.
Hashing has wide applications in image retrieval at large scales due to being an efficient approach to approximate nearest neighbor calculation. It can squeeze complex high-dimensional arrays via binarization while maintaining the semantic properties of the original samples. Currently, most existing hashing methods always predetermine the stable length of hash code before training the model. It is inevitable for these methods to increase the computing time, as the code length converts, caused by the task requirements changing. A single hash code fails to reflect the semantic relevance. Toward solving these issues, we put forward an attention-oriented deep multi-task hash learning (ADMTH) method, in which multiple hash codes of varying length can be simultaneously learned. Compared with the existing methods, ADMTH is one of the first attempts to apply multi-task learning theory to the deep hashing framework to generate and explore multi-length hash codes. Meanwhile, it embeds the attention mechanism in the backbone network to further extract discriminative information. We utilize two common available large-scale datasets, proving its effectiveness. The proposed method substantially improves retrieval efficiency and assures the image characterizing quality. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

17 pages, 8696 KiB  
Article
Camouflaged Insect Segmentation Using a Progressive Refinement Network
by Jing Wang, Minglin Hong, Xia Hu, Xiaolin Li, Shiguo Huang, Rong Wang and Feiping Zhang
Electronics 2023, 12(4), 804; https://doi.org/10.3390/electronics12040804 - 06 Feb 2023
Cited by 1 | Viewed by 1579
Abstract
Accurately segmenting an insect from its original ecological image is the core technology restricting the accuracy and efficiency of automatic recognition. However, the performance of existing segmentation methods is unsatisfactory in insect images shot in wild backgrounds on account of challenges: various sizes, [...] Read more.
Accurately segmenting an insect from its original ecological image is the core technology restricting the accuracy and efficiency of automatic recognition. However, the performance of existing segmentation methods is unsatisfactory in insect images shot in wild backgrounds on account of challenges: various sizes, similar colors or textures to the surroundings, transparent body parts and vague outlines. These challenges of image segmentation are accentuated when dealing with camouflaged insects. Here, we developed an insect image segmentation method based on deep learning termed the progressive refinement network (PRNet), especially for camouflaged insects. Unlike existing insect segmentation methods, PRNet captures the possible scale and location of insects by extracting the contextual information of the image, and fuses comprehensive features to suppress distractors, thereby clearly segmenting insect outlines. Experimental results based on 1900 camouflaged insect images demonstrated that PRNet could effectively segment the camouflaged insects and achieved superior detection performance, with a mean absolute error of 3.2%, pixel-matching degree of 89.7%, structural similarity of 83.6%, and precision and recall error of 72%, which achieved improvements of 8.1%, 25.9%, 19.5%, and 35.8%, respectively, when compared to the recent salient object detection methods. As a foundational technology for insect detection, PRNet provides new opportunities for understanding insect camouflage, and also has the potential to lead to a step progress in the accuracy of the intelligent identification of general insects, and even being an ultimate insect detector. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

16 pages, 2690 KiB  
Article
Framework for Detecting Breast Cancer Risk Presence Using Deep Learning
by Mamoona Humayun, Muhammad Ibrahim Khalil, Saleh Naif Almuayqil and N. Z. Jhanjhi
Electronics 2023, 12(2), 403; https://doi.org/10.3390/electronics12020403 - 12 Jan 2023
Cited by 16 | Viewed by 2570
Abstract
Cancer is a complicated global health concern with a significant fatality rate. Breast cancer is among the leading causes of mortality each year. Advancements in prognoses have been progressively based primarily on the expression of genes, offering insight into robust and appropriate healthcare [...] Read more.
Cancer is a complicated global health concern with a significant fatality rate. Breast cancer is among the leading causes of mortality each year. Advancements in prognoses have been progressively based primarily on the expression of genes, offering insight into robust and appropriate healthcare decisions, owing to the fast growth of advanced throughput sequencing techniques and the use of various deep learning approaches that have arisen in the past few years. Diagnostic-imaging disease indicators such as breast density and tissue texture are widely used by physicians and automated technology. The effective and specific identification of cancer risk presence can be used to inform tailored screening and preventive decisions. For several classifications and prediction applications, such as breast imaging, deep learning has increasingly emerged as an effective method. We present a deep learning model approach for predicting breast cancer risk primarily on this foundation. The proposed methodology is based on transfer learning using the InceptionResNetV2 deep learning model. Our experimental work on a breast cancer dataset demonstrates high model performance, with 91% accuracy. The proposed model includes risk markers that are used to improve breast cancer risk assessment scores and presents promising results compared to existing approaches. Deep learning models include risk markers that are used to improve accuracy scores. This article depicts breast cancer risk indicators, defines the proper usage, features, and limits of each risk forecasting model, and examines the increasing role of deep learning (DL) in risk detection. The proposed model could potentially be used to automate various types of medical imaging techniques. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

14 pages, 11026 KiB  
Article
Wafer Surface Defect Detection Based on Feature Enhancement and Predicted Box Aggregation
by Jiebing Zheng, Jiangtao Dang and Tao Zhang
Electronics 2023, 12(1), 76; https://doi.org/10.3390/electronics12010076 - 25 Dec 2022
Viewed by 1962
Abstract
For wafer surface defect detection, a new method based on improved Faster RCNN is proposed here to solve the problems of missing detection due to small objects and multiple boxes detection due to discontinuous objects. First, focusing on the problem of small objects [...] Read more.
For wafer surface defect detection, a new method based on improved Faster RCNN is proposed here to solve the problems of missing detection due to small objects and multiple boxes detection due to discontinuous objects. First, focusing on the problem of small objects missing detection, a feature enhancement module (FEM) based on dynamic convolution is proposed to extract high-frequency image features, enrich the semantic information of shallow feature maps, and improve detection performance for small-scale defects. Second, for the multiple boxes detection caused by discontinuous objects, a predicted box aggregation method is proposed to aggregate redundant predicted boxes and fine-tune real predicted boxes to further improve positioning accuracy. Experimental results show that the mean average precision of the proposed method, when validated on a self-developed dataset, reached 87.5%, and the detection speed was 0.26 s per image. The proposed method has a certain engineering application value. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

14 pages, 3596 KiB  
Article
MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-Grained Visual Categorization
by Xinyao Lv, Hao Xia, Na Li, Xudong Li and Ruoming Lan
Electronics 2022, 11(21), 3552; https://doi.org/10.3390/electronics11213552 - 31 Oct 2022
Cited by 1 | Viewed by 1456
Abstract
The introduction and application of the Vision Transformer (ViT) has promoted the development of fine-grained visual categorization (FGVC). However, there are some problems when directly applying ViT to FGVC tasks. ViT only classifies using the class token in the last layer, ignoring the [...] Read more.
The introduction and application of the Vision Transformer (ViT) has promoted the development of fine-grained visual categorization (FGVC). However, there are some problems when directly applying ViT to FGVC tasks. ViT only classifies using the class token in the last layer, ignoring the local and low-level features necessary for FGVC. We propose a ViT-based multilevel feature fusion transformer (MFVT) for FGVC tasks. In this framework, with reference to ViT, the backbone network adopts 12 layers of Transformer blocks, divides it into four stages, and adds multilevel feature fusion (MFF) between Transformer layers. We also design RAMix, a CutMix-based data augmentation strategy that uses the resize strategy for crop-paste images and label assignment based on attention. Experiments on the CUB-200-2011, Stanford Dogs, and iNaturalist 2017 datasets gave competitive results, especially on the challenging iNaturalist 2017, with an accuracy rate of 72.6%. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

15 pages, 578 KiB  
Article
SIFT-Flow-Based Virtual Sample Generation for Single-Sample Finger Vein Recognition
by Lizhen Zhou, Lu Yang, Deqian Fu and Gongping Yang
Electronics 2022, 11(20), 3382; https://doi.org/10.3390/electronics11203382 - 19 Oct 2022
Cited by 1 | Viewed by 1000
Abstract
Finger vein recognition is considered to be a very promising biometric identification technology due to its excellent recognition performance. However, in the real world, the finger vein recognition system inevitably suffers from the single-sample problem: that is, only one sample is registered per [...] Read more.
Finger vein recognition is considered to be a very promising biometric identification technology due to its excellent recognition performance. However, in the real world, the finger vein recognition system inevitably suffers from the single-sample problem: that is, only one sample is registered per class. In this case, the performance of many classical finger vein recognition algorithms will decline or fail because they cannot learn enough intra-class variations. To solve this problem, in this paper, we propose a SIFT-flow-based virtual sample generation (SVSG) method. Specifically, first, on the generic set with multiple registered samples per class, the displacement matrix of each class is obtained using the scale-invariant feature transform flow (SIFT-flow) algorithm. Then, the key displacements of each displacement matrix are extracted to form a variation matrix. After removing noise displacements and redundant displacements, the final global variation matrix is obtained. On the single sample set, multiple virtual samples are generated for the single sample according to the global variation matrix. Experimental results on the public database show that this method can effectively improve the performance of single-sample finger vein recognition. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

14 pages, 2781 KiB  
Article
MSEDTNet: Multi-Scale Encoder and Decoder with Transformer for Bladder Tumor Segmentation
by Yixing Wang and Xiufen Ye
Electronics 2022, 11(20), 3347; https://doi.org/10.3390/electronics11203347 - 17 Oct 2022
Cited by 1 | Viewed by 1472
Abstract
The precise segmentation of bladder tumors from MRI is essential for bladder cancer diagnosis and personalized therapy selection. Limited by the properties of tumor morphology, achieving precise segmentation from MRI images remains challenging. In recent years, deep convolutional neural networks have provided a [...] Read more.
The precise segmentation of bladder tumors from MRI is essential for bladder cancer diagnosis and personalized therapy selection. Limited by the properties of tumor morphology, achieving precise segmentation from MRI images remains challenging. In recent years, deep convolutional neural networks have provided a promising solution for bladder tumor segmentation from MRI. However, deep-learning-based methods still face two weakness: (1) multi-scale feature extraction and utilization are inadequate, being limited by the learning approach. (2) The establishment of explicit long-distance dependence is difficult due to the limited receptive field of convolution kernels. These limitations raise challenges in the learning of global semantic information, which is critical for bladder cancer segmentation. To tackle the problem, a newly auxiliary segmentation algorithm integrating a multi-scale encoder and decoder with a transformer is proposed, which is called MSEDTNet. Specifically, the designed encoder with multi-scale pyramidal convolution (MSPC) is utilized to generate compact feature maps which capture the richly detailed local features of the image. Furthermore, the transformer bottleneck is then leveraged to model the long-distance dependency between high-level tumor semantics from a global space. Finally, a decoder with a spatial context fusion module (SCFM) is adopted to fuse the context information and gradually produce high-resolution segmentation results. The experimental results of T2-weighted MRI scans from 86 patients show that MSEDTNet achieves an overall Jaccard index of 83.46%, a Dice similarity coefficient of 92.35%, and a complexity less than that of other, similar models. This suggests that the method proposed in this article can be used as an efficient tool for clinical bladder cancer segmentation. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

15 pages, 5508 KiB  
Article
Chinese Calligraphy Generation Based on Involution
by Yao Song, Fang Yang and Te Li
Electronics 2022, 11(14), 2201; https://doi.org/10.3390/electronics11142201 - 14 Jul 2022
Viewed by 2027
Abstract
The calligraphic works of particular calligraphers often contain only a limited number of characters, rather than the full set of Chinese characters required for typography, which does not meet practical needs. There is therefore a need to develop a complete set of calligraphic [...] Read more.
The calligraphic works of particular calligraphers often contain only a limited number of characters, rather than the full set of Chinese characters required for typography, which does not meet practical needs. There is therefore a need to develop a complete set of calligraphic characters for calligraphers. Most of the recently popular methods for generating calligraphic characters are based on deep learning, using an end-to-end approach to generate the target image. Deep learning-based methods usually suffer from unsuccessful conversion of stroke structures when the printed font differs significantly from the target font structure. In this paper, we propose an involution-based calligraphic character generation model, which can realize the conversion from printed fonts to target calligraphic fonts. We improve the Pix2Pix model by using a new neural operator, involution, which focuses more on spatial feature processing and can better handle the relationship between strokes than the models using only convolution, so that the generated calligraphic characters have an accurate stroke structure. A self-attentive module and a residual block are also added to increase the depth of the network to improve the feature processing capability of the model. We evaluated our method and some baseline methods using the same dataset, and the experimental results demonstrate that our model is superior in both visual and quantitative evaluation. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

15 pages, 1418 KiB  
Article
Low-Light Image Enhancement with an Anti-Attention Block-Based Generative Adversarial Network
by Junbo Qiao, Xing Wang, Ji Chen and Muwei Jian
Electronics 2022, 11(10), 1627; https://doi.org/10.3390/electronics11101627 - 19 May 2022
Cited by 2 | Viewed by 1826
Abstract
High-quality images are difficult to obtain in complex environments, such as underground or underwater. The low performance of images that are captured under low-light conditions significantly restricts the development of various engineering applications. However, existing algorithms exhibit color distortion or under/overexposure when addressing [...] Read more.
High-quality images are difficult to obtain in complex environments, such as underground or underwater. The low performance of images that are captured under low-light conditions significantly restricts the development of various engineering applications. However, existing algorithms exhibit color distortion or under/overexposure when addressing non-uniform illumination images. Furthermore, they introduce high-level noise when processing extremely dark images. In this paper, we propose a novel generative adversarial network (GAN) structure to generate high-quality enhanced images, which is called anti-attention block (AAB)-based generative adversarial networks (AABGAN). Specifically, we propose AAB to suppress undesired chromatic aberrations and establish a mapping relationship between different channels. The deep aggregation pyramid pooling module guides the network when combining multi-scale context information. Furthermore, we design a new multiple loss function to adjust images to the most suitable range for human vision. The results of extensive experiments show that our method outperforms state-of-the-art unsupervised image enhancement methods in terms of noise reduction and has a well-perceived result. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision)
Show Figures

Figure 1

Back to TopTop