Application of Machine Learning in Graphics and Images

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 March 2024) | Viewed by 14821

Special Issue Editors

School of Computer Science, China University of Geosciences, Wuhan 430074, China
Interests: computer graphics; computer-aided design; computer vision and computer-supported cooperative work
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Interests: computer graphics; computer-aided design; computer vision; image, and video processing
Special Issues, Collections and Topics in MDPI journals
School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
Interests: intelligent optimization; medical image processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer graphics and image processing technologies have been widely used in production processes in society today, as well as other aspects of daily life, offering solutions with greatly improved efficiency and quality. Meanwhile, the last few decades have witnessed machine learning modes becoming effective and ubiquitous approaches applied to various challenging real-world or virtual tasks. Both the fields of image processing and computer graphics are important machine learning application scenarios, which have stimulated high research interest and brought about a series of popular research directions.

In this Special Issue, we look forward to your novel research papers or comprehensive surveys of state-of-the-art works that may contribute to innovative machine learning application models, improvements of classical computer graphics and image processing tasks, and new interesting applications. Topics of interest include all aspects of the application of machine learning in graphics and images, but are not limited to the following detailed list:

  • Computer graphics;
  • Image processing;
  • Computer vision;
  • Machine learning and deep learning;
  • Pattern recognition;
  • Object detection, recognition, and tracking;
  • Part and semantic segmentation;
  • Rigid and non-rigid registration;
  • 3D reconstruction;
  • Virtual reality/augmented reality/mixed reality;
  • Computer-aided design/engineering;
  • Human pose and behavior understanding;
  • Autonomous driving.

Technical Committee Member:
Mr. Jun Sun  College of Resources, Sichuan Agricultural University 

Dr. Yiqi Wu
Dr. Dejun Zhang
Dr. Yilin Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer graphics
  • image processing
  • computer vision
  • machine learning
  • deep learning
  • pattern recognition

Related Special Issue

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 7298 KiB  
Article
Improved Transformer-Based Deblurring of Commodity Videos in Dynamic Visual Cabinets
by Shuangyi Huang, Qianjie Liang, Kai Xie, Zhengfang He, Chang Wen, Jianbiao He and Wei Zhang
Electronics 2024, 13(8), 1440; https://doi.org/10.3390/electronics13081440 - 11 Apr 2024
Viewed by 257
Abstract
In the dynamic visual cabinet, the occurrence of motion blur when consumers take out commodities will reduce the accuracy of commodity detection. Recently, although Transformer-based video deblurring networks have achieved results compared to Convolutional Neural Networks in some blurring scenarios, they are still [...] Read more.
In the dynamic visual cabinet, the occurrence of motion blur when consumers take out commodities will reduce the accuracy of commodity detection. Recently, although Transformer-based video deblurring networks have achieved results compared to Convolutional Neural Networks in some blurring scenarios, they are still challenging for the non-uniform blurring problem that occurs when consumers pick up the commodities, such as the problem of difficult alignment of blurred video frames of small commodities and the problem of underutilizing the effective information between the video frames of commodities. Therefore, an improved Transformer video deblurring network is proposed. Firstly, a multi-scale Transformer feature extraction method is utilized for non-uniform blurring. Secondly, for the problem of difficult alignment of small-item-blurred video frames, a temporal interactive attention mechanism is designed for video frame alignment. Finally, a feature recurrent fusion mechanism is introduced to supplement the effective information of commodity features. The experimental results show that the proposed method has practical significance in improving the accuracy of commodity detection. Moreover, compared with the recent Transformer deblurring algorithm Video Restoration Transformer, the Peak Signal-to-Noise Ratio of this paper’s algorithm is higher than that of the Deep Video Deblurring dataset and the Fuzzy Commodity Dataset by 0.23 dB and 0.81 dB, respectively. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

22 pages, 27609 KiB  
Article
Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field
by Meng Wang, Qinkang Yu and Haipeng Liu 
Electronics 2024, 13(2), 448; https://doi.org/10.3390/electronics13020448 - 22 Jan 2024
Viewed by 820
Abstract
3D (Three-Dimensional) scene inpainting aims to remove objects from scenes and generate visually plausible regions to fill the hollows. Leveraging the foundation of NeRF (Neural Radiance Field), considerable advancements have been achieved in the realm of 3D scene inpainting. However, prevalent issues persist: [...] Read more.
3D (Three-Dimensional) scene inpainting aims to remove objects from scenes and generate visually plausible regions to fill the hollows. Leveraging the foundation of NeRF (Neural Radiance Field), considerable advancements have been achieved in the realm of 3D scene inpainting. However, prevalent issues persist: primarily, the presence of inconsistent 3D details across different viewpoints and occlusion losses of real background details in inpainted regions. This paper presents a NeRF-based inpainting approach using uncertainty estimation that formulates mask and uncertainty branches for consistency enhancement. In the initial training, the mask branch learns a 3D-consistent representation from inaccurate input masks, and after background rendering, the background regions can be fully exposed to the views. The uncertainty branch learns the visibility of spatial points by modeling them as Gaussian distributions, generating variances to identify regions to be inpainted. During the inpainting training phase, the uncertainty branch measures 3D consistency in the inpainted views and calculates the confidence from the variance as dynamic weights, which are used to balance the color and adversarial losses to achieve 3D-consistent inpainting with both the structure and texture. The results were evaluated on datasets such as Spin-NeRF and NeRF-Object-Removal. The proposed approach outperformed the baselines in inpainting metrics of LPIPS and FID, and preserved more spatial details from real backgrounds in multi-scene settings, thus achieving 3D-consistent restoration. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

22 pages, 16257 KiB  
Article
RCA-GAN: An Improved Image Denoising Algorithm Based on Generative Adversarial Networks
by Yuming Wang, Shuaili Luo, Liyun Ma and Min Huang
Electronics 2023, 12(22), 4595; https://doi.org/10.3390/electronics12224595 - 10 Nov 2023
Viewed by 885
Abstract
Image denoising, as an essential component of image pre-processing, effectively reduces noise interference to enhance image quality, a factor of considerable research importance. Traditional denoising methods often lead to the blurring of image details and a lack of realism at the image edges. [...] Read more.
Image denoising, as an essential component of image pre-processing, effectively reduces noise interference to enhance image quality, a factor of considerable research importance. Traditional denoising methods often lead to the blurring of image details and a lack of realism at the image edges. To deal with these issues, we propose an image denoising algorithm named Residual structure and Cooperative Attention mechanism based on Generative Adversarial Networks (RCA-GAN). This algorithm proficiently reduces noise while focusing on preserving image texture details. To maximize feature extraction, this model first employs residual learning within a portion of the generator’s backbone, conducting extensive multi-dimensional feature extraction to preserve a greater amount of image details. Secondly, it introduces a simple yet efficient cooperative attention module to enhance the representation capacity of edge and texture features, further enhancing the preservation of intricate image details. Finally, this paper constructs a novel loss function—the Multimodal Loss Function—for the network training process. The experimental results were evaluated using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) as evaluation metrics. The experimental results demonstrate that the proposed RCA-GAN image denoising algorithm has increased the average PSNR from 24.71 dB to 33.76 dB, achieving a 36.6% improvement. Additionally, the average SSIM value has risen from 0.8451 to 0.9503, indicating a 12.4% enhancement. It achieves superior visual outcomes, showcasing the ability to preserve image texture details to a greater extent and excel in edge preservation and noise suppression. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

20 pages, 9989 KiB  
Article
FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting
by Ruixin Liu and Yuesheng Zhu
Electronics 2023, 12(21), 4452; https://doi.org/10.3390/electronics12214452 - 29 Oct 2023
Viewed by 1047
Abstract
Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers [...] Read more.
Video inpainting aims to complete the missing regions with content that is consistent both spatially and temporally. How to effectively utilize the spatio-temporal information in videos is critical for video inpainting. Recent advances in video inpainting methods combine both optical flow and transformers to capture spatio-temporal information. However, these methods fail to fully explore the potential of optical flow within the transformer. Furthermore, the designed transformer block cannot effectively integrate spatio-temporal information across frames. To address the above problems, we propose a novel video inpainting model, named Flow-Guided Spatial Temporal Transformer (FSTT), which effectively establishes correspondences between missing regions and valid regions in both spatial and temporal dimensions under the guidance of completed optical flow. Specifically, a Flow-Guided Fusion Feed-Forward module is developed to enhance features with the assistance of optical flow, mitigating the inaccuracies caused by hole pixels when performing MHSA. Additionally, a decomposed spatio-temporal MHSA module is proposed to effectively capture spatio-temporal dependencies in videos. To improve the efficiency of the model, a Global–Local Temporal MHSA module is further designed based on the window partition strategy. Extensive quantitative and qualitative experiments on the DAVIS and YouTube-VOS datasets demonstrate the superiority of our proposed method. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

13 pages, 2169 KiB  
Article
Road Scene Instance Segmentation Based on Improved SOLOv2
by Qing Yang, Jiansheng Peng, Dunhua Chen and Hongyu Zhang
Electronics 2023, 12(19), 4169; https://doi.org/10.3390/electronics12194169 - 08 Oct 2023
Viewed by 1013
Abstract
Road instance segmentation is vital for autonomous driving, yet the current algorithms struggle in complex city environments, with issues like poor small object segmentation, low-quality mask edge contours, slow processing, and limited model adaptability. This paper introduces an enhanced instance segmentation method based [...] Read more.
Road instance segmentation is vital for autonomous driving, yet the current algorithms struggle in complex city environments, with issues like poor small object segmentation, low-quality mask edge contours, slow processing, and limited model adaptability. This paper introduces an enhanced instance segmentation method based on SOLOv2. It integrates the Bottleneck Transformer (BoT) module into VoVNetV2, replacing the standard convolutions with ghost convolutions. Additionally, it replaces ResNet with an improved VoVNetV2 backbone to enhance the feature extraction and segmentation speed. Furthermore, the algorithm employs Feature Pyramid Grids (FPGs) instead of Feature Pyramid Networks (FPNs) to introduce multi-directional lateral connections for better feature fusion. Lastly, it incorporates a convolutional Block Attention Module (CBAM) into the detection head for refined features by considering the attention weight coefficients in both the channel and spatial dimensions. The experimental results demonstrate the algorithm’s effectiveness, achieving a 27.6% mAP on Cityscapes, a 4.2% improvement over SOLOv2. It also attains a segmentation speed of 8.9 FPS, a 1.7 FPS increase over SOLOv2, confirming its practicality for real-world engineering applications. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

14 pages, 4443 KiB  
Article
Few-Shot Object Detection with Memory Contrastive Proposal Based on Semantic Priors
by Linlin Xiao, Huahu Xu, Junsheng Xiao and Yuzhe Huang
Electronics 2023, 12(18), 3835; https://doi.org/10.3390/electronics12183835 - 11 Sep 2023
Viewed by 739
Abstract
Few-shot object detection (FSOD) aims to detect objects belonging to novel classes with few training samples. With the small number of novel class samples, the visual information extracted is insufficient to accurately represent the object itself, presenting significant intra-class variance and confusion between [...] Read more.
Few-shot object detection (FSOD) aims to detect objects belonging to novel classes with few training samples. With the small number of novel class samples, the visual information extracted is insufficient to accurately represent the object itself, presenting significant intra-class variance and confusion between classes of similar samples, resulting in large errors in the detection results of the novel class samples. We propose a few-shot object detection framework to achieve effective classification and detection by embedding semantic information and contrastive learning. Firstly, we introduced a semantic fusion (SF) module, which projects semantic spatial information into visual space for interaction, to compensate for the lack of visual information and further enhance the representation of feature information. To further improve the classification performance, we embed the memory contrastive proposal (MCP) module to adjust the distribution of the feature space by calculating the contrastive loss between the class-centered features of previous samples and the current input features to obtain a more discriminative embedding space for better intra-class aggregation and inter-class separation for subsequent classification and detection. Extensive experiments on the PASCAL VOC and MS-COCO datasets show that the performance of our proposed method is effectively improved. Our proposed method improves nAP50 over the baseline model by 4.5% and 3.5%. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

23 pages, 19373 KiB  
Article
Development of a Hybrid Method for Multi-Stage End-to-End Recognition of Grocery Products in Shelf Images
by Ceren Gulra Melek, Elena Battini Sonmez, Hakan Ayral and Songul Varli
Electronics 2023, 12(17), 3640; https://doi.org/10.3390/electronics12173640 - 29 Aug 2023
Viewed by 1328
Abstract
Product recognition on grocery shelf images is a compelling task of object detection because of the similarity between products, the presence of the different scale of product sizes, and the high number of classes, in addition to constantly renewed packaging and added new [...] Read more.
Product recognition on grocery shelf images is a compelling task of object detection because of the similarity between products, the presence of the different scale of product sizes, and the high number of classes, in addition to constantly renewed packaging and added new products’ difficulty in data collection. The use of conventional methods alone is not enough to solve a number of retail problems such as planogram compliance, stock tracking on shelves, and customer support. The purpose of this study is to achieve significant results using the suggested multi-stage end-to-end process, including product detection, product classification, and refinement. The comparison of different methods is provided by a traditional computer vision approach, Aggregate Channel Features (ACF) and Single-Shot Detectors (SSD) are used in the product detection stage, and Speed-up Robust Features (SURF), Binary Robust Invariant Scalable Key points (BRISK), Oriented Features from Accelerated Segment Test (FAST), Rotated Binary Robust Independent Elementary Features (BRIEF) (ORB), and hybrids of these methods are used in the product classification stage. The experimental results used the entire Grocery Products dataset and its different subsets with a different number of products and images. The best performance was achieved with the use of SSD in the product detection stage and the hybrid use of SURF, BRISK, and ORB in the product classification stage, respectively. Additionally, the proposed approach performed comparably or better than existing models. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

19 pages, 3281 KiB  
Article
Internal Detection of Ground-Penetrating Radar Images Using YOLOX-s with Modified Backbone
by Xibin Zheng, Sinan Fang, Haitao Chen, Liang Peng and Zhi Ye
Electronics 2023, 12(16), 3520; https://doi.org/10.3390/electronics12163520 - 20 Aug 2023
Viewed by 969
Abstract
Geological radar is an important method used for detecting internal defects in tunnels. Automatic interpretation techniques can effectively reduce the subjectivity of manual identification, improve recognition accuracy, and increase detection efficiency. This paper proposes an automatic recognition approach for geological radar images (GPR) [...] Read more.
Geological radar is an important method used for detecting internal defects in tunnels. Automatic interpretation techniques can effectively reduce the subjectivity of manual identification, improve recognition accuracy, and increase detection efficiency. This paper proposes an automatic recognition approach for geological radar images (GPR) based on YOLOX-s, aimed at accurately detecting defects and steel arches in any direction. The method utilizes the YOLOX-s neural network and improves the backbone with Swin Transformer to enhance the recognition capability for small targets in geological radar images. To address irregular voids commonly observed in radar images, the CBAM attention mechanism is incorporated to improve the accuracy of detection annotations. We construct a dataset using field detection data that includes targets of different sizes and orientations, representing “voids” and “steel arches”. Our model tackles the challenges of traditional GPR image interpretation and enhances the automatic recognition accuracy and efficiency of radar image detection. In comparative experiments, our improved model achieves a recognition accuracy of 92% for voids and 94% for steel arches, as evaluated on the constructed dataset. Compared to YOLOX-s, the average precision is improved by 6.51%. These results indicate the superiority of our model in geological radar image interpretation. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

18 pages, 9331 KiB  
Article
CAS-UNet: A Retinal Segmentation Method Based on Attention
by Zeyu You, Haiping Yu, Zhuohan Xiao, Tao Peng and Yinzhen Wei
Electronics 2023, 12(15), 3359; https://doi.org/10.3390/electronics12153359 - 06 Aug 2023
Cited by 3 | Viewed by 1360
Abstract
Retinal vessel segmentation is an important task in medical image analysis that can aid doctors in diagnosing various eye diseases. However, due to the complexity and blurred boundaries of retinal vessel structures, existing methods face many challenges in practical applications. To overcome these [...] Read more.
Retinal vessel segmentation is an important task in medical image analysis that can aid doctors in diagnosing various eye diseases. However, due to the complexity and blurred boundaries of retinal vessel structures, existing methods face many challenges in practical applications. To overcome these challenges, this paper proposes a retina vessel segmentation algorithm based on an attention mechanism, called CAS-UNet. Firstly, the Cross-Fusion Channel Attention mechanism is introduced, and the Structured Convolutional Attention block is used to replace the original convolutional block of U-Net to achieve channel enhancement for retinal blood vessels. Secondly, an Additive Attention Gate is added to the skip-connection layer of the network to achieve spatial enhancement for retinal blood vessels. Finally, the SoftPool pooling method is used to reduce information loss. Experimental results using the CHASEDB1 and DRIVE datasets show that the proposed algorithm achieves an accuracy of 96.68% and 95.86%, and a sensitivity of 83.21% and 83.75%, respectively. The proposed CAS-UNet thus outperforms the existing U-Net-based classic algorithms. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

20 pages, 777 KiB  
Article
A Robust Adaptive Hierarchical Learning Crow Search Algorithm for Feature Selection
by Yilin Chen, Zhi Ye, Bo Gao, Yiqi Wu, Xiaohu Yan and Xiangyun Liao
Electronics 2023, 12(14), 3123; https://doi.org/10.3390/electronics12143123 - 18 Jul 2023
Cited by 5 | Viewed by 1056
Abstract
Feature selection is a multi-objective problem, which can eliminate irrelevant and redundant features and improve the accuracy of classification at the same time. Feature selection is a great challenge to balance the conflict between the two goals of selection accuracy and feature selection [...] Read more.
Feature selection is a multi-objective problem, which can eliminate irrelevant and redundant features and improve the accuracy of classification at the same time. Feature selection is a great challenge to balance the conflict between the two goals of selection accuracy and feature selection ratio. The evolutionary algorithm has been proved to be suitable for feature selection. Recently, a new meta-heuristic algorithm named the crow search algorithm has been applied to the problem of feature selection. This algorithm has the advantages of few parameters and achieved good results. However, due to the lack of diversity in late iterations, the algorithm falls into local optimal problems. To solve this problem, we propose the adaptive hierarchical learning crow search algorithm (AHL-CSA). Firstly, an adaptive hierarchical learning technique was used to adaptive divide the crow population into several layers, with each layer learning from the top layer particles and the topmost layer particles learning from each other. This strategy encourages more exploration by lower individuals and more exploitation by higher individuals, thus improving the diversity of the population. In addition, in order to make full use of the search information of each level in the population and reduce the impact of local optimization on the overall search performance of the algorithm, we introduce an information sharing mechanism to help adjust the search direction of the population and improve the convergence accuracy of the algorithm. Finally, different difference operators are used to update the positions of particles at different levels. The diversity of the population is further improved by using different difference operators. The performance of the method was tested on 18 standard UCI datasets and compared with eight other representative algorithms. The comparison of experimental results shows that the proposed algorithm is superior to other competitive algorithms. Furthermore, the Wilcoxon rank-sum test was used to verify the validity of the results. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

18 pages, 5001 KiB  
Article
A Shallow Pooled Weighted Feature Enhancement Network for Small-Sized Pine Wilt Diseased Tree Detection
by Mei Yu, Sha Ye, Yuelin Zheng, Yanjing Jiang, Yisheng Peng, Yuyang Sheng, Chongjing Huang and Hang Sun
Electronics 2023, 12(11), 2463; https://doi.org/10.3390/electronics12112463 - 30 May 2023
Viewed by 756
Abstract
Pine wild disease poses a serious threat to the ecological environment of national forests. Combining the object detection algorithm with Unmanned Aerial Vehicles (UAV) to detect pine wild diseased trees (PWDT) is a significant step in preventing the spread of pine wild disease. [...] Read more.
Pine wild disease poses a serious threat to the ecological environment of national forests. Combining the object detection algorithm with Unmanned Aerial Vehicles (UAV) to detect pine wild diseased trees (PWDT) is a significant step in preventing the spread of pine wild disease. To address the issue of shallow feature layers lacking the ability to fully extract features from small-sized diseased trees in existing detection algorithms, as well as the problem of a small number of small-sized diseased trees in a single image, a Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) based on Small Target Expansion (STE) has been proposed for detecting PWDT. First, a Pooled Weighted Channel Attention (PWCA) module is presented and introduced into the shallow feature layer with rich small target information to enhance the network’s expressive ability regarding the characteristics of two-layer shallow feature maps. Additionally, an STE data enhancement method is introduced for small-sized targets, which effectively increases the sample size of small-sized diseased trees in a single image. The experimental results on the PWDT dataset indicate that the proposed algorithm achieved an average precision and recall of 79.1% and 86.9%, respectively. This is 3.6 and 3.8 percentage points higher, respectively, than the recognition recall and average precision of the existing state-of-the-art method Faster-RCNN, and 6.4 and 5.5 percentage points higher than those of the newly proposed YOLOv6 method. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

14 pages, 2624 KiB  
Article
A Generate Adversarial Network with Structural Branch Assistance for Image Inpainting
by Jin Wang, Dongli Jia and Heng Zhang
Electronics 2023, 12(9), 2108; https://doi.org/10.3390/electronics12092108 - 05 May 2023
Viewed by 1007
Abstract
In digital image inpainting tasks, existing deep-learning-based image inpainting methods have achieved remarkable staged results by introducing structural prior information into the network. However, the corresponding relationship between texture and structure is not fully considered, and the inconsistency between texture and structure appears [...] Read more.
In digital image inpainting tasks, existing deep-learning-based image inpainting methods have achieved remarkable staged results by introducing structural prior information into the network. However, the corresponding relationship between texture and structure is not fully considered, and the inconsistency between texture and structure appears in the results of the current method. In this paper, we propose a dual-branch network with structural branch assistance, which decouples the inpainting of low-frequency and high-frequency information utilizing parallel branches. The feature fusion (FF) module is introduced to integrate the feature information from the two branches, which effectively ensures the consistency of structure and texture in the image. The feature attention (FA) module is introduced to extract long-distance feature information, which enhances the consistency between the local features of the image and the overall image and gives the image a more detailed texture. Experiments on the Paris StreetView and CelebA-HQ datasets prove the effectiveness and superiority of our method. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

15 pages, 17556 KiB  
Article
A Compositional Transformer Based Autoencoder for Image Style Transfer
by Jianxin Feng, Geng Zhang, Xinhui Li, Yuanming Ding, Zhiguo Liu, Chengsheng Pan, Siyuan Deng and Hui Fang
Electronics 2023, 12(5), 1184; https://doi.org/10.3390/electronics12051184 - 01 Mar 2023
Cited by 2 | Viewed by 2282
Abstract
Image style transfer has become a key technique in modern photo-editing applications. Although significant progress has been made to blend content from one image with style from another image, the synthesized image may have a hallucinatory effect when the texture from the style [...] Read more.
Image style transfer has become a key technique in modern photo-editing applications. Although significant progress has been made to blend content from one image with style from another image, the synthesized image may have a hallucinatory effect when the texture from the style image is rich when processing high-resolution image style transfer tasks. In this paper, we propose a novel attention mechanism, named compositional attention, to design a compositional transformer-based autoencoder (CTA) to solve this above-mentioned issue. With the support from this module, our model is capable of generating high-quality images when transferring from texture-riched style images to content images with semantics. Additionally, we embed region-based consistency terms in our loss function for ensuring internal structure semantic preservation in our synthesized image. Moreover, information theory-based CTA is discussed and Kullback–Leibler divergence loss is introduced to preserve more brightness information for photo-realistic style transfer. Extensive experimental results based on three benchmark datasets, namely Churches, Flickr Landscapes, and Flickr Faces HQ, confirmed excellent performance when compared to several state-of-the-art methods. Based on a user study assessment, the majority number of users, ranging from 61% to 66%, gave high scores on the transfer effects of our method compared to 9% users who supported the second best method. Further, for the questions of realism and style transfer quality, we achieved the best score, i.e., an average of 4.5 out of 5 compared to other style transfer methods. Full article
(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images)
Show Figures

Figure 1

Back to TopTop