Research

22 pages, 950 KiB

Open AccessArticle

4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

by Anton Trusov, Elena Limonova, Dmitry Nikolaev and Vladimir V. Arlazarov

Mathematics 2024, 12(5), 651; https://doi.org/10.3390/math12050651 - 23 Feb 2024

Viewed by 762

Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product [...] Read more.

Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product accumulators. Lower precision quantizations usually suffer from noticeable quality loss and require specific computational algorithms to outperform eight-bit quantization. In this paper, we propose a novel 4.6-bit quantization scheme that allows for more efficient use of CPU resources. This scheme has more quantization bins than four-bit quantization and is more accurate while preserving the computational efficiency of the later (it runs only 4% slower). Our multiplication uses a combination of 16- and 32-bit accumulators and avoids multiplication depth limitation, which the previous 4-bit multiplication algorithm had. The experiments with different convolutional neural networks on CIFAR-10 and ImageNet datasets show that 4.6-bit quantized networks are 1.5–1.6 times faster than eight-bit networks on the ARMv8 CPU. Regarding the quality, the results of the 4.6-bit quantized network are close to the mean of four-bit and eight-bit networks of the same architecture. Therefore, 4.6-bit quantization may serve as an intermediate solution between fast and inaccurate low-bit network quantizations and accurate but relatively slow eight-bit ones. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

18 pages, 10242 KiB

Open AccessArticle

Gaussian Process-Based Transfer Kernel Learning for Unsupervised Domain Adaptation

by Pengfei Ge and Yesen Sun

Mathematics 2023, 11(22), 4695; https://doi.org/10.3390/math11224695 - 19 Nov 2023

Cited by 1 | Viewed by 814

Abstract

The discriminability and transferability of models are two important factors for the success of domain adaptation methods. Recently, some domain adaptation methods have improved models by adding a discriminant information extraction module. However, these methods need to carefully balance the discriminability and transferability [...] Read more.

The discriminability and transferability of models are two important factors for the success of domain adaptation methods. Recently, some domain adaptation methods have improved models by adding a discriminant information extraction module. However, these methods need to carefully balance the discriminability and transferability of a model. To address this problem, we propose a new deep domain adaptation method, Gaussian Process-based Transfer Kernel Learning (GPTKL), which can perform domain knowledge transfer and improve the discrimination ability of the model simultaneously. GPTKL uses the kernel similarity between all samples in the source and target domains as a priori information to establish a cross-domain Gaussian process. By maximizing its likelihood function, GPTKL reduces the domain discrepancy between the source and target domains, thereby enhancing generalization across domains. At the same time, GPTKL introduces the deep kernel learning strategy into the cross-domain Gaussian process to learn a transfer kernel function based on deep features. Through transfer kernel learning, GPTKL learns a deep feature space with both discriminability and transferability. In addition, GPTKL uses cross-entropy and mutual information to learn a classification model shared by the source and target domains. Experiments on four benchmarks show that GPTKL achieves superior classification performance over state-of-the-art methods. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

15 pages, 5107 KiB

Open AccessArticle

PIDFusion: Fusing Dense LiDAR Points and Camera Images at Pixel-Instance Level for 3D Object Detection

by Zheng Zhang, Ruyu Xu and Qing Tian

Mathematics 2023, 11(20), 4277; https://doi.org/10.3390/math11204277 - 13 Oct 2023

Viewed by 999

Abstract

In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging [...] Read more.

In driverless systems (scenarios such as subways, buses, trucks, etc.), multi-modal data fusion, such as light detection and ranging (LiDAR) points and camera images, is essential for accurate 3D object detection. In the fusion process, the information interaction between the modes is challenging due to the different coordinate systems of various sensors and the significant difference in the density of the collected data. It is necessary to fully consider the consistency and complementarity of multi-modal information, make up for the gap between multi-source data density, and achieve the joint interactive processing of multi-source information. Therefore, this paper is based on Transformer to improve a new multi-modal fusion model called PIDFusion for 3D object detection. Firstly, the method uses the results of 2D instance segmentation to generate dense 3D virtual points to enhance the original sparse 3D point clouds. This optimizes the issue that the nearest Euclidean distance in the 2D image space cannot ensure the nearest in the 3D space. Secondly, a new cross-modal fusion architecture is designed to maintain individual per-modality features to take advantage of their unique characteristics during 3D object detection. Finally, an instance-level fusion module is proposed to enhance semantic consistency through cross-modal feature interaction. Experiments show that PIDFusion is far ahead of existing 3D object detection methods, especially for small and long-range objects, with 70.8 mAP and 73.5 NDS on the nuScenes test set. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

17 pages, 2151 KiB

Open AccessArticle

Alzheimer’s Disease Prediction Using Deep Feature Extraction and Optimization

by Farah Mohammad and Saad Al Ahmadi

Mathematics 2023, 11(17), 3712; https://doi.org/10.3390/math11173712 - 29 Aug 2023

Cited by 1 | Viewed by 1464

Abstract

Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder that affects a substantial proportion of the population. The accurate and timely prediction of AD carries considerable importance in enhancing the diagnostic process and improved treatment. This study provides a thorough examination of AD prediction [...] Read more.

Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder that affects a substantial proportion of the population. The accurate and timely prediction of AD carries considerable importance in enhancing the diagnostic process and improved treatment. This study provides a thorough examination of AD prediction using the VGG19 deep learning model. The primary objective of this study is to investigate the effectiveness of feature fusion and optimization techniques in enhancing the accuracy of classification. The generation of a comprehensive feature map is achieved through the fusion of features that have been extracted from the fc7 and fc8 layers of VGG19. Several machine learning algorithms are employed to classify integrated features and recognize AD. The amalgamated feature map demonstrates a significant level of accuracy of 98% in the prognostication of AD, outperforming present cutting-edge methodologies. In this study, a methodology is utilized that makes use of the whale optimization algorithm (WoA), a metaheuristic approach to optimize features through feature selection. Feature optimization aims to eliminate redundant features and enhance the discriminatory power of the selected features. Following the optimization procedure, the F-KNN algorithm attained a precision level of 99%, surpassing the present state-of-the-art (SOTA) results reported in the current literature. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

28 pages, 1784 KiB

Open AccessArticle

Real-Time Detection of Unrecognized Objects in Logistics Warehouses Using Semantic Segmentation

by Serban Vasile Carata, Marian Ghenescu and Roxana Mihaescu

Mathematics 2023, 11(11), 2445; https://doi.org/10.3390/math11112445 - 25 May 2023

Cited by 1 | Viewed by 1413

Abstract

Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed [...] Read more.

Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed method examines changes in image segmentation from one frame to the next using semantic segmentation, taking into account the position and stationary behavior of newly introduced objects in the scene. The results indicate that the proposed method can detect pallets despite the complexity of the object and its contents. This demonstrates the utility of semantic segmentation for detecting unrecognized objects in real-world scenarios where a precise definition of the class cannot be given. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

17 pages, 2597 KiB

Open AccessArticle

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

by Artem Sher, Anton Trusov, Elena Limonova, Dmitry Nikolaev and Vladimir V. Arlazarov

Mathematics 2023, 11(9), 2112; https://doi.org/10.3390/math11092112 - 29 Apr 2023

Cited by 2 | Viewed by 1380

Abstract

Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison [...] Read more.

Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

15 pages, 4919 KiB

Open AccessArticle

Efficient and Low Color Information Dependency Skin Segmentation Model

by Hojoon You, Kunyoung Lee, Jaemu Oh and Eui Chul Lee

Mathematics 2023, 11(9), 2057; https://doi.org/10.3390/math11092057 - 26 Apr 2023

Cited by 1 | Viewed by 1304

Abstract

Skin segmentation involves segmenting the human skin region in an image. It is a preprocessing technique mainly used in many applications such as face detection, hand gesture recognition, and remote biosignal measurements. As the performance of skin segmentation directly affects the performance of [...] Read more.

Skin segmentation involves segmenting the human skin region in an image. It is a preprocessing technique mainly used in many applications such as face detection, hand gesture recognition, and remote biosignal measurements. As the performance of skin segmentation directly affects the performance of these applications, precise skin segmentation methods have been studied. However, previous skin segmentation methods are unsuitable for real-world environments because they rely heavily on color information. In addition, deep-learning-based skin segmentation methods incur high computational costs, even though skin segmentation is mainly used for preprocessing. This study proposes a lightweight skin segmentation model with a high performance. Additionally, we used data augmentation techniques that modify the hue, saturation, and values, allowing the model to learn texture or contextual information better without relying on color information. Our proposed model requires 1.09M parameters and 5.04 giga multiply-accumulate. Through experiments, we demonstrated that our proposed model shows high performance with an F-score of 0.9492 and consistent performance even for modified images. Furthermore, our proposed model showed a fast processing speed of approximately 68 fps, based on 3 × 512 × 512 images and an NVIDIA RTX 2080TI GPU (11GB VRAM) graphics card. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

21 pages, 13135 KiB

Open AccessArticle

Multi-Task Learning Approach Using Dynamic Hyperparameter for Multi-Exposure Fusion

by Chan-Gi Im, Dong-Min Son, Hyuk-Ju Kwon and Sung-Hak Lee

Mathematics 2023, 11(7), 1620; https://doi.org/10.3390/math11071620 - 27 Mar 2023

Viewed by 1181

Abstract

High-dynamic-range (HDR) image synthesis is a technology developed to accurately reproduce the actual scene of an image on a display by extending the dynamic range of an image. Multi-exposure fusion (MEF) technology, which synthesizes multiple low-dynamic-range (LDR) images to create an HDR image, [...] Read more.

High-dynamic-range (HDR) image synthesis is a technology developed to accurately reproduce the actual scene of an image on a display by extending the dynamic range of an image. Multi-exposure fusion (MEF) technology, which synthesizes multiple low-dynamic-range (LDR) images to create an HDR image, has been developed in various ways including pixel-based, patch-based, and deep learning-based methods. Recently, methods to improve the synthesis quality of images using deep-learning-based algorithms have mainly been studied in the field of MEF. Despite the various advantages of deep learning, deep-learning-based methods have a problem in that numerous multi-exposed and ground-truth images are required for training. In this study, we propose a self-supervised learning method that generates and learns reference images based on input images during the training process. In addition, we propose a method to train a deep learning model for an MEF with multiple tasks using dynamic hyperparameters on the loss functions. It enables effective network optimization across multiple tasks and high-quality image synthesis while preserving a simple network architecture. Our learning method applied to the deep learning model shows superior synthesis results compared to other existing deep-learning-based image synthesis algorithms. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

18 pages, 3183 KiB

Open AccessArticle

Automated Fire Extinguishing System Using a Deep Learning Based Framework

by Senthil Kumar Jagatheesaperumal, Khan Muhammad, Abdul Khader Jilani Saudagar and Joel J. P. C. Rodrigues

Mathematics 2023, 11(3), 608; https://doi.org/10.3390/math11030608 - 26 Jan 2023

Cited by 3 | Viewed by 2836

Abstract

Fire accidents occur in every part of the world and cause a large number of casualties because of the risks involved in manually extinguishing the fire. In most cases, humans cannot detect and extinguish fire manually. Fire extinguishing robots with sophisticated functionalities are [...] Read more.

Fire accidents occur in every part of the world and cause a large number of casualties because of the risks involved in manually extinguishing the fire. In most cases, humans cannot detect and extinguish fire manually. Fire extinguishing robots with sophisticated functionalities are being rapidly developed nowadays, and most of these systems use fire sensors and detectors. However, they lack mechanisms for the early detection of fire, in case of casualties. To detect and prevent such fire accidents in its early stages, a deep learning-based automatic fire extinguishing mechanism was introduced in this work. Fire detection and human presence in fire locations were carried out using convolution neural networks (CNNs), configured to operate on the chosen fire dataset. For fire detection, a custom learning network was formed by tweaking the layer parameters of CNN for detecting fires with better accuracy. For human detection, Alex-net architecture was employed to detect the presence of humans in the fire accident zone. We experimented and analyzed the proposed model using various optimizers, activation functions, and learning rates, based on the accuracy and loss metrics generated for the chosen fire dataset. The best combination of neural network parameters was evaluated from the model configured with an Adam optimizer and softmax activation, driven with a learning rate of 0.001, providing better accuracy for the learning model. Finally, the experiments were tested using a mobile robotic system by configuring them in automatic and wireless control modes. In automatic mode, the robot was made to patrol around and monitor for fire casualties and fire accidents. It automatically extinguished the fire using the learned features triggered through the developed model. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7552 KiB

Open AccessArticle

Categorical Variable Mapping Considerations in Classification Problems: Protein Application

by Gerardo Alfonso Perez and Raquel Castillo

Mathematics 2023, 11(2), 279; https://doi.org/10.3390/math11020279 - 05 Jan 2023

Viewed by 1755

Abstract

The mapping of categorical variables into numerical values is common in machine learning classification problems. This type of mapping is frequently performed in a relatively arbitrary manner. We present a series of four assumptions (tested numerically) regarding these mappings in the context of [...] Read more.

The mapping of categorical variables into numerical values is common in machine learning classification problems. This type of mapping is frequently performed in a relatively arbitrary manner. We present a series of four assumptions (tested numerically) regarding these mappings in the context of protein classification using amino acid information. This assumption involves the mapping of categorical variables into protein classification problems without the need to use approaches such as natural language process (NLP). The first three assumptions relate to equivalent mappings, and the fourth involves a comparable mapping using a proposed eigenvalue-based matrix representation of the amino acid chain. These assumptions were tested across a range of 23 different machine learning algorithms. It is shown that the numerical simulations are consistent with the presented assumptions, such as translation and permutations, and that the eigenvalue approach generates classifications that are statistically not different from the base case or that have higher mean values while at the same time providing some advantages such as having a fixed predetermined dimensions regardless of the size of the analyzed protein. This approach generated an accuracy of 83.25%. An optimization algorithm is also presented that selects an appropriate number of neurons in an artificial neural network applied to the above-mentioned protein classification problem, achieving an accuracy of 85.02%. The model includes a quadratic penalty function to decrease the chances of overfitting. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

16 pages, 8046 KiB

Open AccessArticle

Unsupervised Image Translation Using Multi-Scale Residual GAN

by Yifei Zhang, Weipeng Li, Daling Wang and Shi Feng

Mathematics 2022, 10(22), 4347; https://doi.org/10.3390/math10224347 - 19 Nov 2022

Cited by 1 | Viewed by 2407

Abstract

Image translation is a classic problem of image processing and computer vision for transforming an image from one domain to another by learning the mapping between an input image and an output image. A novel Multi-scale Residual Generative Adversarial Network (MRGAN) based on [...] Read more.

Image translation is a classic problem of image processing and computer vision for transforming an image from one domain to another by learning the mapping between an input image and an output image. A novel Multi-scale Residual Generative Adversarial Network (MRGAN) based on unsupervised learning is proposed in this paper for transforming images between different domains using unpaired data. In the model, a dual generater architecture is used to eliminate the dependence on paired training samples and introduce a multi-scale layered residual network in generators for reducing semantic loss of images in the process of encoding. The Wasserstein GAN architecture with gradient penalty (WGAN-GP) is employed in the discriminator to optimize the training process and speed up the network convergence. Comparative experiments on several image translation tasks over style transfers and object migrations show that the proposed MRGAN outperforms strong baseline models by large margins. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

14 pages, 6095 KiB

Open AccessArticle

Cross-Section Dimension Measurement of Construction Steel Pipe Based on Machine Vision

by Fuxing Yu, Zhihu Qin, Ruina Li and Zhanlin Ji

Mathematics 2022, 10(19), 3535; https://doi.org/10.3390/math10193535 - 28 Sep 2022

Cited by 1 | Viewed by 1623

Abstract

Currently, the on-site measuring of the size of a steel pipe cross-section for scaffold construction relies on manual measurement tools, which is a time-consuming process with poor accuracy. Therefore, this paper proposes a new method for steel pipe size measurements that is based [...] Read more.

Currently, the on-site measuring of the size of a steel pipe cross-section for scaffold construction relies on manual measurement tools, which is a time-consuming process with poor accuracy. Therefore, this paper proposes a new method for steel pipe size measurements that is based on edge extraction and image processing. Our primary aim is to solve the problems of poor accuracy and waste of labor in practical applications of construction steel pipe inspection. Therefore, the developed method utilizes a convolutional neural network and image processing technology to find an optimum solution. Our experiment revealed that the edge image that is proposed in the existing convolutional neural network technology is relatively rough and is unable to calculate the steel pipe’s cross-sectional size. Thus, the suggested network model optimizes the current technology and combines it with image processing technology. The results demonstrate that compared with the richer convolutional features (RCF) network, the optimal dataset scale (ODS) is improved by 3%, and the optimal image scale (OIS) is improved by 2.2%. At the same time, the error value of the Hough transform can be effectively reduced after improving the Hough algorithm. Full article

(This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Application of Machine Learning in Image Processing and Computer Vision

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (12 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI