Journal of Imaging

Research

Jump to: Review

25 pages, 9712 KiB

Open AccessArticle

Comparative Analysis of Color Space and Channel, Detector, and Descriptor for Feature-Based Image Registration

by Wenan Yuan, Sai Raghavendra Prasad Poosa and Rutger Francisco Dirks

J. Imaging 2024, 10(5), 105; https://doi.org/10.3390/jimaging10050105 - 28 Apr 2024

Viewed by 204

Abstract

The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large [...] Read more.

The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large benchmark dataset. The public dataset UDIS-D, with 1106 diverse image pairs, was selected. In total, 21 color spaces or channels including RGB, XYZ, Y′CrCb, HLS, L*a*b* and their corresponding channels in addition to grayscale, nine feature detectors including AKAZE, BRISK, CSE, FAST, HL, KAZE, ORB, SIFT, and TBMR, and 11 feature descriptors including AKAZE, BB, BRIEF, BRISK, DAISY, FREAK, KAZE, LATCH, ORB, SIFT, and VGG were evaluated according to reprojection error (RE), root mean square error (RMSE), structural similarity index measure (SSIM), registration failure rate, and feature number, based on 1,950,984 image registrations. No meaningful benefits from color space or channel were observed, although XYZ, RGB color space and L* color channel were able to outperform grayscale by a very minor margin. Per the dataset, the best-performing color space or channel, detector, and descriptor were XYZ/RGB, SIFT/FAST, and AKAZE. The most robust color space or channel, detector, and descriptor were L*a*b*, TBMR, and VGG. The color channel, detector, and descriptor with the most initial detector features and final homography features were Z/L*, FAST, and KAZE. In terms of the best overall unfailing combinations, XYZ/RGB+SIFT/FAST+VGG/SIFT seemed to provide the highest image registration quality, while Z+FAST+VGG provided the most image features. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

16 pages, 2688 KiB

Open AccessArticle

Derivative-Free Iterative One-Step Reconstruction for Multispectral CT

by Thomas Prohaszka, Lukas Neumann and Markus Haltmeier

J. Imaging 2024, 10(5), 98; https://doi.org/10.3390/jimaging10050098 (registering DOI) - 24 Apr 2024

Viewed by 198

Abstract

Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet [...] Read more.

Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet highly effective algorithm for MSCT image reconstruction, utilizing iterative update mechanisms that leverage the full forward model in the forward step and a derivative-free adjoint problem. Our approach demonstrates both fast convergence and superior performance compared to existing algorithms, making it an interesting candidate for future work. We also discuss further generalizations of our method and its combination with additional regularization and other data discrepancy terms. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

18 pages, 10105 KiB

Open AccessArticle

Multi-View Gait Analysis by Temporal Geometric Features of Human Body Parts

by Thanyamon Pattanapisont, Kazunori Kotani, Prarinya Siritanawan, Toshiaki Kondo and Jessada Karnjana

J. Imaging 2024, 10(4), 88; https://doi.org/10.3390/jimaging10040088 - 09 Apr 2024

Viewed by 496

Abstract

A gait is a walking pattern that can help identify a person. Recently, gait analysis employed a vision-based pose estimation for further feature extraction. This research aims to identify a person by analyzing their walking pattern. Moreover, the authors intend to expand gait [...] Read more.

A gait is a walking pattern that can help identify a person. Recently, gait analysis employed a vision-based pose estimation for further feature extraction. This research aims to identify a person by analyzing their walking pattern. Moreover, the authors intend to expand gait analysis for other tasks, e.g., the analysis of clinical, psychological, and emotional tasks. The vision-based human pose estimation method is used in this study to extract the joint angles and rank correlation between them. We deploy the multi-view gait databases for the experiment, i.e., CASIA-B and OUMVLP-Pose. The features are separated into three parts, i.e., whole, upper, and lower body features, to study the effect of the human body part features on an analysis of the gait. For person identity matching, a minimum Dynamic Time Warping (DTW) distance is determined. Additionally, we apply a majority voting algorithm to integrate the separated matching results from multiple cameras to enhance accuracy, and it improved up to approximately 30% compared to matching without majority voting. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Graphical abstract

15 pages, 3905 KiB

Open AccessArticle

An Efficient and Effective Image Decolorization Algorithm Based on Cumulative Distribution Function

by Tirui Wu, Ciaran Eising, Martin Glavin and Edward Jones

J. Imaging 2024, 10(3), 51; https://doi.org/10.3390/jimaging10030051 - 20 Feb 2024

Viewed by 1213

Abstract

Image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in [...] Read more.

Image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in CIE-XYZ format) a constant weight without considering image content. This approach is simple and fast, but it may cause significant information loss when images contain too many isoluminant colors. In this paper, we propose a new method which is not only efficient, but also can preserve a higher level of image contrast and detail than the traditional methods. It uses the information from the cumulative distribution function (CDF) of the information in each color channel to compute a weight for each pixel in each color channel. Then, these weights are used to combine the three color channels (red, green, and blue) to obtain the final grayscale value. The algorithm works in RGB color space directly without any color conversion. In order to evaluate the proposed algorithm objectively, two new metrics are also developed. Experimental results show that the proposed algorithm can run as efficiently as the traditional methods and obtain the best overall performance across four different metrics. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

18 pages, 2942 KiB

Open AccessArticle

Point Projection Mapping System for Tracking, Registering, Labeling, and Validating Optical Tissue Measurements

by Lianne Feenstra, Stefan D. van der Stel, Marcos Da Silva Guimaraes, Behdad Dashtbozorg and Theo J. M. Ruers

J. Imaging 2024, 10(2), 37; https://doi.org/10.3390/jimaging10020037 - 30 Jan 2024

Viewed by 1467

Abstract

The validation of newly developed optical tissue-sensing techniques for tumor detection during cancer surgery requires an accurate correlation with the histological results. Additionally, such an accurate correlation facilitates precise data labeling for developing high-performance machine learning tissue-classification models. In this paper, a newly [...] Read more.

The validation of newly developed optical tissue-sensing techniques for tumor detection during cancer surgery requires an accurate correlation with the histological results. Additionally, such an accurate correlation facilitates precise data labeling for developing high-performance machine learning tissue-classification models. In this paper, a newly developed Point Projection Mapping system will be introduced, which allows non-destructive tracking of the measurement locations on tissue specimens. Additionally, a framework for accurate registration, validation, and labeling with the histopathology results is proposed and validated on a case study. The proposed framework provides a more-robust and accurate method for the tracking and validation of optical tissue-sensing techniques, which saves time and resources compared to the available conventional techniques. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

14 pages, 1335 KiB

Open AccessArticle

Automated Coronary Artery Tracking with a Voronoi-Based 3D Centerline Extraction Algorithm

by Rodrigo Dalvit Carvalho da Silva, Ramin Soltanzadeh and Chase R. Figley

J. Imaging 2023, 9(12), 268; https://doi.org/10.3390/jimaging9120268 - 01 Dec 2023

Viewed by 1532

Abstract

Coronary artery disease is one of the leading causes of death worldwide, and medical imaging methods such as coronary artery computed tomography are vitally important in its detection. More recently, various computational approaches have been proposed to automatically extract important artery coronary features [...] Read more.

Coronary artery disease is one of the leading causes of death worldwide, and medical imaging methods such as coronary artery computed tomography are vitally important in its detection. More recently, various computational approaches have been proposed to automatically extract important artery coronary features (e.g., vessel centerlines, cross-sectional areas along vessel branches, etc.) that may ultimately be able to assist with more accurate and timely diagnoses. The current study therefore validated and benchmarked a recently developed automated 3D centerline extraction method for coronary artery centerline tracking using synthetically segmented coronary artery models based on the widely used Rotterdam Coronary Artery Algorithm Evaluation Framework (RCAAEF) training dataset. Based on standard accuracy metrics and the ground truth centerlines of all 32 coronary vessel branches in the RCAAEF training dataset, this 3D divide and conquer Voronoi diagram method performed exceptionally well, achieving an average overlap accuracy (OV) of 99.97%, overlap until first error (OF) of 100%, overlap of the clinically relevant portion of the vessel (OT) of 99.98%, and an average error distance inside the vessels (AI) of only 0.13 mm. Accuracy was also found to be exceptionally for all four coronary artery sub-types, with average OV values of 99.99% for right coronary arteries, 100% for left anterior descending arteries, 99.96% for left circumflex arteries, and 100% for large side-branch vessels. These results validate that the proposed method can be employed to quickly, accurately, and automatically extract 3D centerlines from segmented coronary arteries, and indicate that it is likely worthy of further exploration given the importance of this topic. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

14 pages, 2634 KiB

Open AccessArticle

Innovative Bacterial Colony Detection: Leveraging Multi-Feature Selection with the Improved Salp Swarm Algorithm

by Ahmad Ihsan, Khairul Muttaqin, Rahmatul Fajri, Mursyidah Mursyidah and Islam Md Rizwanul Fattah

J. Imaging 2023, 9(12), 263; https://doi.org/10.3390/jimaging9120263 - 28 Nov 2023

Viewed by 1381

Abstract

In this paper, we introduce a new and advanced multi-feature selection method for bacterial classification that uses the salp swarm algorithm (SSA). We improve the SSA’s performance by using opposition-based learning (OBL) and a local search algorithm (LSA). The proposed method has three [...] Read more.

In this paper, we introduce a new and advanced multi-feature selection method for bacterial classification that uses the salp swarm algorithm (SSA). We improve the SSA’s performance by using opposition-based learning (OBL) and a local search algorithm (LSA). The proposed method has three main stages, which automate the categorization of bacteria based on their unique characteristics. The method uses a multi-feature selection approach augmented by an enhanced version of the SSA. The enhancements include using OBL to increase population diversity during the search process and LSA to address local optimization problems. The improved salp swarm algorithm (ISSA) is designed to optimize multi-feature selection by increasing the number of selected features and improving classification accuracy. We compare the ISSA’s performance to that of several other algorithms on ten different test datasets. The results show that the ISSA outperforms the other algorithms in terms of classification accuracy on three datasets with 19 features, achieving an accuracy of 73.75%. Additionally, the ISSA excels at determining the optimal number of features and producing a better fit value, with a classification error rate of 0.249. Therefore, the ISSA method is expected to make a significant contribution to solving feature selection problems in bacterial analysis. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

20 pages, 44749 KiB

Open AccessArticle

Impact of ISP Tuning on Object Detection

by Dara Molloy, Brian Deegan, Darragh Mullins, Enda Ward, Jonathan Horgan, Ciaran Eising, Patrick Denny, Edward Jones and Martin Glavin

J. Imaging 2023, 9(12), 260; https://doi.org/10.3390/jimaging9120260 - 24 Nov 2023

Cited by 1 | Viewed by 3146

Abstract

In advanced driver assistance systems (ADAS) or autonomous vehicle research, acquiring semantic information about the surrounding environment generally relies heavily on camera-based object detection. Image signal processors (ISPs) in cameras are generally tuned for human perception. In most cases, ISP parameters are selected [...] Read more.

In advanced driver assistance systems (ADAS) or autonomous vehicle research, acquiring semantic information about the surrounding environment generally relies heavily on camera-based object detection. Image signal processors (ISPs) in cameras are generally tuned for human perception. In most cases, ISP parameters are selected subjectively and the resulting image differs depending on the individual who tuned it. While the installation of cameras on cars started as a means of providing a view of the vehicle’s environment to the driver, cameras are increasingly becoming part of safety-critical object detection systems for ADAS. Deep learning-based object detection has become prominent, but the effect of varying the ISP parameters has an unknown performance impact. In this study, we analyze the performance of 14 popular object detection models in the context of changes in the ISP parameters. We consider eight ISP blocks: demosaicing, gamma, denoising, edge enhancement, local tone mapping, saturation, contrast, and hue angle. We investigate two raw datasets, PASCALRAW and a custom raw dataset collected from an advanced driver assistance system (ADAS) perspective. We found that varying from a default ISP degrades the object detection performance and that the models differ in sensitivity to varying ISP parameters. Finally, we propose a novel methodology that increases object detection model robustness via ISP variation data augmentation. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

21 pages, 4044 KiB

Open AccessArticle

A Point-Cloud Segmentation Network Based on SqueezeNet and Time Series for Plants

by Xingshuo Peng, Keyuan Wang, Zelin Zhang, Nan Geng and Zhiyi Zhang

J. Imaging 2023, 9(12), 258; https://doi.org/10.3390/jimaging9120258 - 23 Nov 2023

Cited by 1 | Viewed by 1890

Abstract

The phenotyping of plant growth enriches our understanding of intricate genetic characteristics, paving the way for advancements in modern breeding and precision agriculture. Within the domain of phenotyping, segmenting 3D point clouds of plant organs is the basis of extracting plant phenotypic parameters. [...] Read more.

The phenotyping of plant growth enriches our understanding of intricate genetic characteristics, paving the way for advancements in modern breeding and precision agriculture. Within the domain of phenotyping, segmenting 3D point clouds of plant organs is the basis of extracting plant phenotypic parameters. In this study, we introduce a novel method for point-cloud downsampling that adeptly mitigates the challenges posed by sample imbalances. In subsequent developments, we architect a deep learning framework founded on the principles of SqueezeNet for the segmentation of plant point clouds. In addition, we also use the time series as input variables, which effectively improves the segmentation accuracy of the network. Based on semantic segmentation, the MeanShift algorithm is employed to execute instance segmentation on the point-cloud data of crops. In semantic segmentation, the average Precision, Recall, F1-score, and IoU of maize reached 99.35%, 99.26%, 99.30%, and 98.61%, and the average Precision, Recall, F1-score, and IoU of tomato reached 97.98%, 97.92%, 97.95%, and 95.98%. In instance segmentation, the accuracy of maize and tomato reached 98.45% and 96.12%. This research holds the potential to advance the fields of plant phenotypic extraction, ideotype selection, and precision agriculture. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

18 pages, 5809 KiB

Open AccessArticle

Understanding Error Patterns: An Analysis of Alignment Errors in Rigid 3D Body Scans

by Julian Meißner, Michael Kisiel, Nagarajan M. Thoppey, Michael M. Morlock and Sebastian Bannwarth

J. Imaging 2023, 9(12), 255; https://doi.org/10.3390/jimaging9120255 - 21 Nov 2023

Viewed by 1327

Abstract

Three-dimensional body scanners are attracting increasing interest in various application areas. To evaluate their accuracy, their 3D point clouds must be compared to a reference system by using a reference object. Since different scanning systems use different coordinate systems, an alignment is required [...] Read more.

Three-dimensional body scanners are attracting increasing interest in various application areas. To evaluate their accuracy, their 3D point clouds must be compared to a reference system by using a reference object. Since different scanning systems use different coordinate systems, an alignment is required for their evaluation. However, this process can result in translational and rotational misalignment. To understand the effects of alignment errors on the accuracy of measured circumferences of the human lower body, such misalignment is simulated in this paper and the resulting characteristic error patterns are analyzed. The results show that the total error consists of two components, namely translational and tilt. Linear correlations were found between the translational error (R² = 0.90, … 0.97) and the change in circumferences as well as between the tilt error (R² = 0.55, … 0.78) and the change in the body’s mean outline. Finally, by systematic analysis of the error patterns, recommendations were derived and applied to 3D body scans of human subjects resulting in a reduction of error by 67% and 84%. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

14 pages, 2336 KiB

Open AccessArticle

An Automatic Pixel-Wise Multi-Penalty Approach to Image Restoration

by Villiam Bortolotti, Germana Landi and Fabiana Zama

J. Imaging 2023, 9(11), 249; https://doi.org/10.3390/jimaging9110249 - 15 Nov 2023

Viewed by 1173

Abstract

This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed [...] Read more.

This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed in previous work, here we develop a new image restoration model and an iterative algorithm for its effective solution. The model incorporates pixel-wise regularization terms and establishes a rule for parameter selection, aiming to restore images through the solution of a sequence of constrained optimization problems. To achieve this, we present a modified version of the Newton Projection method, adapted to multi-penalty scenarios, and prove its convergence. Numerical experiments demonstrate the efficacy of the method in eliminating noise and blur while preserving the image edges. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

13 pages, 1754 KiB

Open AccessArticle

Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method

by Mohammad Dehghan Rouzi, Behzad Moshiri, Mohammad Khoshnevisan, Mohammad Ali Akhaee, Farhang Jaryani, Samaneh Salehi Nasab and Myeounggon Lee

J. Imaging 2023, 9(11), 247; https://doi.org/10.3390/jimaging9110247 - 13 Nov 2023

Cited by 3 | Viewed by 2300

Abstract

Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep [...] Read more.

Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep learning networks—EfficientNet, Xception, MobileNetV2, InceptionV3, and Resnet50—integrated via our innovative consensus-adaptive weighting (CAW) method. This method permits the dynamic adjustment of multiple deep networks, bolstering the system’s detection capabilities. Our approach also addresses a major challenge in pixel-level data annotation of faster R-CNNs, highlighted in a prominent previous study. Evaluations on various datasets, including the cropped DDSM (Digital Database for Screening Mammography), DDSM, and INbreast, demonstrated the system’s superior performance. In particular, our CAD system showed marked improvement on the cropped DDSM dataset, enhancing detection rates by approximately 1.59% and achieving an accuracy of 95.48%. This innovative system represents a significant advancement in early breast cancer detection, offering the potential for more precise and timely diagnosis, ultimately fostering improved patient outcomes. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

30 pages, 10861 KiB

Open AccessArticle

Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework

by Hayat Ullah and Arslan Munir

J. Imaging 2023, 9(7), 130; https://doi.org/10.3390/jimaging9070130 - 26 Jun 2023

Cited by 2 | Viewed by 1932

Abstract

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown [...] Read more.

Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

21 pages, 11132 KiB

Open AccessArticle

The Effectiveness of Pan-Sharpening Algorithms on Different Land Cover Types in GeoEye-1 Satellite Images

by Emanuele Alcaras and Claudio Parente

J. Imaging 2023, 9(5), 93; https://doi.org/10.3390/jimaging9050093 - 30 Apr 2023

Cited by 2 | Viewed by 1774

Abstract

In recent years, the demand for very high geometric resolution satellite images has increased significantly. The pan-sharpening techniques, which are part of the data fusion techniques, enable the increase in the geometric resolution of multispectral images using panchromatic imagery of the same scene. [...] Read more.

In recent years, the demand for very high geometric resolution satellite images has increased significantly. The pan-sharpening techniques, which are part of the data fusion techniques, enable the increase in the geometric resolution of multispectral images using panchromatic imagery of the same scene. However, it is not trivial to choose a suitable pan-sharpening algorithm: there are several, but none of these is universally recognized as the best for any type of sensor, in addition to the fact that they can provide different results with regard to the investigated scene. This article focuses on the latter aspect: analyzing pan-sharpening algorithms in relation to different land covers. A dataset of GeoEye-1 images is selected from which four study areas (frames) are extracted: one natural, one rural, one urban and one semi-urban. The type of study area is determined considering the quantity of vegetation included in it based on the normalized difference vegetation index (NDVI). Nine pan-sharpening methods are applied to each frame and the resulting pan-sharpened images are compared by means of spectral and spatial quality indicators. Multicriteria analysis permits to define the best performing method related to each specific area as well as the most suitable one, considering the co-presence of different land covers in the analyzed scene. Brovey transformation fast supplies the best results among the methods analyzed in this study. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

27 pages, 5914 KiB

Open AccessArticle

A 3DCNN-Based Knowledge Distillation Framework for Human Activity Recognition

by Hayat Ullah and Arslan Munir

J. Imaging 2023, 9(4), 82; https://doi.org/10.3390/jimaging9040082 - 14 Apr 2023

Cited by 1 | Viewed by 1915

Abstract

Human action recognition has been actively explored over the past two decades to further advancements in video analytics domain. Numerous research studies have been conducted to investigate the complex sequential patterns of human actions in video streams. In this paper, we propose a [...] Read more.

Human action recognition has been actively explored over the past two decades to further advancements in video analytics domain. Numerous research studies have been conducted to investigate the complex sequential patterns of human actions in video streams. In this paper, we propose a knowledge distillation framework, which distills spatio-temporal knowledge from a large teacher model to a lightweight student model using an offline knowledge distillation technique. The proposed offline knowledge distillation framework takes two models: a large pre-trained 3DCNN (three-dimensional convolutional neural network) teacher model and a lightweight 3DCNN student model (i.e., the teacher model is pre-trained on the same dataset on which the student model is to be trained on). During offline knowledge distillation training, the distillation algorithm trains only the student model to help enable the student model to achieve the same level of prediction accuracy as the teacher model. To evaluate the performance of the proposed method, we conduct extensive experiments on four benchmark human action datasets. The obtained quantitative results verify the efficiency and robustness of the proposed method over the state-of-the-art human action recognition methods by obtaining up to 35% improvement in accuracy over existing methods. Furthermore, we evaluate the inference time of the proposed method and compare the obtained results with the inference time of the state-of-the-art methods. Experimental results reveal that the proposed method attains an improvement of up to 50× in terms of frames per seconds (FPS) over the state-of-the-art methods. The short inference time and high accuracy make our proposed framework suitable for human activity recognition in real-time applications. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

Review

Jump to: Research

20 pages, 4022 KiB

Open AccessReview

Review of Image-Processing-Based Technology for Structural Health Monitoring of Civil Infrastructures

by Ji-Woo Kim, Hee-Wook Choi, Sung-Keun Kim and Wongi S. Na

J. Imaging 2024, 10(4), 93; https://doi.org/10.3390/jimaging10040093 - 16 Apr 2024

Viewed by 608

Abstract

The continuous monitoring of civil infrastructures is crucial for ensuring public safety and extending the lifespan of structures. In recent years, image-processing-based technologies have emerged as powerful tools for the structural health monitoring (SHM) of civil infrastructures. This review provides a comprehensive overview [...] Read more.

The continuous monitoring of civil infrastructures is crucial for ensuring public safety and extending the lifespan of structures. In recent years, image-processing-based technologies have emerged as powerful tools for the structural health monitoring (SHM) of civil infrastructures. This review provides a comprehensive overview of the advancements, applications, and challenges associated with image processing in the field of SHM. The discussion encompasses various imaging techniques such as satellite imagery, Light Detection and Ranging (LiDAR), optical cameras, and other non-destructive testing methods. Key topics include the use of image processing for damage detection, crack identification, deformation monitoring, and overall structural assessment. This review explores the integration of artificial intelligence and machine learning techniques with image processing for enhanced automation and accuracy in SHM. By consolidating the current state of image-processing-based technology for SHM, this review aims to show the full potential of image-based approaches for researchers, engineers, and professionals involved in civil engineering, SHM, image processing, and related fields. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

31 pages, 2698 KiB

Open AccessReview

Decision Fusion at Pixel Level of Multi-Band Data for Land Cover Classification—A Review

by Spiros Papadopoulos, Georgia Koukiou and Vassilis Anastassopoulos

J. Imaging 2024, 10(1), 15; https://doi.org/10.3390/jimaging10010015 - 05 Jan 2024

Viewed by 1605

Abstract

According to existing signatures for various kinds of land cover coming from different spectral bands, i.e., optical, thermal infrared and PolSAR, it is possible to infer about the land cover type having a single decision from each of the spectral bands. Fusing these [...] Read more.

According to existing signatures for various kinds of land cover coming from different spectral bands, i.e., optical, thermal infrared and PolSAR, it is possible to infer about the land cover type having a single decision from each of the spectral bands. Fusing these decisions, it is possible to radically improve the reliability of the decision regarding each pixel, taking into consideration the correlation of the individual decisions of the specific pixel as well as additional information transferred from the pixels’ neighborhood. Different remotely sensed data contribute their own information regarding the characteristics of the materials lying in each separate pixel. Hyperspectral and multispectral images give analytic information regarding the reflectance of each pixel in a very detailed manner. Thermal infrared images give valuable information regarding the temperature of the surface covered by each pixel, which is very important for recording thermal locations in urban regions. Finally, SAR data provide structural and electrical characteristics of each pixel. Combining information from some of these sources further improves the capability for reliable categorization of each pixel. The necessary mathematical background regarding pixel-based classification and decision fusion methods is analytically presented. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

38 pages, 20530 KiB

Open AccessReview

A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation

by Amal El Kaid and Karim Baïna

J. Imaging 2023, 9(12), 275; https://doi.org/10.3390/jimaging9120275 - 12 Dec 2023

Viewed by 3199

Abstract

Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through [...] Read more.

Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview. Unlike many existing surveys that categorize approaches based on learning paradigms, our survey offers a fresh perspective, delving deeper into the subject. For image-based approaches, we not only follow existing categorizations but also introduce and compare significant 2D models. Additionally, we provide a comparative analysis of these methods, enhancing the understanding of image-based pose estimation techniques. In the realm of video-based approaches, we categorize them based on the types of models used to capture inter-frame information. Furthermore, in the context of multi-person pose estimation, our survey uniquely differentiates between approaches focusing on relative poses and those addressing absolute poses. Our survey aims to serve as a pivotal resource for researchers, highlighting state-of-the-art deep learning strategies and identifying promising directions for future exploration in 3D human pose estimation. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Image Processing and Computer Vision: Algorithms and Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (18 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI