Image Processing and Computer Vision: Algorithms and Applications

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: 30 June 2024 | Viewed by 29595

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA
Interests: artificial intelligence; computer vision; parallel computing; embedded systems; secure and trustworthy systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Modern image processing is a process of transforming an image into a digital form and using computing systems to process, manipulate, and/or enhance digital images through various algorithms. Image processing is also a requisite for many computer vision tasks as it helps to pre-process images and prepares the data in a form which is suitable for various computer vision models. Computer vision generally refers to techniques that enable computers to understand and make sense of images. Computer vision enables machines to extract latent information from visual data and to mimic the human perception of sight with computational algorithms. Active research is ongoing on developing novel image processing and computer vision algorithms, including deep learning-based algorithms for enabling new and fascinating applications. Advances in image processing and computer vision have enabled many exciting new applications, such as autonomous vehicles, unmanned aerial vehicles, computational photography, augmented reality, surveillance, optical character recognition, machine inspection, autonomous package delivery, photogrammetry, biometrics, the computer-aided inspection of medical images, and remote patient monitoring. Image processing and computer vision have applications in various domains including healthcare, transportation, retail, agriculture, business, manufacturing, construction, space, and the military.

This Special Issue focuses on algorithms and applications of image processing and computer vision. This Special Issue invites original research articles and reviews that relate to computing, architecture, algorithms, security, and applications of image processing and computer vision. The topics of interest include, but are not limited to, the following:

  • Image interpretation;
  • Object detection and recognition;
  • Spatial artificial intelligence;
  • Event detection and activity recognition;
  • Image segmentation;
  • Video classification and analysis;
  • Face and gesture recognition;
  • Pose estimation;
  • Computational photography;
  • Image security ;
  • Vision hardware and/or software architectures;
  • Image/vision acceleration techniques;
  • Monitoring and surveillance;
  • Situational awareness

Dr. Arslan Munir
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • image fusion
  • vision algorithms
  • deep learning
  • stereo vision
  • activity recognition
  • image/video analysis
  • image encryption
  • computational photography
  • vision hardware/software
  • monitoring and surveillance
  • biometrics
  • robotics
  • augmented reality

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 9712 KiB  
Article
Comparative Analysis of Color Space and Channel, Detector, and Descriptor for Feature-Based Image Registration
by Wenan Yuan, Sai Raghavendra Prasad Poosa and Rutger Francisco Dirks
J. Imaging 2024, 10(5), 105; https://doi.org/10.3390/jimaging10050105 - 28 Apr 2024
Viewed by 204
Abstract
The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large [...] Read more.
The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large benchmark dataset. The public dataset UDIS-D, with 1106 diverse image pairs, was selected. In total, 21 color spaces or channels including RGB, XYZ, Y′CrCb, HLS, L*a*b* and their corresponding channels in addition to grayscale, nine feature detectors including AKAZE, BRISK, CSE, FAST, HL, KAZE, ORB, SIFT, and TBMR, and 11 feature descriptors including AKAZE, BB, BRIEF, BRISK, DAISY, FREAK, KAZE, LATCH, ORB, SIFT, and VGG were evaluated according to reprojection error (RE), root mean square error (RMSE), structural similarity index measure (SSIM), registration failure rate, and feature number, based on 1,950,984 image registrations. No meaningful benefits from color space or channel were observed, although XYZ, RGB color space and L* color channel were able to outperform grayscale by a very minor margin. Per the dataset, the best-performing color space or channel, detector, and descriptor were XYZ/RGB, SIFT/FAST, and AKAZE. The most robust color space or channel, detector, and descriptor were L*a*b*, TBMR, and VGG. The color channel, detector, and descriptor with the most initial detector features and final homography features were Z/L*, FAST, and KAZE. In terms of the best overall unfailing combinations, XYZ/RGB+SIFT/FAST+VGG/SIFT seemed to provide the highest image registration quality, while Z+FAST+VGG provided the most image features. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

16 pages, 2688 KiB  
Article
Derivative-Free Iterative One-Step Reconstruction for Multispectral CT
by Thomas Prohaszka, Lukas Neumann and Markus Haltmeier
J. Imaging 2024, 10(5), 98; https://doi.org/10.3390/jimaging10050098 (registering DOI) - 24 Apr 2024
Viewed by 198
Abstract
Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet [...] Read more.
Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet highly effective algorithm for MSCT image reconstruction, utilizing iterative update mechanisms that leverage the full forward model in the forward step and a derivative-free adjoint problem. Our approach demonstrates both fast convergence and superior performance compared to existing algorithms, making it an interesting candidate for future work. We also discuss further generalizations of our method and its combination with additional regularization and other data discrepancy terms. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
18 pages, 10105 KiB  
Article
Multi-View Gait Analysis by Temporal Geometric Features of Human Body Parts
by Thanyamon Pattanapisont, Kazunori Kotani, Prarinya Siritanawan, Toshiaki Kondo and Jessada Karnjana
J. Imaging 2024, 10(4), 88; https://doi.org/10.3390/jimaging10040088 - 09 Apr 2024
Viewed by 496
Abstract
A gait is a walking pattern that can help identify a person. Recently, gait analysis employed a vision-based pose estimation for further feature extraction. This research aims to identify a person by analyzing their walking pattern. Moreover, the authors intend to expand gait [...] Read more.
A gait is a walking pattern that can help identify a person. Recently, gait analysis employed a vision-based pose estimation for further feature extraction. This research aims to identify a person by analyzing their walking pattern. Moreover, the authors intend to expand gait analysis for other tasks, e.g., the analysis of clinical, psychological, and emotional tasks. The vision-based human pose estimation method is used in this study to extract the joint angles and rank correlation between them. We deploy the multi-view gait databases for the experiment, i.e., CASIA-B and OUMVLP-Pose. The features are separated into three parts, i.e., whole, upper, and lower body features, to study the effect of the human body part features on an analysis of the gait. For person identity matching, a minimum Dynamic Time Warping (DTW) distance is determined. Additionally, we apply a majority voting algorithm to integrate the separated matching results from multiple cameras to enhance accuracy, and it improved up to approximately 30% compared to matching without majority voting. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Graphical abstract

15 pages, 3905 KiB  
Article
An Efficient and Effective Image Decolorization Algorithm Based on Cumulative Distribution Function
by Tirui Wu, Ciaran Eising, Martin Glavin and Edward Jones
J. Imaging 2024, 10(3), 51; https://doi.org/10.3390/jimaging10030051 - 20 Feb 2024
Viewed by 1213
Abstract
Image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in [...] Read more.
Image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in CIE-XYZ format) a constant weight without considering image content. This approach is simple and fast, but it may cause significant information loss when images contain too many isoluminant colors. In this paper, we propose a new method which is not only efficient, but also can preserve a higher level of image contrast and detail than the traditional methods. It uses the information from the cumulative distribution function (CDF) of the information in each color channel to compute a weight for each pixel in each color channel. Then, these weights are used to combine the three color channels (red, green, and blue) to obtain the final grayscale value. The algorithm works in RGB color space directly without any color conversion. In order to evaluate the proposed algorithm objectively, two new metrics are also developed. Experimental results show that the proposed algorithm can run as efficiently as the traditional methods and obtain the best overall performance across four different metrics. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

18 pages, 2942 KiB  
Article
Point Projection Mapping System for Tracking, Registering, Labeling, and Validating Optical Tissue Measurements
by Lianne Feenstra, Stefan D. van der Stel, Marcos Da Silva Guimaraes, Behdad Dashtbozorg and Theo J. M. Ruers
J. Imaging 2024, 10(2), 37; https://doi.org/10.3390/jimaging10020037 - 30 Jan 2024
Viewed by 1467
Abstract
The validation of newly developed optical tissue-sensing techniques for tumor detection during cancer surgery requires an accurate correlation with the histological results. Additionally, such an accurate correlation facilitates precise data labeling for developing high-performance machine learning tissue-classification models. In this paper, a newly [...] Read more.
The validation of newly developed optical tissue-sensing techniques for tumor detection during cancer surgery requires an accurate correlation with the histological results. Additionally, such an accurate correlation facilitates precise data labeling for developing high-performance machine learning tissue-classification models. In this paper, a newly developed Point Projection Mapping system will be introduced, which allows non-destructive tracking of the measurement locations on tissue specimens. Additionally, a framework for accurate registration, validation, and labeling with the histopathology results is proposed and validated on a case study. The proposed framework provides a more-robust and accurate method for the tracking and validation of optical tissue-sensing techniques, which saves time and resources compared to the available conventional techniques. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

14 pages, 1335 KiB  
Article
Automated Coronary Artery Tracking with a Voronoi-Based 3D Centerline Extraction Algorithm
by Rodrigo Dalvit Carvalho da Silva, Ramin Soltanzadeh and Chase R. Figley
J. Imaging 2023, 9(12), 268; https://doi.org/10.3390/jimaging9120268 - 01 Dec 2023
Viewed by 1532
Abstract
Coronary artery disease is one of the leading causes of death worldwide, and medical imaging methods such as coronary artery computed tomography are vitally important in its detection. More recently, various computational approaches have been proposed to automatically extract important artery coronary features [...] Read more.
Coronary artery disease is one of the leading causes of death worldwide, and medical imaging methods such as coronary artery computed tomography are vitally important in its detection. More recently, various computational approaches have been proposed to automatically extract important artery coronary features (e.g., vessel centerlines, cross-sectional areas along vessel branches, etc.) that may ultimately be able to assist with more accurate and timely diagnoses. The current study therefore validated and benchmarked a recently developed automated 3D centerline extraction method for coronary artery centerline tracking using synthetically segmented coronary artery models based on the widely used Rotterdam Coronary Artery Algorithm Evaluation Framework (RCAAEF) training dataset. Based on standard accuracy metrics and the ground truth centerlines of all 32 coronary vessel branches in the RCAAEF training dataset, this 3D divide and conquer Voronoi diagram method performed exceptionally well, achieving an average overlap accuracy (OV) of 99.97%, overlap until first error (OF) of 100%, overlap of the clinically relevant portion of the vessel (OT) of 99.98%, and an average error distance inside the vessels (AI) of only 0.13 mm. Accuracy was also found to be exceptionally for all four coronary artery sub-types, with average OV values of 99.99% for right coronary arteries, 100% for left anterior descending arteries, 99.96% for left circumflex arteries, and 100% for large side-branch vessels. These results validate that the proposed method can be employed to quickly, accurately, and automatically extract 3D centerlines from segmented coronary arteries, and indicate that it is likely worthy of further exploration given the importance of this topic. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

14 pages, 2634 KiB  
Article
Innovative Bacterial Colony Detection: Leveraging Multi-Feature Selection with the Improved Salp Swarm Algorithm
by Ahmad Ihsan, Khairul Muttaqin, Rahmatul Fajri, Mursyidah Mursyidah and Islam Md Rizwanul Fattah
J. Imaging 2023, 9(12), 263; https://doi.org/10.3390/jimaging9120263 - 28 Nov 2023
Viewed by 1381
Abstract
In this paper, we introduce a new and advanced multi-feature selection method for bacterial classification that uses the salp swarm algorithm (SSA). We improve the SSA’s performance by using opposition-based learning (OBL) and a local search algorithm (LSA). The proposed method has three [...] Read more.
In this paper, we introduce a new and advanced multi-feature selection method for bacterial classification that uses the salp swarm algorithm (SSA). We improve the SSA’s performance by using opposition-based learning (OBL) and a local search algorithm (LSA). The proposed method has three main stages, which automate the categorization of bacteria based on their unique characteristics. The method uses a multi-feature selection approach augmented by an enhanced version of the SSA. The enhancements include using OBL to increase population diversity during the search process and LSA to address local optimization problems. The improved salp swarm algorithm (ISSA) is designed to optimize multi-feature selection by increasing the number of selected features and improving classification accuracy. We compare the ISSA’s performance to that of several other algorithms on ten different test datasets. The results show that the ISSA outperforms the other algorithms in terms of classification accuracy on three datasets with 19 features, achieving an accuracy of 73.75%. Additionally, the ISSA excels at determining the optimal number of features and producing a better fit value, with a classification error rate of 0.249. Therefore, the ISSA method is expected to make a significant contribution to solving feature selection problems in bacterial analysis. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

20 pages, 44749 KiB  
Article
Impact of ISP Tuning on Object Detection
by Dara Molloy, Brian Deegan, Darragh Mullins, Enda Ward, Jonathan Horgan, Ciaran Eising, Patrick Denny, Edward Jones and Martin Glavin
J. Imaging 2023, 9(12), 260; https://doi.org/10.3390/jimaging9120260 - 24 Nov 2023
Cited by 1 | Viewed by 3146
Abstract
In advanced driver assistance systems (ADAS) or autonomous vehicle research, acquiring semantic information about the surrounding environment generally relies heavily on camera-based object detection. Image signal processors (ISPs) in cameras are generally tuned for human perception. In most cases, ISP parameters are selected [...] Read more.
In advanced driver assistance systems (ADAS) or autonomous vehicle research, acquiring semantic information about the surrounding environment generally relies heavily on camera-based object detection. Image signal processors (ISPs) in cameras are generally tuned for human perception. In most cases, ISP parameters are selected subjectively and the resulting image differs depending on the individual who tuned it. While the installation of cameras on cars started as a means of providing a view of the vehicle’s environment to the driver, cameras are increasingly becoming part of safety-critical object detection systems for ADAS. Deep learning-based object detection has become prominent, but the effect of varying the ISP parameters has an unknown performance impact. In this study, we analyze the performance of 14 popular object detection models in the context of changes in the ISP parameters. We consider eight ISP blocks: demosaicing, gamma, denoising, edge enhancement, local tone mapping, saturation, contrast, and hue angle. We investigate two raw datasets, PASCALRAW and a custom raw dataset collected from an advanced driver assistance system (ADAS) perspective. We found that varying from a default ISP degrades the object detection performance and that the models differ in sensitivity to varying ISP parameters. Finally, we propose a novel methodology that increases object detection model robustness via ISP variation data augmentation. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

21 pages, 4044 KiB  
Article
A Point-Cloud Segmentation Network Based on SqueezeNet and Time Series for Plants
by Xingshuo Peng, Keyuan Wang, Zelin Zhang, Nan Geng and Zhiyi Zhang
J. Imaging 2023, 9(12), 258; https://doi.org/10.3390/jimaging9120258 - 23 Nov 2023
Cited by 1 | Viewed by 1890
Abstract
The phenotyping of plant growth enriches our understanding of intricate genetic characteristics, paving the way for advancements in modern breeding and precision agriculture. Within the domain of phenotyping, segmenting 3D point clouds of plant organs is the basis of extracting plant phenotypic parameters. [...] Read more.
The phenotyping of plant growth enriches our understanding of intricate genetic characteristics, paving the way for advancements in modern breeding and precision agriculture. Within the domain of phenotyping, segmenting 3D point clouds of plant organs is the basis of extracting plant phenotypic parameters. In this study, we introduce a novel method for point-cloud downsampling that adeptly mitigates the challenges posed by sample imbalances. In subsequent developments, we architect a deep learning framework founded on the principles of SqueezeNet for the segmentation of plant point clouds. In addition, we also use the time series as input variables, which effectively improves the segmentation accuracy of the network. Based on semantic segmentation, the MeanShift algorithm is employed to execute instance segmentation on the point-cloud data of crops. In semantic segmentation, the average Precision, Recall, F1-score, and IoU of maize reached 99.35%, 99.26%, 99.30%, and 98.61%, and the average Precision, Recall, F1-score, and IoU of tomato reached 97.98%, 97.92%, 97.95%, and 95.98%. In instance segmentation, the accuracy of maize and tomato reached 98.45% and 96.12%. This research holds the potential to advance the fields of plant phenotypic extraction, ideotype selection, and precision agriculture. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

18 pages, 5809 KiB  
Article
Understanding Error Patterns: An Analysis of Alignment Errors in Rigid 3D Body Scans
by Julian Meißner, Michael Kisiel, Nagarajan M. Thoppey, Michael M. Morlock and Sebastian Bannwarth
J. Imaging 2023, 9(12), 255; https://doi.org/10.3390/jimaging9120255 - 21 Nov 2023
Viewed by 1327
Abstract
Three-dimensional body scanners are attracting increasing interest in various application areas. To evaluate their accuracy, their 3D point clouds must be compared to a reference system by using a reference object. Since different scanning systems use different coordinate systems, an alignment is required [...] Read more.
Three-dimensional body scanners are attracting increasing interest in various application areas. To evaluate their accuracy, their 3D point clouds must be compared to a reference system by using a reference object. Since different scanning systems use different coordinate systems, an alignment is required for their evaluation. However, this process can result in translational and rotational misalignment. To understand the effects of alignment errors on the accuracy of measured circumferences of the human lower body, such misalignment is simulated in this paper and the resulting characteristic error patterns are analyzed. The results show that the total error consists of two components, namely translational and tilt. Linear correlations were found between the translational error (R2 = 0.90, … 0.97) and the change in circumferences as well as between the tilt error (R2 = 0.55, … 0.78) and the change in the body’s mean outline. Finally, by systematic analysis of the error patterns, recommendations were derived and applied to 3D body scans of human subjects resulting in a reduction of error by 67% and 84%. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

14 pages, 2336 KiB  
Article
An Automatic Pixel-Wise Multi-Penalty Approach to Image Restoration
by Villiam Bortolotti, Germana Landi and Fabiana Zama
J. Imaging 2023, 9(11), 249; https://doi.org/10.3390/jimaging9110249 - 15 Nov 2023
Viewed by 1173
Abstract
This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed [...] Read more.
This work tackles the problem of image restoration, a crucial task in many fields of applied sciences, focusing on removing degradation caused by blur and noise during the acquisition process. Drawing inspiration from the multi-penalty approach based on the Uniform Penalty principle, discussed in previous work, here we develop a new image restoration model and an iterative algorithm for its effective solution. The model incorporates pixel-wise regularization terms and establishes a rule for parameter selection, aiming to restore images through the solution of a sequence of constrained optimization problems. To achieve this, we present a modified version of the Newton Projection method, adapted to multi-penalty scenarios, and prove its convergence. Numerical experiments demonstrate the efficacy of the method in eliminating noise and blur while preserving the image edges. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

13 pages, 1754 KiB  
Article
Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method
by Mohammad Dehghan Rouzi, Behzad Moshiri, Mohammad Khoshnevisan, Mohammad Ali Akhaee, Farhang Jaryani, Samaneh Salehi Nasab and Myeounggon Lee
J. Imaging 2023, 9(11), 247; https://doi.org/10.3390/jimaging9110247 - 13 Nov 2023
Cited by 3 | Viewed by 2300
Abstract
Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep [...] Read more.
Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep learning networks—EfficientNet, Xception, MobileNetV2, InceptionV3, and Resnet50—integrated via our innovative consensus-adaptive weighting (CAW) method. This method permits the dynamic adjustment of multiple deep networks, bolstering the system’s detection capabilities. Our approach also addresses a major challenge in pixel-level data annotation of faster R-CNNs, highlighted in a prominent previous study. Evaluations on various datasets, including the cropped DDSM (Digital Database for Screening Mammography), DDSM, and INbreast, demonstrated the system’s superior performance. In particular, our CAD system showed marked improvement on the cropped DDSM dataset, enhancing detection rates by approximately 1.59% and achieving an accuracy of 95.48%. This innovative system represents a significant advancement in early breast cancer detection, offering the potential for more precise and timely diagnosis, ultimately fostering improved patient outcomes. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

30 pages, 10861 KiB  
Article
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
by Hayat Ullah and Arslan Munir
J. Imaging 2023, 9(7), 130; https://doi.org/10.3390/jimaging9070130 - 26 Jun 2023
Cited by 2 | Viewed by 1932
Abstract
Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown [...] Read more.
Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

21 pages, 11132 KiB  
Article
The Effectiveness of Pan-Sharpening Algorithms on Different Land Cover Types in GeoEye-1 Satellite Images
by Emanuele Alcaras and Claudio Parente
J. Imaging 2023, 9(5), 93; https://doi.org/10.3390/jimaging9050093 - 30 Apr 2023
Cited by 2 | Viewed by 1774
Abstract
In recent years, the demand for very high geometric resolution satellite images has increased significantly. The pan-sharpening techniques, which are part of the data fusion techniques, enable the increase in the geometric resolution of multispectral images using panchromatic imagery of the same scene. [...] Read more.
In recent years, the demand for very high geometric resolution satellite images has increased significantly. The pan-sharpening techniques, which are part of the data fusion techniques, enable the increase in the geometric resolution of multispectral images using panchromatic imagery of the same scene. However, it is not trivial to choose a suitable pan-sharpening algorithm: there are several, but none of these is universally recognized as the best for any type of sensor, in addition to the fact that they can provide different results with regard to the investigated scene. This article focuses on the latter aspect: analyzing pan-sharpening algorithms in relation to different land covers. A dataset of GeoEye-1 images is selected from which four study areas (frames) are extracted: one natural, one rural, one urban and one semi-urban. The type of study area is determined considering the quantity of vegetation included in it based on the normalized difference vegetation index (NDVI). Nine pan-sharpening methods are applied to each frame and the resulting pan-sharpened images are compared by means of spectral and spatial quality indicators. Multicriteria analysis permits to define the best performing method related to each specific area as well as the most suitable one, considering the co-presence of different land covers in the analyzed scene. Brovey transformation fast supplies the best results among the methods analyzed in this study. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

27 pages, 5914 KiB  
Article
A 3DCNN-Based Knowledge Distillation Framework for Human Activity Recognition
by Hayat Ullah and Arslan Munir
J. Imaging 2023, 9(4), 82; https://doi.org/10.3390/jimaging9040082 - 14 Apr 2023
Cited by 1 | Viewed by 1915
Abstract
Human action recognition has been actively explored over the past two decades to further advancements in video analytics domain. Numerous research studies have been conducted to investigate the complex sequential patterns of human actions in video streams. In this paper, we propose a [...] Read more.
Human action recognition has been actively explored over the past two decades to further advancements in video analytics domain. Numerous research studies have been conducted to investigate the complex sequential patterns of human actions in video streams. In this paper, we propose a knowledge distillation framework, which distills spatio-temporal knowledge from a large teacher model to a lightweight student model using an offline knowledge distillation technique. The proposed offline knowledge distillation framework takes two models: a large pre-trained 3DCNN (three-dimensional convolutional neural network) teacher model and a lightweight 3DCNN student model (i.e., the teacher model is pre-trained on the same dataset on which the student model is to be trained on). During offline knowledge distillation training, the distillation algorithm trains only the student model to help enable the student model to achieve the same level of prediction accuracy as the teacher model. To evaluate the performance of the proposed method, we conduct extensive experiments on four benchmark human action datasets. The obtained quantitative results verify the efficiency and robustness of the proposed method over the state-of-the-art human action recognition methods by obtaining up to 35% improvement in accuracy over existing methods. Furthermore, we evaluate the inference time of the proposed method and compare the obtained results with the inference time of the state-of-the-art methods. Experimental results reveal that the proposed method attains an improvement of up to 50× in terms of frames per seconds (FPS) over the state-of-the-art methods. The short inference time and high accuracy make our proposed framework suitable for human activity recognition in real-time applications. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Review

Jump to: Research

20 pages, 4022 KiB  
Review
Review of Image-Processing-Based Technology for Structural Health Monitoring of Civil Infrastructures
by Ji-Woo Kim, Hee-Wook Choi, Sung-Keun Kim and Wongi S. Na
J. Imaging 2024, 10(4), 93; https://doi.org/10.3390/jimaging10040093 - 16 Apr 2024
Viewed by 608
Abstract
The continuous monitoring of civil infrastructures is crucial for ensuring public safety and extending the lifespan of structures. In recent years, image-processing-based technologies have emerged as powerful tools for the structural health monitoring (SHM) of civil infrastructures. This review provides a comprehensive overview [...] Read more.
The continuous monitoring of civil infrastructures is crucial for ensuring public safety and extending the lifespan of structures. In recent years, image-processing-based technologies have emerged as powerful tools for the structural health monitoring (SHM) of civil infrastructures. This review provides a comprehensive overview of the advancements, applications, and challenges associated with image processing in the field of SHM. The discussion encompasses various imaging techniques such as satellite imagery, Light Detection and Ranging (LiDAR), optical cameras, and other non-destructive testing methods. Key topics include the use of image processing for damage detection, crack identification, deformation monitoring, and overall structural assessment. This review explores the integration of artificial intelligence and machine learning techniques with image processing for enhanced automation and accuracy in SHM. By consolidating the current state of image-processing-based technology for SHM, this review aims to show the full potential of image-based approaches for researchers, engineers, and professionals involved in civil engineering, SHM, image processing, and related fields. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

31 pages, 2698 KiB  
Review
Decision Fusion at Pixel Level of Multi-Band Data for Land Cover Classification—A Review
by Spiros Papadopoulos, Georgia Koukiou and Vassilis Anastassopoulos
J. Imaging 2024, 10(1), 15; https://doi.org/10.3390/jimaging10010015 - 05 Jan 2024
Viewed by 1605
Abstract
According to existing signatures for various kinds of land cover coming from different spectral bands, i.e., optical, thermal infrared and PolSAR, it is possible to infer about the land cover type having a single decision from each of the spectral bands. Fusing these [...] Read more.
According to existing signatures for various kinds of land cover coming from different spectral bands, i.e., optical, thermal infrared and PolSAR, it is possible to infer about the land cover type having a single decision from each of the spectral bands. Fusing these decisions, it is possible to radically improve the reliability of the decision regarding each pixel, taking into consideration the correlation of the individual decisions of the specific pixel as well as additional information transferred from the pixels’ neighborhood. Different remotely sensed data contribute their own information regarding the characteristics of the materials lying in each separate pixel. Hyperspectral and multispectral images give analytic information regarding the reflectance of each pixel in a very detailed manner. Thermal infrared images give valuable information regarding the temperature of the surface covered by each pixel, which is very important for recording thermal locations in urban regions. Finally, SAR data provide structural and electrical characteristics of each pixel. Combining information from some of these sources further improves the capability for reliable categorization of each pixel. The necessary mathematical background regarding pixel-based classification and decision fusion methods is analytically presented. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

38 pages, 20530 KiB  
Review
A Systematic Review of Recent Deep Learning Approaches for 3D Human Pose Estimation
by Amal El Kaid and Karim Baïna
J. Imaging 2023, 9(12), 275; https://doi.org/10.3390/jimaging9120275 - 12 Dec 2023
Viewed by 3199
Abstract
Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through [...] Read more.
Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview. Unlike many existing surveys that categorize approaches based on learning paradigms, our survey offers a fresh perspective, delving deeper into the subject. For image-based approaches, we not only follow existing categorizations but also introduce and compare significant 2D models. Additionally, we provide a comparative analysis of these methods, enhancing the understanding of image-based pose estimation techniques. In the realm of video-based approaches, we categorize them based on the types of models used to capture inter-frame information. Furthermore, in the context of multi-person pose estimation, our survey uniquely differentiates between approaches focusing on relative poses and those addressing absolute poses. Our survey aims to serve as a pivotal resource for researchers, highlighting state-of-the-art deep learning strategies and identifying promising directions for future exploration in 3D human pose estimation. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Back to TopTop