Next Article in Journal
Clutter Cancellation Methods for Small Target Detection Using High-Resolution W-band Radar
Previous Article in Journal
Security Framework for Network-Based Manufacturing Systems with Personalized Customization: An Industry 4.0 Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Recurrent Learning-Based Region-Focused Feature Detection for Enhanced Target Detection in Multi-Object Media

1
College of Information Science & Technology, Zhejiang Shuren University, Hangzhou 310015, China
2
Department of Electrical Engineering, College of Engineering, Jouf University, Sakakah 72388, Saudi Arabia
3
School of Electrical Engineering, Southeast University, Nanjing 210096, China
4
Institute for Intelligent Systems and Robotics (ISIR), CNRS, Sorbonne University, 75006 Paris, France
5
Faculty of Electrical and Computer Engineering, Cracow University of Technology, Warszawska 24 Str., 31-155 Cracow, Poland
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(17), 7556; https://doi.org/10.3390/s23177556
Submission received: 13 August 2023 / Revised: 25 August 2023 / Accepted: 29 August 2023 / Published: 31 August 2023
(This article belongs to the Section Physical Sensors)

Abstract

:
Target detection in high-contrast, multi-object images and movies is challenging. This difficulty results from different areas and objects/people having varying pixel distributions, contrast, and intensity properties. This work introduces a new region-focused feature detection (RFD) method to tackle this problem and improve target detection accuracy. The RFD method divides the input image into several smaller ones so that as much of the image as possible is processed. Each of these zones has its own contrast and intensity attributes computed. Deep recurrent learning is then used to iteratively extract these features using a similarity measure from training inputs corresponding to various regions. The target can be located by combining features from many locations that overlap. The recognized target is compared to the inputs used during training, with the help of contrast and intensity attributes, to increase accuracy. The feature distribution across regions is also used for repeated training of the learning paradigm. This method efficiently lowers false rates during region selection and pattern matching with numerous extraction instances. Therefore, the suggested method provides greater accuracy by singling out distinct regions and filtering out misleading rate-generating features. The accuracy, similarity index, false rate, extraction ratio, processing time, and others are used to assess the effectiveness of the proposed approach. The proposed RFD improves the similarity index by 10.69%, extraction ratio by 9.04%, and precision by 13.27%. The false rate and processing time are reduced by 7.78% and 9.19%, respectively.

1. Introduction

Object detection from populated images is a complicated task in an application or system. Object detection is mostly used in computer vision and image-processing systems. Populated images contain various factors and things presented among people [1]. Object detection from populated images consumes more time when compared with normal images. Deep learning-based techniques are widely used in the object detection process. Deep learning is used to implement the feature extraction approach, which pulls out the key patterns and features in the image [2]. The extracted features produce necessary information to object detection that reduces latency in computation and classification processes. Some of the important aspects and details are extracted by deep learning that produces optimal data for further processes in the detection method [3]. The many items and things in populated images decrease the system’s energy efficiency range. The convolutional neural network (CNN) algorithm minimizes computing time and energy requirements, enhancing system performance and effectiveness [4]. CNN detects the actual data which are presented in populated images. CNN maximizes accuracy in object detection, which enhances the effectiveness of detection and prediction processes [5].
The ideal information needed for the object detection procedure is provided by region of interest (ROI) detection. Information that is essential to the detecting process is contained in populated images. In the classification and segmentation process, ROI lowers latency. ROI gives pertinent data necessary for the object detection procedure [6]. The ROI-based object detection method mostly uses CNN and artificial intelligence (AI) techniques to enhance the energy-efficiency range of the systems. Both local and global regions are detected from populated images [7]. ROI increases the system’s viability, robustness, and efficiency levels by maximizing the total accuracy of the object detection process. CNN-based models are mostly used for the ROI detection process, predicting spatial regions and important pixels from an image [8]. The pack and detect (PaD) method is also used in ROI detection. The PaD method reduces the requirements required for the object detection process. The requirements are reduced based on ROI, which produces optimal information for object detection. Both spatial and temporal regions are detected from the given image, which reduces the complexity of the object classification process [9,10].
Object detection uses machine learning (ML) methods. The primary goal of ML approaches is to increase the object detection process’s accuracy range. ML-based techniques are commonly used to identify the important region of interest (ROI) from populated images [11]. The deep reinforcement learning (DRL) algorithm identifies the ROI in object detection. DRL achieves high ROI detection accuracy, reducing latency in the classification and optimization process [12]. DRL employs a feature extraction technique to extract pertinent data from an image. The performance and effectiveness of the object detection process are enhanced by DRL. To detect objects, the deep neural network (DNN) technique is also employed [13]. The ROI provides the DNN with the information it needs, decreasing the time and energy needed for computation and classification. The DNN lowers the computation’s level of complexity, increasing the systems’ relevance and viability. Object detection is used in DNN, which detects the data required for object detection [14]. The DNN improves object detection’s accuracy range, which lowers mistakes in subsequent detection and prediction procedures. The support vector machine (SVM) technique, which extracts accurate data from ROI datasets, is frequently employed in object detection. SVM performs an image classification process that identifies the objects presented in an image [15].
The main contribution of the paper is
  • Designing the region-focused feature detection (RFD) method to improve target detection accuracy.
  • Evaluating the deep recurrent learning mathematical model to iteratively extract these features using a similarity measure from training inputs corresponding to various regions.
  • According to the experimental outcomes, the suggested RFD model enhances the similarity index, extraction ratio, precision, and false rate and reduces processing time.
The rest of the article is arranged as follows: Section 2 deliberates related works, Section 3 proposes the RFD framework, and Section 4 executes the experiments and shows the results. In Section 5, a discussion has been provided. Finally, Section 6 concludes the research paper.

2. Related Works

Kim et al. [16] proposed a bounding-box critic network (BBC Net) framework for object detection. BBC Net identifies the exact occlusions and object regions presented in an image. Based on specific traits and patterns, object categories and parameters are categorized. BBC estimates the crucial areas and pixels from images to reduce computing latency. The suggested BBC Net framework successfully detects objects with high accuracy.
A context-driven detection network (CDD-Net) for multiple-class object detection was introduced by Wu et al. [17]. The newly developed framework mainly uses remote-sensing images, which give the detection process the data and patterns it needs. In this study, features and nearby objects from an image are detected using a local context feature network (LCFN). According to experimental findings, the suggested CDD-Net architecture improves object detection accuracy, expanding the performance envelope of the systems.
A hierarchical context embedding (HCE) framework for region-based item detection was created by Chen et al. [18]. In this study, precise patterns and regions of objects are detected from an image using region-based detectors. The HCE framework also detects region of interest (ROI) features and parameters, which lowers the detection process’s time and energy requirements. The suggested HCE architecture raises the systems’ overall efficacy and viability by enhancing object detection accuracy.
A patch-based three-stage aggregation network (PTAN) was created by Sui et al. [19] for object detection. The main application of the suggested system is object detection in high-resolution remote-sensing images. PTAN maximizes the quality of the ROI and features that provide optimal information for further detection. A patch-based strategy is mostly used here to train the parameters and regions presented in an image. Compared with other frameworks, the proposed PTAN framework achieves high accuracy in object detection for remote-sensing images.
Chen et al. [20] introduced a deep neural network method named RoIFusion for the 3D object detection process. RoIFusion merges the multi-modality features and patterns to identify the exact objects required from the image. ROI and pixel detection levels are reduced by the RoIFusion method, which reduces the energy consumption range in computation. The systems’ effectiveness and dependability range is increased by the suggested RoIFusion approach.
Han et al. [21] designed a compressive sensing (CS)-based atomic force microscopy (CS-AFM) scheme for object detection systems. Both high- and low-resolution images are used in CS-AFM that detects accuracy pixels for further processes. Scanners and detectors are used here to predict the objects’ class from an image. Supplementary scanning is implemented to finalize the types of objects. The proposed CS-AFM method reduces computation time, improving object detection’s quality and accuracy range.
A cascaded multi-D-view fusion approach (CM3DV) for object detection was created by Sun et al. [22]. This work uses a cascaded multi-view feature fusion module to determine the specific categories of items from 3D images. Modulated rotation head (MRH) is developed by the CM3DV model, providing objects’ necessary features and patterns. According to experimental data, the suggested CM3DV approach maximizes the object detection process’s overall precision and energy efficiency levels.
An encoder-steered multi-modality feature guidance network (EFGNet) for RGB image and depth map salient item recognition was introduced by Xia et al. [23]. It is possible to extract from RGB images the unimodal characteristics and patterns required for object detection. Unimodal features deliver precise information about objects. The newly developed EFGNet approach improves the systems’ efficiency by improving object detection accuracy.
A multi-level fusion detection (MFD) algorithm-based object detection technique was created by Peng et al. [24]. Here, essential features and patterns present in heterogeneous images are extracted via feature extraction. The collected features give the object detection procedure the best data possible. Additionally, MFD recognizes the pixel and vision range from an image, reducing computation latency. The created MFD increases the viability and performance of object-detecting systems.
Yue et al. [25] developed a low-light image salient object detection (LLISOD) network for various applications. An unfolded implicit non-linear mapping (UINM) module is used here that detects the features for polishing feature maps. Object detection in the application from the low-light image is challenging. The suggested LLISOD framework lowers the detection process’s energy and time requirements. According to experimental data, the suggested LLISOD framework achieves good object detection accuracy.
Xu et al. [26] proposed a two-stage 3D object detection method using position encoding. Position encoding produces information useful for the detection process from raw point data and aggregating voxel features. Here, the major purpose of the feature aggregation module is to lower the error and latency range in the object detection process. From provided 3D images, context and specifics are derived. The proposed method improves object detection accuracy when compared to previous methods.
A corners-based fully convolutional network (C-FCN) for visual object identification systems was introduced by Jiao et al. [27]. This work predicts objects in an image’s right and left corners using a corner region proposal network (CRPN). The FCN is mainly used here to identify the end-to-end objects presented in an image. The FCN increases the accuracy of object detection with a minimum energy consumption range. Object detection is made simpler and more effective by the newly developed C-FCN approach.
A deep-wise separable convolutional network (D-SCNet) was created by Quan et al. [28] for object detection. This application uses the region convolutional neural network (R-CNN) technique to recognize an image’s key details and elements. In this work, a feature map is employed to offer the features and patterns of an object that are essential to the detection process. The suggested D-SCNet method improves the object identification process’s mobility, feasibility, stability, and accuracy compared to previous methods.
A framework for object detection based on graph neural networks was suggested by You et al. [29]. The primary goal of the suggested framework is to establish a connection between an image’s label embedding space and visual feature space. Both region and label proposals are detected from the relation graph, reducing the time consumption level in object classification. The proposed method is mainly used to perform relational reasoning in object detection. The proposed strategy broadens the scope of the reasoning process’s applicability and efficacy.
In their study, Pathak et al. [30] introduce an innovative approach for detecting faults in photovoltaic panels. This method involves analyzing thermal images of solar panels, which are obtained using a thermographic camera. Two advanced convolutional neural network models are employed in this study. The primary objective of the first model is to accurately classify the type of fault affecting the panel. Meanwhile, the second model is specifically designed to identify the region of interest within the faulty panel. The proposed approach employs the F1 score as a metric for evaluating and comparing multiple classification models. Among these models, the ResNet-50 transfer learning model achieves the highest F1 score.
A cross-diffusion-based salient object recognition approach for compact images was introduced by Wang et al. [31]. In this case, the cross-diffusion technique extracts the key details and areas from an image. The salient object detection procedure receives the best information possible from the retrieved features. The error range in detection is reduced by extracting both high-level and low-level image features. The newly developed technique boosts object detection accuracy, enhancing system performance.
Choi and Kim [32] suggested a sensor fusion system that combines a thermal infrared camera with a LiDAR sensor to accurately detect and identify objects in low-visibility situations, such as at night or during the day. The system’s effectiveness was tested using experiments. It remotely calibrates the thermal infrared camera and LiDAR sensor using a 3D calibration target. The suggested sensor system and fusion algorithm show their promise for autonomous vehicle perception technologies by demonstrating their capacity to detect and identify objects in difficult settings.
Zhang et al. [33] developed a vehicle object detection method named candidate region aggregation network (CRAN). The primary objective of the suggested approach is to enhance the detection process’s ability to aggregate data. The majority of optimization issues are resolved by CRAN, which lowers the levels of computational time and energy usage. The procedure of detecting objects in vehicles is more effective and accurate according to the suggested strategy.
Rahman et al. [34] demonstrated that RetinaNet (R101-FPN) and YOLOv5n could detect weeds in cotton fields, with RetinaNet having higher accuracy and YOLOv5n having the ability to be used in real time on devices with limited resources. The study emphasizes the value of data augmentation in enhancing weed identification model accuracy. Creating reliable and effective weed detection systems can be a key component of sustainable agriculture and weed management methods by utilizing deep learning capabilities and continual improvement in model training.
Dai and Nagahara [35] proposed a distributed safety control mechanism for multi-agent systems and applications for collision avoidance of mobile robotic networks. In the suggested method, each agent corrects its control input by resolving a distributed optimization problem to maximize the effectiveness of a predetermined cooperative control strategy while ensuring fulfillment of the safety constraint. The usefulness of the current methodology was proved through case studies examining issues with circular/elliptical vehicle accident avoidance.
Ramachandran Alagarsamy and Dhamodaran Muneeswaran [36] suggested the reptile search optimization algorithm with deep learning for multi-object detection and tracking (RSOADL-MODT). Position estimation, tracking, and action recognition are all parts of the RSOADL-MODT model shown here. The steps involved include “object detection”, “object classification”, and “object tracking”. The feature extraction process is enhanced in the first stage of the described RSOADL-MODT method using a path-augmented RetinaNet-based (PA-RetinaNet) object detection module. The RSOA is employed as a hyperparameter optimizer to enhance the network potential of the PA-RetinaNet approach. Finally, the classification capabilities of a quasi-recurrent neural network (QRNN) classifier are utilized. To evaluate the efficacy of the RSOADL-MODT algorithm’s object detection results, extensive experimental validation is conducted on the DanceTrack and MOT17 datasets. The simulation results validated the advantages of the RSOADL-MODT method over competing DL methods.
Hossein Adeli et al. [37] discussed the brain-inspired object-based attention network for multi-object recognition and visual reasoning. The authors demonstrate how the attention mechanism greatly enhances the precision with which substantially overlapping digits can be categorized. The model achieves near-perfect accuracy in a visual reasoning task that requires comparing two objects, and it significantly outperforms larger models in generalizing to unseen stimuli. This study highlights the usefulness of object-based attention systems that glimpse things in rapid succession.

3. The Proposed RFD Framework

The issue of generic target detection is a crucial challenge in system vision. From the given densely populated image/video (scene), the target object detection is performed due to the presence of different objects. The proposed target object detection technology is designed to segregate the input image into the maximum possible regions to detect the precise target easily. In this generic target detection scenario, the feature types are considered first and then matched against targets in the raw imagery based on multiple objects, different locations, region of interest, scales, and orientations for exact target object detection. The features can be extracted to perform similarity measures from the training inputs’ feature matching in various regions through deep recurrent learning. Based on the similarity measure computation, the textural features are extracted with previously available information. This information matches the current features with existing ones for exploring generic target detection processes in the populated image. The proposed RFD technique performs precise target detection and textural feature extraction using a similarity measure in the maximum possible regions. The proposed RFD is illustrated in Figure 1.
A similarity measure is required, and a process from the training input target is provided for identifying overlapping and non-overlapping features in the original image. This matching process is performed to identify the target with maximum features due to multiple objects in the raw imagery at different locations. In non-overlapping feature identification, the region-wise feature distribution occurs for target detection. In distinguishable regions, the overlapping features are concatenated for detecting the exact target based on matching the target with training input using intensity and contrast features. The RFD technique functions between training input and a densely populated image. The overlapping and non-overlapping features are identified through similarity measures, where smart decisions and intelligent computations are made to identify the target. The smart decision for identifying overlapped and non-overlapped features is pursued using the training inputs from the given densely populated image P i m a g e . The input raw imagery from any source S is processed, and the maximum possible regions can be segregated in which the contrast and intensity are evaluated. First, the two computational segments proposed are described to estimate the region selection and pattern matching. Features acquired for target detection include the type of object, edge, bounding box, color, textures, background information, position, and object labels.

3.1. Region Selection

We compute region selection using intensity and contrast feature matching with the previously available data, and then we evaluate the basic region selection for each pattern. Meanwhile, in the raw imagery, features such as color, intensity, orientation, scale, and pixel distribution are employed. This proposed technique can extract multiple objects and new features to effectively identify the target. In this technique, the heterogeneous densely populated images/videos are analyzed along with multiple objects to perform feature extraction using deep recurrent learning iterations for computing similarity measures in different regions. The two features extracted in this paper are contrast and intensity. Note that the textural feature does not help since input images are often grayscaled. This study estimates the intensity and contrast features by analyzing 16 × 16 regions; each pixel is 16 times smaller than the raw imagery vertically and horizontally ( 1 pixel =   16 × 16 image region). The remaining features are extracted using a similarity measure or matching in different regions based on the training inputs. The given raw imagery is used for identifying the target based on the region selection process. The input populated image calculates the contrast and intensity features over different regions and objects/people. In this instance, the given image P i m a g e is estimated as in Equations (1)–(3):
P i m a g e x , y = R S C ,   I
R i = r × C M a x C M i n T + I M i n
F x = r T x y + C M i n C M a x r R 2 R R ¯
where R S means region selection based on intensity I and contrast C from the given P i m a g e . The variables x and y denote the row and column of the image patch. Where r is the selected region for target object detection and r R ,   C M a x and C M i n represent the maximum and minimum contrast feature required in the input image at different processing time intervals T . The variables F x and R ¯ are used to represent the feature extraction and previously available information for computing the similarity measure. Figure 2 presents the region segregation and feature extraction process.
The R i     r R is extracted using I and C of an input image. The R S is based on the available x , y T such that r R is either a C m a x or   C m i n   extract. Here, either C or I is considered due to the overlapping pixels, and therefore, the regions are identified. This identification is used for specific feature extraction such that R ¯ matches any of R i (Figure 2). The extracted feature in deep recurrent learning iterations is computed as the number of overlapping features obtained in different regions. In this case, the false rate is mitigated in R due to multiple objects in that raw imagery. Therefore, this false rate for targeted object region selection affects the region at any instance in which the similarity measure is required from the training input matching over the different regions, which is expressed as
S M C ,   I = O V f C M i n C M a x N O V f 2
S M x , y = s q r t N p P i m a g e 2 x , y N p M e a n P i m a g e N p x , y N p 1
N O V f = 1 R S 1 r 1 x , y = 1 r R R ¯ N p 2
Equations (4)–(6) compute the similarity measure based on extracted features and training inputs and outputs in overlapping features O V f and non-overlapping features N O V f . Here, the non-overlapping features are distributed, whereas overlapping features are concatenated for identifying the target and the total number of pixels N p in a given image in which the appropriate computation of targeted object region selection is required using a similarity measure in deep recurrent learning iterations. Based on P i m a g e and S M x , y , the continuous identification of non-overlapping features is expressed as
F x P i m a g e , S M x , y = S M x , y P i m a g e 1 2 + S M x , y P i m a g e 2 2 + + 1 R ¯ P i m a g e N p r 2   ,   r R
Equation (7) estimates the non-overlapping feature from the training inputs until the targeted object region detection is selected due to multiple objects in raw imagery. The selected region pixel size, contrast, and intensity features are analyzed for extraction, relying on the processing time until the similarity verification requires the training input matching in various regions. The above region selection based on the non-overlapping feature is distributed using deep recurrent learning iterations. In this scenario, the populated image must be segregated into regions that must be disseminated in precise processing instances to improve the feature extraction ratio and identify distinguishable regions from the image patch. In addition, the distributed non-overlapping feature is to be instantaneous to perform the pattern matching. Therefore, deep recurrent learning iterations are used for feature extraction and pattern matching. The output of the learning process is to identify and segregate the non-overlapping feature distributed regions through training inputs and previously available data. If the learning process identifies overlapping features in the input image, the distinguishable regions are concatenated for detecting the precisely targeted object. The concatenation for achieving distributed regions and previous information is the best output for targeted object region selection in an image.

3.2. Concatenation of Distributed Regions

The similarity measure is computed here and requires the intensity and contrast feature value measured from the raw imagery with 16 × 16 image regions. The identified overlapping features in distributed and target regions are concatenated with human eye fixations. Multiple sophisticated measures of region-wise feature distribution could be performed in the given image. In densely populated image processing, the region of interest R O I always computes the probability distribution of the image contrast and intensity. The R O I can be estimated with the formula described in the following Equation (8):
P i m a g e R O I = r R S M ρ R S × l o g ρ R S
where R S M is the neighborhood region of the targeted object region identified in the given input image. Figure 3 presents the R O I selection for the F x processed image.
The F x   input is utilized for classifying O V and N O V f such that F x P i m a g e ,   S M x , y from which R S M is extracted. This extraction process is required to select precisely ρ P , Q is performed. Therefore, the distribution is performed for concatenation using similarity estimation. Depending on the matching preference, the S M C , I is used for detecting concatenation preference (Figure 3). This represents the probability of maximum possible contrast and intensity over different regions in the analyzed neighborhood. Consider two random variables P and   Q , their concatenation can be expressed as in Equations (9) and (10):
ρ P , Q = C o n c a t P , Q Δ P Δ Q
ρ P , Q = C o n c a t P Q C o n c a t P C o n c a t Q C o n c a t P 2 C o n c a t 2 P C o n c a t Q 2 C o n c a t 2 Q
Here, pattern matching is computed at a different location between the overlapping features of 16 × 16 pixels, and non-overlapping features at distinguishable regions from the given image are computed. It verifies the similarity of training inputs and neighbors for precise target identification. In the distinguishable region concatenation process, the maximum possible region concatenation measures a high special feature in the input image, i.e., low similarity.

3.3. Feature Distribution Detection

All features can be extracted except intensity and contrast for training the learning process recurrently; a feature extraction mechanism leads to general extracts in which the overlapped features based on strong peaks at conspicuous locations are identified in a given image while suppressing features that include region selection and peak responses. The false rate is reduced while region-wise feature distribution is carried out using multiple extraction instances with the previously available data; we estimate four statistic values to denote each feature map, such as mean value M V , number of maximum possible regions segregate M P R over the pixel, standard deviation S D over the feature map pixels, and the number of peaks k N in the input image feature map. The computation is expressed as in Equations (11)–(13):
M V = 1 x × y T S M T P , Q
S D = 1 N p 1 x , y S M T P , Q R O I 2
M P R = m e a n x P x Q 2 + y P y Q 2
The deep recurrent learning assessment process from the sequential instances with the first training inputs matching in a different region based on   M P R , M V , and   S D is used for identifying the target object region in the given populated image. Figure 4 presents the learning process.
The R S M is split for P and Q over the varying r R such that k N at any variational peaks is identified. The intermediate for ρ P , Q over the S M C , I   M V is computed; the computation is performed using available r and C , I differences. After the S M   C , I the O V and N O V f are segregated using C (over   I ) for S D and M P R detection. This M P R R     t is concatenated using R S M for image detection. Contrarily, the S D is reduced for the next ROI such that T is used for the next (consecutive) r   P , Q (Figure 4). The region distribution from the training inputs and feature map is used for identifying the overlapping feature and, if this feature can be observed at any region, concatenation is performed and achieves maximum similarity. However, skipping the false rate does not affect the performance of generic target object detection in populated images.

Pattern Matching

The pattern-matching estimation is related to the target region selection, except that overlapped feature cooperation across the region is analyzed for precise object detection. The extracted features elaborate the overall populated image data using mean value to denote the pattern matching in that selected region which is identified using multiple extraction instances. The pattern matching can be expressed as
ϑ m = 1 x × y T ϑ m C , I
Before performing the region selection and pattern extraction to identify the target in an image, it is necessary to normalize the contrast and intensity feature value. The extracted and normalized feature can be sent to the pattern-matching process for generic target detection. The above representation’s region selection and pattern matching detect a target and satisfy low similarity. The raw imagery’s non-overlapping contrast and intensity features are compared with previously available data and then distributed, for instance. The non-overlapping feature of contrast and intensity jointly produces the output of F x P i m a g e , S M x , y at its maximum possible concatenation. In this technique, as illustrated in the first and consecutive segregation of the input image into the maximum possible regions, the targeted object location is identified using the training inputs and extracted feature. The pattern-matching process is illustrated in Figure 5.
The pattern matching for the P i m a g e and the training inputs are performed using N O V f and O V   T . For any single r R i identified, the ϑ m is performed such that ρ P , Q = m a x i m u m . The possible P , Q combinations are identified   F x S M x , y and R O I   R S M for concatenation. Therefore, the ϑ m = m a x i m u m and ρ P , Q = m a x i m u m regions (without O V )   are jointly used for detecting the object (Figure 5). In the first instance, the pattern matching is processed for identifying the overlapped and non-overlapped features and maximum peaks in the feature map. Therefore, the maximum similarity feature identification outputs precise target region detection, and hence, the region-wise feature distribution mitigates the false rates and is retained without non-overlapping features, for instance. The false rate-generating features in the images identify distinguishable regions through improved precision wherein the extracted features, such as intensity and contrast, impact the training inputs. The output of the DRL is to identify the overlapping features and mitigates the false rates in region selection and the pattern-matching processes to identify a target. This computation jointly allows generating feature prediction for the instances even if they overlap in the given image for optimum performance of generic target detection. In this manuscript, the possibility of identifying the important difference between the features and regions through DRL and suppressing the non-overlapping features is based on reducing the false rates. Therefore, the minimum false failures and iterations are achieved. Hence, the region selection and pattern matching are consecutively processed using deep recurrent learning iterations to improve precision and identify the region of interest in this image. This generic target detection using similarity verification reduces the processing time and false rate. The extracted feature map and dense embedding features are used for target object detection. The analysis of N O V f and O V and R S M for the varying r is presented in Figure 6.
The overlapping and non-overlapping regions are identified using   R ¯ from the distribution of F x r . In the learning process, k N identifies the ρ P , Q for maximization region detection. The M P R from R S M maximizes O V over the identified R O I   and maximizes C m a x and C m i n     F x P i m a g e , S M x , y . This is required to prevent false rates due to C and I overlapping. Therefore, the need for R S M increases, due to which S D decreases. In this case, the S M x y is validated for identifying any possible input across different M P R . Therefore, the presence of a similar region is detected for ϑ m through recurrent T     x , y (Figure 6). In Figure 7, the analysis of S D and ϑ m % is presented.
The proposed technique identifies both negative and positive S D until 0 > ρ P , Q is observed. This case is modified after ρ P , Q > 0 or ρ P , Q = 1 ; the deviations are suppressed for which r is detected. If the r R i     T , the ρ P , Q is maximum, then S D reduces; this case is observed in r = 8 and 10. This means the regions are high with O V then N O V f and hence R S M is high. Depending on the available R S M , the ϑ m is increased with the available S M C , I and R ¯ . Therefore, the R ¯ is required to compensate M V without increasing the false rate. This refers to any recurrent iteration   S D observed.

4. Experiments and Results

The performance assessment of the proposed technique is presented in this section using a comparative study. This study analyzes the metrics similarity index, extraction ratio, precision, false rate, and processing time by varying the features and regions. The features are between 2 and 26, and the regions are between 1 and 10. The existing C-FCN [27], CDD-Net [17], and PTAN [19] are augmented with the proposed technique for comparison. The experimental images used in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5 are extracted from [36,38] with 10,000+ training and testing inputs.

4.1. Similarity Index

Heterogeneous densely populated images/videos are analyzed to identify the target object region in the feature map through similarity analysis. It refers to skipping the false rate-generating features based on region selection and pattern matching (refer to Figure 8). The intensity and contrast feature achieves a high similarity index required for precise target detection at different time intervals using deep recurrent learning iterations. The region selection and pattern matching are mitigated using multiple extraction instances and the extract feature for R S C ,   I computation. The learning process compares the training inputs and previously available data for gaining similarity measure outputs using contrast and intensity features to augment precision by analyzing 16 × 16 regions. The continuous image processing identifies distinguishable regions through improved precision for target object detection. This achieves maximum similarity through the overlap feature in a given image/video. Based on image processing, region segregation is used for predicting false rates with improved precision. Therefore, the similarity index is high in this proposed technique.

4.2. Extraction Ratio

This proposed technique achieves a high feature extraction ratio for populated image processing with region segregation; the false rate is mitigated for detecting precise targets from the given image (refer to Figure 9). The region-wise feature distribution is performed for training the DRL recurrently with the contrast and intensity feature over different regions and for objects/people computation. Based on F x and R ¯ , the similarity is analyzed at various processing time intervals. The proposed technique first segregates the input image into the maximum possible regions with improved precision based on region selection. This false rate is addressed when the target is matched with training inputs for identifying the target. The extraction ratio is estimated over different objects using the previously available data for reducing the similarity measure for gaining high precision in detecting the target object location/region in the feature map. Therefore, S M C ,   I is computed for improving the special feature and the processing time at different regions. Therefore, the similarity verification must satisfy high feature extraction to reduce the processing time. In this proposed technique, image processing is performed to identify the target region and improves precision.

4.3. Precision

In this proposed technique, the different deep recurrent learning iterations using similarity measures rely on extracted features to more easily detect the target object region in the given populated image due to the presence of multiple objects. Addressing selected regions appropriately and accurately in densely populated images is difficult and region selection and pattern matching are used regarding processing time and training input to reduce the difficulties at different instances. The false rate and multiple objects are identified in the feature map through deep recurrent learning. From the overlapping features instance, the distinguishable regions are concatenated for identifying the target without training input in the feature map based on region segregation through the deep recurrent learning process, preventing false rates. Continuous image processing is performed with similarity feature verification for improving precision. Therefore, the region-wise distribution relies on training inputs for training the learning recurrently. In this proposed technique, the similarity measure is computed for increasing the extraction rate and achieves a lower false rate, as illustrated in Figure 10.

4.4. False Rate

This proposed RFD technique for detecting a precise target in densely populated images with region selection achieves a lower false rate than other factors, as represented in Figure 11. The distinguishable regions are concatenated for identifying the precise target using a similarity measure. In contrast, the non-overlapping features can be distributed in a densely populated image using deep recurrent learning. Reducing false rate-generating features at different processing time intervals is computed for identifying the target object detection in populated images. From the training inputs, the extracted features and previously available data are matched to detect the generic target to increase the similarity index. The false rate is mitigated in region selection and pattern matching due to multiple objects over different regions. It is difficult to identify the false rate in the feature map in various instances. This technique requires image processing from the training inputs matching in different regions. Thus, the proposed technique estimates three static values for each feature map using multiple extraction instances, and the processing time is less in this analysis.

4.5. Processing Time

The false rate-generating feature is skipped in this proposed technique to identify distinguishable regions. It achieves high processing time for heterogeneous, densely populated image processing (refer to Figure 12). This process improves the precision with previously available data and does not mitigate the false rate compared to the other factors in region-of-interest-based target object detection in the given image. Based on the feature extraction, the overlapping and non-overlapping features are identified through similarity measures for accurate region distribution based on F x P i m a g e , S M x , y and at its maximum possible concatenation is achieved. In this manner, the maximum similarity leads to appropriate and accurate target object detection in an image through a feature map, and the continuous instance of populated image processing mitigates the false rate during the processing time. This technique reduces processing time and the false rate to maximize precision. These identified false rate-generated features are analyzed and compared with the available data for region selection. Hence, the false rate is mitigated in distinguishable regions for each feature map with less processing time.

4.6. Error Probability Ratio

This paper defines deep recurrent learning for calculating error probability, frequently utilized with target location estimates. Error probability relates measurement errors to the difference in a calculation calculated from feature measurements. A mathematical basis is provided for the error probability estimate, and deep recurrent learning is shown to be extremely accurate. It varies from the true error probability estimate by less than 1% on average and has a maximum error of 1.5%. As such, it is a beneficial method for evaluating the sensitivity and accuracy of any computed quantity to errored input. The proposed RFD model achieves less error when detecting similar targets. Graphical details of Error Probability Ratio are given in Figure 13.

5. Discussion

Ablation Study

This study presents a series of experiments designed to illustrate the impact of patch size on the model in the RFD. Experiments for single-scale and multi-scale combinations are carried out in this research to cover both the target region and a specific background area to pick patch size. Table 1 shows the patch size for target detection.
In Table 2 and Table 3, the comparative analysis results are summarized. Object-level sensitivity, precision, and accuracy score succeed in achieving reliability. The high-precision and high-sensitivity regions of interest (ROIs) collectively occupy too much of an image on average. Sensitivity analysis, as it is being used in this paper, can be a viable technique for testing the robustness of the DL model to identify how effective the model is when given low-quality images.

6. Conclusions

This article introduces a region-focused feature detection technique for identifying specific object targets in densely populated images. The input image is classified based on intensity and contrast for the maximum regions. The regions are distinguished using overlapping and non-overlapping features distributed across different boundaries. For the extracted features within a boundary, the matching for distinguishable features is performed over maximizing the detection precision. The overlapping features with maximum concatenation probability are fused in the alternate, overlapping region for generating the actual image. The fused image is identified from the external inputs across various means and deviations. The entire process is administered using recurrent deep learning to reduce false rates. The new region identification or feature extraction is decided using the learning paradigm for controlled processing time. The proposed RFD improves the similarity index by 10.69%, extraction ratio by 9.04%, and precision by 13.27%. The false rate and processing time are reduced by 7.78% and 9.19%, respectively.
The limitation of the proposed RFD model is the inability to deal with new object classes. The extraction of visual features will be the focus of future research in various environments and weather conditions. These include bright and dim lighting, dense fog, and intense rain. The application of the proposed RFD model includes image classification, surveillance, entertainment, gaming, autonomous vehicles, and scene understanding.

Author Contributions

Conceptualization, J.W., A.A. and G.A.; methodology, K.K.; software, G.A.; validation, G.N., W.A. and A.S.; formal analysis, A.A.; investigation, K.K.; resources, G.A.; data curation, K.K.; writing—original draft preparation, J.W., A.A. and G.A.; writing—review and editing, G.A, M.S., S.A. and M.A.; visualization, A.S. and W.A.; supervision, K.K.; project administration, G.N.; funding acquisition, G.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Faculty of Electrical and Computer Engineering, Cracow University of Technology, and the Ministry of Science and Higher Education, Republic of Poland (grant no. E-1/2023).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors extend their appreciation to the Deputyship for Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number 223202.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Su, Y.; Tao, F.; Jin, J.; Zhang, C. Automated overheated region object detection of imagevoltaic module with thermography image. IEEE J. Imagevoltaics 2021, 11, 535–544. [Google Scholar]
  2. Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 431–435. [Google Scholar] [CrossRef]
  3. Liu, Z.; Wang, K.; Dong, H.; Wang, Y. A cross-modal edge-guided salient object detection for RGB-D image. Neurocomputing 2021, 454, 168–177. [Google Scholar] [CrossRef]
  4. Lin, W.J.; Chen, J.W.; Jhuang, J.P.; Tsai, M.S.; Hung, C.L.; Li, K.M.; Young, H.T. Integrating object detection and image segmentation for detecting the tool wear area on stitched image. Sci. Rep. 2021, 11, 19938. [Google Scholar] [CrossRef] [PubMed]
  5. Scheiner, N.; Kraus, F.; Appenrodt, N.; Dickmann, J.; Sick, B. Object detection for automotive radar point clouds—A comparison. AI Perspect. 2021, 3, 1–23. [Google Scholar] [CrossRef]
  6. Chen, J.; Huang, H.W.; Rupp, P.; Sinha, A.; Ehmke, C.; Traverso, G. Closed-Loop Region of Interest Enabling High Spatial and Temporal Resolutions in Object Detection and Tracking via Wireless Camera. IEEE Access 2021, 9, 87340–87350. [Google Scholar] [CrossRef]
  7. Xiao, J.; Zhang, S.; Dai, Y.; Jiang, Z.; Yi, B.; Xu, C. Multiclass Object Detection in UAV Images Based on Rotation Region Network. IEEE J. Miniaturization Air Space Syst. 2020, 1, 188–196. [Google Scholar] [CrossRef]
  8. Sun, T.; Pan, W.; Wang, Y.; Liu, Y. Region of Interest Constrained Negative Obstacle Detection and Tracking with a Stereo Camera. IEEE Sens. J. 2022, 22, 3616–3625. [Google Scholar] [CrossRef]
  9. Fan, Z.; Liu, Q. Adaptive region-aware feature enhancement for object detection. Pattern Recognit. 2022, 124, 108437. [Google Scholar] [CrossRef]
  10. Yao, Z.; Wang, L. ERBANet: Enhancing region and boundary awareness for salient object detection. Neurocomputing 2021, 448, 152–167. [Google Scholar] [CrossRef]
  11. Fang, B.; Fang, L. Concise feature pyramid region proposal network for multi-scale object detection. J. Supercomput. 2020, 76, 3327–3337. [Google Scholar] [CrossRef]
  12. Zhu, D.; Xia, S.; Zhao, J.; Zhou, Y.; Niu, Q.; Yao, R.; Chen, Y. Spatial hierarchy perception and hard samples metric learning for high-resolution remote sensing image object detection. Appl. Intell. 2022, 52, 3193–3208. [Google Scholar] [CrossRef]
  13. Huang, L.; Dai, S.; He, Z. Few-shot object detection with dense-global feature interaction and dual-contrastive learning. Appl. Intell. 2023, 53, 14547–14564. [Google Scholar] [CrossRef]
  14. Fuentes-Jimenez, D.; Losada-Gutierrez, C.; Casillas-Perez, D.; Macias-Guarasa, J.; Pizarro, D.; Martin-Lopez, R.; Luna, C.A. Towards dense people detection with deep learning and depth images. Eng. Appl. Artif. Intell. 2021, 106, 104484. [Google Scholar] [CrossRef]
  15. Zhou, M.; Wang, R.; Xie, C.; Liu, L.; Li, R.; Wang, F.; Li, D. ReinforceNet: A reinforcement learning embedded object detection framework with region selection network. Neurocomputing 2021, 443, 369–379. [Google Scholar] [CrossRef]
  16. Kim, J.U.; Kwon, J.; Kim, H.G.; Ro, Y.M. Bbc net: Bounding-box critic network for occlusion-robust object detection. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1037–1050. [Google Scholar] [CrossRef]
  17. Wu, Y.; Zhang, K.; Wang, J.; Wang, Y.; Wang, Q.; Li, Q. CDD-Net: A context-driven detection network for multiclass object detection. IEEE Geosci. Remote Sens. Lett. 2020, 19, 8004905. [Google Scholar] [CrossRef]
  18. Chen, Z.M.; Jin, X.; Zhao, B.R.; Zhang, X.; Guo, Y. HCE: Hierarchical context embedding for region-based object detection. IEEE Trans. Image Process. 2021, 30, 6917–6929. [Google Scholar] [CrossRef]
  19. Sui, B.; Xu, M.; Gao, F. Patch-Based Three-Stage Aggregation Network for Object Detection in High Resolution Remote Sensing Images. IEEE Access 2020, 8, 184934–184944. [Google Scholar] [CrossRef]
  20. Chen, C.; Fragonara, L.Z.; Tsourdos, A. Roifusion: 3d object detection from lidar and vision. IEEE Access 2021, 9, 51710–51721. [Google Scholar] [CrossRef]
  21. Han, G.; Chen, Y.; Wu, T.; Li, H.; Luo, J. Adaptive AFM imaging based on object detection using compressive sensing. Micron 2022, 154, 103197. [Google Scholar] [CrossRef]
  22. Sun, J.; Xu, J.; Ji, Y.M.; Wu, F.; Sun, Y. Cascaded multi-3D-view fusion for 3D-oriented object detection. Comput. Electr. Eng. 2022, 103, 108312. [Google Scholar] [CrossRef]
  23. Xia, C.; Duan, S.; Fang, X.; Gao, X.; Sun, Y.; Ge, B.; Zhang, H.; Li, K.C. EFGNet: Encoder steered multi-modality feature guidance network for RGB-D salient object detection. Digit. Signal Process. 2022, 131, 103775. [Google Scholar] [CrossRef]
  24. Peng, Y.; Liu, G.; Xu, X.; Bavirisetti, D.P.; Gu, X.; Zhang, X. MFDetection: A highly generalized object detection network unified with multilevel heterogeneous image fusion. Optik 2022, 266, 169599. [Google Scholar] [CrossRef]
  25. Yue, H.; Guo, J.; Yin, X.; Zhang, Y.; Zheng, S.; Zhang, Z.; Li, C. Salient object detection in low-light images via functional optimization-inspired feature polishing. Knowl.-Based Syst. 2022, 257, 109938. [Google Scholar] [CrossRef]
  26. Xu, W.; Zou, L.; Fu, Z.; Wu, L.; Qi, Y. Two-stage 3D object detection guided by position encoding. Neurocomputing 2022, 501, 811–821. [Google Scholar] [CrossRef]
  27. Jiao, L.; Wang, R.; Xie, C. C-FCN: Corners-based fully convolutional network for visual object detection. Multimed. Tools Appl. 2020, 79, 28841–28857. [Google Scholar] [CrossRef]
  28. Quan, Y.; Li, Z.; Chen, S.; Zhang, C.; Ma, H. Joint deep separable convolution network and border regression reinforcement for object detection. Neural Comput. Appl. 2021, 33, 4299–4314. [Google Scholar] [CrossRef]
  29. You, X.; Liu, H.; Wang, T.; Feng, S.; Lang, C. Object detection by crossing relational reasoning based on graph neural network. Mach. Vis. Appl. 2022, 33, 1. [Google Scholar] [CrossRef]
  30. Pathak, S.P.; Patil, D.S.; Patel, S. Solar Panel Hotspot Localization and Fault Classification Using Deep Learning Approach. Procedia Comput. Sci. 2022, 204, 698–705. [Google Scholar] [CrossRef]
  31. Wang, F.; Peng, G. Salient object detection via cross diffusion-based compactness on multiple graphs. Multimed. Tools Appl. 2021, 80, 15959–15976. [Google Scholar] [CrossRef]
  32. Choi, J.D.; Kim, M.Y. A sensor fusion system with thermal infrared camera and LiDAR for autonomous vehicles and deep learning based object detection. ICT Express 2023, 9, 222–227. [Google Scholar] [CrossRef]
  33. Zhang, L.; Wang, H.; Wang, X.; Liu, Q.; Wang, H.; Wang, H. Vehicle object detection method based on candidate region aggregation. Pattern Anal. Appl. 2021, 24, 1635–1647. [Google Scholar] [CrossRef]
  34. Rahman, A.; Lu, Y.; Wang, H. Performance evaluation of deep learning object detectors for weed detection for cotton. Smart Agric. Technol. 2023, 3, 100126. [Google Scholar] [CrossRef]
  35. Dai, X.; Nagahara, M. Platooning control of drones with real-time deep learning object detection. Adv. Robot. 2023, 37, 220–225. [Google Scholar] [CrossRef]
  36. Pierdicca, R.; Paolanti, M.; Felicetti, A.; Piccinini, F.; Zingaretti, P. Automatic Faults Detection of Photovoltaic Farms: SolAIr, a Deep Learning-Based System for Thermal Images. Energies 2020, 13, 6496. [Google Scholar] [CrossRef]
  37. Adeli, H.; Ahn, S.; Zelinsky, G.J. A brain-inspired object-based attention network for multi-object recognition and visual reasoning. biorXiv 2022. [Google Scholar] [CrossRef]
  38. Open Images 2019—Object Detection. Available online: https://www.kaggle.com/c/open-images-2019-object-detection (accessed on 29 June 2023).
Figure 1. The Proposed RFD.
Figure 1. The Proposed RFD.
Sensors 23 07556 g001
Figure 2. Region Segregation and Feature Extraction.
Figure 2. Region Segregation and Feature Extraction.
Sensors 23 07556 g002
Figure 3. ROI Selection Process.
Figure 3. ROI Selection Process.
Sensors 23 07556 g003
Figure 4. Learning Process.
Figure 4. Learning Process.
Sensors 23 07556 g004
Figure 5. Pattern-Matching Process.
Figure 5. Pattern-Matching Process.
Sensors 23 07556 g005
Figure 6. N O V f , O V , and   R S M Analysis.
Figure 6. N O V f , O V , and   R S M Analysis.
Sensors 23 07556 g006
Figure 7. Analysis of S D and ϑ m .
Figure 7. Analysis of S D and ϑ m .
Sensors 23 07556 g007
Figure 8. Similarity Index Comparison.
Figure 8. Similarity Index Comparison.
Sensors 23 07556 g008
Figure 9. Extraction Ratio Comparison.
Figure 9. Extraction Ratio Comparison.
Sensors 23 07556 g009
Figure 10. Precision Comparison.
Figure 10. Precision Comparison.
Sensors 23 07556 g010
Figure 11. False Rate Comparison.
Figure 11. False Rate Comparison.
Sensors 23 07556 g011
Figure 12. Processing Time Comparison.
Figure 12. Processing Time Comparison.
Sensors 23 07556 g012
Figure 13. Error Probability Ratio.
Figure 13. Error Probability Ratio.
Sensors 23 07556 g013
Table 1. Patch Size.
Table 1. Patch Size.
Image TypePatch SizeNumber of Training and Testing
Label Image5, 5, 5, 5145
Label Image10, 6, 5, 3156
Annotated Image15, 10, 8, 5278
Annotated Image20, 16, 15, 101610
Table 2. Comparative Analysis Results (# Features).
Table 2. Comparative Analysis Results (# Features).
MetricsC-FCNCDD-NetPTANRFD
Similarity Index0.5880.7140.8420.9285
Extraction Ratio62.0971.8882.7990.326
Precision0.7390.8240.9080.9564
False Rate0.1690.1390.0940.0562
Processing Time (ms)1435.461071.21788.43492.623
Findings: The proposed RFD improves the similarity index by 10.69%, extraction ratio by 9.04%, and precision by 13.27%. The false rate and processing time are reduced by 7.78% and 9.19%, respectively.
Table 3. Comparative Analysis Results (# Regions).
Table 3. Comparative Analysis Results (# Regions).
MetricsC-FCNCDD-NetPTANRFD
Similarity Index0.5590.6510.8320.9328
Extraction Ratio62.5571.7979.2991.025
Precision0.7140.7960.9090.9511
False Rate0.1660.1310.0970.0534
Processing Time (ms)1430.83944.14717.49478.172
Findings: The proposed RFD improves the similarity index by 12.61%, extraction ratio by 9.91%, and precision by 14.48%. The false rate and processing time are reduced by 7.79% and 8.94%, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Alshahir, A.; Abbas, G.; Kaaniche, K.; Albekairi, M.; Alshahr, S.; Aljarallah, W.; Sahbani, A.; Nowakowski, G.; Sieja, M. A Deep Recurrent Learning-Based Region-Focused Feature Detection for Enhanced Target Detection in Multi-Object Media. Sensors 2023, 23, 7556. https://doi.org/10.3390/s23177556

AMA Style

Wang J, Alshahir A, Abbas G, Kaaniche K, Albekairi M, Alshahr S, Aljarallah W, Sahbani A, Nowakowski G, Sieja M. A Deep Recurrent Learning-Based Region-Focused Feature Detection for Enhanced Target Detection in Multi-Object Media. Sensors. 2023; 23(17):7556. https://doi.org/10.3390/s23177556

Chicago/Turabian Style

Wang, Jinming, Ahmed Alshahir, Ghulam Abbas, Khaled Kaaniche, Mohammed Albekairi, Shahr Alshahr, Waleed Aljarallah, Anis Sahbani, Grzegorz Nowakowski, and Marek Sieja. 2023. "A Deep Recurrent Learning-Based Region-Focused Feature Detection for Enhanced Target Detection in Multi-Object Media" Sensors 23, no. 17: 7556. https://doi.org/10.3390/s23177556

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop