Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective

Ramírez-Ayala, Oscar; González-Hernández, Iván; Salazar, Sergio; Flores, Jonathan; Lozano, Rogelio

doi:10.3390/s23229216

Open AccessArticle

Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective

by

Oscar Ramírez-Ayala

,

Iván González-Hernández

,

Sergio Salazar

,

Jonathan Flores

and

Rogelio Lozano

^*

Aerial and Submarine Autonomous Navigation Systems Program, Cinvestav, Mexico City 07360, Mexico

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(22), 9216; https://doi.org/10.3390/s23229216

Submission received: 20 September 2023 / Revised: 1 November 2023 / Accepted: 9 November 2023 / Published: 16 November 2023

(This article belongs to the Special Issue Internet of Things and Sensor Technologies in Smart Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Detecting people in images and videos captured from an aerial platform in wooded areas for search and rescue operations is a current problem. Detection is difficult due to the relatively small dimensions of the person captured by the sensor in relation to the environment. The environment can generate occlusion, complicating the timely detection of people. There are currently numerous RGB image datasets available that are used for person detection tasks in urban and wooded areas and consider the general characteristics of a person, like size, shape, and height, without considering the occlusion of the object of interest. The present research work focuses on developing a thermal image dataset, which considers the occlusion situation to develop CNN convolutional deep learning models to perform detection tasks in real-time from an aerial perspective using altitude control in a quadcopter prototype. Extended models are proposed considering the occlusion of the person, in conjunction with a thermal sensor, which allows for highlighting the desired characteristics of the occluded person.

Keywords:

UAV; CNN; robust control

1. Introduction

Unmanned Aerial Vehicles (UAVs) are applied in many fields. UAVs have many advantages compared to ground vehicles [1], such as more degrees of freedom to avoid obstacles, coverage of a wide area in less time [2], and the detection of small objects, improving inspection coverage [3]. Object detection is one of the main machine vision applications used by UAVs. This task includes search and rescue missions in areas that are difficult to access, wide search areas, or areas affected by natural disasters [4]. Quick and planned actions are crucial to save as many lives as possible. Detecting people from aerial platforms has become an important aspect of the deployment of autonomous unmanned aerial vehicle systems in search and rescue missions. UAV systems have increased in popularity in various civil and scientific applications due to their ability to operate over large areas and difficult terrain [5,6].

In [7], the authors presented a robust automatic person-detection algorithm for search missions. These missions are a challenge due to the occlusion and radiation of trees in direct sunlight scenarios. The sunlight temperature reflected on the tree surfaces is similar to body temperature captured by camera sensors. Therefore, bodies in thermal images are poorly detected if they are partially hidden by trees. A simple threshold for the heat signal will not be appropriate for the detection of people [8].

In such situations, autonomous location reports for the detected objects of interest or human detection and recognition of bodies can eliminate the need for manual analysis of live UAV video images [9,10,11].

The image variation caused by movements and aerial vehicle instability generates blurred images. The UAVs’ attitude changes the visual shape of the captured object, affecting the size and position of the target object. These visual changes transform the appearance of the object, making detection harder [12].

In critical situations, like forest fire monitoring and fighting, the use of UAVs has increased [13,14]. For example, in [15], a perception algorithm implemented in a UAV was presented to perform surveillance tasks using RGB aerial and thermal sensors when monitoring a specific area, see Figure 1.

Sliding Mode Control (SMC) was developed by Utkin in [16]. Its main contribution is to guarantee that a restriction on the sliding surface is satisfied. After verifying the restriction, the system trajectories converge in finite-time on the sliding surface. The advantages of SMC are the simplicity and robustness of the control strategy after choosing the sliding variable [17]. Fixed-time stability was introduced by Polyakov in [18]. Fixed-time stability ensures that the establishment time does not depend on the initial conditions and provides a predefined convergence time. Fixed-time stabilization with independent time controllers was proposed in [19,20,21].

2. Related Work

The present paper describes an approach to real-time person detection from thermal images captured from an aerial perspective in complex forest scenarios [22]. There are many challenges due the fast movements of the UAV, the image instability, and the relatively small size of the target person. Due to the altitude and attitude of the flight of the vehicle, the appearance of observed objects is complex, and it is difficult to detect and control the stabilization in real-time during the inspection of wooded areas with aerial vehicles.

In this work, the proposed CNN architecture analyzes images at a rate of six frames per second (fps) in order to maximize the image quality available for detection considering the embedded hardware limitations. Therefore, the proposed CNN + Haar model focuses on robust human detection within each individual captured frame [23].

To carry out human detection, sets of thermal images of people and a wooded area without people were formed. There are very few databases of this type, and so, the aim of the present work was to provide a new database that allows the improvement of the characteristics and detection conditions for deep learning algorithms when detecting partially occluded people. For the classification task, a Convolutional Neural Network (CNN) was used. The CNN is a deep learning technique that has been used successfully to solve problems related to computer vision [24].

Person detection presents a complex problem due to the small size of the target, the occlusion of people, and the low contrast of the human with respect to the background (Figure 1). Thermal imaging is used to reduce the optical camouflage present in the image. However, within thermal images, the detection of human traces remains a challenging problem due to the variability in these thermal signatures, generated by changing weather conditions, the thermography of the ambient environment, occlusion, or light-generated noise in the image. The main challenge is the dynamic and robust detection of people in different environments, both in clear and cluttered aerial views, regardless of the location or weather conditions.

We propose a real-time detection approach for detecting people using thermal images with the analysis of the thermal signatures.

The contributions are a database of thermal images of people and a compact CNN architecture assisted by a Haar cascade classifier for real-time applications, allowing the CNN model to be evaluated in an embedded computer from an aerial perspective. In order to obtain real-time images, a quadrotor aircraft was used. The hover flight was stabilized by a fixed-time sliding mode controller to compensate for unmodeled dynamics and external perturbations.

3. Quadrotor Aircraft Dynamical Model

Quadrotors are under actuated systems with four control inputs and six degrees of freedom, which evolve in three-dimensional space. Their dynamical model makes them a system of interest to the scientific community because they contain under-actuated, strongly coupled, and multi-variable nonlinear dynamics. The aerial vehicle is considered as a rigid body that evolves in three dimensions, subject to a main force u generated by the propulsion of the rotors. The dynamical model of the drone is obtained from the Euler–Lagrange approach. Figure 2 shows a free-body diagram of the quadrotor aircraft.

The vehicle center of mass position with respect to the inertial frame I is denoted by

ξ = [x, y, z] \in R^{3}

(1)

where x and y are the coordinates in the horizontal plane and z is the vertical position. The Euler angles are represented by

η = [ϕ, θ, ψ] \in R^{3}

(2)

where

ϕ

is the roll angle around the x-axis,

θ

is the pitch angle around the y-axis, and

ψ

is the yaw angle around the z-axis; see Figure 2. The generalized coordinates of the vehicle are given by

q = [x, y, z, ϕ, θ, ψ] \in R^{6}

(3)

The quadrotor dynamical model is obtained from the Euler–Lagrange methodology, and the equations can be divided into translational and rotational displacements.

The dynamical model of the quadrotor vehicle is

m \ddot{x} = u (cos ϕ sin θ cos ψ + sin ϕ sin ψ) + d_{x}

(4)

m \ddot{y} = u (\cos ϕ sin θ sin ψ - sin ϕ cos ψ) + d_{y}

(5)

m \ddot{z} = u cos θ cos ϕ - m g + d_{z}

(6)

\ddot{ϕ} = \dot{θ} \dot{ψ} \frac{I_{y} - I_{z}}{I_{x}} + \frac{l}{I_{x}} {\tilde{τ}}_{ϕ} + {\tilde{d}}_{ϕ}

(7)

\ddot{θ} = \dot{ϕ} \dot{ψ} \frac{I_{x} - I_{z}}{I_{y}} + \frac{l}{I_{y}} {\tilde{τ}}_{θ} + {\tilde{d}}_{θ}

(8)

\ddot{ψ} = \dot{ϕ} \dot{θ} \frac{I_{y} - I_{x}}{I_{z}} + \frac{l}{I_{z}} {\tilde{τ}}_{ψ} + {\tilde{d}}_{ψ}

(9)

where m is the mass and l is the distance from the center of mass to each rotor.

I_{x}

,

I_{y}

, and

I_{z}

are the inertial constants in each axis.

{\tilde{τ}}_{ϕ}

,

{\tilde{τ}}_{θ}

, and

{\tilde{τ}}_{ψ}

represent the roll, pitch, and yaw torque, respectively.

{\tilde{d}}_{ϕ}

,

{\tilde{d}}_{θ}

, and

{\tilde{d}}_{ψ}

represent the external perturbations, and

d_{x}

,

d_{y}

, and

d_{z}

represent non-modeling, coupling, and external perturbations.

4. Control Based on the SMC Algorithm in Fixed-Time

For the implementation of the controller by sliding modes in fixed-time, we performed simulations to verify the behavior of the dynamics of the quadrotor. However, for real-time applications, there are disturbances such as wind gusts and unmodeled dynamics; it is necessary to use nonlinear control, which provides robustness [25,26]. A well-known control method is the Sliding Mode Control (SMC), whose strengths are the robustness in the face of the unmodeled dynamics of the system, which is essential in the search and rescue application.

4.1. Control Design

The objective is to control the altitude by means of the force u, and the movement on the

X_{b}

- and

Y_{b}

-axes is controlled by means of the desired angles

θ

and

ϕ

since it is not possible to apply forces directly in the

X_{b}

and

Y_{b}

directions with the rotors. The desired angles for the quadrotor should be defined, so that the force u generates components in the

X_{b}

and

Y_{b}

directions. The design problem is to enforce the behavior of the states towards the desired trajectory. The following procedure describes how to determine the control law for any of the dynamics of the quadrotor (x, y, z,

ψ

,

θ

,

ϕ

).

The translational dynamics of the quadrotor is defined in Equations (4)–(6); considering the dynamics in z, the vertical thrust force u is proposed as

u = \frac{m (ν_{z} + g)}{cos θ cos ϕ}

(10)

where

ν_{z}

is an auxiliary control that will be defined later. Introducing (10) into (6) the translational dynamics in the z-axis, we have

\ddot{z} = ν_{z} + d_{z}

(11)

Introducing (10) into the

\ddot{x}

and

\ddot{y}

dynamics, it follows that

\ddot{x} = (ν_{z} + g) (tan θ cos ψ + \frac{tan ϕ sin ψ}{cos θ}) + d_{x}

(12)

\ddot{y} = (ν_{z} + g) (tan θ sin ψ - \frac{tan ϕ cos ψ}{cos θ}) + d_{y}

(13)

where

d_{x}

,

d_{y}

, and

d_{z}

are the lumped disturbances, including the externally bounded and unmodeled ones. The above equations can be written as follows:

[\begin{matrix} \ddot{x} \\ \ddot{y} \end{matrix}] = (ν_{z} + g) [\begin{matrix} cos ψ & sin ψ \\ sin ψ & - cos ψ \end{matrix}] [\begin{matrix} tan θ^{d} \\ \frac{tan ϕ^{d}}{cos θ} \end{matrix}] + [\begin{matrix} d_{x} \\ d_{y} \end{matrix}]

(14)

Defining the virtual control inputs

θ^{d}

and

ϕ^{d}

,

[\begin{matrix} tan θ^{d} \\ \frac{tan ϕ^{d}}{cos θ} \end{matrix}] = \frac{1}{(ν_{z} + g)} {[\begin{matrix} cos ψ & sin ψ \\ sin ψ & - cos ψ \end{matrix}]}^{- 1} [\begin{matrix} ν_{x} \\ ν_{y} \end{matrix}] = [\begin{matrix} \frac{ν_{x} cos ψ + ν_{y} sin ψ}{ν_{z} + g} \\ \frac{ν_{x} sin ψ - ν_{y} cos ψ}{ν_{z} + g} \end{matrix}]

(15)

Introducing (15) into (14), we obtain the translational dynamics:

\ddot{x} = ν_{x} + d_{x}

(16)

\ddot{y} = ν_{y} + d_{y}

(17)

where

ν_{x}

and

ν_{y}

are auxiliary controls, which will be defined later.

The error is defined as

e (t) = [\begin{matrix} e_{x} \\ e_{y} \\ e_{z} \end{matrix}] = [\begin{matrix} x - x^{d} \\ y - y^{d} \\ z - z^{d} \end{matrix}]

(18)

We can rewrite the translational dynamics as follows:

[\begin{matrix} \ddot{x} \\ \ddot{y} \\ \ddot{z} \end{matrix}] = [\begin{matrix} ν_{x} \\ ν_{y} \\ ν_{z} \end{matrix}] + [\begin{matrix} d_{x} \\ d_{y} \\ d_{z} \end{matrix}]

(19)

Sliding surfaces are defined for each auxiliary control of the translational dynamics:

s_{x} = \dot{x} + β_{x} (x - x^{d})

(20)

s_{y} = \dot{y} + β_{y} (y - y^{d})

(21)

s_{z} = \dot{z} + β_{z} (z - z^{d})

(22)

These sliding surfaces are an important stage design of the SMC; these surfaces guarantee the fixed-time convergence of the involved states.

Define the auxiliary controls of the translation dynamics using the SMC of constant exponential coefficients defined in the following expression:

ν_{x} = - β_{x} {\dot{e}}_{x} - k_{1 x} s i g n (s_{x}) - k_{2 x} | s_{x} |^{α} s i g n (s_{x}) - k_{3 x} {| s_{x} |}^{γ} s i g n (s_{x}) - k_{4 x} s_{x}

(23)

ν_{y} = - β_{y} {\dot{e}}_{y} - k_{1 y} s i g n (s_{y}) - k_{2 y} | s_{y} |^{α} s i g n (s_{y}) - k_{3 y} {| s_{y} |}^{γ} s i g n (s_{y}) - k_{4 y} s_{y}

(24)

ν_{z} = - β_{z} {\dot{e}}_{z} - k_{1 z} s i g n (s_{z}) - k_{2 z} | s_{z} |^{α} s i g n (s_{z}) - k_{3 z} {| s_{z} |}^{γ} s i g n (s_{z}) - k_{4 z} s_{z}

(25)

Define the torque control inputs as

{\tilde{τ}}_{ϕ} = \frac{I_{x}}{l} τ_{ϕ} - [\dot{θ} \dot{ψ} (\frac{I_{y} - I_{z}}{I_{x}})]

(26)

{\tilde{τ}}_{θ} = \frac{I_{y}}{l} τ_{θ} - [\dot{ϕ} \dot{ψ} (\frac{I_{x} - I_{z}}{I_{y}})]

(27)

{\tilde{τ}}_{ψ} = \frac{I_{z}}{l} τ_{ψ} - [\dot{ϕ} \dot{θ} (\frac{I_{y} - I_{x}}{I_{z}})]

(28)

Then, we obtain

[\begin{matrix} \ddot{ϕ} \\ \ddot{θ} \\ \ddot{ψ} \end{matrix}] = [\begin{matrix} τ_{ϕ} \\ τ_{θ} \\ τ_{ψ} \end{matrix}] + [\begin{matrix} d_{ϕ} \\ d_{θ} \\ d_{ψ} \end{matrix}]

(29)

where

d_{ϕ}

,

d_{θ}

, and

d_{ψ}

are lumped perturbations including the external and coupling dynamics. The attitude error is defined as

\tilde{η} = η - η^{d} = [\begin{matrix} ϕ - ϕ^{d} \\ θ - θ^{d} \\ ψ - ψ^{d} \end{matrix}]

(30)

where the sliding surfaces are defined for each auxiliary control of the attitude dynamics:

s_{ϕ} = \dot{ϕ} + β_{ϕ} (ϕ - ϕ^{d})

(31)

s_{θ} = \dot{θ} + β_{θ} (θ - θ^{d})

(32)

s_{ψ} = \dot{ψ} + β_{ψ} (ψ - ψ^{d})

(33)

Define the auxiliary controls of the attitude dynamics using the SMC of constant exponential coefficients in the following expression:

τ_{ϕ} = - β_{ϕ} {\dot{\tilde{η}}}_{ϕ} - k_{1 ϕ} s i g n (s_{ϕ}) - k_{2 ϕ} | s_{ϕ} |^{α} s i g n (s_{ϕ}) - k_{3 ϕ} {| s_{ϕ} |}^{γ} s i g n (s_{ϕ}) - k_{4 ϕ} s_{ϕ}

(34)

τ_{θ} = - β_{θ} {\dot{\tilde{η}}}_{θ} - k_{1 θ} s i g n (s_{θ}) - k_{2 θ} | s_{θ} |^{α} s i g n (s_{θ}) - k_{3 θ} {| s_{θ} |}^{γ} s i g n (s_{θ}) - k_{4 θ} s_{θ}

(35)

τ_{ψ} = - β_{ψ} {\dot{\tilde{η}}}_{ψ} - k_{1 ψ} s i g n (s_{ψ}) - k_{2 ψ} | s_{ψ} |^{α} s i g n (s_{ψ}) - k_{3 ψ} {| s_{ψ} |}^{γ} s i g n (s_{ψ}) - k_{4 ψ} s_{ψ}

(36)

4.2. Constant Exponent Coefficient Sliding Mode Control Stability Analysis

Consider the following uncertain nonlinear second-order system:

\ddot{z} = - g + \frac{cos θ cos ϕ}{m} u + d_{z}

(37)

and with the sliding surface

ν_{z}

proposed for the dynamics of the z-axis:

s_{z} = \dot{z} + β_{z} z

(38)

with

β > 0

; the controller is proposed as

u = \frac{m}{cos θ cos ϕ} [g - β_{z} z - k_{1 z} s i g n (s_{z}) - k_{2 z} | s_{z} |^{α} s i g n (s_{z}) - k_{3 z} | s_{z} |^{γ} s i g n (s_{z}) - k_{4 z} s_{z}]

(39)

with

k_{1 z} > d_{z}, k_{2 z} > 0, k_{3 z} \geq 0, k_{4 z} \geq 0, α > 1

, and

0 < γ < 1

.

Proposition 1.

The closed-loop system (37)–(39) reaches the sliding surface

s_{z} = 0

in fixed-time, satisfying the settling time

T (s_{0})

.

T (s_{0}) \leq \frac{1}{k_{1 z} - δ} + \frac{1}{k_{2 z} (α - 1)}

(40)

The closed-loop system is globally asymptotically stable.

4.3. Stability Proof

Consider the sliding surface

s_{z}

and the dynamics in

\ddot{z}

and u defined in (39), where g represents the force of gravity and m the mass of the vehicle. The derivative of the proposed sliding surface

{\dot{s}}_{z}

is defined as

{\dot{s}}_{z} = - g + \frac{m}{cos θ cos ϕ} u + β \dot{z} + d_{z}

(41)

Introducing (39) into the above, we obtain

{\dot{s}}_{z} = k_{1 z} s i g n (s_{z}) + k_{2 z} | s_{z} |^{α} s i g n (s_{z}) + k_{3 z} {| s_{z} |}^{γ} s i g n (s_{z}) + k_{4 z} s_{z} + d_{z}

(42)

To demonstrate the stability of the system, consider the following Lyapunov candidate function [23]:

V (x) = s_{z}^{2}

(43)

Then,

\dot{V} = - 2 k_{1 z} | s_{z} | - 2 k_{2 z} | s_{z} |^{α - 1} - 2 k_{3 z} {| s_{z} |}^{γ - 1} - 2 k_{4 z} s_{z}^{2} + 2 d_{z} s_{z}

(44)

\leq - 2 (k_{1 z} - δ_{z}) | s_{z} | - 2 k_{2 z} {| s_{z} |}^{α + 1}

(45)

\leq - 2 (k_{1 z} - δ_{z}) V {(s_{z})}^{\frac{1}{2}} - 2 k_{2 z} V {(s_{z})}^{\frac{α + 1}{2}}

(46)

with

\frac{(α + 1)}{2} > 1

.

Remark 1.

The function

x \to {| x |}^{γ} s g n (s_{z})

with

x (t) \in R, λ > 0

and

μ > 0

such that

θ = \frac{λ}{(1 + μ)} > 1

, with

0 < γ < 1

, which ensures the asymptotic stability of the closed-loop system (37), (38), and (39) towards the origin.

The stability test is similar to the one in [25] applied to the second-order

\ddot{z}

dynamics of the quadrotor and also to the other dynamics.

5. Results

5.1. Simulation Results of the Aerial Vehicle

The simulation for each of the dynamics was developed with different parameters because they are nonlinear and under-actuated, where the perturbation for the translational dynamics is defined as

d =

[

s i n (t)

,

s i n (t)

,

2 s i n (10 t)

], and the parameters

β = 1

,

α = 1.5

, and

γ = 0.5

are the same for the translational dynamics and

δ =

[

1.1

,

1.1

,

2.1

]. For the gains

k_{i}

where

i = 1, \dots, 4

defined for each independent translational dynamics, one still has

T_{z} (s_{0}) \leq 1.833 s

, and for the other dynamics, being coupled and depending directly on the z dynamics, we define a different settling-time such that

T_{x, y} (s_{0}) \leq 6 s

with different parameters.

A desired trajectory is defined by

q^{d} = [\begin{matrix} \cos (ω t) \\ - \sin (ω t) \\ 6 \end{matrix}]

(47)

where

ω = \frac{2 π}{20}

.

The trajectory tracking of the translational dynamics x, y, and z with initial conditions

I C =

[

0.5

, −

0.4

, 0] is shown in Figure 3.

The tracking errors are shown in Figure 4, and the convergence to zero of the states of the translational dynamics is observed.

The behavior of the trajectory tracking of the attitude dynamics

ϕ

and

θ

and

ψ

is shown in Figure 5.

Finally, Figure 6 shows the three-dimensional trajectory tracking carried out by the quadrotor vehicle.

5.2. CNN People-Detection Results

For the development of the person-detection algorithm, in complex wooded areas using thermal images, it is necessary to consider that, in multiple situations, the object of interest (person) may be occluded by objects in the environment, and this makes detection difficult. This complication requires models that consider many representative features learned from image datasets under the desired detection conditions. The detection model was developed using the generalized thermal imaging dataset of a person when viewed in its entirety, making some first predictions. Then, an extension is made using the thermal imaging data, focused on the characterization of the occlusion condition. With these conditions, we implemented a CNN model with more-representative characteristics for a lost person in forested areas. A FLIR VUO thermal camera was used to capture the videos in the three forested settings shown in Figure 7.

The Figure 7a corresponds the place where video capture was carried out with the camera mounted on the UAV vehicle, capturing multiple videos, which were divided into two parts: one part was used for the training process and the other part for the process of model validation. The Figure 7b corresponds to the videos captured in a controlled way and where there is no occlusion of the persons. With this set of videos, the dataset of images where persons were fully displayed was created. Finally, in Figure 7c, the dataset of images never seen before by the model was generated and the training process was not used. The image dataset was divided into two classes: person and non-person. The first class must contain persons or representative parts in some part of the image, and the second class contains objects in general that represent everything that does not correspond to the representative characteristics of a person.

This architecture was tested for small image datasets and may represent a viable option considering that the detection application is focused on detecting persons, who represent a small portion of the overall image. The models developed so far had complications when defining the architecture because each convolution process reduces the image by a certain proportion. Therefore, very deep networks obtained few representative features of the small portion of the image that made up a person in the image.

5.3. Dataset (Thermal Images)

The dataset was made up of images of persons and images that did not contain any person. With the cutouts or sections of images that contained a complete person or with a certain portion of their characteristics occluded by the environment and sections that did not contain any person, which represent the characteristics of the desired environment for inspection, a large number of positive image samples will allow the algorithm to generalize the characteristics of a person for the classification and detection processes.

This procedure was carried out on all the images captured in the three proposed forest scenarios, obtaining a total of 10,000 thermal images of 640 × 520 px (database available in People Thermal Images1, https://doi.org/10.6084/m9.figshare.24473002.v1 (accessed on 18 September 2023)), and the sectioned datasets were made up in the following form:

Complete positive images: 3000 images (150 × 150 px) sectioned as people divided into two sets: 2000 training images and 1000 validation images, see Figure 8.

Positive occlusion images: 2000 images (150 × 150 px) sectioned as persons divided into two sets: 1500 training images and 500 validation images, see Figure 9.

Negative images: 5000 images (150 × 150 px) sectioned as persons divided into two sets: 3500 training images and 1500 validation images, see Figure 10.

The forested negative dataset represents all images that did not represent a person, which may be objects, vegetation, or anything that can provide information to the detection model, which accurately generalizes the representative characteristics of a person.

5.4. Training the CNN Models

The CNNarchitecture had a sequence of consecutive layers, and this is important to increase the capacity of the network, as well as reduce the size of the feature maps so that they are not too large when we reach the Flatten layer, which will convert them into a one-dimensional vector of values. The procedure starts with input images of size [150 × 150], ends with feature maps of size 2 × 2, followed by a conversion of the 2D array of [2 × 2] to a 1D vector using the Flatten layer. The depth of the feature maps increases progressively in the network (from 64 to 128). This is a pattern that can be observed in all CNNs: it starts with a set dimension image and becomes progressively smaller in order to highlight the most-representative features of the dataset.

Several training processes were carried out with the CNN architecture with the thermal image dataset. This model has a shallow architecture, which requires little computational resources, which allows the proposed codes to be optimized and tested in a more-fluid way. Subsequently, with the tested codes, the architecture with the data extension was used.

For the CNN model, as it only represented a rapid test model, it was trained several times to perform training that was not so precise but that would allow the detection algorithms to be tested in real-time in a fluid and fast way, prior to further training. For the CNN model with a modified architecture, the algorithm trained for 100 epochs, in addition the modification of some of the hyperparameters.

The training processes were developed for 100 epochs using the established dataset considering the partial occlusion of persons in the thermal images. The maximum values of the precision and loss obtained for the two training processes with the best result are observed in Table 1.

The CNN1 and CNN2 architectures were the same; the difference was that, in CNN1, it was carried out with data from complete persons, and for CNN2, data in a situation of occlusion were added. The temporal evolution of the maximum values of the precision and loss in the training process for the CNN2 model showed adequate training for the detection of features never seen before by the model, as shown in Figure 11. A progressive training follow-up was observed without separating the estimates; this indicated that the predictions for both the training and validation data had lower loss values.

5.5. Detection of a Person in Thermal Images Using Sliding Window

For the evaluation of the performance of the CNN detection model, a sliding window application was used with the weights learned during the training process. In combination with image pyramids and a sliding window process, we implemented an image classifier, which can recognize objects at different scales and locations in the image. These techniques, although they are not a solution for real-time detection, play an absolutely critical role in the performance analysis in object detection and image classification.

Different complex scenarios of thermal images captured from an aerial perspective, not used in the model training process, were evaluated, obtaining adequate classification results even in situations where the environment is very rugged and there is partial occlusion of the persons shown in Figure 12.

5.6. Real-Time CNN Classifier + Haar Cascade

In a real-time implementation, see Figure 13, the Haar model was run on an NVIDIA Jetson Nano mini-computer with restricted processing resources. The Haar model is a good alternative for testing in a real-time object-detection applications [27].

This allowed us to evaluate the CNN model in real-time on board a prototype quadcopter vehicle. In order to test the CNN deep learning model for the detection of persons in thermal images from an aerial perspective and reduce the computational processing requirements, the developed CNN model will only evaluate the selector boxes generated by the Haar model and assign a label for each of the frames. The selector box may represent a person or may represent part of the woodland environment. This implementation will allow the processing to be carried out on board the prototype vehicle. The prototype vehicle used for the CNN model test is shown in Figure 14.

Training the Haar cascade model for our thermal image dataset, we sought to obtain a similar result considering that our dataset was relatively small and, for the Haar classifiers, there are no data extension functions.

The second block of results focused on combining the Haar classifier with a CNN model, which would allow the generation of deep learning predictions on the regions of interest detected by the Haar cascade classifier. This is very useful because, if the Haar classifier generates false positives in its classification process, a second evaluation is still possible, which will define whether it is a person or not and will depend on the prediction values generated by the modified CNN model, allowing us to rule out a large number of false detections.

The results in Figure 15 show that, in both images, the regions of interest with exposed persons without occlusion were classified correctly, and for the regions of interest where no persons were found, they were defined with the label “no person”.

6. Discussion

The results presented in this work were similar to other publications in the literature. For example, in [28], the authors used several high-performance deep-learning-based object detection algorithms for detecting small objects in aerial thermal images. The detection algorithm was implemented in a workstation computer.

In [29], the authors used a thermal camera using a deep learning model for human detection in low-visibility fire smoke scenarios. Furthermore, Reference [24] presented a new dataset of thermal image sequences for person detection and tracking. Besides, the authors proposed a new framework based on particle filters to track persons in aerial thermal images, using a small computer.

The detection of persons in real-time using thermal images from an aerial perspective is adequate for different circumstances in which the occlusion of certain portions of the features of the human body occur in a wooded area. We observed that false positives may be generated by the Haar cascade model, but were correctly classified by the CNN model with the label of “non-person”. The proposed deep learning CNN model correctly classified most of these positive detections with 99.8% training accuracy for a complete person and 95% training accuracy for an occluded person. Several scenes captured during the test flights with the quadrotor prototype vehicle were presented. The capture of aerial information allowed the creation of a perspective image database that adequately generalized the detection conditions of lost persons in forested areas and was used to train the CNN deep learning model.

Author Contributions

Investigation, O.R.-A.; projet administration, I.G.-H.; supervision, S.S.; data curation, J.F.; conceptualization, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by Aerial and Submarine Autonomous Navigation Systems PhD program of Research and Advanced Studies Center (CINVESTAV) of the National Polytechnic Institute.

Data Availability Statement

People thermal images database available in https://doi.org/10.6084/m9.figshare.24473002.v1 (accessed on 18 September 2023).

Acknowledgments

The authors are grateful to CONAHCYT (National Council of Science, Humanities and Technology) for its support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gaszczak, A.; Breckon, T.P.; Han, J. Real-time people and vehicle detection from UAV imagery. In Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques; SPIE: Bellingham, WA, USA, 2011; Volume 7878, pp. 71–83. [Google Scholar]
Saif, A.F.M.S.; Prabuwono, A.S.; Mahayuddin, Z.R. A review of machine vision based on moving objects: Object detection from UAV aerial images. Int. J. Adv. Comput. Technol. 2013, 5, 57–72. [Google Scholar]
Alessandro, M.; Piero, T.; Filippo, D.G.S.; Lorenzo, G.; Primo, V.F.; Jacopo, P.; Claudio, B.; Alessandro, Z.; Roberto, B.; Beniamino, G. Intercomparison of UAV, aircraft and satellite remote sensing platforms for precision viticultur. Remote Sens. 2015, 7, 2971–2990. [Google Scholar]
Pere, M.; Ismael, C.; Victoria, T.; Jan, S.; Kornus, W.; Prades, R.; Aguilera, C. Searching lost people with UAVs: The system and results of the close-search project. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 39, 441–446. [Google Scholar]
De Oliveira, D.C.; Wehrmeister, M.A. Towards real-time people recognition on aerial imagery using convolutional neural networks. In Proceedings of the 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC), York, UK, 17–20 May 2016; Volume 10, pp. 27–34. [Google Scholar]
Burke, C.; McWhirter, P.R.; Veitch-Michaelis, J.; McAree, O.; Pointon, H.A.; Wich, S.; Longmore, S. Requirements and limitations of thermal drones for effective search and rescue in marine and coastal areas. Drones 2019, 3, 78. [Google Scholar] [CrossRef]
Lygouras, E.; Santavas, N.; Taitzoglou, A.; Tarchanidis, K.; Mitropoulos, A.; Gasteratos, A. Unsupervised human detection with an embedded vision system on a fully autonomous UAV for search and rescue operations. Sensors 2019, 19, 3542. [Google Scholar] [CrossRef]
Brunetti, A.; Buongiorno, D.; Trotta, G.F.; Bevilacqua, V. Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 2018, 300, 17–33. [Google Scholar] [CrossRef]
Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
Liu, C.; Szirányi, T. Real-time human detection and gesture recognition for on-board UAV rescue. Sensors 2021, 21, 2180. [Google Scholar] [CrossRef]
Dong, J.; Ota, K.; Dong, M. UAV-based real-time survivor detection system in post-disaster search and rescue operations. IEEE J. Miniaturization Air Space Syst. 2021, 2, 209–219. [Google Scholar] [CrossRef]
Ammour, N.; Alhichri, H.; Bazi, Y.; Benjdira, B.; Alajlan, N.; Zuair, M. Deep learning approach for car detection in UAV imagery. Remote Sens. 2017, 9, 312. [Google Scholar] [CrossRef]
Barmpoutis, P.; Papaioannou, P.; Dimitropoulos, K.; Grammalidis, N. A review on early forest fire detection systems using optical remote sensing. Sensors 2020, 20, 6442. [Google Scholar] [CrossRef]
Lee, S.; Song, Y.; Kil, S.H. Feasibility analyses of real-time detection of wildlife using UAV-derived thermal and RGB images. Remote Sens. 2021, 13, 2169. [Google Scholar] [CrossRef]
Al-Kaff, A.; Madridano, Á.; Campos, S.; García, F.; Martín, D.; de la Escalera, A. Emergency support unmanned aerial vehicle for forest fire surveillance. Electronics 2020, 9, 260. [Google Scholar] [CrossRef]
Vadim, U. Variable structure systems with sliding modes. IEEE Trans. Autom. Control 1977, 22, 212–222. [Google Scholar]
Utkin, V.; Shi, J. Integral sliding mode in systems operating under uncertainty conditions. In Proceedings of the 35th IEEE Conference on Decision and Control, Kobe, Japan, 13 December 1996; Volume 4, pp. 4591–4596. [Google Scholar]
Andrey, P. Nonlinear feedback design for fixed-time stabilization of linear control systems. IEEE Trans. Autom. Control 2011, 57, 2106–2110. [Google Scholar]
Arie, L. On fixed and finite time stability in sliding mode control. In Proceedings of the 52nd IEEE Conference on Decision and Control, Firenze, Italy, 10–13 December 2013; pp. 4260–4265. [Google Scholar]
Zuo, Z. Non-singular fixed-time terminal sliding mode control of non-linear systems. IET Control Theory Appl. 2015, 9, 545–552. [Google Scholar] [CrossRef]
Corradini, M.L.; Cristofaro, A. Nonsingular terminal sliding-mode control of nonlinear planar systems with global fixed-time stability guarantees. Automatica 2018, 95, 561–565. [Google Scholar] [CrossRef]
Mishra, B.; Garg, D.; Narang, P.; Mishra, V. Drone-surveillance for search and rescue in natural disaster. Comput. Commun. 2020, 156, 1–10. [Google Scholar] [CrossRef]
David, R.; André, M.; Jacinto, N.C.; Pedro, M. A real-time pedestrian detector using deep learning for human-aware navigation. arXiv 2016, arXiv:1607.04441. [Google Scholar]
Portmann, J.; Lynen, S.; Chli, M.; Siegwart, R. People detection and tracking from aerial thermal views. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1794–1800. [Google Scholar]
Hu, C.; Yu, J.; Chen, Z.; Jiang, H.; Huang, T. Fixed-time stability of dynamical systems and fixed-time synchronization of coupled discontinuous neural networks. Neural Netw. 2017, 89, 74–83. [Google Scholar] [CrossRef]
Emmanuel, M.; Vincent, L.; Emmanuel, B.; Franck, P. Robust Fixed-Time Stability: Application to Sliding-Mode Control. IEEE Trans. Autom. Control 2021, 67, 1061–1066. [Google Scholar]
Setjo, C.H.; Achmad, B. Thermal image human detection using Haar-cascade classifier. In Proceedings of the 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia, 1–2 August 2017; pp. 1–6. [Google Scholar]
Akshatha, K.R.; Karunakar, A.K.; Shenoy, S.B.; Pai, A.K.; Nagaraj, N.H.; Rohatgi, S.S. Human detection in aerial thermal images using faster R-CNN and SSD algorithms. Electronics 2022, 7, 1151. [Google Scholar] [CrossRef]
Tsai, P.F.; Liao, C.H.; Yuan, S.M. Using deep learning with thermal imaging for human detection in heavy smoke scenarios. Sensors 2022, 14, 5351. [Google Scholar] [CrossRef] [PubMed]

Figure 1. People aerial thermal image.

Figure 2. Free-body diagram of the quadrotor aircraft.

Figure 3. Behavior of the trajectory tracking of the translational dynamics.

Figure 4. Convergence of tracking errors in the translational dynamics of the quadrotor vehicle.

Figure 5. Behavior of the

ϕ

,

θ

and

ψ

dynamics.

Figure 5. Behavior of the

ϕ

,

θ

and

ψ

dynamics.

Figure 6. Three dimensional trajectory tracking.

Figure 7. CNN model human detection in a thermal image. (a) shows training and validation process example, (b) shows no occlusion people example and (c) real application.

Figure 8. Positive image section containing people.

Figure 9. Positive image section with occlusion containing persons.

Figure 10. Negative image section containing wooded areas.

Figure 11. Precision and loss plots of the CNN2 model for the training data extended over 100 epochs.

Figure 12. CNN2 model detection using sliding window for images never seen before by the model.

Figure 13. Person detection using Haar cascade model (Viola and Jones) in thermal images.

Figure 14. Prototype vehicle used for the evaluation of the CNN model in real-time.

Figure 15. CNN model people detection in real-time for thermal image.

Table 1. Metrics obtained in the training process of the CNN model.

Metrics	Train_acc	Train_loss	Val_acc	Val_loss
CNN1	0.98 (98%)	0.00247	0.96 (96%)	0.00183
CNN2	0.998 (99.8%)	0.00093	0.98 (98%)	0.00032

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ramírez-Ayala, O.; González-Hernández, I.; Salazar, S.; Flores, J.; Lozano, R. Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective. Sensors 2023, 23, 9216. https://doi.org/10.3390/s23229216

AMA Style

Ramírez-Ayala O, González-Hernández I, Salazar S, Flores J, Lozano R. Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective. Sensors. 2023; 23(22):9216. https://doi.org/10.3390/s23229216

Chicago/Turabian Style

Ramírez-Ayala, Oscar, Iván González-Hernández, Sergio Salazar, Jonathan Flores, and Rogelio Lozano. 2023. "Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective" Sensors 23, no. 22: 9216. https://doi.org/10.3390/s23229216

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective

Abstract

1. Introduction

2. Related Work

3. Quadrotor Aircraft Dynamical Model

4. Control Based on the SMC Algorithm in Fixed-Time

4.1. Control Design

4.2. Constant Exponent Coefficient Sliding Mode Control Stability Analysis

4.3. Stability Proof

5. Results

5.1. Simulation Results of the Aerial Vehicle

5.2. CNN People-Detection Results

5.3. Dataset (Thermal Images)

5.4. Training the CNN Models

5.5. Detection of a Person in Thermal Images Using Sliding Window

5.6. Real-Time CNN Classifier + Haar Cascade

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI