MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection

Lu, Kangjie; Huang, Jingwen; Li, Junhui; Zhou, Jiashun; Chen, Xianliang; Liu, Yunfei

doi:10.3390/f13091448

Open AccessArticle

MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection

by

Kangjie Lu

¹

,

Jingwen Huang

¹

,

Junhui Li

¹,

Jiashun Zhou

¹,

Xianliang Chen

² and

Yunfei Liu

^1,*

¹

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China

²

School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia

^*

Author to whom correspondence should be addressed.

Forests 2022, 13(9), 1448; https://doi.org/10.3390/f13091448

Submission received: 28 July 2022 / Revised: 2 September 2022 / Accepted: 7 September 2022 / Published: 9 September 2022

(This article belongs to the Section Natural Hazards and Risk Management)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning-based forest fire vision monitoring methods have developed rapidly and are becoming mainstream. The existing methods, however, are based on enormous amounts of data, and have issues with weak feature extraction, poor small target recognition and many missed and false detections in complex forest scenes. In order to solve these problems, we proposed a multi-task learning-based forest fire detection model (MTL-FFDet), which contains three tasks (the detection task, the segmentation task and the classification task) and shares the feature extraction module. In addition, to improve detection accuracy and decrease missed and false detections, we proposed the joint multi-task non-maximum suppression (NMS) processing algorithm that fully utilizes the advantages of each task. Furthermore, considering the objective fact that divided flame targets in an image are still flame targets, our proposed data augmentation strategy of a diagonal swap of random origin is a good remedy for the poor detection effect caused by small fire targets. Experiments showed that our model outperforms YOLOv5-s in terms of

m A P

(mean average precision) by 3.2%,

A P_{S}

(average precision for small objects) by 4.8%,

A R_{S}

(average recall for small objects) by 4.0%, and other metrics by 1% to 2%. Finally, the visualization analysis showed that our multi-task model can focus on the target region better than the single-task model during feature extraction, with superior extraction ability.

Keywords:

forest fire detection; computer vision; multi-task learning; data augmentation; small fire targets

1. Introduction

Forests provide a fundamental habitat for terrestrial plants and animals, and they play an important role in maintaining the ecological balance of the ecosystem. Forest fires are among the most devastating forestry disasters, harming the global carbon cycle, soil characteristics and species richness as well as contributing to climate change [1]. Forest fires can even endanger human lives and public property, which will result in losses for the economy and resources. Forest fires spread swiftly, and are currently difficult to effectively control or prevent [2]. Therefore, it is crucial to locate fires quickly and put them out before they become serious incidents.

Traditionally, forest fire monitoring mainly relies on manual inspection [3] and conventional sensor monitoring [4,5]. However, manual inspection not only requires human and material resources, but also accurate positioning of the fire. The expense of covering the entire forest with a wireless sensor network consisting of traditional sensors, such as temperature, humidity, wind and rain sensors [6] is extremely high, in spite of the fact that sensor monitoring can locate fires quickly and precisely. With the rapid developments in computer vision, remote sensing and artificial intelligence technology, optical sensors are widely being used in forest fire detection due to their low cost, broad coverage, real-time processing and efficient recognition [2]. Many optical sensor-based forest fire detection studies involving visible [7,8,9], infrared [10,11,12], hyperspectral [13], multispectral [14] and 360-degree [15] cameras have demonstrated good progress. In addition, the platforms used for forest fire detection are constantly evolving. In their earlier stages of development, forest fire detection platforms relied on watchtower monitoring or satellite remote sensing. However, watchtower monitoring is very inflexible, and the satellite images are too large to detect early forest fires. Unmanned aerial vehicles (UAVs), with their flexibility and real-time information for early forest fire warning, are a recent trend [16,17].

Deep learning methods have outperformed traditional image processing methods in many fields, and have been widely used in forest fire detection. Pan et al. [18] trained an AddNet [19] to be a forest fire classifier. Images were divided into multiple small patches, and were accurately classified by the system to ascertain forest fire size and location. Zhang et al. [20] trained a joined classifier, and used a cascade approach to detect both full image and fine-grained patches. Jiao et al. [21] improved YOLOv3 [22] for forest fire detection and implemented it aboard UAVs for forest patrols. Wu et al. [23] explored the performance of detectors such as the you only look once (YOLO) network, single shot multi-box detector (SSD) [24], and the faster region-convolutional neural network (Faster R-CNN) [25]; they improved the structure of the YOLO network to enhance fire detection accuracy and inference speed. Xu et al. [26] adopted an ensemble learning scheme that improved detection accuracy and reduced false detection. However, the model was not end-to-end and was also highly complex. The aforementioned methods are able to discern the presence of fire as well as locate fires in images. Additionally, some studies employed the semantic segmentation technique for forest fire detection, which offers pixel-level information of fire areas that is more comprehensive and accurate. For example, Zhang [27] proposed ATT Squeeze U-Net, which added an attention mechanism to the original U-Net, and achieved a lightweight as well as precise system. Song [28] proposed the squeezed fire binary segmentation model, and transplanted it into an embedded device, achieving real-time speed, effectiveness, and high precision.

In previous studies of forest fire detection, many implementations of image classification models, object detection models and semantic segmentation models were used; each of these methods has its advantages. However, few studies have combined these methods and investigated how the unified advantages complement each other. In our paper, we explore a fusion of multiple tasks, and exploit connections between these tasks in order to improve the performance of forest fire detection. For example, image classification focuses on global information, while object detection focuses on regional information and semantic segmentation focuses on pixel information. In order to extract more characteristics of forest fires, a combination of these different fine-grained features can provide useful assistance. In addition, the fire images in forest scenes are difficult to obtain, so it is crucial to make full use of available data. Multi-task learning provides the possible benefit of assimilating similar but not identical missions.

Compared with the system used in Xu et al.’s previous study, our MTL-FFDet model is more lightweight, and is a practical end-to-end model for training, inference and deployment. Although the previous studies cited above achieved satisfying results, many problems with the visual detection of forest fires in complex forest scenes still exist. Firstly, deep learning is a data-driven approach, and the limited amount of publicly available forest fire video and image data sets makes it difficult to for this system to learn effective features, resulting in insufficient model generalization capabilities. Secondly, it is a challenge to adequately extract the essential characteristics of flames as a result of their rich dynamic and static characteristics, as well as their lack of fixed colors, textures and shapes. Furthermore, there are several disturbances in the forest environment, such as red leaves, light, shadow and twilight glow, all of which lead to a lot of false detections. Thirdly, in practical applications, whether from inspection by UAVs or in the fixed monitoring of watchtowers, vision-based algorithms are limited by camera pixels, and are weak in detecting small flame targets. In particular, due to limitations in visual perspective, the flame subject may not be in the middle of the image but rather is distributed at the edges of the image as a small and incomplete target. If these small targets as well as incomplete targets can be effectively detected, this will greatly improve the performance of forest fire detection models. Therefore, in our paper, our model focuses more on improving the abilities of feature extraction, few-shot learning and small target detection.

The main contributions of our paper are three-fold:

We propose a multi-task learning-based forest fire detection (MTL-FFDet) model that involves three tasks to enhance its feature extraction and learning capabilities from small samples for better performance.
In order to minimize the occurrence of missed and false detections in complex forest scenes, a joint multi-task NMS processing algorithm is proposed to filter out redundant and poor-quality prediction boxes.
A data augmentation approach involving a diagonal swap of random origin, is proposed to increase the number of small targets and improve the detection performance for small flame targets, particularly incomplete small flame targets at the edge of the viewpoint.

The rest of this paper is organized as follows: in Section 2, we introduce our data set, the MTL-FFDet model and optimizations in detail; in Section 3, we provide the experimental results and visualization analysis. The discussion and conclusions are presented in Section 4 and Section 5.

2. Materials and Methods

2.1. Data set and Annotation

The preparation of the data set is an essential component to implementating the algorithm in this paper. We collected images from several public forest fire data sets including VisiFire [29], ForestryImages [30], FiSmo [31], BowFire [32], Firesense [33] and EFD-Dataset [34], to form our data set. This self-built forest fire data set contains day fire, night fire, aerial fire, fixed shot fire, mountain fire, surface fire, trunk fire, canopy fire, etc., and natural forest images with disturbance. The diversity of the data set enabled the algorithm to be more generalized in complex forest environments. Our data set contained a total of 6595 images, of which 3987 were images with forest fires and 2608 were non-fire images in forest scenes. We randomly divided the entire data set into a training set and a validation set, in the ratio of 8:2, for training and testing processes, respectively. Some representative samples are shown in Figure 1.

The detection model was based on the multi-task learning proposed in this paper, which implemented three tasks, namely, object detection, semantic segmentation and image classification. Each image in our data set was annotated with three types of annotations. The labeling of image classification was done during the collection, while the annotation for the other two tasks was slightly more complex and required some caveats. For object detection, we needed to frame the flame region with a rectangular box using the LabelImg tool [35]; it is worth noting that the four boundaries of the annotation had to fit the flame target with an error of no more than two pixels. For segmentation, we used polygon annotation to outline the flame target using the Labelme tool [36], which is a pixel-level classification. Annotation examples are shown in the following Figure 2.

2.2. MTL-FFDet

Multi-task learning (MTL) seeks to improve generalization and feature extraction by drawing on the domain-specific information that is found in related tasks; this approach is in contrast to traditional single-task learning, which strives to fulfill a task using a particular model [37]. In addition, MTL brings several advantages. It contains an implicit mechanism for data augmentation that can improve the representation by averaging the noise patterns while learning multi-tasks [38]. Meanwhile, it may be challenging for the model to separate useful from irrelevant variables if a task is very noisy, has little data, or is high-dimensional [39]. MTL can aid the model by concentrating on key traits, since other tasks will provide additional evidence pertaining to their relevance or irrelevance. Additionally, inductive bias is introduced as a regularization term [40,41,42]. The risk of overfitting, and the complexity of the model is decreased when several related tasks share complementary information and serve as regularizers for one another. Furthermore, by sharing layers across multiple tasks, computation duplication is minimized, inference speed is increased and memory utilization is decreased.

In this paper, our MTL-FFDet model consists of three tasks: forest fire object detection, forest fire semantic segmentation and image classification. The detection task is the primary task, while the other two are secondary tasks. These three tasks share the convolutional neural network-based backbone for better extraction of fire features, while the detection and segmentation tasks share the multi-scale fusion network for better feature expression ability. The shared layers boost both the learning capacity for a small number of samples as well as the feature extraction capabilities for forest fires, leading to greater performance and generalization. The architecture of our MTL-FFDet is shown in Figure 3, and the design details of the backbone, neck and head will be specified below.

2.2.1. Backbone

The backbone is used as a shared layer for our model to extract features for use in later modules and serves as an encoder for our MTL-FFDet model. In our backbone design, the cross-stage-partial (CSP) bottleneck [43] with three convolutions (C3) is used due to its excellent performance in YOLOv5. Additionally, influenced by the functionality of lightweight network architecture, GhostNet [44], we replace the Conv module with the GhostConv module, which uses fewer parameters with the same precision. In our tests, we found that the overall parameters used changed from 8.9 M to 4.8 M, a reduction of nearly one-half, while maintaining the same accuracy. Figure 4 illustrates the backbone architecture. The input image size was set to 416 × 416 pixels so that the model could achieve a better trade-off between inference latency and accuracy. In the first layer of the backbone, the Conv module converts the information on width and height dimensions to the channel dimension, which can expand the receptive field without losing features from the raw image. After that, the number of layers is increased by stacking the GhostConv module and the C3Ghost module to make the network go deeper and extract more exact features. The final spatial pyramid pooling faster structure (SPPF) [45] solves the problem of different input sizes, resulting in different output dimensions and reducing the risk of overfitting.

2.2.2. Neck

The neck fuses the feature maps, which have 8×, 16×, and 32× down-sampling rates from the backbone. Our neck network references the design ideas of the feature pyramid network (FPN) [46] and the path aggregation network (PAN) [47], which have a top-down fusion and a bottom-up fusion, respectively, as shown in Figure 5. The fusion can combine semantic features and spatial features in a forest scene, in order to obtain a better feature expression effect.

2.2.3. Head

The head serves as a decoder for our MTL-FFDet to solve different tasks. Our model has three tasks, so we needed to design a specific head for each task, as shown in Figure 6.

The segmentation head uses the fused feature map from the neck and then up-samples it to the same size as the original (416 × 416). The number of output channels represents the categories, which determine where the pixels belong. In our segmentation task, background and flame are the two categories.

For the detection head, we adopted an anchor-based scheme for multi-scale detection, which is similar to the YOLO series [22,48]. The first two of the output dimensions represent the grid size in the scale, while the third dimension represents the number of anchors. The last dimension represents the number of predictions, including the coordinates of the prediction bounding box, the confidence score and the categories of the detected object.

The classification head is used to do binary classification on the input image, in order to distinguish whether the image is a forest fire image or not.

2.2.4. Loss Function

The design of the loss function is crucial for the training of deep neural networks. Since there are three tasks in our model, the multi-task loss contains three parts: detection loss

ℒ_{d e t}

, segmentation loss

ℒ_{s e g}

and classification loss

ℒ_{c l s}

.

Considering that the detection task not only needs to identify the fire, but also needs to frame the region of the fire with a rectangle, the detection loss

ℒ_{d e t}

is composed of object classification loss

ℒ_{d c l s}

, object confidence loss

ℒ_{d o b j}

and bounding box loss

ℒ_{d b o x}

, and is a weighted sum of these three, as shown in Equation (1):

ℒ_{d e t} = α_{1} ℒ_{d c l s} + α_{2} ℒ_{d o b j} + α_{3} ℒ_{d b o x}

(1)

where

ℒ_{d c l s}

and

ℒ_{d o b j}

use weighted binary cross-entropy (W-BCE) [49] loss to cope with the sample imbalance problem. The

ℒ_{d b o x}

uses complete intersection over union (CIoU) [50] loss, which takes the overlapping rate, the distance and the aspect ratio between the predictions and ground truth, into overall consideration.

As for the segmentation loss

ℒ_{s e g}

, it has two metrics. Firstly, we want each pixel point to be classified correctly. Secondly, we expect fewer false positives (FP) and true negatives (TN). Thus, its expression can be written as Equation (2), which follows:

ℒ_{s e g} = β_{1} ℒ_{s c l s} + β_{2} ℒ_{s i o u}

(2)

where

ℒ_{s c l s}

uses cross-entropy (CE) [51] loss, and

ℒ_{s i o u}

equals

1 - \frac{T P}{F P + T P + T N}

.

Similarly, we use CE loss as the classification loss

ℒ_{c l s}

. Thus, our total loss is a weighted sum of the three task losses, as shown in Equation (3):

ℒ_{M T L} = λ_{1} ℒ_{d e t} + λ_{2} ℒ_{s e g} + λ_{3} ℒ_{c l s}

(3)

where

α_{1},

α_{2}

,

α_{3}

,

β_{1}

,

β_{2}, λ_{1}, λ_{2}

,

λ_{3}

for the preceding three equations are adjustable parameters that are used to balance each loss.

2.3. Joint Multi-Task NMS

The complex environment of the forest makes the characteristics of forest fires change. For example, irregularities in the shape of the flame target due to leaf and branch shading can render the previously learned shape features inapplicable. Another complicating factor is that overexposure or underexposure in strong or low light conditions causes losses in texture and color characteristics in the forest fire images. These external change models inevitably present many situations that lead to missed and false detections. Considering that our model is composed of multiple tasks, we combined their respective characteristics in order to obtain better performance and reduce false and missed detections. For segmentation and detection tasks, although both share feature extraction and feature fusion modules, the tasks are different and the utilization of features is different. The semantic segmentation task is pixel point-based and focuses more on fine-grained category differentiation, while the object detection task is regression box-based and focuses more on regional differentiation. As shown in Figure 7, the false detections and missed detections that occur in the detection task are well identified in the segmentation task, while the missed detections that occur in the segmentation task are well detected in the detection task.

Facing this problem, we proposed an improved non-maximum suppression method, the joint multi-task non-maximum suppression (JM-NMS) algorithm, in order to filter redundant and low-confidence boxes in conjunction with the segmentation task.

First, for each generated prediction box, calculate the pixel ratio

ℛ_{o b j}

of the object within the box using the following Equation (4):

ℛ_{o b j} = \frac{n_{o b j}}{N_{t}}

(4)

where

N_{t}

represents the number of all pixels in the prediction box, and

n_{o b j}

represents the number of pixels belonging to the flame object.

Second, map the pixel ratio

ℛ_{o b j}

onto a nonlinear space. We use a sigmoid function, as shown in Equation (5), as our mapping function:

ℱ (x) = \frac{1}{1 + e^{- a x + b}}

(5)

where

a

and

b

are the scaling and bias factors, respectively. After the experiment, it was found that the effect is better when

a = 6

and

b = 1

. The function graph is shown in Figure 8. We can see that the majority of the high-quality prediction boxes can be maintained when

ℛ_{o b j}

is bigger than 0.5, since their coefficients are larger and differ less from one another. However, when

ℛ_{o b j}

is less than 0.5, its coefficients are lower and differentiate more, making it easier to censor prediction boxes with poor quality. Additionally, the coefficients are not zero when

ℛ_{o b j}

is close to 0, preventing certain normal boxes from being incorrectly censored as a result of poor segmentation results.

Based on this, we improved the original NMS [52] algorithm from confidence ranking to mixed score ranking. The mix score

𝒫_{o b j}

is calculated by Equation (6). Our JM-NMS algorithm is shown in Algorithm 1.

𝒫_{o b j} = ℱ (ℛ_{o b j}) \times S_{o b j}

(6)

where

S_{o b j}

represents the confidence score of the prediction box.

Algorithm 1 Joint Multi-task Non-Maximum Suppression (JM-NMS)

Input:

ℬ = {b_{1}, \dots, b_{N}}, S = {s_{1}, \dots, s_{N}}, ℳ = {m_{1}, \dots, m_{N}}, T_{n m s}

ℬ

is the list of initial detection boxes

S

contains corresponding detection scores

ℳ

contains the corresponding segmentation area in detection boxes
Begin:

ℛ =

Cal(

ℳ, ℬ

)

C = S \times 𝒫 (ℛ)

D \leftarrow {}

While

ℬ \neq e m p t y

do

i \leftarrow argmax (C)

ℐ \leftarrow b_{i}

D \leftarrow D \cup ℐ; ℬ \leftarrow ℬ - ℐ

For

b_{j} i n ℬ

do
If CIoU

(ℐ, b_{j}) \geq T_{n m s}

then

ℬ \leftarrow ℬ - b_{j}

;

C \leftarrow C - c_{j}; S \leftarrow S - s_{j}

End
End
End
Return

D, C, S

End

2.4. Data Augmentation

Data augmentation is a more common means of improving model performance in deep learning model training. In the training of our model, HSV enhancement, left-right flip, top-down flip, random scale transformation and Mosaic [53] data augmentation were used to improve model robustness, increase training data, avoid overfitting and sample imbalance.

In our study, we found that current algorithms are less effective in recognizing small flame targets, including those formed by occlusion and those appearing at the edges of the image due to viewpoint limitations. This situation is caused by two factors: firstly, the number of pixels that can represent small target features is low; and secondly, the number of small targets in the forest fire data set is low. Thus, we proposed a new data augmentation approach, a diagonal swap of random origin, in order to enhance the identification of small targets in forest fire detection from the perspective of the data set. The diagram of this approach is shown in Figure 9.

The approach is comparatively simple but practical. Initially, we determine whether the graph has only one target label, and whether that target is a medium-sized object ranging from

32^{2}

to

96^{2}

pixels in the area. The next step is to generate a random origin in the area between 25% and 75% of the label width and height (the green area in Figure 9); the image is then quadratically divided based on the origin. Then, the two diagonal images are swapped to form a new image. This approach not only increases the number of small targets, but also increases the scenarios with incomplete samples of flame targets due to viewpoint limitations; this improves the generalization of the model. Note that the data augmentation is used randomly, and its probability varies depending on the data set—it should not be too high, generally between 10% and 20%.

3. Results

3.1. Training

Our forest fire detection model, MTL-FFDet, was built by PyTorch (v1.7.1, Facebook AI Research, New York, USA) and trained on NVIDIA RTX 3080 (NVIDIA corporation, St. Clara State of California, USA) with 10 GB memory. In training, we used the learning strategies of warmup and cosine learning rate, which helped to mitigate the early overfitting of the model and keep the distribution smooth, as well as maintain the stability of the model with deep layers. There are two ways of training multiple tasks; one is to backpropagate the loss for each task, step by step; another way is to backpropagate the total loss, which is a weighted sum of the individual task losses. In terms of performance, these two approaches are about the same. However, as for convenience, the second approach is better. The coefficients

α_{1}

,

α_{2}

,

α_{3}

,

β_{1}

,

β_{2}

,

λ_{1}

,

λ_{2}

,

λ_{3}

in total loss (see Section 2.2.4) were set as 0.5, 1, 0.05, 0.5, 0.4, 1, 0.1, 0.2, respectively. As for the other training parameters, they are listed in Table 1.

3.2. Comparisons

In order to demonstrate the performance of our MTL-FFDet model, we use a validation set to evaluate each task in our model. In our experiments, our model was not only compared with other models, but was also compared for its the performance between single-task and multi-task detection (see Table 2 and Figure 10). Since the detection task is the principal task of our model while the other two act as auxiliaries, performance improvement in the detection task was considered more important than in other models. It is worth noting that when training our model for a single task, it was necessary to freeze the other two task branches.

Microsoft COCO criteria [54] is widely used to evaluate object detection tasks. In the criteria, there are two main metrics, average precision (AP) and average recall (AR); their formulas are shown in Equations (7)–(10).

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

A P = \sum_{i = 1}^{n - 1} (R_{i + 1} - R_{i}) P (R_{i + 1})

(9)

A R = 2 \int_{0.5}^{1} R (o) d o

(10)

where

T P

,

F P

and

F N

represent the numbers of true positives, false positives and false negatives, respectively.

P

and

R

represent precision and recall, respectively. The variable

n

represents the number of recall levels (for COCO criteria, it has 11 levels, ranging from 0.0 to 1.0, with intervals of 0.1). The variable

o

represents the IoU between the prediction box and the ground truth box.

As shown in Table 2, we found that our MTL-FFDet model using data augmentation and JM-NMS proposed in this paper is significantly better than other methods, including YOLOv5-s, YOLOv3-tiny and NanoDet-g, reaching 56.3% in mAP. Compared with single-task learning, the shared feature extraction module for multi-task learning yielded excellent performance, with an improvement of 3.6%. After the data enhancement strategy of the diagonal swap of random origin, the detection of our model on small targets greatly improved, by 4.8% in

A P_{S}

and by 4.0% in

A R_{S}

(compare with YOLOv5-s). In addition, the joint multi-task NMS processing algorithm allowed some missed and false targets to be correctly identified, improving most metrics by roughly 1% to 2%.

For the other two tasks, we used three metrics, Acc, IoU and mIoU [55], in order to evaluate the performance of the segmentation task; we used P, R and F1-score [56] metrics for the classification task. As seen in Table 3, though the two tasks were not the main task, the metrics still show a slight improvement compared with other models. The formula of the evaluation metric is calculated by Equations (11)–(14):

A c c = \frac{T P + T N}{N_{p}}

(11)

I o U = \frac{T P}{T P + F P + F N}

(12)

m I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{T P}{T P + F P + F N}

(13)

F 1_score = 2 \frac{P \times R}{P + R}

(14)

where

N_{p}

represents the number of all pixels in the test image, and

k

represents the number of classes.

Our multi-task model (Figure 10a,e) showed advantages over other models, both in the detection task and in the segmentation task, especially in the recognition of small fire targets (IMG1, IMG3 and IMG4). In addition, our multi-task model improved the missed and false detections, such as the firefighter in IMG2, and the red hat in IMG3, as a result of inclusion of the joint multi-task NMS processing algorithm. According to the segmentation results (Figure 10e,f,g), the fire target profile is more precise and effectively distinguishes the fire-like targets.

3.3. Visualization and Analysis

In order to investigate why our multi-task model, MTL-FFDet, performs better than the single-task model, we used Eigen-CAM [57] for visualization. Considering that the biggest difference between multi-task and single-task is the shared feature extraction module, the final layer of our feature extraction module, the eighth layer (see in Section 2.2.1), was analyzed for feature visualization. As shown in Figure 11, the multi-task model focused more precisely on the flame region, while the single-task model was limited to extracting the rough flame region and contained some redundant features. The accuracy of extracting flame features led to the discrepancy in the subsequent feature fusion and detection modules of the two models.

4. Discussion

Forest fire detection is more challenging to carry out compared with other types of vision inspection tasks (e.g., face detection, defect detection, lane line detection, etc.). The irregular shape of the flame target varies from moment to moment, and the interference of many features brought about by the complex forest environment makes the detection task difficult. A delayed or even missed detection may turn into a large-scale fire, causing devastating losses. Therefore, the use of computer vision technology instead of manual inspection is an advantageous and feasible solution, and improving computer vision technology’s detection accuracy is one of the key objectives.

In ideal circumstances, color gamut-based studies [8,9] are indeed straightforward and efficient for the visual detection of forest fires, as the backgrounds in forests mostly contrast with flame targets. However, many misidentifications are brought on by changes in seasons, lighting and in the environment, which make forest fire detection systems malfunction. Consequently, numerous researchers have added shape [58], texture [59] and spatio-temporal [60] features in order to reduce the occurrence of missed and false detections, as well as improve the reliability of the forest fire detection task. Similarly, from the perspective of reducing missed and false detections, Xu et al. [26] used an ensemble of three deep-learning models, which achieved higher accuracy. However, this ensemble learning-based model is too large to deploy on edge computing devices for real-time detection. Real-time and high accuracy detection models enable early identification and early warning, which are critical in controlling the development and spread of fires in the field of forest fire safety.

In order to improve the detection accuracy in this paper, we began by improving feature extraction. A novel multi-task learning-based forest fire detection model was proposed in this paper, which was built using hard parameter sharing [41], i.e., using the same feature extraction layer. The aim is to extract more accurate feature information through the joint learning of multiple tasks (the detection task, the segmentation task and the classification task), especially when the sample size is not particularly large. Through a series of experiments (Table 2 and Table 3), our model achieved better results than other common models in each task, with improvements shown in most metrics. With its lightweight design, the shared feature extraction module will reduce memory usage and enable efficient detection in real-time. In addition, the parametric visualization analysis (Figure 11) of our model in multi-tasking versus single-tasking also showed that multi-task learning in the backbone of feature extraction can better and more accurately focus on the flame region.

Furthermore, we also proposed two strategies to further improve flame detection accuracy. Considering that our model is outputted by three tasks, the joint multi-task NMS processing algorithm was proposed to make full use of these tasks and consequently reduce the occurrence of missed and false detections. After an experimental comparison test, the

m A P

improved by 2.5%, while other metrics achieved increments ranging from 0.5% to 2%. Another implemented strategy was the diagonal swap of random origin for data augmentation. The detection of small targets has always been a tricky problem in the field of vision-based detection, and also for the detection of small forest fire targets, which often appear to be obscured by trees as well as being incomplete targets captured on edges due to view limitations. The limited pixel representations of small flame targets, and the few small targets contained in the forest fire data set, cause poor performance of the model in extracting and learning small flame targets. Considering the objective fact that the divided flame targets in images are still flame targets, our proposed data augmentation strategy compensates well for the small number of small flame targets and improves the

A P_{S}

and

A R_{S}

metrics by 5.4% and 3.1% (compared with the original model), respectively.

However, there are still some limitations in our research. Firstly, small target detection is difficult to address fundamentally, due to the small amount of pixel information that is characterized. Our method only improves the detection quality from the perspective of data balancing. Secondly, smoke is a relatively common feature of early fires, and our model is not adapted for smoke. Finally, due to the complex environments in forests, the problem of shading by covered vegetation can render our model useless, while traditional sensors such as temperature and humidity sensors, infrared cameras, etc., are not affected. Therefore, multi-sensor fusion is also one of the main directions for future study.

5. Conclusions

In this study, we proposed a novel forest fire detection model, MTL-FFDet, and two improvement strategies based on it, namely the joint multi-task NMS processing algorithm, and the diagonal swap of random origin data augmentation. The main contributions are as follows: (1) the feature extraction module shared by the three tasks makes the network more sensitive and attentive to flame features, improving the performance of forest fire detection; (2) the joint multi-task NMS processing algorithm takes advantage of the differences between multiple tasks, and combines the advantages of each to reduce the occurrence of missed and false detections; (3) the number of small targets is enriched by the diagonal swap of random origin data augmentation, which greatly improves the detection accuracy of our model for small targets. Experiments show that our model substantially outperforms other models in most of the metrics. Our model also achieves an excellent trade-off between performance and efficiency thanks to the shared backbone and the use of lightweight convolutional modules which are suitable for real-time detection tasks deployed on UAVs for forest fire inspection.

For further study, we intend to concentrate our research efforts on multi-sensor fusion, in order to improve the detection of forest fires, and deploy our system on low-power and high-performance edge devices.

Author Contributions

Conceptualization, K.L.; data curation, J.H., J.L. and J.Z.; funding acquisition, Y.L.; methodology, K.L.; resources, J.H., X.C. and J.L.; software, K.L. and J.H.; validation, Y.L.; visualization, K.L. and J.H.; writing—original draft, K.L.; writing—review and editing, Y.L. and X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (grant number 2017YFD0600904).

Data Availability Statement

The data presented in this study are openly available in VisiFire [29], ForestryImages [30], FiSmo [31], BowFire [32], Firesense [33] and EFD-Dataset [34].

Conflicts of Interest

The authors declare no conflict of interest.

References

Abid, F. A survey of machine learning algorithms based forest fires prediction and detection systems. Fire Technol. 2021, 57, 559–590. [Google Scholar] [CrossRef]
Barmpoutis, P.; Papaioannou, P.; Dimitropoulos, K.; Grammalidis, N. A review on early forest fire detection systems using optical remote sensing. Sensors 2020, 20, 6442. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Tang, L.; Wang, H.; He, X. Early detection of forest fire based on unmaned aerial vehicle platform. In Proceedings of the 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019; pp. 1–4. [Google Scholar]
Yu, L.; Wang, N.; Meng, X. Real-time forest fire detection with wireless sensor networks. In Proceedings of the 2005 International Conference on Wireless Communications, Networking and Mobile Computing, Nagasaki, Japan, 6–9 December 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1214–1217. [Google Scholar]
Lloret, J.; Garcia, M.; Bri, D.; Sendra, S. A wireless sensor network deployment for rural and forest fire detection and verification. Sensors 2009, 9, 8722–8747. [Google Scholar] [CrossRef] [PubMed]
Bouabdellah, K.; Noureddine, H.; Larbi, S. Using wireless sensor networks for reliable forest fires detection. Procedia Comput. Sci. 2013, 19, 794–801. [Google Scholar] [CrossRef]
Mahmoud, M.A.; Ren, H. Forest fire detection using a rule-based image processing algorithm and temporal variation. Math. Probl. Eng. 2018, 2018, 7612487. [Google Scholar] [CrossRef]
Cruz, H.; Eckert, M.; Meneses, J.; Martínez, J.-F. Efficient forest fire detection index for application in unmanned aerial systems (UASs). Sensors 2016, 16, 893. [Google Scholar] [CrossRef]
Premal, C.E.; Vinsley, S. Image processing based forest fire detection using YCbCr colour model. In Proceedings of the 2014 International Conference on Circuits, Power and Computing Technologies (ICCPCT-2014), Thuckalay, India, 20–21 March 2014; pp. 1229–1237. [Google Scholar]
Hua, L.; Shao, G. The progress of operational forest fire monitoring with infrared remote sensing. J. For. Res. 2017, 28, 215–229. [Google Scholar] [CrossRef]
Arrue, B.C.; Ollero, A.; De Dios, J.M. An intelligent system for false alarm reduction in infrared forest-fire detection. IEEE Intell. Syst. Appl. 2000, 15, 64–73. [Google Scholar] [CrossRef]
Yuan, C.; Liu, Z.; Zhang, Y. Fire detection using infrared images for UAV-based forest fire surveillance. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, 13–16 June 2017; pp. 567–572. [Google Scholar]
Shaik, R.U.; Laneve, G.; Fusilli, L. An automatic procedure for forest fire fuel mapping using hyperspectral (PRISMA) imagery: A semi-supervised classification approach. Remote Sens. 2022, 14, 1264. [Google Scholar] [CrossRef]
Qadir, A.; Talukdar, N.R.; Uddin, M.M.; Ahmad, F.; Goparaju, L. Predicting forest fire using multispectral satellite measurements in Nepal. Remote Sens. Appl. Soc. Environ. 2021, 23, 100539. [Google Scholar] [CrossRef]
Barmpoutis, P.; Stathaki, T.; Dimitropoulos, K.; Grammalidis, N. Early fire detection based on aerial 360-degree sensors, deep convolution neural networks and exploitation of fire dynamic textures. Remote Sens. 2020, 12, 3177. [Google Scholar] [CrossRef]
Lu, K.; Xu, R.; Li, J.; Lv, Y.; Lin, H.; Liu, Y. A Vision-Based Detection and Spatial Localization Scheme for Forest Fire Inspection from UAV. Forests 2022, 13, 383. [Google Scholar] [CrossRef]
Sherstjuk, V.; Zharikova, M.; Dorovskaja, I. 3d fire front reconstruction in uav-based forest-fire monitoring system. In Proceedings of the 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2020; pp. 243–248. [Google Scholar]
Pan, H.; Badawi, D.; Zhang, X.; Cetin, A.E. Additive neural network for forest fire detection. Signal Image Video Process. 2020, 14, 675–682. [Google Scholar] [CrossRef]
Faraone, J.; Kumm, M.; Hardieck, M.; Zipf, P.; Liu, X.; Boland, D.; Leong, P.H. Addnet: Deep neural networks using fpga-optimized multipliers. IEEE Trans. Very Large Scale Integr. Syst. 2019, 28, 115–128. [Google Scholar] [CrossRef]
Zhang, Q.; Xu, J.; Xu, L.; Guo, H. Deep convolutional neural networks for forest fire detection. In Proceedings of the 2016 International Forum on Management, Education and Information Technology Application, Guangzhou, China, 30–31 January 2016; pp. 568–575. [Google Scholar]
Jiao, Z.; Zhang, Y.; Xin, J.; Mu, L.; Yi, Y.; Liu, H.; Liu, D. A deep learning based forest fire detection approach using UAV and YOLOv3. In Proceedings of the 2019 1st International Conference on Industrial Artificial Intelligence (IAI), Shenyang, China, 22–26 July 2019; pp. 1–5. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wu, S.; Zhang, L. Using popular object detection methods for real time forest fire detection. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; pp. 280–284. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
Xu, R.; Lin, H.; Lu, K.; Cao, L.; Liu, Y. A Forest Fire Detection System Based on Ensemble Learning. Forests 2021, 12, 217. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, H.; Wang, P.; Ling, X. ATT squeeze U-Net: A lightweight network for forest fire detection and recognition. IEEE Access 2021, 9, 10858–10870. [Google Scholar] [CrossRef]
Song, K.; Choi, H.-S.; Kang, M. Squeezed fire binary segmentation model using convolutional neural network for outdoor images on embedded device. Mach. Vis. Appl. 2021, 32, 120. [Google Scholar] [CrossRef]
Kong, S.G.; Jin, D.; Li, S.; Kim, H. Fast fire flame detection in surveillance video using logistic regression and temporal smoothing. Fire Saf. J. 2016, 79, 37–43. [Google Scholar] [CrossRef]
Douce, G.K.; Moorhead, D.J.; Bargeron, C.T., IV. Forestry Images. org: High resolution image archive and web-available image system. J. For. Sci. 2001, 47, 77–79. [Google Scholar]
Cazzolato, M.T.; Avalhais, L.; Chino, D.; Ramos, J.S.; de Souza, J.A.; Rodrigues, J.F., Jr.; Traina, A. Fismo: A compilation of datasets from emergency situations for fire and smoke analysis. In Proceedings of the Brazilian Symposium on Databases-SBBD, Uberlandia, Brazil, 4–7 October 2017; pp. 213–223. [Google Scholar]
Chino, D.Y.; Avalhais, L.P.; Rodrigues, J.F.; Traina, A.J. Bowfire: Detection of fire in still images by integrating pixel color and texture analysis. In Proceedings of the 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images, Salvador, Brazil, 26–29 August 2015; pp. 95–102. [Google Scholar]
Kucuk, G.; Kosucu, B.; Yavas, A.; Baydere, S. FireSense: Forest Fire Prediction and Detection System using Wireless Sensor Networks. In Proceedings of the the 4th IEEE/ACM International Conference on Distributed Computing in Sensor Systems (DCOSS’08), Santorini, Greece, 11–14 June 2008. [Google Scholar]
Li, S.; Yan, Q.; Liu, P. An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans. Image Process. 2020, 29, 8467–8475. [Google Scholar] [CrossRef] [PubMed]
LabelImg. Available online: https://github.com/heartexlabs/labelImg (accessed on 22 August 2022).
Labelme. Available online: https://github.com/wkentaro/labelme (accessed on 22 August 2022).
Ma, J.; Zhao, Z.; Yi, X.; Chen, J.; Hong, L.; Chi, E.H. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 16 November 2018; pp. 1930–1939. [Google Scholar]
Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
Multi-Task Learning: Theory, Algorithms, and Applications. Available online: https://104.239.175.136/meetings/sdm12/zhou_chen_ye.pdf (accessed on 22 August 2022).
Vandenhende, S.; Georgoulis, S.; Van Gansbeke, W.; Proesmans, M.; Dai, D.; Van Gool, L. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3614–3633. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; He, P.; Chen, W.; Gao, J. Multi-task deep neural networks for natural language understanding. arXiv 2019, arXiv:1901.11504. [Google Scholar]
Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
Sun, K.; Zhang, Y.-J.; Tong, S.-Y.; Wang, C.-B. Study on Rice Grain Mildewed Region Recognition Based on Microscopic Computer Vision and YOLO-v5 Model. 8 June 2022, PREPRINT (Version 1). Available online: https://www.researchsquare.com/article/rs-1716276/v1 (accessed on 22 August 2022). [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 9197–9206. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ruby, U.; Yendapalli, V. Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 5393–5397. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Martinez, M.; Stiefelhagen, R. Taming the cross entropy loss. In Proceedings of the German Conference on Pattern Recognition, Stuttgart, Germany, 9–12 October 2018; pp. 628–637. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Niu, J.; Chen, Y.; Yu, X.; Li, Z.; Gao, H. Data augmentation on defect detection of sanitary ceramics. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 5317–5322. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Ghorbanzadeh, O.; Blaschke, T. Optimizing Sample Patches Selection of CNN to Improve the mIOU on Landslide Detection. In Proceedings of the Geographical Information Systems Theory, Applications and Management, Heraklion, Greece, 3–5 May 2019; pp. 33–40. [Google Scholar]
Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding classifiers to maximize F1 score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
Bany Muhammad, M.; Yeasin, M. Eigen-CAM: Visual explanations for deep convolutional neural networks. SN Comput. Sci. 2021, 2, 47. [Google Scholar] [CrossRef]
Ryu, J.; Kwak, D. Flame Detection Using Appearance-Based Pre-Processing and Convolutional Neural Network. Appl. Sci. 2021, 11, 5138. [Google Scholar] [CrossRef]
Emmy Prema, C.; Vinsley, S.; Suresh, S. Efficient flame detection based on static and dynamic texture analysis in forest fire detection. Fire Technol. 2018, 54, 255–288. [Google Scholar] [CrossRef]
Chen, J.; Mu, X.; Song, Y.; Yu, M.; Zhang, B. Flame recognition in video images with color and dynamic features of flames. J. Auton. Intell. 2019, 2, 30–45. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Samples of the data set. (a) Forest fire images; (b) non-fire images in the forest scene.

Figure 2. An example of data annotation, where x and y denote the center point coordinates of the rectangular box, and w and h denote the width and height of the rectangular box, respectively.

Figure 3. The architecture of our MTL-FFDet. It contains three tasks, which are the detection task, the segmentation task, and the classification task.

Figure 4. The architecture of the backbone in our MTL-FFDet. The parameter ‘k’ denotes the kernel size, the parameter ‘s’ denotes the stride and the parameter ‘g’ denotes the group size.

Figure 5. The architecture of the neck in our MTL-FFDet. Its input comes from three different scales of the backbone. The up-sampling layer uses the bilinear interpolation method.

Figure 6. The architecture of the three heads in our MTL-FFDet. The inputs of the segmentation head and the detection head come from the corresponding output in the neck, while the input of the classification head comes from the backbone.

Figure 7. Samples of false and missed detections in the detection task and the segmentation task. The blue and green rectangles represent the missed detection and the false detection in the detection task, respectively, and the yellow rectangle represents missed detection in the segmentation task. (a) has one missed detection and one false detection; (b) has one missed detection; (c) has two missed detections in the segmentation task.

Figure 8. The graph of

ℱ (x)

when

a = 6

and

b = 1

.

Figure 8. The graph of

ℱ (x)

when

a = 6

and

b = 1

.

Figure 9. Diagram of the diagonal swap of random origin strategy.

Figure 10. Prediction results for the segmentation task and the detection task. (a) Detection outputs of our MTL-FFDet (multi-task); (b) detection outputs of our MTL-FFDet (det only); (c) detection outputs of YOLOv5-s; (d) detection outputs of YOLOv3-tiny; (e) segmentation outputs of our MTL-FFDet (multi-task); (f) segmentation outputs of FCN-16; (g) segmentation outputs of SegNet.

Figure 11. Eigen-CAM visualization results on multi-task and single-task detection. (a) Raw image; (b) multi-task; (c) single-task.

Table 1. Training parameters of our model.

Batch Size	Epoch	Optimizer	Momentum	LR0	LRF	WD
32	200	SGD	0.937	0.01	0.1	0.0005

Table 2. Comparison and ablation experiments for the detection tasks using our data set.

Model	$m A P$	$A P_{50}$	$A P_{S}$	$A P_{M}$	$A P_{L}$	$m A R$	$A R_{S}$	$A R_{M}$	$A R_{L}$	Params	Speed
MTL-FFDet(Det only) *	52.7	79.5	32.7	41.7	61.6	64.2	43.1	54.3	63.8	3.02 M	20 ms
MTL-FFDet(multi-task)	53.8	80.4	32.8	42.2	62.1	64.9	42.9	54.6	65.2	4.85 M	23 ms
+ diagonal swap	53.7	80.2	38.2	42.4	62.4	64.6	46.0	54.2	66.0	-	-
+ JM-NMS	56.3	80.3	37.8	44.8	64.2	65.5	46.2	54.4	67.1	-	-
YOLOv3-tiny	46.8	75.7	27.7	34.9	54.8	55.7	37.7	45.0	60.9	8.86 M	18 ms
YOLOv5-s	53.1	80.1	33.4	43.7	61.3	64.4	42.2	55.8	64.4	7.23 M	25 ms
NanoDet-g	49.3	78.6	29.2	39.8	56.7	58.2	40.6	48.5	55.0	3.81 M	17 ms

The subscript

50

means under the condition of IoU = 0.50, and the subscripts

S

,

M

and

L

represent small objects (area <

32^{2}

), medium objects (

32^{2}

< area <

96^{2}

) and large objects (area >

96^{2}

), respectively. Note that

m A P

,

A P_{50}

,

A P_{S}

,

A P_{M}

,

A P_{L}

,

m A R

,

A R_{S}

,

A R_{M}

and

A R_{L}

are all percentages, and that the speed was tested on NVIDIA RTX 3080. The bolded numbers indicate the best performance in the comparison. + Add ablation experienments on the basis of MTL-FFDet(multi-task). * MTL-FFDet involving the detection (Det) task only.

Table 3. Comparison results between different models in the segmentation task and classification task.

Model	Segmentation			Classification
	Acc	IoU	mIoU	P	R	F1-Score
MTL-FFDet(Seg only) *	98.3	74.9	86.4	\	\	\
MTL-FFDet(multi-task)	97.9	75.5	86.5	\	\	\
FCN-16	97.8	75.3	85.9	\	\	\
SegNet	92.6	72.7	80.6	\	\	\
MTL-FFDet(Cls only) *	\	\	\	97.8	98.2	98.0
MTL-FFDet(multi-task)	\	\	\	97.2	98.1	97.6
ResNet50	\	\	\	97.4	98.5	97.9
EfficientNet-B2	\	\	\	97.5	97.4	97.4

Note that Acc, IoU, mIoU, P, R and F1-score are all shown in percentages. The bolded numbers indicate the best performance in the comparison. * MTL-FFDet with the segmentation (Seg) task only and the classification (Cls) tasks only.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, K.; Huang, J.; Li, J.; Zhou, J.; Chen, X.; Liu, Y. MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection. Forests 2022, 13, 1448. https://doi.org/10.3390/f13091448

AMA Style

Lu K, Huang J, Li J, Zhou J, Chen X, Liu Y. MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection. Forests. 2022; 13(9):1448. https://doi.org/10.3390/f13091448

Chicago/Turabian Style

Lu, Kangjie, Jingwen Huang, Junhui Li, Jiashun Zhou, Xianliang Chen, and Yunfei Liu. 2022. "MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection" Forests 13, no. 9: 1448. https://doi.org/10.3390/f13091448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MTL-FFDET: A Multi-Task Learning-Based Model for Forest Fire Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Data set and Annotation

2.2. MTL-FFDet

2.2.1. Backbone

2.2.2. Neck

2.2.3. Head

2.2.4. Loss Function

2.3. Joint Multi-Task NMS

2.4. Data Augmentation

3. Results

3.1. Training

3.2. Comparisons

3.3. Visualization and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI