Next Article in Journal
Protein Disulfide Isomerase CfPdi1 Is Required for Response to ER Stress, Autophagy, and Pathogenicity in Colletotrichum fructicola
Next Article in Special Issue
Extraction of Pine Wilt Disease Regions Using UAV RGB Imagery and Improved Mask R-CNN Models Fused with ConvNeXt
Previous Article in Journal
A Novel Strategy for Constructing Large-Scale Forest Scene: Integrating Forest Hierarchical Models and Tree Growth Models to Improve the Efficiency and Stability of Forest Polymorphism Simulation
Previous Article in Special Issue
Sweetgum Leaf Spot Image Segmentation and Grading Detection Based on an Improved DeeplabV3+ Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Time Series Forest Fire Prediction Based on Improved Transformer

1
College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
2
Department of Criminal Investigation, Nanjing Forest Police College, Nanjing 210023, China
3
Institute of Meteorological Sciences of Jilin Province, Changchun 130062, China
4
National Satellite Meteorological Center, Beijing 100081, China
5
School of Design Arts and Media, Nanjing University of Science and Technology, Nanjing 210094, China
*
Author to whom correspondence should be addressed.
Forests 2023, 14(8), 1596; https://doi.org/10.3390/f14081596
Submission received: 17 June 2023 / Revised: 11 July 2023 / Accepted: 2 August 2023 / Published: 7 August 2023

Abstract

:
Forest fires, severe natural disasters causing substantial damage, necessitate accurate predictive modeling to guide preventative measures effectively. This study introduces an enhanced window-based Transformer time series forecasting model aimed at improving the precision of forest fire predictions. Leveraging time series data from 2020 to 2021 in Chongli, a myriad of forest fire influencing factors were ascertained using remote sensing satellite and GIS technologies, with their interrelationships estimated through a multicollinearity test. Given the intricate nature of real-world forest fire prediction tasks, we propose a novel window-based Transformer architecture complemented by a dual time series input strategy premised on 13 influential factors. Subsequently, time series data were incorporated into the model to generate a forest fire risk prediction map in Chongli District. The model’s effectiveness was then evaluated using various metrics, including accuracy (ACC), root mean square error (RMSE), and mean absolute error (MAE), and compared with traditional deep learning methods. Our model demonstrated superior predictive performance (ACC = 91.56%, RMSE = 0.37, MAE = 0.05), harnessing spatial background information efficiently and effectively utilizing the periodicity of forest fire factors. Consequently, the study proves this method to be a novel and potent approach for time series fire prediction.

1. Introduction

Fire timing prediction refers to the application of computational models, machine learning, and artificial intelligence techniques to predict the occurrence and spread of fires. The history of this field spans several decades, tracing its origins back to traditional statistical models and evolving to sophisticated machine learning algorithms, which have become increasingly prevalent with the advancement of computing power.
The early efforts to predict fire occurrence and timing were primarily based on statistical models and empirical relationships. Researchers sought to identify factors that influenced fire ignition and spread, such as weather variables, topography, and vegetation characteristics. Rothermel (1972) provided a seminal work in this area, developing a mathematical model for predicting the rate of spread of wildfires [1]. His work formed the foundation for many subsequent fire prediction models. Another notable early study was that of Andrews (1986), who developed the BEHAVE fire behavior prediction system, which utilized Rothermel’s equations and additional inputs to estimate fire spread rates and intensities [2]. The BEHAVE system was one of the first operational tools used by fire management agencies to aid in decision-making. Now he scholarly discourse surrounding forest fire prediction and prevention has recently seen innovative strides. For instance, a study by Yang (2023) puts forth the Preferred Vector Machine method for detecting forest fires, underscoring the potential of deep learning in improving the prediction accuracy [3].
The study by Ting Yun et al. demonstrated the important role of deep learning in forest resource management. By combining deep learning techniques and LiDAR data, they achieved efficient and accurate forest parameter inversion [4,5] and forest structure reconstruction [6,7]. The wide application of deep learning techniques allows for better understanding and analysis of forest ecosystems through large amounts of data-driven model learning and optimization. The results of these studies help to enhance the understanding of forest conservation, management and sustainable use, and promote the sustainable development of forest resources. Therefore, the importance of deep learning techniques in the field of forest resource management cannot be ignored.
With advances in data transmission technology, real-time access to remote sensing, meteorological and other data facilitates forest fire prediction [8,9]. At the same time, the advent of remote sensing technology and Geographic Information Systems (GIS) greatly enhanced the capabilities of fire prediction models [10]. Satellite imagery and remote sensing data allowed researchers to monitor large areas and obtain real-time information on fire occurrences, land cover, and weather conditions. Kogan (1997) was among the first to use remote sensing data to predict fire occurrences, specifically by using Advanced Very High Resolution Radiometer (AVHRR) data to monitor vegetation conditions and estimate fire risk [11]. Keeley et al. (1999) integrated GIS with remote sensing data to create spatially explicit fire risk models [12]. Zhao (2021) showcase the utility of GIS in predicting forest fires, exemplifying how geospatial technology help create detailed risk maps [10]. These advancements in technology and data availability marked a significant shift in the way fire prediction models were developed and used.
With the advent of artificial intelligence, data-driven approaches were introduced to fire prediction. Viegas, Viegas, and Ferreira (1992) proposed a neural network model for predicting fire propagation. The model was trained using a set of historical fire incidents and demonstrated its potential in predicting future incidents [13]. Around the same period, expert systems were also applied in fire timing prediction. Expert systems such as FARSITE (Finney, 1998) and BEHAVE (Andrews, 1986) were developed to simulate fire spread based on various environmental and geographical factors [2,14]. These expert systems remained popular until the emergence of advanced machine learning models.
With the proliferation of machine learning in the 21st century, ensemble learning techniques started being applied in fire prediction. Cortez and Morais (2007) demonstrated that the combination of several weak learning models can provide accurate and reliable fire predictions [15].
In recent years, the rapid advancements in machine learning and artificial intelligence (AI) have provided new opportunities for improving fire timing prediction algorithms. Researchers have employed various machine learning techniques, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Random Forests, to predict fire occurrences and timings with increased accuracy. Chen et al. (2011) used ANNs to predict daily fire occurrence in the Mediterranean region with high accuracy [16]. Rodrigues et al. (2014) employed SVMs to predict fire occurrences in Portugal and found them to be more effective than traditional statistical models [17]. Vakalis et al. (2016) compared the performance of multiple machine learning techniques for predicting fire occurrences and found that Random Forests performed the best [18]. Moreover, deep learning models such as convolutional neural networks (CNN) and recurrent neural networks (RNN) were employed to predict fire incidents based on various factors such as weather conditions, geographical features, and historical fire data (Chen, Li, Zhu, and Goldsborough, 2018) [19].
The Transformer architecture, introduced by Vaswani et al. (2017) as part of the “Attention is All You Need” paper, marked a watershed moment in the field of natural language processing and has had profound implications for machine learning, including time series prediction [20]. The Transformer model deviates from traditional sequence prediction models that rely on recurrent architectures, such as Long Short-Term Memory (LSTM) or Gated Recurrent Units (GRU). These recurrent models process data sequentially, which limits their ability to parallelize computations and consequently affects scalability. The Transformer model, on the other hand, uses a mechanism known as the “attention mechanism”, which captures dependencies between all input elements regardless of their distance from each other. This architecture allows for parallelized computations, making it more efficient.
In recent years, the Transformer model has achieved great success in fields such as natural language processing and computer vision. At the same time, the model can effectively capture the long-distance dependencies in the sequence through the self-attention mechanism and provide better modeling capabilities. Therefore, transformers are widely used in the field of time series prediction.For example, Bai, Kolter, and Koltun (2018) developed the Temporal Transformer Network (TTN), which applied the self-attention mechanism to time series forecasting [21]. Guo, Yang, and Lu (2020) developed the Transformer encoder-decoder for multivariate time series forecasting, which proved superior to traditional LSTM and GRU models [22].
As for the application of Transformer-based models in fire timing prediction, research is still in its nascent stage, however the inherent potential of Transformer-based models in capturing complex temporal dependencies, their scalability, and efficiency make them ideal candidates for application in fire timing prediction, where the data often comprises long-term dependencies and is subject to sudden changes. In this paper, the forest fire time series prediction will be based on the transformer, and the algorithm will be improved to obtain better prediction results.
In addition to the algorithm, this paper has made a thorough consideration of the selection and processing of the data. Although previous methods have been proven to be effective in predicting forest fires, they rely entirely on data to a certain extent and cannot effectively extract features to improve classification accuracy in the face of larger sample data sets. Forest fire has a temporal and spatial pattern. Different factors affecting wildfires in space will lead to different probabilities of wildfires. In terms of time, the factors affecting wildfires for several consecutive days also affect the possibility of wildfires. Furthermore, multivariate temporal forecasting poses a major challenge: how to efficiently capture and exploit the correlations among multiple variables.
To naturally align with local temporal dependencies common in time series data, and to handle longer input sequences, this paper introduces a windowed attention mechanism to improve performance and computational efficiency. The main idea behind it is to restrict the self-attention operation to a fixed-sized window centered on each element. The original Transformer model proposed by Vaswani et al. (2017), compute attention weights between each pair of elements in the input sequence [20]. While this allows the model to capture long-range dependencies, it also incurs quadratic computational and memory complexity relative to sequence length, which can be prohibitive for very long sequences.
For Transformer, an input sequence is first passed through an encoder, which produces a sequence of context-aware encodings. These codes are then fed into the decoder, which makes predictions based on the codes and its own previous output, reflecting the autoregressive nature of time series forecasting. Furthermore, an important aspect of the Transformer model is the incorporation of positional encoding. Since the self-attention mechanism does not have any inherent notion of the position or order of the input data points, a positional encoding is added to the input embedding at the bottom of the encoder and decoder stack to encode the order of the time steps.
Consequently, the model produces a series of predicted data points. The model can effectively predict forest fires by using the local dependence of forest fire factors and the periodicity of long-term changes in some data. On this basis, the window attention mechanism is added for simulation to simulate the local time dependence of forest fires in the time series, so as to improve the effect of long-sequence time series forecasting (LSTF).
To be more specific, in this paper, firstly we select 16 factors related to forest fires, and connect them with historical fire point data according to time stamp and latitude and longitude to establish the original data sample library; at the same time, preprocess the data set, and use multicollinearity test and information gain rate to analyze all influencing factors Perform feature screening to reduce the impact of sample imbalance. Then design the Transformer model structure, optimize the model by introducing the window attention mechanism, and adjust its parameters to train the model classifier. Finally, a forest fire prediction model relied on Window-based Transformer was developed, and a forest fire sensitivity map of the study area was drawn. Several metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percent Error (MAPE), and Accuracy (ACC), were used to compare the performance of the models in this paper with other machine learning methods.
In this study, a novel forest fire prediction model called Window-based Transformer combined with the filtered data is proposed. The primary objectives of this paper mainly encompass (1) Select and process various factors related to meteorology, topography and human activities as input features for the model. (2) Evaluate the performance of the proposed methods on a case study of Chongli district in China, where forest fire are frequent and pose a serious threat to the environment and human safety. (3) Compare the proposed method with several traditional deep learning methods, such as LSTM, RNN, and SVM, and to demonstrate superiority in performance. (4) Draw a forest fire prediction risk map based on the predicted data, which can provide guidance for fire prevention and management.Our study is based on the concept of LSTF, combined with a Window-based Transformer, which effectively performs automatic processing of long time series data for fire prediction in the Chongli district.

2. Materials and Study Area

2.1. Overview of the Study Area

The research area is located in Chongli District, Zhangjiakou City, Hebei Province, in the transition zone between the Inner Mongolia Plateau and the North China Plain, 114 ° 17 115 ° 34 E, 40 ° 47 –41 ° 17 N, with a total area of 2334 km 2 ; 80% of the area is mountainous, with a forest coverage rate of 52.38%, as shown in Figure 1. The landform is a transitional mountainous area above the dam and below the dam, mostly in the direction of northeast–southwest and east–west, with steep mountains and an altitude of 813–2174 m, with a maximum elevation difference of 1361 m. It belongs to the temperate subarid zone of the continental monsoon climate in East Asia. The temperature rises rapidly but fluctuates greatly. The frosting period is late, the rainfall is relatively low, and the number of days with strong winds is more. The surface water system is mainly precipitation. Due to the influence of the geographical location and terrain, fires have a strong seasonality, mainly concentrated in mid-February to mid-May.
The climate in Chongli District belongs to the temperate subarid zone of the East Asia continental monsoon climate. The average temperature is 19 °C in summer and −12 °C in winter. In addition, the average wind speed in this area is only level 2, the snow falls early, and the snow accumulation period is long. The average annual precipitation is 483.3 mm, the total precipitation is 1.13 billion cubic meters, and the annual runoff is 100.69 million cubic meters.
Chongli has experienced several historical fires in the past decades, mainly caused by human activities, lightning strikes, and droughts. According to historical records, Chongli had 14 major forest fires from 2000 to 2017, with a total burned area of 1156.4 ha. The study also found that the fire occurrence was influenced by factors such as elevation, slope, aspect, vegetation type, road density, population density, and meteorological conditions. Chongli has a rich and diverse vegetation cover, and the forest coverage rate reached 67% in 2020, mainly consisting of forest, grassland, shrubland, and cropland. The forest vegetation is dominated by coniferous species, such as Pinus tabuliformis and Picea meyeri, and broad-leaved species, such as Betula platyphylla and Quercus mongolica. The grassland vegetation is mainly composed of Stipa grandis and Leymus chinensis. The cropland vegetation mainly grows wheat, corn, and potato. Chongli is also an important area for socioeconomic activities in Hebei Province. It has a population of about 130,000 people and had a GDP of about CNY 6 billion in 2019. As one of the main venues for the 2022 Beijing Winter Olympics, Chongli is famous for its winter sports industry and tourism resources. Chongli also has rich cultural heritage and natural scenery. Therefore, it is necessary to predict the occurrence of forest fires in this area.

2.2. Data Source

2.2.1. Generations of Datasets

The occurrence of forest fires is caused by many factors and is always affected by the natural environment and human factors [23]. Therefore, it is very important to quickly and accurately obtain information such as identifying its geography and environment [24] and to find out the spatiotemporal laws of fire occurrence. Additionally, it is very important to reasonably determine the influencing factors of forest fire prediction to improve the accuracy of the model.
The four types of data sources selected in this paper to study the Chongli area are terrain data, meteorological data, vegetation data, and data related to human activities. The selection of these sources involved satellite imagery information and the current record of fire information from previous years. Geographical factors and vegetation factors are obtained from remote sensing data. Weather information includes information about temperature, wind, rainfall, and other variables. Human activity data include population data, distance to roads, railways, and rivers.
In ArcGIS 10.4, we can use remote sensing data (Landsat and MODIS) to obtain geographic and vegetation data, such as slope and vegetation cover. Then, we can extract and merge these remote sensing data with human activity and meteorological data to create a dataset containing 16 impact factors. After that, rasterization was performed using ArcGIS 10.4, and each feature was converted to a raster format with the same pixel size (1 km × 1 km), data type (8-bit unsigned integer), and coordinate system (GCS_WGS_1984), and a total of 2125 rasters were obtained. Each feature obtained from the raster feature inversion is converted into point elements, and the image element value of each raster is obtained, and the latitude and longitude are calculated by applying GCS_WGS_1984 coordinates to obtain a dataset indexed by the combination of time stamp and latitude and longitude, with 16 influencing factors and labeled by the occurrence of fire.
In the forest fire problem, the dataset often suffers from sample imbalance, with the sample of the no-fire class being much larger than that of the yes-fire class. Oversampling has been shown to be a robust solution to this problem [25]; also, considering the spatial aggregation characteristics of fires, according to the first law of geography proposed by Waldo Tobler (correlation between features is related to distance; in general, the closer the distance, the greater the correlation between features; the further the distance, the greater the dissimilarity between features), any raster near a fire point is more likely to be a fire point [26]. Therefore, this study used a combination of buffer analysis and oversampling to increase the number of firelike events and to reduce the effect of the sample imbalance phenomenon to some extent. First, 176,732 samples were divided into 20% (35,346 samples) as the test set, and the remaining samples were oversampled to create a buffer based on a square boundary, resampling the fire label of rasters within the 5 km buffer to 1 (with fire) and the rasters outside the buffer to 0 (without fire).
The workflow for data acquisition and processing is shown in Figure 2, and each process will be specified in the subsequent sections.

2.2.2. Factors Influencing Forest Fires

In this study, the data from 2020–2021 were chosen as the historical fire dataset, during which the Chongli District experienced 22 fires, including 15 fires in the spring, which accounted for 68% of the total number of fires, the majority of which took place in the eastern part of the study area. Based on previous studies, and considering various influencing factors, such as meteorology and geography [27], 16 forest influencing factors were finally identified, as shown in Table 1.
The daily values of temperature, wind speed, rainfall, specific humidity, and atmospheric pressure are selected as meteorological influencing factors. According to the data provided by the national comprehensive meteorological information sharing platform CIMISS, temperature, specific humidity, and relative humidity (RH) have the following relations:
R H = ( T M P 273.5 ) × P R S / [ 0.378 × ( T M P 273.5 ) + 0.622 ] / ( 6.112 e 17.67 S H U S H U + 243.5 ) .
In the formula, T M P represents temperature, S H U represents specific humidity, and P R S represents atmospheric pressure. Because there is a mapping relationship between temperature, specific humidity, and relative humidity, in order to prevent collinearity of characteristics, specific humidity is selected to represent the water vapor content in the air.
Meteorological influencing factors have dense and evenly spaced data, and the inverse distance weighted (IDW) method is used to fill in the missing values of the five factors; in forest fire assessment and prediction, terrain characteristics are also considered to be an important influencing factor, where slope and aspect are the most important features. Vegetation distribution has an important impact on forest fires. The normalized vegetation index (NDVI), plant moisture content (divided into canopy vegetation moisture content and surface litter moisture content), combustibility, and vegetation coverage were selected as five kinds of feature. NDVI can reflect the health status of vegetation and the basic distribution of vegetation coverage, and partially eliminate the influence of radiation changes related to atmospheric conditions, such as solar altitude angle, satellite observation angle, terrain, and cloud shadow. This is a method to quantify vegetation by calculating the difference between the near-infrared band reflectance (NIR) and the red band reflectance (Red); the specific calculation formula is as follows:
N D V I = N I R R e d N I R + R e d .
Using the digital elevation model (DEM) inversion obtained by a Landsat satellite to obtain slope and aspect (Figure 3a,b) and obtain vegetation coverage through NDVI (Figure 3c) according to the Forest and Grassland Fire Risk Assessment Technology ”Procedure” to classify the flammability of plant species (Figure 3d); weigh the dry and fresh weight of plants and litter to obtain moisture content (Figure 3e,f); and apply the buffer tool in the ArcGIS 10.4 toolbox calculates the Euclidean distances from each grid to rivers, railways, and highways (Figure 3g–i).

2.2.3. Impact Factor Assessment

Feature selection is particularly important in forest fire prediction [28], and high-dimensional training samples will complicate the prediction process and reduce prediction accuracy [29]. In this study, multicollinearity analysis and information gain ratio (IGR) were used to evaluate forest fire impact factors, and multicollinearity analysis was used to estimate the correlation between forest fire impact factors. A multicollinearity test was carried out on forest fire impact factors by tolerances (TOL) and variance inflation factor (VIF), calculated as follows:
V I F = 1 1 R i 2
T O L = 1 R i 2 .
where R i represents the degree of linear correlation between the ith independent variable and all other independent variables. Factors were considered collinear if TOL was less than 0.1 or VIF was greater than 10 (Colkesen et al., 2016) [30]. The test results of each influencing factor are shown in Table 2.
After testing, among the 16 factors related to forest fires, the normalized difference vegetation index, atmospheric pressure, and distance to the railway meet the critical value requirements, which are considered to be collinear and screened out to improve the prediction performance of the model. After excluding the features with collinearity, the average information gain rate of each feature is calculated so as to obtain the contribution of each factor to the occurrence of forest fires, further screening the features used in the model. The steps to calculate the average gain ratio is as follows:
First, we calculate the entropy of the dataset. For prediction problems, the entropy is calculated based on the distribution of class labels. It is a measure of the uncertainty and chaos of a dataset. ArcGIS 10.4 was used for rasterization, and each feature was converted into a raster format with the same pixel size (1000 m × 1000 m). For each plot A, in other words, a raster format pixel point, all its time series data constitute a dataset D. The formula for calculating the entropy of D i is
E n t ( D i ) = i = 1 t ( c i n ) log 2 ( c i n ) ,
In which n represents the label of the corresponding plot’s two situations of fire occurrence (fire or none fire), and c i is the fire label of time i for plot A in set D.
Then for dataset D, it has m = 13 forest fire influencing factors. Divided by the influencing factors, D is split into D 1 , D 2 , , D m . Every subset has a time series sequence of duration t, which also indicates the length of each set. For instance, subset D i has a sequence of d 1 i , d 2 i , , d t i . The formula to calculate the average gain rate of dataset D i is shown below:
A v e r a g e G a i n R a t e ( D i ) = 1 p i = 1 p E n t ( D ) j = 1 t ( d j i n ) E n t ( D i ) j = 1 t ( d j i n ) log 2 ( d j i n ) ,
where p is the number of rasters we collect in the study.
Consequently, the information gain rate of each feature is shown in Figure 4.
It can be seen from Figure 4 that the average information gain rate of the 13 features is greater than 0, which has an impact on the predictive ability of the model, and all of them are reserved for a subsequent model input. Combustibility has the greatest impact on the occurrence of forest fires, with an average information gain rate of 0.133, followed by temperature, wind speed, and rainfall.

2.2.4. Establishment of Datasets

The selected 13 influencing factors, such as meteorology, topography, vegetation, and human activities, were normalized to eliminate dimensional effects. ArcGIS 10.4 was used for rasterization, and each feature was converted into a raster format with the same pixel size (1000 m × 1000 m), data type (8-bit unsigned integer), and coordinate system (GCS_WGS_1984), and a total of 2125 rasters were obtained. Convert the features obtained from grid feature inversion into point elements, obtain the pixel value of each grid, use GCS_WGS_1984 coordinates to calculate the latitude and longitude, obtain the combination of time stamp and latitude and longitude as the index, and use meteorology, terrain, vegetation, human activities. It is characterized by 13 influencing factors, such as correlation, and is a dataset with the label of whether the fire occurred.
In the forest fire problem, the dataset often has sample imbalance, and then the samples without fire are much larger than those with fire. Oversampling has been proven to be a robust solution to this problem, ref. [25] at the same time, considering the spatial aggregation characteristics of fires, according to the first law of geography proposed by Waldo Tobler (1970). The correlation between objects is related to the distance. Generally speaking, the closer the distance, the greater the correlation between fire points. Therefore, this study uses a combination of buffer analysis and oversampling to increase the number of fire events and reduce the impact of sample imbalance to a certain extent. First, 20% of the 176,732 samples (35,346 samples) were divided as the test set, the remaining samples were oversampled to establish a buffer zone based on a square boundary, the grid in the 5 km buffer zone was resampled to 1 (with fire label), the pixels outside the buffer were resampled to 0 (no fire label), and finally, 272,583 samples were obtained. The dataset obtained after oversampling uses the sliding window verification format, and according to the times tamp in seven steps, a training set of 194,703 samples and a verification set of 77,880 samples were finally obtained.

3. Research Method

3.1. Algorithm Model

The window-based Transformer model for time series prediction is a sophisticated framework designed for capturing intricate temporal relationships within sequential data. It is composed of four main sections:
Input Embedding Layer: The section of the model where the input data are initially received and processed.
Encoder: The part of the model where the primary analysis of the input data is conducted. It contains subcomponents, such as multihead self-attention with window attention and position-wise feed-forward network.
Decoder: This part of the model is designed to generate outputs based on the processed inputs from the encoder. It mirrors the structure of the encoder but also includes an additional multihead self-attention subcomponent.
Output Layer: The final part of the model, where the final prediction is made based on the information processed in the earlier sections.
These sections work together to capture both the local and global temporal dependencies present in time series data, thus enabling the model to make highly accurate predictions. The whole structure is shown in Figure 5 below:

3.1.1. Input Embedding

The initial step in the window-based Transformer model’s processing pipeline is the input embedding layer. This layer is designed to convert discrete input tokens into continuous vector representations that can be handled by the model. The influencing factors d t j represent the tth plot time series, jth in 13 influencing factors of the corresponding plot. Let us denote our two input sequences of tokens as X a = x 1 , x 2 , , x t 1 , 0 , X b = 0 , x 2 , , x t 1 , x t , where x i corresponds to a d- d i m e n s i o n a l vector token at the ith time step and t represents the sequence length. In sequences X a , x t is set to zero, while in sequences X b , x 1 is set to zero. This way of modifying the time series sequence as input creates two distinct views of the same sequence.
The input embedding layer begins by mapping each token x i to a vector e i using an embedding matrix E. This process can be represented by the following equation:
e i = E [ x i ] .
This equation represents a lookup operation where the x i t h row of the matrix E is selected. Note that E is a learnable parameter in the model.
Subsequent to this embedding process, positional encodings are added to infuse the order of the tokens within the sequence. Given the lack of inherent sequential processing in the Transformer model, these encodings are crucial for recognizing patterns across different positions in the sequence.
These processed vectors Z = z 1 , z 2 , , z t constitute the output of the input embedding layer, which is then passed onto the subsequent layers in the Transformer model for further processing.

3.1.2. Encoder

The encoder of the Transformer model with window attention is composed of a stack of identical layers. Each layer consists of two sublayers: a self-attention mechanism with window attention modification and a position-wise feed-forward network. Then we employ a residual connection around each of the two sublayers, followed by layer normalization.
(1)
Window-Based Self-Attention Mechanism
This mechanism allows the model to weigh the relevance of each token in the sequence Z 1 , Z 2 with respect to others. However, in the standard self-attention mechanism, every token attends to every other token, which may not be computationally feasible for large sequences and might not be necessary if long-range dependencies are rare.
The window attention modification addresses this by limiting the attention to a fixed-size window of adjacent positions. It introduces a parameter w that defines the window size within which each position can attend. The window size w restricts the range of interaction, reducing computational complexity and focusing on local dependencies. For position i, the window of attention ranges from ( i w ) to ( i + w ) .
The attention score a i j between positions i and j is computed as follows:
a i j = exp ( Q i K j T d ) m = max ( i w , 1 ) min ( i + w , t ) exp ( Q i K j T d ) ,
where Q i and K j represent the query and key vectors for positions i and j, respectively, d is the dimensionality of these vectors, and the denominator is a normalization term ensuring the attention scores sum to 1 over the window of attention. These scores are then used to compute a weighted sum of value vectors V:
H i = m = max ( i w , 1 ) min ( i + w , t ) a i m V m ,
where H i is the output vector for position i.
In the next step, we add a residual connection to the output of the self-attention mechanism. This is performed by adding the original input x i to the output H i of the self-attention mechanism for each position i; the result is normalized by layer normalization to yield O i x :
O i x = L a y e r N o r m ( z i + H i ) .
(2)
Position-wise Feed-Forward Network
Following the process of layer normalization, each variable O i x is funneled through a position-wise feed-forward network (FFN), which essentially comprises two linear transformations that are separated by a ReLU (rectified linear unit) activation function. The associated formula for this transformation is as follows:
F F N ( x ) = m a x ( 0 , x W 1 + b 1 ) W 2 + b 2 .
The output of this feed-forward network, denoted as O i x , is generated for every unique position i and is computed directly from the input O i x through the FFN.
The FFN output for each position, O x , then goes through a second residual connection and is subsequently normalized as O x , which represents a transformed version of the original sequence, which can then be relayed to the next encoder layer or to the decoder for further processing.

3.1.3. Decoder

The decoder in the Transformer model, which also consists of several identical layers, has a critical role in generating the output sequence based on the encoded input sequence. Each layer in the decoder has three primary components: masked window self-attention, encoder–decoder attention, and a position-wise feed-forward network (FFN).
(1)
Window-Based Masked Self-Attention Mechanism
This module is similar to the window self-attention mechanism in the encoder, but it incorporates a masking operation to ensure that the prediction at any time step can only depend on known outputs at preceding steps, thereby maintaining the autoregressive property.
The token embeddings are transformed into query (Q), key (K), and value (V) vectors, which are computed using learned linear transformations. Next, a masked self-attention mechanism is applied, which allows each token to attend to all positions up to and including its own position within a fixed window of positions. The attention score between positions i and j, a i j , is computed as
a i j = exp ( Q i K j T d ) m = max ( i w , 1 ) i exp ( Q i K m T d ) .
The output from the self-attention mechanism for position i, H i , is the weighted sum of the value vectors V:
H i = m = max ( i w , 1 ) i a i m V m .
Assume that Y = y 1 , y 2 , , y t is the output of the last window-based Transformer, where each y i is a d- d i m e n s i o n a l vector and t is the sequence length. The output H i of the self-attention mechanism for each position i is then summed with the original input token embedding y i (a residual connection), and this sum is normalized using layer normalization:
O i = L a y e r N o r m ( y i + H i ) .
(2)
Encoder–Decoder Attention Mechanism with Window Attention
The output from the layer normalization, O i , is then passed into a second attention mechanism, which performs attention over the encoder’s output, O x . This attention mechanism is also modified to attend within a window of positions in the encoder’s output. The attention scores are calculated as
a i j = exp ( O i K j T d ) m = max ( j w , 1 ) min ( j + w , t ) exp ( O i K m T d ) .
The output of the encoder–decoder attention for position i, P i , is then the weighted sum of the encoder’s output vectors:
P i = m = max ( j w , 1 ) min ( j + w , t ) a i j O m x .
The output P i then goes through another residual connection (added with O i y ) and layer normalization:
O i y = L a y e r N o r m ( O i y + P i ) .
(3)
Position-wise Feed-Forward Network
Each normalized output O i y undergoes a transformation through a position-wise feed-forward network (FFN). The transformed output from the FFN, O i y , is then added to its original input. This sum is subsequently normalized as O u t p u t i , culminating in the final output of the decoder layer.

3.1.4. Output Layer

The final component of the Transformer model for time series prediction is the output layer. The output from the decoder is then passed to the final output layer of the model. The output layer is typically a linear layer followed by a softmax operation, which converts the decoder’s output into a probability distribution over the target vocabulary. The process can be expressed as follows:
Each output vector O u t p u t i from the decoder is passed through a learned linear transformation, represented by the weight matrix W(dimension d × V ) and bias vector b(dimension V), where V is the size of the target vocabulary:
P r e d i = W O u t p u t i + b .
Here, P r e d i is a V- d i m e n s i o n a l vector, where each element corresponds to a word in the target vocabulary.
Then, a softmax operation is applied to convert each P r e d i into a probability distribution over the target vocabulary:
p i = s o f t m a x ( P r e d i ) .
Here, P i is also a V- d i m e n s i o n a l vector, but now each element is a probability (between 0 and 1, and all elements sum up to 1), corresponding to the model’s estimated probability of each word in the target vocabulary being the correct next word in the output sequence.
This completes the process of converting the decoder’s output into a sequence of predicted word probabilities. The word with the highest probability can be chosen as the model’s prediction for each position in the output sequence.

3.2. Performance Evaluation Index

The evaluation index is the key factor to establish the classifier model and verify its performance. This study established a regional forest fire classification prediction model, using the overall accuracy (ACC), specificity, sensitivity, positive predictive value (PPV) and negative predictive value (NPV), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) indicators to evaluate the classification ability of the model; the calculation formula is as follows:
A C C = T P + T N T P + T N + F P + F N , S p e c i f i c i t y = T N T N + F P , S e n s i t i v i t y = T P T P + F N , P P V = T P T P + F P , N P V = T N T N + F N , R M S E = 1 n i = 1 n ( y i y ^ i ) 2 , M A E = 1 n i = 1 n | y i y ^ i | , M A P E = 100 % n i = 1 n y i y ^ i y i .
In these formulas, true positive (TP) and true negative (TN) represent the number of samples correctly classified as fire and nonfire, respectively; false positive (FP) and false negative (FN), respectively, represent the number of samples misclassified as fire and nonfire; sensitivity is the proportion of correctly classified fire in the total fire; and specificity is the proportion of correctly classified non-fire categories out of the total non-fire categories.
At the same time, the global performance of the prediction model is evaluated and validated using the ROC curve and AUC value, describing the trade-off of classification with and without fire labels. The ROC curve of a good forest fire prediction classifier tends to rise sharply at the starting point, and can quickly identify and learn the characteristics of fire samples, and tends to be stable around the maximum value of 1; the ROC curve of ordinary classifiers is relatively closer to the diagonal line, the sensitivity is close to specificity, and the correct rate of the prediction of forest fire risk is close to 50%. The AUC value is considered to be an important indicator for quantitatively evaluating the overall accuracy of the classifier performance. The closer the AUC value is to 1, the better is the performance of the classifier.

4. Results

4.1. Model Parameters and Accuracy

The encoder is composed of a stack of 2 identical layers. Each layer has two sublayers. The first is a multihead self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward network. The decoder is also composed of a stack of two identical layers. In addition to the two sublayers in each encoder layer, the decoder inserts a third sublayer, which performs multihead attention over the output of the encoder stack. Speaking of the window attention mechanism we introduced, the window size (w) proved to be about 7 to obtain the best prediction effect, which means that the data of a week before and after a time point will be considered, which is more in line with the occurrence of forest fires. The Adam optimization algorithm is used to find the optimal solution of the overall loss function, the batch size is set to 32, the epoch is set to 100, the initial learning rate is set to 0.001, and the sliding window verification is performed on the validation set.
Constantly adjust the learning rate to minimize the model error, and finally, the loss value is the lowest when the learning rate is 0.00075, and the model reaches the ideal accuracy. Using the cross-entropy loss function to express the loss comparison between the training set and the verification set is shown in Figure 6. When the number of iterations reaches 100, it basically converges completely. The number of iterations is set to 100 to improve the efficiency of model training.

4.2. Forest Fire Susceptibility Mapping

After training, our study transforms the forest fire prediction outcomes generated by the network model into more understandable, discrete risk levels. As the network model outputs probabilities (ranging from 0 to 1), a direct interpretation of forest fire risk levels cannot be readily ascertained. To address this, we devised a classification scheme that would convert these probabilities into five distinct fire risk levels: very low, low, moderate, high, and very high. To initiate this process, we mapped the continuous range of probabilities output by the model to these five categories. Probabilities close to 0 would correspond to very low risk, and as the probability increases, the risk level escalates, with probabilities close to 1 corresponding to very high risk. We determined the precise probability thresholds for each category based on standard practices or our specific requirements.
Once we obtained the risk level for each grid cell in the dataset, we proceeded to visualize this information on a map. To provide a continuous and comprehensive view of the risk levels across the entire study area, we utilized an interpolation method within the ArcGIS environment. Interpolation allows us to estimate risk levels for locations where we did not have direct observations, based on the known values from nearby locations. The interpolation procedure generates a smooth surface where each point represents an estimated risk level. This results in a comprehensive, areawide forest fire risk map, offering an easily understandable, visual representation of the areas with varying degrees of risk. The produced map can assist forest fire management authorities to allocate resources more efficiently and develop targeted prevention and control strategies.
We also use traditional machine learning models, such as support vector machines, to predict forest fire susceptibility by feeding data from the test set. With several tests accomplished, we randomly selects eight plots in Chongli District to compare their forest fire risk level predicted by different methods, as shown in Table 3. Additionally, the prediction results of other methods are drawn in the map with ArcGIS 10.4, as shown in Figure 7.
From the results of the forest fire risk forecast map, it can be seen that the forest fire risk points in Chongli City are mainly distributed in the eastern and southeastern regions. The forest fire susceptibility maps drawn by the Transformer model and the long short-term memory (LSTM) model are relatively uniform in risk level, which is more in line with the actual situation. Most of the areas predicted by the traditional model are in the middle and high grades. In fact, the possibility of wildfires in most areas of Chongli is relatively small, and the traditional machine learning model predicts that the risk of forest fires in the entire Chongli area is relatively high, resulting in overfitting, indicating that in the traditional machine learning model, there are deficiencies in long-term forest fire risk prediction.

4.3. Performance

Evaluation and Comparison

A performance comparison of the four models on the verification set and the test set is shown in Table 4. The accuracy (91.56%), specificity (98.21%), sensitivity (81.34%), positive predictive value (98.02%), root mean square error (0.37), mean absolute error (0.05), and mean absolute percentage error (0.25) of the forest fire risk prediction model based on the Transformer algorithm on the test set are all better than the other traditional machine learning methods.
The ROC curves and AUC values of the four models are shown in Figure 8. The ROC curve of the Transformer algorithm forest fire risk prediction model performed best, with the largest upward trend at the starting point and the fastest rising speed, and then quickly stabilized. On top of the algorithm, its AUC value is also the largest among the four models.
The primary goal of this research was to enhance the accuracy and efficiency of forest fire prediction models, thus aiding in fire prevention efforts. Our results strongly corroborated this hypothesis, yielding an accuracy of 91.56%, having better generalization performance, and demonstrating a considerable improvement over traditional deep learning methods.
A key feature of our model is the window attention mechanism coupled with the dual time series input strategy. This allows the model to effectively capture the local dependence and periodicity of forest fire influencing factors. By capturing local dependence, our model can better understand the immediate influences on a given point’s fire risk. At the same time, by acknowledging the periodicity in the data, it can understand and forecast trends over time. The advancements in this study will likely aid significantly in forest fire prevention and control, ultimately contributing to the preservation of our natural ecosystems.

5. Discussion

5.1. Major Findings

Current methods for forest fire prediction often directly feed remote sensing data related to forestry into deep learning networks for prediction [31]. While this approach has its merits, it often neglects a critical component: sensitivity analysis of the data. The importance of certain features in contributing to the model’s predictive power is largely overlooked, leading to a considerable amount of data redundancy and repetition. This oversight invariably impacts the learning ability of the model, as it may overfit on redundant features, thereby diminishing its capacity to generalize.
Notably, several prevalent deep learning models have been employed for fire prediction, including LSTM, RNN, and SVM. LSTM and RNN, though effective for sequence data, struggle with long-term dependencies due to the problem of vanishing and exploding gradients. Moreover, they process data sequentially, resulting in slower training times. SVM, on the other hand, may underperform when dealing with high-dimensional and large-scale data, often resulting in suboptimal solutions due to its inherent quadratic programming problem [32]. These limitations, coupled with the issue of data redundancy, invariably affect the accuracy of fire predictions. In contrast, the proposed window-based Transformer architecture addresses these issues with its ability to process sequences in parallel and handle longer dependencies effectively.
The window-based transformer architecture, combined with a dual time series input strategy and a screening of remote sensing data based on impact factors, provides an innovative and effective approach to forest fire time series prediction.
First, the window-based attention mechanism uniquely embedded in the transformer model plays a vital role in grasping long-range dependencies in the time series data. Traditional models, such as LSTM and RNN, struggle with such dependencies due to the problem of vanishing and exploding gradients. In contrast, the window-based attention mechanism overcomes this limitation by incorporating a larger context for each position in the sequence. Consequently, this improved processing of dependencies directly contributes to the model’s enhanced predictive power [20].
Moreover, the transformer model’s inherent structure facilitates the parallel processing of sequences, resulting in faster training times. This advantage is especially valuable when dealing with large-scale fire prediction datasets, allowing for the efficient utilization of computational resources and timely predictions, a feature that is critically important in practical applications for early warning and mitigation planning.
Second, the utilization of a dual time series input strategy provides a comprehensive view of the temporal dynamics in the data. Instead of relying on a single time stamp, this approach involves setting two different time stamps to zero, thereby offering a more balanced perspective of the sequence data. This method potentially mitigates biases that could arise from over-reliance on a specific time stamp, consequently enhancing the robustness of the predictions.
Lastly, the preliminary screening of input features based on their significance helps to focus the model’s attention on the most pertinent information. This process reduces data redundancy, thereby preventing overfitting and improving the learning ability of the model. It ensures that the model is informed by relevant features and not misled by noise or less important factors, ultimately enhancing the accuracy of forest fire predictions.
In conclusion, the key findings of this research work include the following:
(1) Enhanced temporal predictive capabilities: The application of Transformer models with a window attention mechanism to time series prediction tasks significantly enhances the model’s predictive capabilities compared with traditional Transformer models [33]. By integrating a window-based attention mechanism that focuses on a specific subset of the input data, the model effectively prioritizes more relevant temporal data, leading to more accurate and efficient predictions.
(2) Effective forecasting in fire prediction: The use of the Transformer with a window attention mechanism for the specific task of forest fire prediction shows promising results [34]. The model accurately predicted the timing and potential severity of fires, demonstrating its potential for real-world application in natural disaster prevention and response.
(3) Improved computational efficiency: The window attention mechanism not only improved the Transformer model’s performance but also enhanced its computational efficiency. By limiting the attention scope to a specific window, the model reduces the computational cost associated with the self-attention mechanism, thereby processing data faster and more effectively [35].
(4) Robustness to hyperparameters: The robustness of the model to different choices of hyperparameters, such as batch size, time step, and epochs, was noted. This characteristic makes the model versatile for different application contexts and data characteristics [36].
These major findings were made possible by building upon existing work in the field of time series prediction and the development of Transformer models, as well as incorporating innovative modifications, such as the window attention mechanism.

5.2. Model Comparison

In this study, we proposed a novel method for forest fire risk prediction based on a window-based Transformer model. Here, we review some of the previous studies and discuss their results that utilized various methodologies in relation to the area of forest fire research. CNN, a widely used network in the deep learning community, has superior performance in forest fire detection (accuracy ranging from 90% [37] to 94% [38]). Furthermore, the utilization of MLP methods was successfully implemented for the real-time detection of predominant combustion phases, yielding an impressive accuracy rate (approximately 82.5% [39]). Lozano employed the CART algorithm to fire occurrence probability modeling (accuracy of about 88.39%). Catry [40], in an exploration of spatial pattern prediction for ignitions, incorporated human activities in a logistic-regression-based model, achieving acceptable results (accuracy of about 79.8% [41]). Furthermore, in a gesture to evaluate the performance and applicability of our method, we apply these deep learning methods to the same Chongli dataset for fire prediction performance comparison. Compared with traditional deep learning methods, our model is better for time series prediction, especially long sequence time series prediction (LSTF).
The LSTM (long short-term memory) is a type of recurrent neural network (RNN) that is designed to handle long-term dependencies. The LSTM method has been widely used for forest fire risk prediction, where the input data typically consist of a series of remote sensing time series data capturing various factors that affect forest fire risk, such as temperature, humidity, wind speed, and vegetation conditions. However, due to the complex and dynamic nature of forest environments, the LSTM method may face limitations when dealing with irregular and overlapping canopy shapes. This is because the input data may contain noise and incomplete information, such as atmospheric interferences caused by clouds, smoke, and other factors, as well as occlusions and shadows caused by the forest canopy. These factors can lead to inaccurate predictions, especially in areas with complex terrain and forest structures. Additionally, the LSTM method may struggle with accurately predicting forest fire risk in areas where there are multiple local apices in the forest canopy, as it may have difficulty differentiating between the different layers of the forest. Finally, the presence of outliers in the input data can affect the performance of the LSTM method, leading to over- or underprediction of forest fire risk.
The RNN (recurrent neural network) method is a popular choice for time series data modeling due to its inherent capability of handling sequential information. In the context of forest fire risk prediction, RNNs exhibit certain drawbacks when processing complex forest fire risk prediction data. Notably, they suffer from the infamous problem of vanishing and exploding gradients, particularly when dealing with long sequences of data [42]. This issue is prominent in the processing of remote sensing data for forest fire prediction, as these datasets often encompass lengthy time series to fully capture the temporal variations of various environmental variables.The capacity of traditional RNNs to learn and recall information from past inputs significantly decreases as the gap length increases. This limitation, termed as ’long-term dependency’ issue, directly impacts the performance of RNNs in forest fire risk prediction, which requires understanding patterns across extended time horizons [43]. Moreover, RNNs do not inherently account for the multiscale temporal dynamics that are often present in forest fire risk factors. For instance, certain variables, such as vegetation health, may exhibit changes over longer time scales, while others, such as temperature or humidity, may fluctuate within a day. RNNs may struggle to simultaneously account for these differing temporal scales, potentially leading to suboptimal predictions.
The SVM (support vector machine) is a commonly applied machine learning model with powerful capabilities in handling binary classification problems. It has seen applications in forest fire risk prediction tasks where the goal is to discriminate between fire-prone and fire-resilient areas based on remote sensing time series data encompassing different environmental parameters. Despite its widespread usage, the SVM has several limitations when employed for complex tasks, such as forest fire prediction. SVMs are essentially memoryless models and inherently lack the capacity to handle sequential data, such as time series, a characteristic fundamental to forest fire risk prediction [44]. Furthermore, SVMs operate as ”black-box” models, providing very little interpretability in terms of the importance or significance of individual features in the model’s decision-making process (Ribeiro, Singh, & Guestrin, 2016) [45]. This lack of interpretability could be a disadvantage in a field like forest fire prediction where understanding the key driving factors behind a prediction can be as crucial as the prediction itself. Lastly, SVMs are sensitive to the choice of kernel and the tuning of hyperparameters, which can significantly affect the model’s performance. However, finding the optimal settings often requires extensive computational resources and domain expertise, making SVMs less practical for large-scale, real-time forest fire prediction.
Here, we apply the improved Transformer method, LSTM, RNN, and SVM to four different experimental datasets (forest fire, weather, electricity, traffic), and take different prediction horizons (96, 192, 336, 720) to measure and compare their performance against mean squared error (MSE) and mean absolute error (MAE). The results are shown in Table 5.
The results show that our method can have better long sequence time series forecasting (LSTF) effects in different types of datasets and different prediction horizons. The experiments demonstrate that our proposed method is robust across different contexts, enhancing the generalizability of our results. Using different time horizons is also key to illustrating how well a model performs not only in immediate forecasts, but also over longer periods of time, which is crucial for many practical applications.

5.3. Limitations

While the incorporation of the window attention mechanism into the Transformer model for time series prediction has shown promising results, this research is not without limitations:
(1) Fixed window size: A significant limitation of this approach lies in its utilization of a fixed window size for the window attention mechanism. This could potentially lead to suboptimal predictions when the input data patterns change over different time scales. Future work could explore dynamic window sizes that adapt based on the input data [46].
(2) Assumption of independence: The model assumes that each time series prediction is independent of the others. This assumption may not hold true in complex, real-world scenarios where multiple factors can interact and influence each other over time, thereby affecting the fire timing.
(3) Lack of interpretability: Similar to other deep learning models, the Transformer model with a window attention mechanism suffers from a lack of interpretability. This makes it challenging to understand why the model makes certain predictions, which can be a significant obstacle in fields where explainability is crucial [47].
(4) Computational resources: Despite the window attention mechanism reducing the computational cost, training Transformer models, particularly on large datasets, still requires significant computational resources. This limitation can pose a barrier for users with restricted computational power [48].

5.4. Further Development

Based on the findings and limitations identified in this research, several potential directions emerge for future research:
(1) Dynamic window sizes: The window attention mechanism, as implemented in this research, relies on a fixed window size. In the future, more flexible, dynamic window sizes that adjust based on the data’s temporal characteristics could be investigated. This could potentially improve the model’s predictive performance by accounting for variable temporal dependencies in the data [49].
(2) Interconnected time series prediction: The model could be further extended to handle interconnected time series predictions where multiple factors interact and influence each other over time. This could involve incorporating techniques from multivariate time series forecasting or the use of graph-based methods to represent interdependencies among different time series [42].
(3) Enhanced interpretability: Improving the model’s interpretability is an important direction for future work. Techniques such as attention visualization, model distillation, or integration with interpretable models could be explored to make the model’s predictions more understandable.
(4) Model efficiency: Despite the computational efficiency brought by the window attention mechanism, there is still room to improve the model’s efficiency, especially for large-scale datasets. This could involve further research into more efficient attention mechanisms or the use of model parallelism and other advanced training techniques [48].
(5) Data size: Moving forward, we recognize the value of leveraging a larger dataset for enhancing the accuracy of time series fire prediction. Larger datasets not only provide a more comprehensive view of the variables at play but also allow our model to better learn and capture complex patterns and dependencies over time. By incorporating more temporal data, we aim to refine the predictive power of our model, especially in terms of capturing the cyclic nature of certain factors influencing forest fires. This approach will improve our model’s understanding of both the immediate and long-term effects of these factors. Additionally, expanding the dataset will enable the model to generalize better across different periods and variations, thus making the predictions more robust against unexpected changes. This, in turn, will contribute to more effective and strategic decision making in forest fire prevention and control.
These limitations present opportunities for future research to enhance the Transformer model with a window attention mechanism, making it more robust, interpretable, and efficient for real-world applications.

6. Conclusions

Considering the complexity of real-world forest fire predictive tasks, this paper proposed a window-based transformer architecture combined with a dual time series input strategy for the spatial prediction of forest fire susceptibility in Zhangjiakou, Chongli. The multicollinearity test is used to filter and validate the forest fire influencing factors, and 13 factors are eventually chosen.
In this study, we examined the use of the Transformer model incorporating a window attention mechanism for the time series prediction of forest fires. The model demonstrates superior performance by harnessing the inherent temporal dependencies in the data. Furthermore, the introduction of the window attention mechanism successfully addresses the need for limiting the range of positional dependencies, which is particularly beneficial when processing large sequences of data. The window-based transformer architecture combined with the dual time series input strategy we proposed offers significant improvements by focusing the model’s attention on a fixed-size window of adjacent time steps. This approach provides a balance between computational efficiency and the ability to capture important temporal dependencies. It is particularly effective in situations where recent observations are more likely to influence future predictions, a condition often observed in time series data.
The results show that the window-based Transformer method proposed in this paper has a good performance in the LSTF task and achieves a high accuracy (ACC 91.56%) in long-sequence forest fire prediction. The overall fire risk prediction within the raster performs relatively well despite the vastness of the test area and the large extent of the data raster inclusion. Our work extends the direction of future forest fire prediction and provides guidance for forest managers’ fire prevention initiatives and national policies related to forest fire prevention, which is of good practical social guidance.

Author Contributions

Conceptualization, X.M.; methodology, D.G.; validation, X.M., J.L. and D.G.; formal analysis, X.M. and C.H.; investigation, Y.M. (Yunjie Mu) and W.W.; resources, J.C., W.W. and Y.M. (Yunfei Ma); data curation, C.H.; writing—original draft preparation, X.M.; writing—review and editing, J.L. and D.G.; visualization, X.M. and C.H.; supervision, D.G. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a project funded by China Postdoctoral Science Foundation (2018T110505, 2017M611828), National Natural Science Foundation of China Youth Fund Project (62002171), National Natural Science Foundation of China (31870643, 31901321), Natural Science Foundation Project Youth Fund Project of Jiangsu (BK20200464), National Natural Science Foundation of Jiangsu (BK20201337), and Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time, but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this paper.

References

  1. Rothermel, R.C. A Mathematical Model for Predicting Fire Spread in Wildland Fuels; Res. Pap. INT-115; U.S. Department of Agriculture, Forest Service, Intermountain Forest and Range Experiment Station: Ogden, UT, USA, 1972; 40p.
  2. Andrews, P.L. BEHAVE: Fire Behavior Prediction and Fuel Modeling System—BURN Subsystem; Part 1. Gen. Tech. Rep. INT-194; U.S. Department of Agriculture, Forest Service, Intermountain Research Station: Ogden, UT, USA, 1986; 130p.
  3. Yang, X.; Hua, Z.; Zhang, L.; Fan, X.; Zhang, F.; Ye, Q.; Fu, L. Preferred Vector Machine for Forest Fire Detection. Pattern Recognit. 2023, 143, 109722. [Google Scholar] [CrossRef]
  4. Xue, X.; Jin, S.; An, F.; Zhang, H.; Fan, J.; Eichhorn, M.P.; Jin, C.; Chen, B.; Jiang, L.; Yun, T. Shortwave radiation calculation for forest plots using airborne LiDAR data and computer graphics. Plant Phenomics 2022, 2022, 9856739. [Google Scholar] [CrossRef]
  5. Cao, L.; Zhang, Z.; Yun, T.; Wang, G.; Ruan, H.; She, G. Estimating tree volume distributions in subtropical forests using airborne LiDAR data. Remote Sens. 2019, 11, 97. [Google Scholar] [CrossRef] [Green Version]
  6. Liu, H.; Shen, X.; Cao, L.; Yun, T.; Zhang, Z.; Fu, X.; Liu, F. Deep learning in forest structural parameter estimation using airborne lidar data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1603–1618. [Google Scholar] [CrossRef]
  7. Yun, T.; Li, W.; Sun, Y.; Xue, L. Study of subtropical forestry index retrieval using terrestrial laser scanning and hemispherical photography. Math. Probl. Eng. 2015, 2015, 206108. [Google Scholar] [CrossRef] [Green Version]
  8. Gao, D.; Liu, Y.; Hu, B.; Wang, L.; Chen, W.; Chen, Y.; He, T. Time Synchronization based on Cross-Technology Communication for IoT Networks. IEEE Internet Things J. 2023, 2023, 1. [Google Scholar] [CrossRef]
  9. Gao, D.; Wang, L.; Hu, B. Spectrum efficient communication for heterogeneous IoT networks. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3945–3955. [Google Scholar] [CrossRef]
  10. Zhao, P.; Zhang, F.; Lin, H.; Xu, S. GIS-Based Forest Fire Risk Model: A Case Study in Laoshan National Forest Park, Nanjing. Remote Sens. 2021, 13, 3704. [Google Scholar] [CrossRef]
  11. Kogan, F.N. Global drought watch from space. Bull. Am. Meteorol. Soc. 1997, 78, 621–636. [Google Scholar] [CrossRef]
  12. Keeley, J.E.; Fotheringham, C.J.; Morais, M. Re-examining fire suppression impacts on brushland fire regimes. Science 1999, 284, 1829–1832. [Google Scholar] [CrossRef] [Green Version]
  13. Viegas, D.X.; Viegas, M.T.; Ferreira, A.D. A stochastic differential equation approach to the modeling of fire spread. Int. J. Wildland Fire 1992, 2, 63–66. [Google Scholar]
  14. Finney, M.A. FARSITE: Fire Area Simulator—Model Development and Evaluation; Res. Pap. RMRS-RP-4; U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station: Ogden, UT, USA, 1998; 47p.
  15. Cortez, P.; Morais, A. A data mining approach to predict forest fires using meteorological data. In New trends in artificial intelligence. In Proceedings of the 13th EPIA 2007—Portuguese Conference on Artificial Intelligence, Guimarães, Portugal, 3–7 December 2007; pp. 512–523. [Google Scholar]
  16. Chen, W.; Moriya, K.; Sakai, T.; Kunifuji, S. Prediction of daily fire occurrence using artificial neural networks. In Proceedings of the 2011 IEEE International Conference on Systems, Man, and Cybernetics, Anchorage, AK, USA, 9–12 October 2011; pp. 140–145. [Google Scholar]
  17. Rodrigues, M.; de la Riva, J.; Fotheringham, S. Modeling the spatial variation of the explanatory factors of human-caused wildfires in Spain using geographically weighted logistic regression. Appl. Geogr. 2014, 48, 52–63. [Google Scholar] [CrossRef]
  18. Vakalis, D.; Sarimveis, H.; Kiranoudis, C.T.; Alexandridis, A. Acomparison of artificial neural networks, random forests, and gradient boosting machines for the prediction of human-caused wildfires. Fire Saf. J. 2016, 81, 212–222. [Google Scholar]
  19. Chen, Y.; Li, W.; Zhu, Q.; Goldsborough, P. A deep learning approach to conflating heterogeneous geospatial data for corn yield estimation: A case study of the US Corn Belt at the county level. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 61–77. [Google Scholar]
  20. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  21. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  22. Guo, S.; Yang, Y.; Lu, C. A Transformer-based framework for multivariate time series representation learning. arXiv 2020, arXiv:2010.02803. [Google Scholar]
  23. Kim, D. Characteristics of Korean Forest Fires and Forest Fire Policies in the Joseon Dynasty Period (1392–1910) Derived From Historical Records. Forests 2019, 10, 29. [Google Scholar] [CrossRef] [Green Version]
  24. Donnegan, J.A.; Veblen, T.T.; Sibold, J.S. Climatic and human influences on fire history in Pike National Forest, central Colorado. Can. J. For. Res. 2001, 31, 1525–1539. [Google Scholar] [CrossRef]
  25. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2017, 106, 249–259. [Google Scholar] [CrossRef] [Green Version]
  26. Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ. Geogr. 1970, 46 (Suppl. 1), 234–240. [Google Scholar] [CrossRef]
  27. Zumbrunnen, T.; Pezzatti, G.B.; Menéndez, P.; Bugmann, H.; Bürgi, M.; Conedera, M. Weather and human impacts on forest fires: 100 years of fire history in two climatic regions of Switzerland. For. Ecol. Manag. 2011, 161, 2188–2199. [Google Scholar] [CrossRef]
  28. Maxwell, N.M.; Koprowski, J.L. Response to fire by a forest specialist in isolated montane forest. For. Ecol. Manag. 2020, 462, 117996. [Google Scholar]
  29. Sudhakar, S.; Vijayakumar, V.; Kumar, C.S.; Priya, V.; Ravi, L.; Subramaniyaswamy, V. Unmanned Aerial Vehicle (UAV) based Forest Fire Detection and monitoring for reducing false alarms in forest-fires. Comput. Commun. 2020, 149, 1–16. [Google Scholar] [CrossRef]
  30. Colkesen, I.; Sahin, E.K.; Kavzoglu, T. Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression. J. Afr. Earth Sci. 2016, 118, 53–64. [Google Scholar] [CrossRef]
  31. Brown, M. Application of Remote Sensing Techniques in Forest Fire Management. Int. J. Wildland Fire 2020, 30, 1–18. [Google Scholar]
  32. Huang, C.L.; Wang, C.J. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
  33. Kumar, A.; Irsoy, O.; Ondruska, P.; Iyyer, M.; Bradbury, J.; Gulrajani, I.; Socher, R. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2026; pp. 1378–1387. [Google Scholar]
  34. Cheng, W.; Dong, W.; Zhang, W.; Liu, C.; Lu, X.; Xu, D. Deep forest: Towards an alternative to deep neural networks. IJCAI 2020, 2020, 3553–3559. [Google Scholar] [CrossRef] [Green Version]
  35. Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  36. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  37. Muhammad, K.; Ahmad, J.; Baik, S.W. Early fire detection using convolutional neural networks during surveillance for effective disaster management. Neurocomputing 2018, 288, 30–42. [Google Scholar] [CrossRef]
  38. Zhang, Q.; Xu, J.; Xu, L.; Guo, H. Deep convolutional neural networks for forest fire detection. In International Forum on Management, Education and Information Technology Application (IFMEITA); Atlantis Press: Amsterdam, The Netherlands, 2016; pp. 568–575. [Google Scholar]
  39. Yan, X.; Cheng, H.; Zhao, Y.; Yu, W.; Huang, H.; Zheng, X. Real-time identification of smoldering and flaming combustion phases in forest using a wireless sensor networkbased multi-sensor system and artificial neural network. Sensors 2016, 16, 1228. [Google Scholar] [CrossRef] [Green Version]
  40. Catry, F.X.; Rego, F.; Bacao, F.; Moreira, F. Modeling and mapping wildfire ignition risk in Portugal. Int. J. Wildland Fire 2009, 18, 921–931. [Google Scholar] [CrossRef] [Green Version]
  41. Lozano, F.J.; Suarez-Seoane, S.; Kelly, M.; Luis, E. A multi-scale approach for modeling fire occurrence probability using satellite data and class. Remote Sens. Environ. 2008, 112, 708–719. [Google Scholar] [CrossRef]
  42. Zhou, N.; Jiang, L.; Chen, L.; Zou, J.; Yang, Q. Temporal Relational Ranking for Stock Prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 283–292. [Google Scholar]
  43. Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 17–19 June 2013; pp. 1310–1318. [Google Scholar]
  44. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  45. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  46. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B.; Qi, Y.; Han, J.; et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
  47. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  48. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8697–8710. [Google Scholar]
  49. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
Figure 1. Chongli District, Zhangjiakou City, Hebei Province, China.
Figure 1. Chongli District, Zhangjiakou City, Hebei Province, China.
Forests 14 01596 g001
Figure 2. An overview of the workflow for data acquisition and processing.
Figure 2. An overview of the workflow for data acquisition and processing.
Forests 14 01596 g002
Figure 3. Parameters influencing forest fire risk.
Figure 3. Parameters influencing forest fire risk.
Forests 14 01596 g003
Figure 4. Average gain rate of influencing factors of forest fire. (MCC indicates the moisture content of canopy; MCS indicates the moisture content of surface litter).
Figure 4. Average gain rate of influencing factors of forest fire. (MCC indicates the moisture content of canopy; MCS indicates the moisture content of surface litter).
Forests 14 01596 g004
Figure 5. Structure of the window-based Transformer.
Figure 5. Structure of the window-based Transformer.
Forests 14 01596 g005
Figure 6. Loss function comparison.
Figure 6. Loss function comparison.
Forests 14 01596 g006
Figure 7. Forest fire risk prediction results.
Figure 7. Forest fire risk prediction results.
Forests 14 01596 g007
Figure 8. ROC curve and AUC of comparison method.
Figure 8. ROC curve and AUC of comparison method.
Forests 14 01596 g008
Table 1. Data description of forest fire influence.
Table 1. Data description of forest fire influence.
No.Influence FactorScale/ResolutionUnitSource
1Temperature-°CCIMISS
2Wind speed-m·s 1 CIMISS
3Rainfall-mm·h 1 CIMISS
4Specific humidity-kg·kg 1 CIMISS
5Atmospheric pressure-PaCIMISS
6NDVI500 m-MODIS
7Vegetation coverage500 m-MODIS
8Slope30 m ° Landsat
9Aspect30 m ° Landsat
10Moisture content of canopy30 m-Landsat
11Moisture content of surface litter30m-Landsat
12Flammability30 m-Landsat
13Population density-people·km 2 DSNLV
14Distance to highway1:250,000kmDSNLV
15Distance to railway1:250,000kmDSNLV
16Distance to river1:250,000kmDSNLV
Table 2. Multicollinearity analysis of forest fire influencing factors.
Table 2. Multicollinearity analysis of forest fire influencing factors.
No.Influence FactorTOLVIF
1NDVI0.01190.909
2Distance to railway0.01758.824
3Atmospheric pressure0.03727.027
4Wind speed0.1128.929
5Temperature0.1616.211
6Moisture content of canopy0.4002.500
7Rainfall0.4192.387
8Specific humidity0.6011.664
9Moisture content of surface litter0.7241.381
10Slope0.8231.215
11Vegetation coverage0.8651.156
12Aspect0.9171.091
13Distance to river0.9621.040
14Distance to highway0.9681.033
15Population density0.9741.0267
16Flammability0.9991.001
In ascending TOL order.
Table 3. Forest fire risk level prediction for eight plots in relevant area with window-based Transformer, LSTM, RNN, and SVM.
Table 3. Forest fire risk level prediction for eight plots in relevant area with window-based Transformer, LSTM, RNN, and SVM.
PlotLongitudeLatitudeForest Fire Risk Level
TransformerLSTMRNNSVM
1 115 ° 07 12 E 41 ° 10 07 N 0.086 ± 0.012 0.074 ± 0.031 0.091 ± 0.116 0.076 ± 0.021
2 115 ° 26 04 E 41 ° 11 31 N 0.155 ± 0.045 0.124 ± 0.091 0.284 ± 0.037 0.235 ± 0.068
3 115 ° 18 54 E 41 ° 07 43 N 0.534 ± 0.023 0.546 ± 0.016 0.637 ± 0.023 0.723 ± 0.129
4 115 ° 01 35 E 41 ° 04 56 N 0.016 ± 0.003 0.014 ± 0.022 0.014 ± 0.003 0.015 ± 0.001
5 115 ° 02 01 E 40 ° 57 27 N 0.125 ± 0.036 0.168 ± 0.023 0.146 ± 0.021 0.173 ± 0.023
6 115 ° 21 44 E 40 ° 08 39 N 0.981 ± 0.036 0.972 ± 0.053 0.975 ± 0.075 0.992 ± 0.038
7 115 ° 22 28 E 40 ° 04 19 N 0.834 ± 0.093 0.845 ± 0.076 0.897 ± 0.069 0.943 ± 0.036
8 115 ° 11 24 E 40 ° 01 21 N 0.663 ± 0.036 0.732 ± 0.013 0.869 ± 0.129 0.875 ± 0.056
Table 4. Performance comparison between models.
Table 4. Performance comparison between models.
PhasePerformancePrediction Model
TransformerLSTMRNNSVM
Training setAccuracy 95.94 ± 1.45 94.13 ± 2.43 91.53 ± 1.95 80.58 ± 3.95
Specificity 94.25 ± 2.69 93.04 ± 2.73 87.51 ± 1.76 73.52 ± 3.46
Sensitivity 96.83 ± 0.65 97.11 ± 0.93 95.56 ± 1.26 87.64 ± 1.13
PPV 95.02 ± 1.30 93.42 ± 1.65 88.44 ± 2.19 76.82 ± 3.63
NPV 95.76 ± 1.12 96.69 ± 1.32 95.17 ± 1.27 85.67 ± 1.69
RMSE 0.39 ± 0.03 0.41 ± 0.02 0.42 ± 0.01 0.40 ± 0.02
MAE 0.05 ± 0.02 0.15 ± 0.01 0.18 ± 0.03 0.12 ± 0.03
MAPE 0.24 ± 0.04 0.34 ± 0.02 0.37 ± 0.04 0.36 ± 0.02
Test setAccuracy 91.56 ± 1.72 88.06 ± 1.93 84.36 ± 2.34 78.47 ± 1.93
Specificity 98.21 ± 0.30 97.99 ± 0.15 96.12 ± 0.24 95.49 ± 0.29
Sensitivity 81.34 ± 3.67 76.12 ± 2.67 72.59 ± 1.27 61.45 ± 3.44
PPV 98.02 ± 0.87 97.43 ± 1.29 93.94 ± 1.27 93.17 ± 0.91
NPV 82.88 ± 2.31 83.41 ± 2.16 77.81 ± 2.68 71.24 ± 2.39
RMSE 0.37 ± 0.01 0.38 ± 0.02 0.39 ± 0.01 0.37 ± 0.01
MAE 0.05 ± 0.01 0.14 ± 0.03 0.16 ± 0.01 0.11 ± 0.01
MAPE 0.25 ± 0.02 0.32 ± 0.01 0.34 ± 0.034 0.34 ± 0.02
Table 5. Long time series forecasting performance comparison between models.
Table 5. Long time series forecasting performance comparison between models.
ModelsTransformerLSTMRNNSVM
MetricMSEMAEMSEMAEMSEMAEMSEMAE
Forest Fire960.3160.3470.3160.3640.7640.4160.5050.475
1920.3630.3700.3630.3900.4260.4410.5530.496
3360.3920.3900.4080.4260.4450.4590.6210.537
7200.4580.4250.4590.4640.5430.4900.6710.561
avg0.3830.3840.3870.4110.4480.4520.5580.517
Weather960.1500.1880.1610.2290.2170.2960.2660.336
1920.2020.2380.2200.2810.2760.3360.3070.367
3360.2600.2820.2780.3310.3390.3800.3590.395
7200.3430.3530.3110.3560.4030.4280.4190.428
avg0.2390.2610.2430.2990.3090.3600.4190.428
Electricity960.1410.2330.1640.2690.1930.3080.2010.317
1920.1600.2500.1770.2850.2010.3150.2220.334
3360.1730.2630.1930.3040.2140.3290.2540.361
7200.1970.2840.2120.3210.2460.3550.2540.361
avg0.1680.2580.1870.2950.2140.3270.2270.338
Traffic960.4190.2690.5190.3090.5870.3660.6130.388
1920.4430.2760.5370.3150.6040.3730.6160.382
3360.4600.2830.5340.3130.6210.3830.6220.337
7200.4900.2990.5770.3250.6260.3820.6600.408
avg0.4530.2820.5420.3160.6100.3760.6280.379
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miao, X.; Li, J.; Mu, Y.; He, C.; Ma, Y.; Chen, J.; Wei, W.; Gao, D. Time Series Forest Fire Prediction Based on Improved Transformer. Forests 2023, 14, 1596. https://doi.org/10.3390/f14081596

AMA Style

Miao X, Li J, Mu Y, He C, Ma Y, Chen J, Wei W, Gao D. Time Series Forest Fire Prediction Based on Improved Transformer. Forests. 2023; 14(8):1596. https://doi.org/10.3390/f14081596

Chicago/Turabian Style

Miao, Xinyu, Jian Li, Yunjie Mu, Cheng He, Yunfei Ma, Jie Chen, Wentao Wei, and Demin Gao. 2023. "Time Series Forest Fire Prediction Based on Improved Transformer" Forests 14, no. 8: 1596. https://doi.org/10.3390/f14081596

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop