Next Article in Journal
High-Precision Leveled Homomorphic Encryption for Rational Numbers
Next Article in Special Issue
Utility of Particle Swarm Optimization in Statistical Population Reconstruction
Previous Article in Journal
Prediction of a Sensitive Feature under Indirect Questioning via Warner’s Randomized Response Technique and Latent Class Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatially Weighted Bayesian Classification of Spatio-Temporal Areal Data Based on Gaussian-Hidden Markov Models

by
Kęstutis Dučinskas
,
Marta Karaliutė
* and
Laura Šaltytė-Vaisiauskė
Faculty of Marine Technologies and Natural Sciences, Klaipeda University, 92294 Klaipeda, Lithuania
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(2), 347; https://doi.org/10.3390/math11020347
Submission received: 10 December 2022 / Revised: 30 December 2022 / Accepted: 5 January 2023 / Published: 9 January 2023
(This article belongs to the Special Issue Mathematical and Computational Statistics and Applications)

Abstract

:
This article is concerned with an original approach to generative classification of spatiotemporal areal (or lattice) data based on implementation of spatial weighting to Hidden Markov Models (HMMs). In the framework of this approach data model at each areal unit is specified by conditionally independent Gaussian observations and first-order Markov chain for labels and call it local HMM. The proposed classification is based on modification of conventional HMM by the implementation of spatially weighted estimators of local HMMs parameters. We focus on classification rules based on Bayes discriminant function (BDF) with plugged in the spatially weighted parameter estimators obtained from the labeled training sample. For each local HMM, the estimators of regression coefficients and variances and two types of transition probabilities are used in two levels (higher and lower) of spatial weighting. The average accuracy rate (ACC) and balanced accuracy rate (BAC), computed from confusion matrices evaluated from a test sample, are used as performance measures of classifiers. The proposed methodology is illustrated for simulated data and for real dataset, i.e., annual death rate data collected by the Institute of Hygiene of the Republic of Lithuania from the 60 municipalities in the period from 2001 to 2019. Critical comparison of proposed classifiers is done. The experimental results showed that classifiers based on HMM with higher level of spatial weighting in majority cases have advantage in spatial–temporal consistency and classification accuracy over one with lower level of spatial weighting.

1. Introduction

This article is concerned with a generative approach (see, e.g., [1]) to supervised classification of spatiotemporal data collected at fixed areal units (sites, locations) and specified by HMMs with continuous observations in each class (state) and traditionally invoked conditional independence assumption for observation distribution and Markov assumption for label transition probabilities (see, e.g., [2]). That is an extension to the basic HMM when observation may be signified by continuous real value instead of discrete value. We focus on the models that describes the process of randomly generating an unobservable class label (state) sequences from the first-order Markov chain and generating an Gaussian observation from each class to produce an observable sequence from single probabilistic distribution [3,4]. Unlike the conventional conditional density HMM, the proposed HMMs consider the temporal information of each areal unit obtain the most possible sequence of label by the incorporating the spatial information through spatial weights. It should be noted that HMM with spatial weighting have been successfully used in land-cover classification [5]. Shekhar [6], Atkinson [7], Tang et al. [8] extended pixel-based geostatistically weighted classifiers to object-based image classifications. Kadhem, Hewson, and Kaimi [9] developed Poisson HMM to model spatial dependence in network. However, only the cases with geostatistical models of observation based on spatial covariances or semivariograms are considered there.
Comprehensive literature survey on state-of-the-art advances in spatiotemporal data analysis is proposed by Ali Hamdi et al. [10]. Systematic review of methods in spatial deep learning is reported by Mishra et al. [11]. It should be noted that in the environmental agricultural and other research, data are often collected across space and through time. The spatiotemporal data are usually recorded at regular time intervals (time lags) and at irregular stations (areas) in compact area (see, e.g., [12,13]). Modeling and prediction of a such type of data has been studied by the numerous authors (see, e.g., [14,15,16,17]). Often before analyzing spatiotemporal datasets, spatiotemporal discretization (or aggregation) is applied. The discretization is useful to summarize information and help in extracting features within a spatiotemporal range rather than measuring a single point [18]. Compared with the general classification problem, spatial classification needs to consider the location information of the data and the interaction among feature and label variables at every time moment. There are currently a variety of ways to achieve the classification goal, but one of the most effective one is to use the BDF. The generative approach to spatial classification is studied by numerous authors (see, e.g., [19,20,21,22]). The approach to Bayesian classification of Gaussian Markov Random Fields (GMRF) observation on the lattice has been developed by Dreižienė and Dučinskas [23,24,25], and generative classification method for geostatistical spatiotemporal Gaussian model is introduced by Karaliutė and Dučinskas [26,27].
Novelty of our approach lies in the incorporation of spatial weights in HMM parameter estimation from labeled training sample and using plug-in BDF for classification (decoding) of the test observation. This enables us to enrich and generalize the existed generative classification methods of spatiotemporal data modeled by HMM.
Our approach is aimed at situation when BDF is implemented for data that are collected at fixed areal units (or sites as they are also known) over the fixed sequential time periods. At the beginning, for each local HMM, the maximum likelihood estimators of regression coefficients and variances and two types of transition probabilities estimators are derived from labeled training sample.
Furthermore, HMM parameter estimators in two spatial weighting levels are plugged in BDF. Two levels of spatial weighting are specified as follows:
  • lower level called as partial spatial weighting (PSW) indicates that parameter estimators of Gaussian component are spatially weighted when transition probability estimators remain unweighted;
  • higher level called as complete spatial weighting (CSW) comprise the cases with spatially weighted estimators of Gaussian parameters and transition probabilities.
Performance measure of the classifier are chosen to be the ACC and BAC evaluated from the confusion matrices for a test sample. The proposed methodology is illustrated for simulated data and for real dataset, i.e., annual death rate data collected by the Institute of Hygiene of the Republic of Lithuania from the 60 municipalities in the period from 2001 to 2019. Presented critical comparison of proposed classifier by the introduced criteria with existing ones can aid in the selection of proper classification rules of spatiotemporal areal data.
The experimental results showed that proposed HMM in CSW level in majority cases have advantage in classification accuracy over HMM in PCW level by both performance measures.
In order to carry out the study, this paper is organized as follows. First, Section 2 introduces the HMM for spatial–temporal Gaussian data. Next, Section 3 introduces Bayesian classification based on HMM, while Section 4 compares them by two criteria via simulation. In Section 5, we offer a real application of our approach of classification of annual death rate data collected by the Institute of Hygiene of the Republic of Lithuania from the 60 municipalities in the period from 2001 to 2019. Finally, conclusions and comments are made in the last section.

2. HMM for Spatiotemporal Data

Formally, we identify the set of possible labels of classes (e.g., diseases, land cover labels, etc.) as Ω = { 0 , 1 } across n areal units over T years in the time series. Let S = { s i D ; i = 1 , , n } be a set of n nonoverlapping areal units (AUs). Assume that S is endowed with a neighborhood system N = { N i : i = 1 , , n } , where N i denotes the collection of areas that are neighbors of area s i . Usually, the neighborhood N i could be defined to be those areas with which area s i shares a common border. For each area unit s i , data consist of feature variable observation Z ( s i , t ) (further simply observation) and label Y ( s i , t ) Ω observed at consecutive time periods at t = 1 , 2 , . Let | N i | denote number of elements in the set N i .
Furthermore, use the following notation Y ( s i , t ) = Y t ( i ) , Z ( s i , t ) = Z t ( i ) , for i = 1 , , n and t = 1 , , T + 1 .
In this study, we assume that for l = 0 , 1 , the model of observation Z t ( i ) conditional on Y t ( i ) = l is
Z t ( i ) = μ l ( s i ; t ) + ε t ( i ) ,
where μ l ( s i ; t ) is deterministic spatiotemporal mean and errors ε 1 ( i ) , , ε T ( i ) , are independent normally distributed random variables with zero mean and finite variance σ i 2 .
In what follows with an insignificant loss of generality, we focus on the linear independent of time mean μ l ( s i ; t ) = β l ( i ) x ( s ) , where x ( s ) = ( x 1 ( s ) , , x q ( s ) ) is the vector of an explanatory variables and β l ( i ) is a q-dimensional vector of parameters, l = 0 , 1 . That choice is motivated by considering only the cases when explanatory variables represent only the spatial coordinates or their functions.
Then design matrix X ( i ) for observation vector Z ( i ) has the form
X ( i ) = 1 y 1 ( i ) y 1 ( i ) 1 y 2 ( i ) y 2 ( i ) 1 y T ( i ) y T ( i ) x i where x i = x ( s i ) , i = 1 , , n .
Conditional distribution of Z ( i ) conditional on { Y ( i ) = y ( i ) } is T dimensional Gaussian, i.e.,
Z ( i ) | y ( i ) N X ( i ) β ( i ) , σ i 2 I T ,
where β ( i ) = β 0 ( i ) β 1 ( i ) and I T is T × T identity matrix.
Denote by Z ( i ) = ( Z 1 ( i ) , , Z T ( i ) ) and Y ( i ) = ( Y 1 ( i ) , , Y T ( i ) ) the vectors of observations and class labels associated ith AU over T temporal periods.
As it follows, denote the realized values of random variables Y t ( i ) and Z t ( i ) by y t ( i ) and z t ( i ) , respectively.
It is known that the label sequence Y constitutes a first-order Markov chain, and the observed sequence Z is only related to the corresponding class label sequence, assuming that the label of the observation is only related to the last year’s label. Hence, dependence between Y and Z constitutes a first-order HMM. Label variable Y for ith AU is specified by transition probabilities matrix A ( i ) = ( a k l ( i ) ) where a k l ( i ) = P ( Y t + 1 ( i ) = l | Y t ( i ) = k ) for k , l = 0 , 1 . An obvious extension to the basic HMM model is to allow continuous observation space instead of a finite number of discrete symbols. In this model, the emission probabilities matrix cannot be described as a simple matrix of point probabilities but rather as a complete probability density function (PDF) over the continuous observation space for each state. Therefore, the values emission probabilities must be replaced with a conditional PDF of Z t ( i ) , given class label Y t ( i ) = k , is denoted by p k ( z t ( i ) ) = p ( Z t ( i ) = z t ( i ) | Y t ( i ) = k ) . The conditional distributions can in principle be arbitrary but usually they are restricted to be simple parametric distributions, like Gaussians. The set of all possible values of parameters of above conditional densities is denoted by B ( i ) = { β ( i ) , σ i 2 } . The last component of HMM is called an initial label distribution over classes and is denoted by π ( i ) = ( π 1 ( i ) , π 2 ( i ) ) , where π k ( i ) = P ( Y t ( i ) = k ) , k = 0 , 1 . Specify the HMM parameter set for ith AU by λ ( i ) = ( π ( i ) , A ( i ) , B ( i ) ) , i = 1 , , n .
The HMM allows us to talk about both observed events that we meet in the input and hidden events that we think of as causal factors in our probabilistic model. A first-order HMM instantiates two simplifying assumptions.
First, Markov Assumption: as with a first-order Markov chain, the probability of a particular state depends only on the previous state: P ( Y t + 1 ( i ) | Y 1 ( i ) , , Y t ( i ) ) = P ( Y t + 1 ( i ) | Y t ( i ) ) .
Second, Output Independence: the probability of an output observation oi depends only on the class that produced the observation qi and not on any other states or any other observations: P ( Z t ( i ) | Z 1 ( i ) , , Z T ( i ) , Y 1 ( i ) , , Y T ( i ) ) = P ( Z t ( i ) | Y t ( i ) ) .
The goal of HMMs is to infer the most likely Y by giving the observed variables Z.
Then, under HMM independence assumption, the conditional distribution of Z T + 1 ( i ) given Z = z and Y T + 1 ( i ) = l , is Gaussian, i.e.,
Z T + 1 ( i ) | z ( i ) , y ( i ) , y T + 1 ( i ) = l N β ( i ) x i , σ i 2 , l = 0 , 1 .
Recall that for HMM, each label realization produces only a single observation. Thus, the sequence of labels and the sequence of observations have the same length.
Given this one-to-one mapping and the Markov assumptions, for a particular hidden state sequence y ( i ) and an observation sequence z ( i ) , the conditional likelihood of the observation sequence is
l i ( z ( i ) | y ( i ) , β ( i ) , σ i 2 ) = t = 1 T p ( z t ( i ) | y t ( i ) ) = t = 1 T n ( z t ( i ) | ( β y t ( i ) ( i ) ) x i , σ i 2 )
where n ( z t ( i ) | ( β y t ( i ) ( i ) ) x i , σ i 2 ) denotes PDF of univariate Gaussian distribution N ( ( β y t ( i ) ( i ) ) x i , σ i 2 ) .
Given an λ ( i ) = ( π ( i ) , A ( i ) , B ( i ) ) and an observation z ( i ) and label y ( i ) sequences (i.e., labeled training sample is known), the likelihood L i = L ( z ( i ) , y ( i ) , λ ( i ) ) has the following form L i = P ( y ( i ) ) l i ( z ( i ) | y ( i ) ) where P ( y ( i ) ) = P ( Y 1 ( i ) = y 1 ( i ) ) t = 1 T 1 P ( Y t + 1 ( i ) = y t + 1 ( i ) | Y t ( i ) = y t ( i ) ) .
Specify a matrix of spatial weights W = ( w i j : i , j = 1 , , n ) with w i j = 0 if s j N i and w i j > 0 elswhere.
Define an augmented likelihood
L i + z T + 1 ( i ) , y T + 1 ( i ) , λ ( i ) = L z ( i ) , z T + 1 ( i ) , y ( i ) , y T + 1 ( i ) , λ ( i ) = P y ( i ) , y T + 1 ( i ) l i z ( i ) , z T + 1 ( i ) | y ( i ) , y T + 1 ( i )
where P ( y ( i ) , y T + 1 ( i ) ) = P ( Y 1 ( i ) = y 1 ( i ) ) t = 1 T P ( Y t + 1 ( i ) = y t + 1 ( i ) | Y t ( i ) = y t ( i ) ) and l i ( z ( i ) , z T + 1 ( i ) | y ( i ) , y T + 1 ( i ) ) = t = 1 T + 1 p z t ( i ) | y t ( i ) .
Then, the criterion for Bayes classification rule is y ^ T + 1 ( i ) = arg max k = 0 , 1 ( L i + ( z T + 1 ( i ) , y T + 1 ( i ) = k , λ ( i ) ) ) .
In the HMM and machine learning context, this problem is called as decoding problem.

3. Bayesian Classification Based on HMM

Under the assumption that the HMM is completely specified given labeled training sample z ( i ) , y ( i ) , it is that known BDF minimizing the total probability of misclassification of Z T + 1 ( i ) is formed by the log-ratio of augmented conditional likelihoods of distributions (see [28]) specified in Equation (2), that is,
W Z z T + 1 ( i ) | y T ( i ) , λ ( i ) = l n L i + z T + 1 ( i ) , y T + 1 ( i ) = 1 , λ ( i ) L i + z T + 1 ( i ) , y T + 1 ( i ) = 0 , λ ( i ) .
So, BDF classifies the observation Z T + 1 ( i ) = z T + 1 ( i ) in following way: class label takes value 1 if W Z ( z T + 1 ( i ) | y T ( i ) , λ ( i ) ) 0 , and 0 otherwise.
Definition 1. 
Probability of correct classification (PCC) incurred by W Z ( z T + 1 ( i ) | y T ( i ) , λ ( i ) ) given y T + 1 ( i ) = l is defined as
P l ( i ) = H ( ( 1 ) l W Z ( z T + 1 ( i ) | y T ( i ) , λ ( i ) ) ) n l ( z T + 1 ( i ) ) d z T + 1 ( i )
where H ( . ) denotes Heaviside step function, and n l ( z T + 1 ( i ) ) denotes the PDF of Gaussian distribution N ( ( β l ( i ) ) x i , σ i 2 ) .
Lemma 1. 
Under the above HMM, the BDF defined in Equation (3) has the form
W Z ( z T + 1 ( i ) | y T ( i ) , λ ( i ) ) = z T + 1 ( i ) ( β ( i ) ) F x i 2 ( β ( i ) ) G x i / σ i 2 + γ ( i )
 with PCC
P l ( i ) = Φ ( ( 1 ) l + 1 Δ ( i ) 2 + γ ( i ) Δ ( i ) )
where F = ( I q , I q ) and G = ( I q , I q ) , Δ ( i ) = | ( β ( i ) ) G x i / σ i | and γ ( i ) = l n a y T ( i ) , 1 ( i ) a y T ( i ) , 0 ( i ) , Φ ( . ) is the standard Gaussian distribution function.
Proof of Lemma 1.
From Equations (1) and (2), we observe that
L i + z T + 1 ( i ) , y T + 1 ( i ) = l , λ ( i ) = P y ( i ) , y T + 1 ( i ) = l l i z ( i ) , z T + 1 ( i ) | y ( i ) , y T + 1 ( i ) = l
with P ( y ( i ) , y T + 1 ( i ) = l ) = P ( Y 1 ( i ) = y 1 ( i ) ) t = 1 T 1 P ( Y t + 1 ( i ) = y t + 1 ( i ) | Y t ( i ) = y t ( i ) ) P ( Y t + 1 ( i ) = l | Y T ( i ) = y T ( i ) ) and l i ( z ( i ) , z T + 1 ( i ) | y ( i ) , y T + 1 ( i ) = l ) = t = 1 T p ( z t ( i ) | y t ( i ) ) p ( z T + 1 ( i ) | y t ( i ) ) , l = 0 , 1 .
Substituting these expressions directly in Equation (3), we obtain closed-form expression for BDF presented in Equation (4).
Under the assumptions described above and formula (4), it follows that given Y T + 1 ( i ) = l , the conditional distribution of W Z ( Z T + 1 ( i ) | y T ( i ) , λ ( i ) ) is Gaussian distribution with mean ( 1 ) l + 1 ( Δ ( i ) ) 2 / 2 + γ ( i ) and variance ( Δ ( i ) ) 2 , l = 0 , 1 .
Consequently, using the properties of Gaussian distribution, we complete the proof of Lemma 1. □
So, it is obvious from Equation (5) that P l ( i ) depends only the one label realization of training sample, i.e., Y T ( i ) = y T ( i ) . The PCC and its plug-in versions are one of the natural performance measures to the BDF similar as the mean squared prediction error (MSPE) for the universal kriging predictor (see [29]). MSPE and its plug-in versions are usually used for spatial sampling design criterion for prediction (see [30]). These facts strengthen the motivation for the deriving an explicit expression of the PCC for any of spatial classification procedures.
However, in practical applications, all statistical parameters β ( i ) , σ i 2 , a y T , 0 ( i ) ( i ) , a y T , 1 ( i ) ( i ) of populations are rarely known.
Then, the estimators of unknown parameters β ^ ( i ) , σ ^ i 2 are derived by the maximum likelihood (ML) method by maximizing l i ( z ( i ) | y ( i ) ) and a ^ y T ( i ) , 0 ( i ) , a ^ y T ( i ) , 1 ( i ) by maximizing P ( y ( i ) ) . In classical HMM context, it is also called the learning problem.
Hence, we obtain the estimators of local HMM parameters:
ML estimator of regression parameters β ^ ( i ) = ( ( X ( i ) ) X ( i ) ) 1 ( X ( i ) ) Z ( i ) ; and bias-adjusted ML estimator of variance σ ^ i 2 = ( Z ( i ) X ( i ) β ^ ( i ) ) ( Z ( i ) X ( i ) β ^ ( i ) ) / ( T 2 q ) .
ML estimators of transition probabilities
a ^ 00 ( i ) = t = 2 T I ( y t 1 ( i ) = 0 ) I ( y t ( i ) = 0 ) t = 2 T I ( y t 1 ( i ) = 0 ) , a ^ 01 ( i ) = t = 2 T I ( y t 1 ( i ) = 0 ) I ( y t ( i ) = 1 ) t = 2 T I ( y t 1 ( i ) = 0 ) ,
a ^ 10 ( i ) = t = 2 T I ( y t 1 ( i ) = 1 ) I ( y t ( i ) = 0 ) t = 2 T I ( y t 1 ( i ) = 1 ) , a ^ 11 ( i ) = t = 2 T I ( y t 1 ( i ) = 1 ) I ( y t ( i ) = 1 ) t = 2 T I ( y t 1 ( i ) = 1 ) ,
where I ( A ) is the indicator of event A.
It should be noted that we do not care about initial label distributions π ( i ) = ( π 1 ( i ) , π 2 ( i ) ) and consider it fixed, since it does not presented in BDF expression Equation (4).
In this study, we propose two types of the transition probability estimators for the ith AU:
(M1)
a ^ y T ( i ) , 1 ( i ) = a ^ 01 ( i ) I ( y T ( i ) = 0 ) + a ^ 11 ( i ) I ( y T ( i ) = 1 )  and  a ^ y T ( i ) , 0 ( i ) = a ^ 00 ( i ) I ( y T ( i ) = 0 ) + a ^ 10 ( i ) I ( y T ( i ) = 1 )
(M2)
a ^ y T ( i ) , 1 ( i ) = q ( i ) I ( y T ( i ) = 0 ) + p ( i ) I ( y T ( i ) = 1 ) and a ^ y T ( i ) , 0 ( i ) = p ( i ) I ( y T ( i ) = 0 ) + q ( i ) I ( y T ( i ) = 1 )
where p ( i ) = t = 2 T I ( y t 1 ( i ) = y t ( i ) ) T 1 and q ( i ) = 1 p ( i ) = t = 2 T I ( y t 1 ( i ) y t ( i ) ) T 1 denote label persistence rate and change rate, respectively.
For greater interpretability, we impose the following popular for the areal data spatial weights
w i j = 0 , i f i = j 1 , i f s j N i 0 , o t h e r w i s e and w i j * = w i j k = 1 n w i k .
We propose the following spatial weighting procedures for local model parameter estimators
β ˜ ( i ) = ( j = 1 n w i j * β ^ ( j ) + β ^ ( i ) ) / 2 , σ ˜ i 2 = ( j = 1 n w i j * σ ^ j 2 + σ ^ i 2 ) / 2 ,
a ˜ y T ( i ) , 1 ( i ) = j = 1 n w i j * a ^ y T ( j ) , 1 ( j ) + a ^ y T ( i ) , 1 ( i ) / 2 , a ˜ y T ( i ) , 0 ( i ) = j = 1 n w i j * a ^ y T ( j ) , 0 ( j ) + a ^ y T ( i ) , 0 ( i ) / 2 .
Replacing parameters with their estimators for BDF, we introduce four plug-in BDFs (PBDF):
1.
W ^ Z ( z T + 1 ( i ) ) = z T + 1 ( i ) ( β ˜ ( i ) ) F x i 2 ( ( β ˜ ( i ) ) G x i ) / σ ˜ i 2 + l n a ^ y T ( i ) , 1 ( i ) a ^ y T ( i ) , 0 ( i ) partial spatial weighting level with inserted M1 type transition probability estimators and denote it by (PSW1);
2.
W ^ Z ( z T + 1 ( i ) ) = z T + 1 ( i ) ( β ˜ ( i ) ) F x i 2 ( ( β ˜ ( i ) ) G x i ) / σ ˜ i 2 + l n a ^ y T ( i ) , 1 ( i ) a ^ y T ( i ) , 0 ( i ) partial spatial weighting level with inserted M2 type transition probability estimators and denote it by (PSW2);
3.
W ^ Z ( z T + 1 ( i ) ) = z T + 1 ( i ) ( β ˜ ( i ) ) F x i 2 ( ( β ˜ ( i ) ) G x i ) / σ ˜ i 2 + l n a ˜ y T ( i ) , 1 ( i ) a ˜ y T ( i ) , 0 ( i ) complete spatial weighting level with inserted M1 type transition probability estimators and denote it by (CSW1);
4.
W ^ Z ( z T + 1 ( i ) ) = z T + 1 ( i ) ( β ˜ ( i ) ) F x i 2 ( ( β ˜ ( i ) ) G x i ) / σ ˜ i 2 + l n a ˜ y T ( i ) , 1 ( i ) a ˜ y T ( i ) , 0 ( i ) complete spatial weighting level with inserted M2 type transition probability estimators and denote it by (CSW2).
They differ on the type and level of the incorporated spatiotemporal information.
Performance criteria of the generative classifier based on PBDF is evaluated by confusion matrix formed for test data and that records the results of correctly and incorrectly recognized test observations of each class.
That procedure is realized via partitioning the observed data into training and testing sets. Then, classifier being designed on the training data and its accuracy being validated on the test data. In this paper, our focus is on using T temporal observations for training and the observations at time moment t = T + 1 are used for testing.
Label prediction in areal unit s i at time moment t = T + 1 based on PBDF is given by Y ^ T + 1 ( i ) = H ( W ^ Z ( Z T + 1 ( i ) ) ) .
The form of the confusion matrix that will be applied for the assessment of the proposed classifier performances is shown in Table 1.
Define n k l = i = 1 n m k l ( i ) , for k , l = 0 , 1 .
Traditionally, the most commonly used empirical measure of classifier performance is called accuracy rate that shows the percentage of correctly classified test data is given by the formula
A C C = n 00 + n 11 n 00 + n 01 + n 10 + n 11 .
However, in practice, situations with significant disbalance between the majority and minority class examples frequently occur (see, e.g., [31,32,33]). Then, the evaluation of the classifiers’ performance must be carried out using specific metrics to take the class distribution into account. In this article, we also used other performance evaluation measures based on confusion matrix usually called Balanced Accuracy (see, e.g., [31,34]) and is specified by the formula
B A C = n 00 n 00 + n 01 + n 11 n 10 + n 11 / 2 .
Below, we applied the Receiver Operator Characteristic (ROC) chart [35] that allows to visualize the trade-off between sensitivity (true positive rate), i.e., T P R = n 00 n 00 + n 01 and 1-specificity (false positive rate), i.e., F P R = n 10 n 10 + n 11 for any confusion matrix corresponding to selected PBDF.

4. Simulation Study

To conduct the critical comparison of four proposed classifiers, we begin with some simulation studies that based on spatial layout of n = 20 AUs distributed in S [ 0 , 5 ] × [ 0 , 5 ] . Data collected in each locations followed the follows HMM with T = 50 and normally distributed observations and q = 3 like first order trend surface model, i.e., the first explanatory variable is constant equal 1, these second and third explanatory variables are x and y coordinates. Define the imbalance ratio for training sample by I R = t = 1 T I ( Y t ( i ) = 1 ) / T assuming that it is constant for all i = 1 , , n .
Spatial sampling set illustrated by the graph with 20 vertices is depicted in Figure 1. The neighborhood system is specified by the graph edges.
For all 20 AUs, spatially weighted estimates of Gaussian distribution parameters and transition probabilities are presented in Table 2.
Conditional distribution of Y T + 1 ( i ) given Y T ( i ) = 1 is Bernoully with parameter a ^ 11 ( i ) , i.e., ( Y T + 1 ( i ) | Y T ( i ) = 1 ) B e ( a ^ 11 ( i ) ) . Hence, it is obvious that ( Y T + 1 ( i ) | Y T ( i ) = 0 ) B e ( a ^ 00 ( i ) ) .
At first, we generate m = 100 simulation runs (replications) for test label by the Bernoulli distribution described above and corresponding conditional Gaussian observation.
Furthermore, we compute performance measures ACC and BAC for each AUs and present their spatially averaged values in Table 3.
As we can see from Table 3, classifiers based on HMM with CSW level of spatial weighting in majority cases have an advantage over one with PSW level of spatial weighting by ACC as well as BAC performance measures for various I R values.
Estimation method M1 has advantage against M2 for smaller values of I R = 0.1 , 0.3 , 0.4 . However, M2 ensures greater classification accuracy for I R = 0.6 , 0.7 , 0.9 .

5. Real Data Example

The numerical analysis of annual death rate data collected by the Institute of Hygiene of the Republic of Lithuania from the 60 municipalities in the period from 2001 to 2019 is carried out.
Crude death rate for each municipality measured in units per one hundred thousand population is considered as variable Z. We consider two label variables: one is specified by the threshold mortality index due to acute cardiovascular event (ACE) and other is specified by the threshold mortality index due to diseases of the circulatory system (CSD).
Cases with index values less than threshold have label value equal 0 and have label value equal 1. Here, we consider the case with constant mean, i.e., μ l ( s ; t ) = β l , with x ( s ) = 1 .
Then, for i = 1 , , 60
X ( i ) = 1 y 1 ( i ) y 1 ( i ) 1 y 2 ( i ) y 2 ( i ) 1 y T ( i ) y T ( i ) .
Numerical illustrations are performed on 60 areas in two dimensional areas that are depicted in Figure 2.
Data in the period from 2001 to 2018 ( t = 1 , , 18 ) are used for training and remaining data (period 2019) are used for testing. Hence, T = 18 and n = 60 , i.e., we consider 18 × 60 observations for training and 60 for testing. Here, I R values are calculated for testing data by changing the thresholds in specification of the mortality levels.
The values of performance measures for the proposed classifiers specified in Equations (6) and (7) are presented in Table 4.
As it might be seen from Table 4, classifiers based on HMM with CSW level of spatial weighting in majority cases have an advantage over one with PSW level of spatial weighting by ACC as well BAC performance measures for various I R values. The similar trend is detected for CSD disease case. Exceptions to this rule is depicted in the table by bold figures.
ROC plot of is presented in Figure 3 visualizes the classifier performances with I R = 0.2 induced by different classifiers. Depicted points represent four considered classifiers and a random classifier. It is easy to check that the area under the curve in the ROC plot is equal the performance measure BAC.
As can be seen from Figure 3, the classifier CSW2 shows advantage against others since has the largest area under the curve that is equal to 0.95 (see in Table 4).

6. Conclusions

In this paper, we developed the novel supervised generative approach to Bayesian classification of areal Gaussian data based on HMM implementing spatial weighting in several levels. A real data study has been conducted, and critical comparison of the performance of the classifiers with decision threshold values induced by different spatial autocorrelation indexes are performed.
The proposed methodology has several attractive features that make it compare favorably against other approaches to generative supervised spatial classification based on HMM.
First, the method is applicable to HMM with discrete and continuous observation on regular and irregular areal units.
Second, the approach provides an easily interpretable methods of spatial weighting for the local HMM parameters.
Third, proposed classification methodology can be easily specified in terms of undirected and directed graphs.
The obtained results with simulated and real data showed that the spatial weighting level significantly influenced the performance of proposed classifiers.
Based on the calculation results, the suggestions for the selection of transition probabilities estimation method is presented.
It should be also noted that derived closed form PCC could be essentially used in specifying theoretical values of model parameters.
There are several reasons for further research. First, there is further scope for exploring techniques supervising classification due to HMM of spatiotemporal Gaussian data to multiclass case. Second, future research may also include implementation of the proposed classification technique in the context of other continuous non-Gaussian models of observations. Third, in future research, it is reasonable to consider second-order Markov chain model for labels instead of first-order Markov chain. That will enable us to incorporate more information on stochastic temporal dependence.
Presented critical comparison of the performances of classifiers with different level of incorporated spatial information and types of parameter estimators can aid in the selection of proper classification rules of spatiotemporal areal data specified by HMM with Gaussian or other continuous observations.

Author Contributions

Conceptualization, K.D.; methodology, M.K. and K.D.; software, M.K.; data curation, M.K.; formal analysis, L.Š.-V.; writing—original draft preparation, M.K. and L.Š.-V.; writing—review and editing, M.K. and L.Š.-V.; visualization, M.K.; supervision, K.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: the Institute of Hygiene of the Republic of Lithuania https://stat.hi.lt/ (accessed on 1 January 2020).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HMMHidden Markov Model
BDFBayes discriminant function
ACCAverage accuracy rate
BACBalanced accuracy rate
PSWPartial spatial weighting
CSWComplete spatial weighting
AUAreal unit
PDFProbability density function
PCCProbability of correct classification
MSPEMean squared prediction error
MLMaximum likelihood
PBDFPlug-in Bayes discriminant function
ROCReceiver Operator Characteristic
ACEAcute cardiovascular event

References

  1. Ng, A.Y.; Jordan, M.I. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes. NIPS Neural Inf. Process. Syst. 2001, 14, 33. [Google Scholar]
  2. Zuanetti, D.A.; Milan, L.A. Second-order autoregressive Hidden Markov Model. Braz. J. Probab. Stat. 2017, 31, 653–665. [Google Scholar] [CrossRef]
  3. Bryan, J.D.; Levinson, S.E. Autoregressive Hidden Markov Model and the Speech Signal. Procedia Comput. Sci. 2015, 61, 328–333. [Google Scholar] [CrossRef] [Green Version]
  4. Nguyen, L. Continuous Observation Hidden Markov Model. Rev. Kasmera 2016, 44, 65–149. [Google Scholar]
  5. Gong, W.; Fang, S.; Yang, G.; Ge, M. Using a Hidden Markov Model for Improving the Spatial-Temporal Consistency of Time Series Land Cover Classification. Int. J. Geo-Inf. 2017, 6, 292. [Google Scholar] [CrossRef] [Green Version]
  6. Shekhar, S.; Schrater, P.R.; Vatsavai, R.R.; Wu, W.; Chawla, S. Spatial contextual classification and prediction models for mining geospatial data. IEEE Trans. Multimed. 2002, 4, 174–188. [Google Scholar]
  7. Atkinson, P.M.; Lewis, P. Geostatistical classification for remote sensing: An introduction. Comput. Geosci. 2000, 26, 361–371. [Google Scholar] [CrossRef]
  8. Tang, Y.; Jing, L.; Li, H.; Atkinson, P.M. A multiple-point spatially weighted k-NN method for object-based classification. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 263–274. [Google Scholar] [CrossRef] [Green Version]
  9. Kadhem, S.K.; Hewson, P.; Kaimi, I. Using hidden Markov models to model spatial dependence in a network. Aust. N. Z. J. Stat. 2018, 60, 423–446. [Google Scholar] [CrossRef]
  10. Hamdi, A.; Shaban, K.; Erradi, A.; Mohamed, A.; Rumi, S.K.; Salim, F.D. Spatiotemporal data mining: A survey on challenges and open problems. Artif. Intell. Rev. 2022, 55, 1441–1488. [Google Scholar] [CrossRef]
  11. Mishra, B.; Dahal, A.; LuinteL, N.; Shahi, T.B.; Panthi, S.; Pariyar, S.; Ghimire, B.R. Methods in the spatial deep learning: Current status and future direction. Spat. Inf. Res. 2022, 30, 215–232. [Google Scholar] [CrossRef]
  12. Demel, S.S.; Du, J. Spatio-temporal models for some data sets in continuous space and discrete time. Stat. Sin. 2015, 25, 81–98. [Google Scholar]
  13. Haslett, J.; Raftery, A.E. Space-Time Modelling with Long Memory Dependence: Assessing Ireland’s Wind Power Resource. Appl. Stat. 1989, 38, 1–50. [Google Scholar] [CrossRef]
  14. Blangiardo, M.; Cameletti, M. Spatial and Spatio-Temporal Bayesian Models with R-INLA; Wiley, John Wiley & Sons: Hoboken, NJ, USA, 2015; pp. 235–258. [Google Scholar]
  15. Blangiardo, M.; Cameletti, M.; Baio, G.; Rue, H. Spatial and spatio-temporal models with R-INLA. Spat. Spatio-Temporal Epidemiol. 2013, 7, 39–55. [Google Scholar] [CrossRef] [PubMed]
  16. Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
  17. Fontanella, L.; Ippoliti, L.; Martin, R.J.S. Trivisonno. Interpolation of spatial and spatio-temporal Gaussian fields using Gaussian Markov random fields. Adv. Data Anal. Classif. 2008, 2, 63–79. [Google Scholar] [CrossRef]
  18. Giannotti, F.; Pedreschi, D. Mobility, Data Mining and Privacy: Geographic Knowledge Discovery; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  19. Berrett, C.; Calder, C.A. Bayesian spatial binary classification. Spat. Stat. 2016, 16, 72–102. [Google Scholar] [CrossRef] [Green Version]
  20. Dučinskas, K. Approximation of the expected error rate in classification of the Gaussian random field observations. Stat. Probab. Lett. 2009, 79, 138–144. [Google Scholar] [CrossRef]
  21. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  22. Sensoy, M.; Kaplan, L.; Cerutti, F.; Saleki, M. Uncertainty-Aware Deep Classifiers Using Generative Models. Proc. Aaai Conf. Artif. Intell. 2020, 34, 5620–5627. [Google Scholar] [CrossRef]
  23. Dreižienė, L.; Dučinskas, K. Comparison of spatial linear mixed models for ecological data based on the correct classification rates. Spat. Stat. 2020, 35, 100395. [Google Scholar] [CrossRef]
  24. Dučinskas, K.; Dreižienė, L. Risks of Classification of the Gaussian Markov Random Field Observations. J. Classif. 2018, 35, 422–436. [Google Scholar] [CrossRef]
  25. Dučinskas, K.; Dreižienė, L. Performance Evaluations of Gaussian Spatial Data Classifiers Based on Hybrid Actual Error Rate Estimators. Aust. J. Stat. 2020, 49, 27–34. [Google Scholar] [CrossRef] [Green Version]
  26. Karaliutė, M.; Dučinskas, K. Classification of Gaussian spatio-temporal data with stationary separable covariances. Nonlinear Anal. Model. Control 2021, 26, 363–374. [Google Scholar] [CrossRef]
  27. Karaliutė, M.; Dučinskas, K. Supervised linear classification of Gaussian spatio-temporal data. Liet. Mat. Rinkinys. Proc. Lith. Math. Soc. 2021, 62, 9–15. [Google Scholar] [CrossRef]
  28. McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition; Wiley: New York, NY, USA, 2004. [Google Scholar]
  29. Diggle, P.J.; Ribeiko, P.J.; Christensen, O.F. An introduction to model-based geostatistics. Spat. Stat. Comput. Methods Lect. Notes Stat. 2003, 173, 43–86. [Google Scholar]
  30. Zhu, Z.; Stein, M.L. Spatial sampling design for prediction with estimated parameters. J. Agric. Biol. Environ. Stat. 2006, 11, 24–44. [Google Scholar] [CrossRef]
  31. López, V.; Fernández, A.; García, S.; Palade, V. Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
  32. Orriols-Puig, A.E. Bernadó-Mansilla, Evolutionary rule-based systems for imbalanced datasets. Soft Comput. 2009, 13, 213–225. [Google Scholar] [CrossRef]
  33. Ortigosa-Hernández, J.; Inza, I.; Lozano, J.A. Measuring the class-imbalance extent of multi-class problems. Pattern Recognit. Lett. 2017, 98, 32–38. [Google Scholar] [CrossRef]
  34. Soleymani, R.; Granger, E.; Fumera, G. F-measure curves: A tool to visualize classifier performance under imbalance. Pattern Recognit. 2020, 100, 107146. [Google Scholar] [CrossRef]
  35. Scaccia, L.; Martin, R.J. Testing axial symmetry and separability of lattice processes. J. Stat. Plan. Inference 2005, 131, 19–39. [Google Scholar] [CrossRef]
Figure 1. Spatial sampling set S 20 .
Figure 1. Spatial sampling set S 20 .
Mathematics 11 00347 g001
Figure 2. Classified Lithuanian municipalities in 2018 (left) and 2019 (right). Yellow color areas indicate municipalities with low level of mortality due to ACE (with label value 0) and red areas indicates municipalities with high level of mortality due to ACE (with label value 1).
Figure 2. Classified Lithuanian municipalities in 2018 (left) and 2019 (right). Yellow color areas indicate municipalities with low level of mortality due to ACE (with label value 0) and red areas indicates municipalities with high level of mortality due to ACE (with label value 1).
Mathematics 11 00347 g002
Figure 3. ROC plots for the problems with class label variables ACE with I R = 0.2 .
Figure 3. ROC plots for the problems with class label variables ACE with I R = 0.2 .
Mathematics 11 00347 g003
Table 1. Confusion matrix for ith AU.
Table 1. Confusion matrix for ith AU.
Y ^ T + 1 ( i )
Y T + 1 ( i ) 01
0 m 00 ( i ) m 01 ( i )
1 m 10 ( i ) m 11 ( i )
where m k l ( i ) = I ( Y T + 1 ( i ) = k ) I ( Y ^ T + 1 ( i ) = l ) for k,l = 0,1.
Table 2. Spatially weighted estimators of the HMM parameters for 20 AUs I R = 0.7 .
Table 2. Spatially weighted estimators of the HMM parameters for 20 AUs I R = 0.7 .
i β ˜ 0 ( i ) β ˜ 1 ( i ) σ ˜ i 2 a ˜ y T ( i ) , 0 ( i ) : M 1 a ˜ y T ( i ) , 0 ( i ) : M 2 a ˜ y T ( i ) , 1 ( i ) : M 1 a ˜ y T ( i ) , 1 ( i ) : M 2
1−0.043, −0.133, −0.1260.043, 0.133, 0.1260.2490.3610.5610.3410.439
2−0.035, −0.095, −0.1280.044, 0.125, 0.1640.3230.1670.6840.2000.316
3−0.040, −0.103, −0.1410.044, 0.117, 0.1570.3170.8060.3420.7770.658
4−0.015, −0.069, −0.0650.044, 0.206, 0.1940.3200.3690.6630.2410.337
5−0.015, −0.069, −0.0680.043, 0.205, 0.2010.3140.3290.6630.2330.337
6−0.099, −0.114, −0.2620.008, 0.013, 0.0210.3180.6650.4690.6350.531
7−0.044, −0.122, −0.1400.044, 0.121, 0.1380.2800.7080.4030.7060.597
8−0.015, −0.069, −0.0660.044, 0.206, 0.1970.3170.6510.3370.7630.663
9−0.049, −0.024, −0.2110.024, 0.012, 0.1010.3160.7510.4290.6990.571
10−0.142 −0.351, −0.067−0.028, −0.061, −0.0100.2610.6460.4590.6390.541
11−0.165, −0.256, −0.296−0.013, −0.014, −0.0210.2790.4130.6020.2980.398
12−0.119, −0.233, −0.2320.015, 0.031, 0.0300.3310.6720.4290.6920.571
13−0.201, −0.405, −0.080−0.092, −0.180, −0.0370.3240.7260.4800.6330.520
14−0.049, −0.024, −0.2110.024, 0.012, 0.1010.3160.7510.4290.6990.571
15−0.263, −0.064, −0.466−0.132, −0.033, −0.2230.2630.5230.3270.7670.674
16−0.171, −0.378, −0.073−0.060, −0.120, −0.0240.2930.3140.5310.3640.469
17−0.142, −0.244, −0.2640.001, 0.008, 0.0050.3050.6300.4130.6970.587
18−0.099, −0.114, −0.2620.008, 0.013, 0.0210.3180.6650.4690.6350.531
19−0.561, −0.150, −0.466−0.464, −0.125, −0.3480.3380.4000.6530.2520.347
20−0.412, −0.107, −0.466−0.298, −0.079, −0.2850.3010.5620.3370.7580.663
Table 3. Averaged performance of classifiers based on PBDF for simulated data.
Table 3. Averaged performance of classifiers based on PBDF for simulated data.
IR PSW1CSW1PSW2CSW2
0.1ACC0.83950.88500.54800.6125
BAC0.88640.92410.67780.7075
0.3ACC0.86600.88750.81000.8235
BAC0.86470.88230.83000.8450
0.4ACC0.73950.75250.73650.7455
BAC0.74210.75680.74130.7519
0.5ACC0.87900.92400.87700.9220
BAC0.88530.92780.88310.9261
0.6ACC0.78250.79450.80350.8260
BAC0.78790.79380.81140.8283
0.7ACC0.91500.91750.92150.9255
BAC0.90530.90950.90660.9137
0.9ACC0.86100.85850.96250.9655
BAC0.78780.78640.87770.8891
Table 4. Performance measures of classifiers based on PBDF for real data.
Table 4. Performance measures of classifiers based on PBDF for real data.
IR PSW1CSW1PSW2CSW2
ACE0.2ACC0.71670.83330.91670.9500
BAC0.85590.91530.95760.9746
0.3ACC0.66670.78330.80000.8667
BAC0.82460.88600.89470.7719
0.4ACC0.60000.68330.75000.8333
BAC0.62960.67590.71300.6852
0.5ACC0.63330.65000.75000.8333
BAC0.64710.56540.66990.7190
CSD0.2ACC0.65000.81670.78330.8500
BAC0.58040.66960.65180.6875
0.3ACC0.75000.81670.81670.8167
BAC0.75000.78570.66960.6696
0.4ACC0.66670.78330.80000.7667
BAC0.69790.80210.78130.7604
0.5ACC0.60000.61670.75000.7000
BAC0.61250.63750.75000.7000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dučinskas, K.; Karaliutė, M.; Šaltytė-Vaisiauskė, L. Spatially Weighted Bayesian Classification of Spatio-Temporal Areal Data Based on Gaussian-Hidden Markov Models. Mathematics 2023, 11, 347. https://doi.org/10.3390/math11020347

AMA Style

Dučinskas K, Karaliutė M, Šaltytė-Vaisiauskė L. Spatially Weighted Bayesian Classification of Spatio-Temporal Areal Data Based on Gaussian-Hidden Markov Models. Mathematics. 2023; 11(2):347. https://doi.org/10.3390/math11020347

Chicago/Turabian Style

Dučinskas, Kęstutis, Marta Karaliutė, and Laura Šaltytė-Vaisiauskė. 2023. "Spatially Weighted Bayesian Classification of Spatio-Temporal Areal Data Based on Gaussian-Hidden Markov Models" Mathematics 11, no. 2: 347. https://doi.org/10.3390/math11020347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop