Hidden Markov Model Based on Logistic Regression

Lee, Byeongheon; Park, Joowon; Kim, Yongku

doi:10.3390/math11204396

Open AccessArticle

Hidden Markov Model Based on Logistic Regression

by

Byeongheon Lee

¹,

Joowon Park

² and

Yongku Kim

^1,*

¹

Department of Statistics, Kyungpook National University, Daegu 41566, Republic of Korea

²

School of Forest Science and Landscape Architecture, Kyungpook National University, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(20), 4396; https://doi.org/10.3390/math11204396

Submission received: 28 August 2023 / Revised: 20 October 2023 / Accepted: 20 October 2023 / Published: 23 October 2023

(This article belongs to the Section Probability and Statistics)

Download

Browse Figure

Versions Notes

Abstract

:

A hidden Markov model (HMM) is a useful tool for modeling dependent heterogeneous phenomena. It can be used to find factors that affect real-world events, even when those factors cannot be directly observed. HMMs differ from traditional methods by using state variables and mixture distributions to model the hidden states. This allows HMMs to find relationships between variables even when the variables cannot be directly observed. HMM can be extended, allowing the transition probabilities to depend on covariates. This makes HMMs more flexible and powerful, as they can be used to model a wider range of sequential data. Modeling covariates in a hidden Markov model is particularly difficult when the dimension of the state variable is large. To avoid these difficulties, Markovian properties are achieved by implanting the previous state variables to the logistic regression model. We apply the proposed method to find the factors that affect the hidden state of matsutake mushroom growth, in which it is hard to find covariates that directly affect matsutake mushroom growth in Korea. We believe that this method can be used to identify factors that are difficult to find using traditional methods.

Keywords:

Bayesian analysis; hidden Markov model; hierarcical modeling; logistic regression

MSC:

62F15

1. Introduction

The hidden Markov model (HMM), introduced by Baum and his colleagues in a series of papers (Baum and Petrie [1]), is a useful way to represent dependent heterogeneous phenomena. It is often used to model stochastic processes and time-dependent sequences and has been applied to a wide range of applications such as medical diagnosis, financial forecasting, and natural language processing. In many applications, the marginal distribution of observations is clearly a multi-model, meaning that the observations come from a mixture of different distributions associated with different regimes. This behavior is a key attribute of HMMs. HMMs can be used to detect these regimes and find the hidden states that correspond to them. Once the hidden states have been found, a mixture of different distributions is identified on them (Titterington et al. [2]; Zucchini and MacDonald [3]). Inference in HMMs is typically based on maximum likelihood or Bayesian approaches. However, the dependence structure of HMMs can be more difficult to compute than that of regular mixtures. Robert et al. [4] provided an efficient Bayesian estimate of HMMs via Gibbs sampling, and Chib [5] introduced a widely used state simulation procedure (see also Campbell et al. [6]). Li et al. [7] developed an approach to multivariate time series anomaly detection in which an HMM is engaged here to detect anomalies in multivariate time series. Nguyen [8] used an HMM to predict a daily stock price of three active trading stocks, Apple, Google, and Facebook, based on their historical data.

HMMs also have a number of disadvantages, such as assumption of independence and limitation of state space, while they have a number of advantages, including flexibility, interpretability, and efficiency. Another limitation of HMMs lies in directly incorporating additional explanatory variables. Covariate-dependent hidden Markov models (CDHMMs) are a class of statistical models used to model sequential data where the transition probabilities between hidden states depend on observed covariates. This makes CDHMMs well-suited for a wide range of applications, such as medical diagnosis, financial forecasting, and natural language processing. CDHMMs can be used to identify different market states, such as bull markets, bear markets, and sideways markets, or used to model the progression of Alzheimer’s disease and to identify different stages of Alzheimer’s disease. CHMMs can be also used to segment customers into different groups based on their purchase behavior. For example, CHMMs can be used to identify customers who are likely to churn and to identify customers who are likely to be high-value customers.

One of the most common approaches to modeling covariates in HMMs is to use a mixture transition distribution model. In an MTD model, the transition probabilities between hidden states are modeled as a mixture of distributions, where the weights of the mixture components are determined by the covariates. This approach is relatively simple to implement and can be used to model a wide range of covariate effects. Another approach to modeling covariates in HMMs is to use a covariate-dependent transition matrix model. In this approach, the transition probabilities between hidden states are parameterized directly using the covariates. This approach is more flexible than the MTD approach, but it can also be more difficult to implement and estimate.

Rabiner and Juang [9] applied the CHMM to speech recognition and showed that it outperformed HMMs without covariate-dependent transition probabilities. Marshall and Jones [10] discussed the application of a multi-state model to diabetic retinopathy under the assumption that a continuous time Markov process determines the transition times between disease stages. Altman [11] presented a class of mixed HMMS where covariates and random effects captured differences between an observed process and an underlying hidden process. Chamroukhi et al. [12] proposed a finite mixture model of hidden Markov regression with covariate dependence. They applied the CHMM to financial time series analysis and showed that it outperformed other methods for financial time series modeling. Maruotti [13] provided a general review for statistical methods that combine HMMs and random effects models in a longitudinal setting.

Rubin et al. [14] proposed a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. Sarkar and Zhu [15] proposed a novel approach for detecting clusters and regimes in time series data in the presence of random covariates, which is based on a CHMM with covariate-dependent transition probabilities. In a covariate-dependent transition matrix model, HMMs are extended by allowing the transition probabilities to depend on covariates through regression models. In this case, each transition probability requires a different regression model, which can be especially problematic especially when the dimension of state variable is large. To avoid this problem, we here introduce the previous hidden state as another explanatory variable to HMMs through a logistic regression model. That is, Markovian properties are achieved by implanting the previous state variables to the logistic regression model. Note that the proposed model can be applied regardless of the dimension of the state variable in the hidden Markov model.

In some cases, it can be difficult to find the factors that affect a particular phenomenon. This is because the factors may be hidden or may interact in complex ways. HMMs can be used to analyze hidden states without using observed values as the dependent variable. This can be an effective way to find significant variables if they cannot be found directly. In Korea, matsutake mushrooms have not been cultivated on a large scale. Instead, they are harvested from natural forests. The effect of climatic factors on matsutake mushroom yield has not been studied in detail. Some studies have been conducted to investigate the relationship between matsutake mushroom occurrence and weather patterns, but they have not been able to identify the significant meteorological factors. The proposed model is used to identify the factors that indirectly affect matsutake mushroom production. This could help to improve the cultivation of matsutake mushrooms in Korea.

The remaining sections are organized as follows. In Section 2, a brief summary of the hidden Markov model is given. In Section 3, we introduce a hidden Markov model based on logistic regression. Then, we present the hierarchical Bayesian procedure of the proposed model in Section 4. In Section 5 and Section 6, we demonstrate the hidden Markov model based on logistic regression by way of a simulation study and a case study example. The last section is the conclusion of this paper and mentions further study.

2. Hidden Markov Model

We here introduce a hidden Markov model. A brief definition of a hidden Markov model and the assumptions of the model are introduced. A hidden Markov model

{y_{t} : t \in N}

is a particular kind of dependent mixture. In its most general form, the hidden Markov model is defined as follows:

\begin{matrix} P (z_{t} | Z^{(t - 1)}) = P (z_{t} | z_{t - 1}) f o r t = 2, 3, \dots, T, \end{matrix}

(1)

\begin{matrix} P (y_{t} | Y^{(t - 1)}, Z^{(t)}) = P (y_{t} | z_{t}) f o r t = 1, 2, \dots, T, \end{matrix}

(2)

where

z_{t}

and

y_{t}

represent an unobserved state and an observation at the time t, respectively. Note that

Z^{(t)}

denotes a vector of unobserved states from time 1 to time t,

Y^{(t)}

denotes a vector of observations from time 1 to time t, and T is the number of total observations. That is,

Z^{(t)} = (z_{1}, z_{2}, \dots, z_{t})

and

Y^{(t)} = (y_{1}, y_{2}, \dots, y_{t})

.

Unlike typical mixture models, the HMM assumes that observed data are generated through a finite-valued, unobserved process. This unobserved process is assumed to be in one of a finite number of discrete states at each discrete time point, and given the previous state or states, it is assumed to transit stochastically in the Markov fashion. The data observed at each time point depend only on the value of the corresponding hidden state and are independent of others. The heterogeneity of data is represented by hidden Markov states. In other words, a pair

(z_{t}, y_{t})

with the state

z_{t} \in {1, \dots, k}

and its random value

y_{t} | z_{t} \sim f_{z_{t}} (y_{t})

. Here,

y_{t}

is considered conditionally independent given

z_{t}

. The HMM derives its name from the following two defining attributes. First,

z_{t}

is distributed as a (finite-state) Markov chain. Second,

z_{t}

is not observed.

3. Logistic Regression on Hidden State

A logistic regression is used to model the probability of a certain class or event existing such as pass or fail or to model a binary dependent variable. Consider a model with p predictors,

x_{1}, \dots, x_{p}

, and one binary response variable Y with a probability

π = P (Y = 1)

. A linear relationship between the predictor variables and the log-odds of the event that

Y = 1

is assumed. This linear relationship can be written in the following mathematical form, where

(β_{0}, β_{1}, \dots, β_{p})

are parameters of the model:

\begin{matrix} l o g (\frac{π}{1 - π}) = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} \end{matrix}

By simple algebraic manipulation, the probability that

Y = 1

is

\begin{matrix} π = \frac{\exp (β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p})}{1 + \exp (β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p})} . \end{matrix}

Therefore, predictors affecting the binary dependent variable Y can be found with logistic regression analysis by estimating parameters

(β_{0}, β_{1}, \dots, β_{p})

.

We assume that

(x_{t 1}, x_{t 2}, \dots, x_{t p})

are the predictor variables and the space of hidden state

z_{t}

of HMM is

{0, 1}

at the time point t. That is,

z_{t} \in {0, 1}

is a binary variable. Then, we model the following logistic regression:

\begin{matrix} P (z_{t} = 1) = π_{t}, \end{matrix}

l o g (\frac{π_{t}}{1 - π_{t}}) = β_{0} + β_{1} x_{t 1} + \dots + β_{p} x_{t p} + γ z_{t - 1}

for

t = 1, \dots, T

, and then the probability that

z_{t} = 1

is

\begin{matrix} π_{t} = \frac{\exp (β_{0} + β_{1} x_{t 1} + \dots + β_{p} x_{t p} + γ z_{t - 1})}{1 + \exp (β_{0} + β_{1} x_{t 1} + \dots + β_{p} x_{t p} + γ z_{t - 1})}, \end{matrix}

where

γ z_{t - 1}

represents the Markovian property in Equation (1). That is, the term involving

z_{t - 1}

allows for a shift in the transition probability depending on the previous hidden state. Therefore, hidden state model

f_{z_{t} | β, γ, z_{t - 1}} (z_{t})

can be expressed as follows:

\begin{matrix} f_{z_{t} | β, γ, z_{t - 1}} (z_{t}) \sim B e r (π_{t}), π_{t} = \frac{\exp (β^{'} X_{t} + γ z_{t - 1})}{1 + \exp (β^{'} X_{t} + γ z_{t - 1})}, \end{matrix}

where

X_{t} = {(1, x_{t 1}, x_{t 2}, \dots, x_{t p})}^{'}

and

β = {(β_{0}, β_{1}, \dots, β_{p})}^{'}

. Additionally, in the function

f_{z_{1} | β, γ, z_{0}} (z_{1})

,

z_{0}

is a starting hidden state immediately before the hidden state

z_{1}

, and it is assumed as follows:

\begin{matrix} P (z_{0} = 1) = π_{0}, \end{matrix}

\begin{matrix} f_{z_{0} | π_{0}} (z_{0}) \sim B e r (π_{0}), \end{matrix}

where

π_{0}

is a parameter of the state

z_{0}

distribution. Note that

B e r (\cdot)

denotes Bernoulli distribution.

4. Hierarchical Bayesian Approach to HMM Based on Logistic Regression

Now we introdue explanatory variables to HMMs through a logistic regression model in which we add the previous state as another explanatory variable in the logistic regression model to maintain the temporal dependence of the HMM. At first, hierarchical structures are described, then we check the full conditional distributions of each parameter for Gibbs sampling. In addition, prediction steps of the proposed model are shown.

4.1. HMM Based on Logistic Regression

In the transition probability distribution

f_{y_{t} | z_{t}} (y_{t})

, it is assumed that

\begin{matrix} y_{t} | z_{t} = 1 \sim f_{1} (y_{t} | θ_{1}), \\ y_{t} | z_{t} = 0 \sim f_{0} (y_{t} | θ_{0}) \end{matrix}

for

t = 1, \dots, T

, where

θ_{1}

and

θ_{0}

are parameters of the mixture distributions. Therefore, the data model

f_{Y^{(T)} | Z^{(T)}, θ} (Y^{(T)})

can be expressed as

\begin{matrix} f_{Y^{(T)} | Z^{(T)}, θ} (Y^{(T)}) = \prod_{t = 1}^{T} f_{y_{t} | z_{t}, θ} (y_{t}), \end{matrix}

where

θ = (θ_{0}, θ_{1})

. The process model

f_{Z^{(T)}, z_{0}, π_{0} | β, γ} (Z^{(T)})

is given by

\begin{matrix} f_{Z^{(T)} | β, γ, z_{0}} (Z^{(T)}) = \prod_{t = 1}^{T} f_{z_{t} | β, γ, z_{t - 1}} (z_{t}) . \end{matrix}

Finally, we consider prior distributions for all model parameters

β, γ, θ

and an initial state

z_{0}

. In summary, the hierarchical structure can be expressed as follows:

Data model: $f_{Y^{(T)} | Z^{(T)}, θ} (Y^{(T)}) = \prod_{t = 1}^{T} f_{y_{t} | z_{t}, θ} (y_{t})$
Process model: $f_{Z^{(T)} | β, γ, z_{0}} (Z^{(T)}) = \prod_{t = 1}^{T} f_{z_{t} | β, γ, z_{t - 1}} (z_{t})$
Prior model: $π (β, γ) π (θ) π (z_{0}, π_{0}) = π (β, γ) π (θ) π (z_{0} | π_{0}) π (π_{0})$

4.2. Bayesian Analysis

Since a direct sampling from a joint posterior distribution of model parameters is computationally difficult, simplified or approximate MCMC approaches such as the Metropolis–Hasting (M-H) approach within the Gibbs algorithm are needed.

The joint posterior distribution

π (β, γ, θ, π_{0}, z_{0}, Z^{(T)} | Y^{(T)})

is proportional to

\begin{matrix} f_{Y^{(T)} | Z^{(T)}, θ} (Y^{(T)}) f_{Z^{(T)} | β, γ, z_{0}} (Z^{(T)}) π (β, γ) π (θ) π (z_{0}, π_{0}) \\ = [\prod_{t = 1}^{T} f_{y_{t} | z_{t}, θ} (y_{t}) f_{z_{t} | β, γ, z_{t - 1}} (z_{t})] π (β, γ) π (θ) π (z_{0}, π_{0}) . \end{matrix}

Note that

π (\cdot)

and

π (\cdot | Y^{(T)})

denote a prior distribution function and a posterior distribution function given

Y^{(T)}

, respectively. Note that the full conditional distribution used in the Gibbs algorithm can be easily found through the joint posterior distribution.

The full conditional distribution of logistic regression parameters,

(β, γ)

, is of the form

\begin{matrix} π (β, γ | θ, π_{0}, z_{0}, Z^{(T)}, Y^{(T)}) \propto [\prod_{t = 1}^{T} f_{z_{t} | β, γ, z_{t - 1}} (z_{t})] π (β, γ), \end{matrix}

where the prior distribution

π (β, γ)

generally is assumed to be non-informative, which is a constant. Note that

π (\cdot | \cdot, Y^{(T)})

denotes a conditionl posterior distribution function.

The full conditional distribution of data model parameters,

θ

, can be expressed as

\begin{matrix} π (θ | β, γ, π_{0}, z_{0}, Z^{(T)}, Y^{(T)}) \propto [\prod_{t = 1}^{T} f_{y_{t} | z_{t}, θ} (y_{t})] π (θ), \end{matrix}

where the prior distribution

π (θ)

is also assumed to be non-informative prior, which is a constant.

The full conditional distributions of parameters,

π_{0}

and

z_{0}

, are given by

\begin{matrix} π (π_{0} | β, γ, θ, z_{0}, Z^{(T)}, Y^{(T)}) \propto f_{z_{0} | π_{0}} (z_{0}) π (π_{0}) \propto π_{0}^{z_{0}} {(1 - π_{0})}^{1 - z_{0}} π (π_{0}) \end{matrix}

(3)

and

\begin{matrix} π (z_{0} | β, γ, θ, π_{0}, Z^{(T)}, Y^{(T)}) \propto f_{z_{0} | π_{0}} (z_{0}) f_{z_{1} | β, γ, z_{0}} (z_{1}), \end{matrix}

respectively. Note that we here assume

U (0, 1)

as the prior distribution

π (π_{0})

. From a non-informative prior assumption, the full conditional distribution of parameter

π_{0}

is a beta distribution of the form

\begin{matrix} π (π_{0} | β, γ, θ, z_{0}, Z^{(T)}, Y^{(T)}) \sim B e t a (z_{0} + 1, 2 - z_{0}) . \end{matrix}

Note that

B e t a (\cdot, \cdot)

denotes a beta distribution. In addition, since the hidden state space is assumed to be

{0, 1}

, the probability of the hidden state

z_{0}

can be expressed as

\begin{matrix} P [z_{0} = s | β, γ, θ, π_{0}, Z^{(T)}, Y^{(T)}] = \frac{P (z_{0} = s | π_{0}) f_{z_{1} | β, γ, z_{0} = s} (z_{1})}{\sum_{k = 0}^{1} P (z_{0} = k | π_{0}) f_{z_{1} | β, γ, z_{0} = k} (z_{1})} \end{matrix}

for

s = 0, 1

. As a result, the posterior distribution of

z_{0}

can be defined as follows:

\begin{matrix} π (z_{0} | β, γ, θ, π_{0}, Z^{(T)}, Y^{(T)}) \sim B e r [\frac{P (z_{0} = s | π_{0}) f_{z_{1} | β, γ, z_{0} = s} (z_{1})}{\sum_{k = 0}^{1} P (z_{0} = k | π_{0}) f_{z_{1} | β, γ, z_{0} = k} (z_{1})}] . \end{matrix}

The full conditional distribution of a hidden state

z_{j}

can be expressed as

\begin{matrix} π (z_{j} | β, γ, θ, π_{0}, z_{0}, Z_{- j}^{(T)}, Y^{(T)}) \propto f_{y_{j} | z_{j}, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j}} (z_{j + 1}) f_{z_{j} | β, γ, z_{j - 1}} (z_{j}) \end{matrix}

for

j = 1, \dots, T - 1

, and

\begin{matrix} π (z_{T} | β, γ, θ, π_{0}, z_{0}, Z^{(T - 1)}, Y^{(T)}) \propto f_{y_{T} | z_{T}, θ} (y_{T}) f_{z_{T} | β, γ, z_{T - 1}} (z_{T}) \end{matrix}

where

Z_{- j}^{(T)}

are the unobserved states from time 1 to time t except time j. That is,

Z_{- j}^{(T)} = (z_{1}, \dots, z_{j - 1}, z_{j + 1}, \dots, z_{T})

. In the same way of

z_{0}

, the posterior distribution of

z_{j}

can be obtained as follows:

\begin{matrix} π (z_{j} | β, γ, θ, π_{0}, z_{0}, Z_{- j}^{(T)}, Y^{(T)}) \\ \sim B e r [\frac{f_{y_{j} | z_{j} = 1, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j} = 1} (z_{j + 1}) P (z_{j} = 1 | β, γ, z_{j - 1})}{\sum_{k = 0}^{1} f_{y_{j} | z_{j} = k, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j} = k} (z_{j + 1}) P (z_{j} = k | β, γ, z_{j - 1})}] \end{matrix}

for

j = 1, \dots, T - 1

, and

\begin{matrix} π (z_{T} | β, γ, θ, π_{0}, z_{0}, Z^{(T - 1)}, Y^{(T)}) \sim B e r [\frac{f_{y_{T} | z_{T} = 1, θ} (y_{T}) P (z_{T} = 1 | β, γ, z_{T - 1})}{\sum_{k = 0}^{1} f_{y_{T} | z_{T} = k, θ} (y_{T}) P (z_{T} = k | β, γ, z_{T - 1})}] . \end{matrix}

Each full conditional distribution obtained in this way can be used for sampling with the Gibbs algorithm. Note that the full conditional distributions of logistic parameters

(β, γ)

, data model parameters

θ

, and

z_{0}

are not given as a closed form. Thus the M-H algorithm is required for sampling from these distributions.

4.3. Prediction

The posterior distribution sample can be generated in Section 4.2, and the observation model and process model are already assumed in (1) and (2), and then the Bayesian prediction process is carried out in the following steps:

Generate model parameters $β, γ, θ, z_{T}$ from the $π (β, γ, θ, Z^{(T)} | Y^{(T)})$ .
Predict the state variable $z_{T + 1}$ from the $f_{z_{T + 1} | β, γ, z_{T}} (z_{T + 1})$ .
Finally predict $y_{T + 1}$ from $f_{y_{T + 1} | z_{T + 1}, θ} (y_{T + 1})$ .

Note that step 1 can be done through the Gibbs algorithm.

5. Simulation Study

We present the results of a small simulation study designed to investigate the proposed model-based approach in terms of the parameter estimation. We consider two models based on Gaussian mixture and exponential mixture, respectively. Because our application of interest is the effects of the previous hidden state and the degree of mixing of two distributions on the estimations, which is also considered in the simulation, for each model, we execute the following two steps:

Generate hidden states sequentially based on transition probabilities through the logistic regression.
Generate a sequence of observations from the mixture distribution corresponding the hidden states.

Let

Y_{t}

be the observation at time t and let

Z_{t}

be the associated hidden state. Let

x_{t}

be the covariates from

N (0, 1)

, respectively. We consider the following eight models:

$y_{t} | z_{t} = 0 \sim N (μ_{0}, σ_{0}^{2})$ , $y_{t} | z_{t} = 1 \sim N (μ_{1}, σ_{1}^{2})$ and $l o g i t (P (z_{t} = 1 | z_{t - 1})) = β_{0} + β_{1} x_{t} + γ z_{t - 1}$ .
- ( $μ_{0} = 5$ , $σ_{0}^{2} = 4$ , $μ_{1} = 13$ , $σ_{1}^{2} = 9$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.9$ ).
- ( $μ_{0} = 5$ , $σ_{0}^{2} = 4$ , $μ_{1} = 13$ , $σ_{1}^{2} = 9$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.3$ ).
- ( $μ_{0} = 5$ , $σ_{0}^{2} = 9$ , $μ_{1} = 15$ , $σ_{1}^{2} = 16$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.9$ ).
- ( $μ_{0} = 5$ , $σ_{0}^{2} = 9$ , $μ_{1} = 15$ , $σ_{1}^{2} = 16$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.3$ ).
$y_{t} | z_{t} = 0 \sim E x p (θ_{0})$ , $y_{t} | z_{t} = 1 \sim E x p (θ_{1})$ and $l o g i t (P (z_{t} = 1 | z_{t - 1})) = β_{0} + β_{1} x_{t} + γ z_{t - 1}$ .
- ( $θ_{0} = 2$ , $θ_{1} = 5$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.9$ ).
- ( $θ_{0} = 2$ , $θ_{1} = 5$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.4$ ).
- ( $θ_{0} = 2$ , $θ_{1} = 3$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.9$ ).
- ( $θ_{0} = 2$ , $θ_{1} = 3$ ) and ( $β_{0} = 1$ , $β_{1} = 2$ , $γ = 0.4$ ).

Note that the settings of (

μ_{0} = 5

,

σ_{0}^{2} = 4

,

μ_{1} = 13

,

σ_{1}^{2} = 9

) and (

θ_{0} = 2

,

θ_{1} = 5

) mean a mild mixing of two distributions, and the settings of (

μ_{0} = 5

,

σ_{0}^{2} = 9

), (

μ_{1} = 15

,

σ_{1}^{2} = 16

), and (

θ_{0} = 2

,

θ_{1} = 3

) mean a strong mixing of two distributions. In addition,

γ

controls the effect of the previous hidden state. The eight models were simulated and averaged statistics calculated, with the simulation exercise repeated 200 times. Table 1 and Table 2 show the averaged posterior means and posterior standard deviations for parameters based on Gaussian mixtures and exponential mixtures, respectively. It turns out that the stronger the mixing and the weaker the influence of hidden states, the greater the uncertainty in the estimation. However, overall, it demonstrates a good performance.

6. Data Analysis

Matsutake mushrooms are a valuable forest product that can increase rural incomes by up to 100%. They are wild mushrooms that grow naturally in Pinus densiflora forests. In Korea, matsutake mushrooms have not been cultivated due to a lack of agricultural technology. However, the impact of climate on matsutake mushroom yield has not been well studied. Previous studies have attempted to relate matsutake mushroom occurrence to weather patterns in Korea but have not been able to identify the specific meteorological factors that are most influential. Here, we use hidden Markov models (HMMs) to identify hidden states in a specific area of Bonghwa-gun, Gyeongsangbuk-do. We then use these hidden states to identify the meteorological factors that indirectly affect matsutake mushroom yield.

6.1. Data Description

The data for this study are the annual production of matsutake mushrooms (kg) observed in a region of Bonghwa-gun, Gyeongsangbuk-do, from 1997 to 2016. Matsutake mushrooms are harvested in Korea from late August to late October, with the peak harvest period occurring for about 10 days from late September to early October. The production of matsutake mushrooms can vary significantly depending on the weather. Therefore, this study considered the meteorological factors in May, June, July, and August from 1997 to 2016. The meteorological factors were combined and analyzed in terms of time and space. The detailed variables are summarized in Table 3.

6.2. Hierarchical Modeling

We here assume that the production of matsutake mushroom is in one of two states: a lean year or a bumper year. Each hidden state has an independent distribution of total matsutake mushroom production, which is assumed to be a Gaussian distribution. From year to year, persistence varies from state to state because it is governed by state and predictor each time. Let

z_{t}

denote the (unobserved) annual year state at time t (i.e.,

z_{t} = 1

for a bumper year,

z_{t} = 0

for a lean year). Let

y_{t}

be the (observed) total matsutake mushroom production amount at time t for

1 \leq t \leq T

. The total matsutake mushroom production amount

y_{t}

is conditionally independent given the current year state

z_{t}

. For a hierarchical modeling, we express the full conditional distribution more specifically.

We consider a Gaussian mixture for an observation

y_{t}

. That is,

f_{1} (y_{t}) \sim N (θ_{1}, σ_{1}^{2})

, and

f_{0} (y_{t}) \sim N (θ_{0}, σ_{0}^{2})

. We assume informative prior

π (β, γ) \propto c

, where c is a constant. Then, the full conditional distribution of the logistic parameters

(β, γ)

is obtained by

\begin{matrix} π (β, γ | θ, π_{0}, z_{0}, Z^{(T)}, Y^{(T)}) \propto \prod_{t = 1}^{T} {[\frac{\exp (β^{'} X_{t} + γ z_{t - 1})}{1 + \exp (β^{'} X_{t} + γ z_{t - 1})}]}^{z_{t}} {[\frac{1}{1 + \exp (β^{'} X_{t} + γ z_{t - 1})}]}^{1 - z_{t}} . \end{matrix}

Based on the informative prior

π (θ) \propto c

, where c is a constant, the full conditional distribution of data model parameters

θ = (θ_{0}, θ_{1}, σ_{0}^{2}, σ_{1}^{2})

is given by

\begin{matrix} π (θ | β, γ, π_{0}, z_{0}, Z^{(T)}, Y^{(T)}) \propto \prod_{t = 1}^{T} [z_{t} f_{1} (y_{t}) + (1 - z_{t}) f_{0} (y_{t})] . \end{matrix}

Assuming

π (π_{0}) \sim U (0, 1)

, the full conditional distribution of hyperparameter

π_{0}

and initial state

z_{0}

can be expressed as

\begin{matrix} π (π_{0} | β, γ, θ, z_{0}, Z^{(T)}, Y^{(T)}) \sim B e t a (z_{0} + 1, 2 - z_{0}) \end{matrix}

and

\begin{matrix} π (z_{0} | β, γ, θ, π_{0}, Z^{(T)}, Y^{(T)}) \sim B e r [\frac{P (z_{0} = 1 | π_{0}) f_{z_{1} | β, γ, z_{0} = 1} (z_{1})}{\sum_{k = 0}^{1} P (z_{0} = k | π_{0}) f_{z_{1} | β, γ, z_{0} = k} (z_{1})}], \end{matrix}

where

\begin{matrix} f_{z_{0} | π_{0}} (z_{0}) f_{z_{1} | β, γ, z_{0}} (z_{1}) \\ = π_{0}^{z_{0}} {(1 - π_{0})}^{1 - z_{0}} {[\frac{\exp (β^{'} X_{1} + γ z_{0})}{1 + \exp (β^{'} X_{1} + γ z_{0})}]}^{z_{1}} {[\frac{1}{1 + \exp (β^{'} X_{1} + γ z_{0})}]}^{1 - z_{1}} . \end{matrix}

Finally, the full conditional distribution of hidden state

z_{j}

is of the form

\begin{matrix} π (z_{j} | β, γ, θ, π_{0}, z_{0}, Z_{- j}^{(T)}, Y^{(T)}) \\ \sim B e r [\frac{f_{y_{j} | z_{j} = 1, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j} = 1} (z_{j + 1}) P (z_{j} = 1 | β, γ, z_{j - 1})}{\sum_{k = 0}^{1} f_{y_{j} | z_{j} = k, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j} = k} (z_{j + 1}) P (z_{j} = k | β, γ, z_{j - 1})}], \end{matrix}

where

\begin{matrix} f_{y_{j} | z_{j}, θ} (y_{j}) f_{z_{j + 1} | β, γ, z_{j}} (z_{j + 1}) f_{z_{j} | β, γ, z_{j - 1}} (z_{j}) \\ = [z_{j} f_{1} (y_{j}) + (1 - z_{j}) f_{0} (y_{j})] \\ \times \prod_{k = j}^{j + 1} {[\frac{\exp (β^{'} X_{k} + γ z_{k - 1})}{1 + \exp (β^{'} X_{k} + γ z_{k - 1})}]}^{z_{k}} {[\frac{1}{1 + \exp (β^{'} X_{k} + γ z_{k - 1})}]}^{1 - z_{k}} \end{matrix}

for

j = 1, \dots, T - 1

, and

\begin{matrix} π (z_{T} | β, γ, θ, π_{0}, z_{0}, Z^{(T - 1)}, Y^{(T)}) \sim B e r [\frac{f_{y_{T} | z_{T} = 1, θ} (y_{T}) P (z_{T} = 1 | β, γ, z_{T - 1})}{\sum_{k = 0}^{1} f_{y_{T} | z_{T} = k, θ} (y_{T}) P (z_{T} = k | β, γ, z_{T - 1})}], \end{matrix}

where

\begin{matrix} f_{y_{T} | z_{T}, θ} (y_{T}) f_{z_{T} | β, γ, z_{T - 1}} (z_{T}) \\ = [z_{T} f_{1} (y_{T}) + (1 - z_{T}) f_{0} (y_{T})] \\ \times {[\frac{\exp (β^{'} X_{T} + γ z_{T - 1})}{1 + \exp (β^{'} X_{T} + γ z_{T - 1})}]}^{z_{T}} {[\frac{1}{1 + \exp (β^{'} X_{T} + γ z_{T - 1})}]}^{1 - z_{T}} . \end{matrix}

6.3. Analysis

Before examining the parameters from the posterior distribution, we first perform a regression analysis using the production of matsutake mushrooms as the dependent variable to identify the meteorological factors that explain the variation in matsutake mushroom production. Two weather variables are found to be significant (see Table 4).

In a comparison for our proposed method, the Altman’s approach can be considered. Altman [11] considered the following model:

$l o g i t (P (z_{t} = 1) | P (z_{t - 1} = k)) = γ_{k}$ at the time point t.
Conditional on the hidden state $z_{t}$ , observation $y_{t}$ is assumed to be normally distributed with mean $θ_{t}$ and variance $σ^{2}$ , where $θ_{t} = α z_{t} + β^{'} X_{t}$ .

It can be also extended by allowing the transition probabilities to depend on covariates or random effects through logistic regression models. Table 5 shows the estimated coefficients with their standard error for the model to be compared. In Altman’s approach, only hidden state and maximum temperature during June are found to be significant, which means that most meteorological variables do not directly affect the observed annual production of matsutake mushrooms.

For a Bayesian analysis for the proposed HMM based on logistic regression, two parallel chains are used to check the convergence of the chain. After generating 35,000 samples, the initial 15,000 samples are discarded to eliminate the influence on the initial value, then 1000 samples are extracted by selecting every 20th samples to eliminate the autocorrelation. Gelman–Rubin statistics (G-R) are also checked (Gilks et al. [16]). First, let us look at the sequence of hidden states,

z_{t}

, corresponding to the matsutake production (see Figure 1). The dashed line represents the annual matsutake mushroom production, and the solid line shows the year hidden state corresponding to the annual matsutake mushroom production. When

z_{t} = 0

, it means a lean year, and when

z_{t} = 1

, it means a bumper year. Second, let us take a look at the logistic parameters,

(β, γ)

. In a logistic regression based on the hidden Markov model (HMM), we use the hidden state in each year as the dependent variable of the logistic regression model to identify the meteorological factors that indirectly explain the variation in matsutake mushroom production (see Table 6). As a result of the analysis, four variables are found to be significant. The analysis shows that the total precipitation in August and the mean ground temperature in May are additional variables that affect matsutake mushroom production, compared to the previous model. In addition, there is a significant shift in the transition probability depending on whether the previous year was a lean year or a bumper year.

7. Concluding Remarks

Covariates-dependent hidden Markov models extended by allowing the transition probabilities to depend on covariates are a powerful class of statistical models and are used to model a wider range of sequential data. In this paper, Markovian properties were achieved by implanting the previous state variables to the logistic regression model. The logistic regression based on the hidden Markov model (HMM) is applied to identify significant variables affecting the annual production of matsutake mushrooms (kg) data observed in the region of Bonghwa-gun from 1997 to 2016. The proposed method differs from the existing analysis methods by using the state variable in logistic regression and the mixture distribution of the state, rather than using the observed value directly in the analysis. It is particularly useful to identify the relevance of variables that are difficult to show in the existing models. As a result, we find additional meteorological factors affecting the annual production of matsutake mushrooms compared to the existing methods. The proposed covariates-dependent hidden Markov models can be useful tools for sequential data in the presence of covariates and used in a variety of applications, including financial time series analysis, medical diagnosis, and customer segmentation. In addition, it can be applied in modeling covariates in HMMs regardless of the dimension of the state variable in the hidden Markov model.

Author Contributions

Conceptualization, Y.K.; Methodology, Y.K.; Software, B.L.; Validation, J.P.; Formal analysis, B.L.; Data curation, J.P.; Writing—original draft, Y.K.; Supervision, Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the R&D Program for Forest Science Technology (Project No. 2019149B10-2323-0301) provided by the Korea Forest Service (Korea Forestry Promotion Institute).

Conflicts of Interest

The authors declare no conflict of interest.

References

Baum, L.E.; Petrie, T. Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 1966, 37, 1554–1563. [Google Scholar] [CrossRef]
Titterington, D.; Smith, A.; Makov, U. Statistical Analysis of Finite Mixture Distributions; John Wiley & Sons: New York, NY, USA, 1985. [Google Scholar]
Zucchini, W.; MacDonald, I.L. Hidden Markov and Other Models for Discrete-Valued Time Series; Chapman & Hall/CRC: New York, NY, USA, 1997. [Google Scholar]
Robert, C.P.; Celeux, G.; Diebolt, J. Bayesian estimation of hidden Markov chains: A stochastic implementation. Stat. Probab. Lett. 1993, 16, 77–83. [Google Scholar] [CrossRef]
Chib, S. Calculating posterior distributions and modal estimates in Markov mixture models. J. Econom. 1996, 75, 79–97. [Google Scholar] [CrossRef]
Campbell, M.J.; Macdonald, I.L.; Zucchini, W. Hidden Markov and other Models for Discrete-Valued Time Series. Biometrics 1998, 54, 394. [Google Scholar] [CrossRef]
Li, J.; Pedrycz, W.; Jamal, I. Multivariate time series anomaly detection: A framework of Hidden Markov Models. Appl. Soft Comput. 2017, 60, 229–240. [Google Scholar] [CrossRef]
Nguyen, N. An Analysis and Implementation of the Hidden Markov Model to Technology Stock Prediction. Risks 2017, 5, 62. [Google Scholar] [CrossRef]
Rabiner, L.; Juang, B. An introduction to hidden Markov models. IEEE Acoust. Speech Signal Process. Newsl. 1986, 3, 4–16. [Google Scholar] [CrossRef]
Marshall, G.; Jones, R.H. Multi-state models and diabetic retinopathy. Stat. Med. 1995, 14, 1975–1983. [Google Scholar] [CrossRef] [PubMed]
Altman, R.M. Mixed hidden Markov models: An extension of the hidden Markov model to the longitudinal data setting. J. Am. Stat. Assoc. 2007, 102, 201–210. [Google Scholar] [CrossRef]
Chamroukhi, F.; Samé, A.; Aknin, P.; Govaert, G. Model-based clustering with Hidden Markov Model regression for time series with regime changes. In Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2814–2821. [Google Scholar]
Maruotti, A. Mixed hidden makrov models for longitudinal data: An overview. Int. Stat. Rev. 2011, 79, 427–454. [Google Scholar] [CrossRef]
Rubin, M.L.; Chan, W.; Yamal, J.-M.; Robertson, C.S. A joint logistic regression and covariate-adjusted continuous-time Markov chain model. Stat. Med. 2017, 36, 4570–4582. [Google Scholar] [CrossRef] [PubMed]
Sarkar, S.; Zhu, X. Finite mixture model of hidden Markov regression withcovariate dependence. STAT 2022, 11, c469. [Google Scholar] [CrossRef]
Gilks, W.R.; Richardson, S.; Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice; Chapman and Hall: London, UK, 1996. [Google Scholar]

Figure 1. Hidden states sampled from posterior distribution for total matsutake mushroom production of Bonghwa.

Table 1. Averaged posterior means (with posterior standard deviations) for parameters based on Gaussian mixtures.

True Parameters							Posterior Means (Posterior Standard Deviations)
$μ_{0}$	$σ_{0}^{2}$	$μ_{1}$	$σ_{1}^{2}$	$β_{0}$	$β_{1}$	$γ$	$μ_{0}$	$σ_{0}^{2}$	$μ_{1}$	$σ_{1}^{2}$	$β_{0}$	$β_{1}$	$γ$
5	4	13	9	1	2	1	5.0219	4.1093	13.1212	8.9132	1.0426	1.9873	0.8852
							(0.1983)	(0.5922)	(0.4342)	(1.4984)	(0.0897)	(0.1464)	(0.1742)
						0.3	4.9823	4.1232	12.8334	8.9342	1.0426	1.9873	0.2942
							(0.2134)	(0.6312)	(0.4724)	(1.6243)	(0.0903)	(0.1483)	(0.1383)
5	9	15	16	1	2	1	5.1423	9.2742	14.8976	16.5872	1.0634	1.9674	0.9853
							(0.2072)	(0.6001)	(0.4872)	(1.9874)	(0.0932)	(0.1512)	(0.1932)
						0.3	5.2014	9.4234	14.8773	16.9532	1.0721	1.9532	0.2912
							(0.2303)	(0.7423)	(0.5023)	(2.1532)	(0.1102)	(0.1772)	(0.1492)

Table 2. Averaged posterior means (with posterior standard deviations) for parameters based on exponential mixtures.

True Parameters					Posterior Means (Posterior Standard Deviations)
$θ_{0}$	$θ_{1}$	$β_{0}$	$β_{1}$	$γ$	$θ_{0}$	$θ_{1}$	$β_{0}$	$β_{1}$	$γ$
2	5	1	2	1	2.1032	4.9882	1.0723	1.991	0.8872
					(0.1072)	(0.2314)	(0.0932)	(0.1498)	(0.1821)
				0.4	2.1032	4.9882	1.0723	1.991	0.4123
					(0.1212)	(0.2532)	(0.0987)	(0.1534)	(0.1431)
2	3	1	2	1	2.1432	2.9432	1.0934	1.9845	0.8773
					(0.1523)	(0.2944)	(0.0963)	(0.1517)	(0.1889)
				0.4	2.1632	2.8982	1.0723	1.9741	0.3889
					(0.1873)	(0.3214)	(0.1083)	(0.1562)	(0.1589)

Table 3. Variables and their descriptions.

Variables	Description
m_temp [5/6/7/8]	mean temperature during [May/June/July/August] (°C)
h_temp [5/6/7/8]	maximum temperature during [May/June/July/August] (°C)
l_temp [5/6/7/8]	minimum temperature during [May/June/July/August] (°C)
mdew_temp [5/6/7/8]	mean dew point temperature during [May/June/July/August] (°C)
ms_pre [5/6/7/8]	mean spot atmospheric pressure during [May/June/July/August] (hPa)
msea_pre [5/6/7/8]	mean sea-level pressure during [May/June/July/August] (hPa)
hsea_pre [5/6/7/8]	maximum sea-level pressure during [May/June/July/August] (hPa)
lsea_pre [5/6/7/8]	minimum sea-level pressure during [May/June/July/August] (hPa)
mwat_pre [5/6/7/8]	mean water vapor pressure during [May/June/July/August] (hPa)
hwat_pre [5/6/7/8]	maximum water vapor pressure during [May/June/July/August] (hPa)
lwat_pre [5/6/7/8]	minimum water vapor pressure during [May/June/July/August] (hPa)
sun [5/6/7/8]	amount of sunlight during [May/June/July/August] (hour)
daylight [5/6/7/8]	daylight ratio in [May/June/July/August] (%)
m_humid [5/6/7/8]	mean relative humidity in [May/June/July/August] (%)
l_humid [5/6/7/8]	minimum relative humidity in [May/June/July/August] (%)
precip [5/6/7/8]	total precipitation during [May/June/July/August] (mm)
day_precip [5/6/7/8]	the largest daily precipitation in [May/June/July/August] (mm)
hhour_precip [5/6/7/8]	the largest amount of precipitation in one hour in [May/June/July/August] (mm)
hmin_precip [5/6/7/8]	the largest amount of precipitation in ten minutes in [May/June/July/August] (mm)
m_wind [5/6/7/8]	mean wind speed during [May/June/July/August] (m/s)
h_wind [5/6/7/8]	maximum wind speed during [May/June/July/August] (m/s)
h_instwind [5/6/7/8]	the maximum instantaneous wind speed during [May/June/July/August] (m/s)
lgra_temp [5/6/7/8]	minimum grass temperature during [May/June/July/August] (°C)
mground_temp [5/6/7/8]	mean ground temperature during [May/June/July/August] (°C)

Table 4. Posterior means (Est.) and posterior standard deviations (S.E.) for parameters of regression model.

Variables	Intercept	h_temp6	lwat_pre6
Est.	−240,276.5654	153.3783	8741.5625
S.E.	110,388.0470	36.5569	3981.2952

Table 5. Posterior means (Est.) and posterior standard deviations (S.E.) by the Altman’s approach.

Model for Transition Probability
Variables	$γ_{0}$	$γ_{1}$
Est.	0.4167	0.3750
S.E.	0.1263	0.1089
Model for Observation
Variables	Intercept	h_temp6	lwat_pre6
Est.	190,359.78	−6054.89	106.57
S.E.	115,617.48	2408.28	1085.01
Variables	precip8	mground_temp5	$Z_{t}$
Est.	36.61	2518.04	−51,275.40
S.E.	22.49	3561.73	8208.23

Table 6. Posterior means (Est.) and posterior standard deviations (S.E.) for parameters of the proposed model.

Variables	$θ_{0}$	$θ_{1}$	$σ_{0}$	$σ_{1}$
Est.	11,136.43	58,555.87	8829.52	14,328.17
S.E.	284.76	395.82	149.51	263.08
Variables	Intercept	h_temp6	lwat_pre6
Est.	−13.6391	0.2367	−0.1378
S.E.	2.4270	0.0482	0.0451
Variables	precip8	mground_temp5	$Z_{t - 1}$
Est.	0.0032	0.2520	0.3321
S.E.	0.0005	0.0576	0.1533

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, B.; Park, J.; Kim, Y. Hidden Markov Model Based on Logistic Regression. Mathematics 2023, 11, 4396. https://doi.org/10.3390/math11204396

AMA Style

Lee B, Park J, Kim Y. Hidden Markov Model Based on Logistic Regression. Mathematics. 2023; 11(20):4396. https://doi.org/10.3390/math11204396

Chicago/Turabian Style

Lee, Byeongheon, Joowon Park, and Yongku Kim. 2023. "Hidden Markov Model Based on Logistic Regression" Mathematics 11, no. 20: 4396. https://doi.org/10.3390/math11204396

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hidden Markov Model Based on Logistic Regression

Abstract

1. Introduction

2. Hidden Markov Model

3. Logistic Regression on Hidden State

4. Hierarchical Bayesian Approach to HMM Based on Logistic Regression

4.1. HMM Based on Logistic Regression

4.2. Bayesian Analysis

4.3. Prediction

5. Simulation Study

6. Data Analysis

6.1. Data Description

6.2. Hierarchical Modeling

6.3. Analysis

7. Concluding Remarks

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI