Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews

Chang, Yu-Ming; Chen, Chieh-Huang; Lai, Jung-Pin; Lin, Ying-Lei; Pai, Ping-Feng

doi:10.3390/app112110291

Open AccessArticle

Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews

by

Yu-Ming Chang

^1,2

,

Chieh-Huang Chen

¹,

Jung-Pin Lai

¹

,

Ying-Lei Lin

¹

and

Ping-Feng Pai

^1,3,*

¹

PhD Program in Strategy and Development of Emerging Industries, National Chi Nan University, Nantou 54561, Taiwan

²

Department of Culinary Arts and Hotel Management, Hung Kuang University, Taichung 43302, Taiwan

³

Department of Information Management, National Chi Nan University, Nantou 54561, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(21), 10291; https://doi.org/10.3390/app112110291

Submission received: 6 September 2021 / Revised: 24 October 2021 / Accepted: 31 October 2021 / Published: 2 November 2021

Download

Browse Figures

Versions Notes

Abstract

:

For hotel management, occupancy is a crucial indicator. Online reviews from customers have gradually become the main reference for customers to evaluate accommodation choices. Thus, this study employed online customer rating scores and review text provided by booking systems to forecast monthly hotel occupancy using long short-term memory networks (LSTMs). Online customer reviews of hotels in Taiwan in various languages were gathered, and Google’s natural language application programming interface was used to convert online customer reviews into sentiment scores. Five other forecasting models—back propagation neural networks (BPNN), general regression neural networks (GRNN), least square support vector regression (LSSVR), random forest (RF), and gaussian process regression (GPR)—were employed to predict hotel occupancy using the same datasets. The numerical data indicated that the long short-term memory network model outperformed the other five models in terms of forecasting accuracy. Integrating hotel online customer review sentiment scores and customer rating scores can lead to more accurate results than using unique scores individually. The novelty and applicability of this study is the application of deep learning techniques in forecasting room occupancy rates in multilingual comment scenarios with data gathered from review text and customers’ rating scores. This study reveals that using long short-term memory networks with sentiment analysis of review text and customers’ rating scores is a feasible and promising alternative in forecasting hotel room occupancy.

Keywords:

sentiment analysis; online reviews; long short-term memory networks; forecast; hotel room occupancy

1. Introduction

Business performance management is significant for enterprises in any industry, especially in the tourism and hotel industry, because of the fixed service products in terms of content and quantity. When a room passes the sales time point, a fixed cost is paid. However, no more sales can be made, which directly affects the company’s revenue and operating performance. Therefore, the prediction and management of occupancy are important. Hotel performance management is usually composed of two aspects—revenue management and cost control. The discipline of revenue management includes two strategic levers, namely duration control and demand-based pricing [1]. Research on hotel needs consists of two major categories. One category involves developing new methods to improve the accuracy of demand prediction. This type of research usually employs many alternative forecasting models to predict travel or hotel demand and compares forecasting performance. The other category involves determining hotel needs and the relationship between needs. Existing econometric models are applied to quantify the impact of demand factors using demand elasticity analysis [2].

A variety of factors influence hotel operating performance. In the past, historical sales data have been used to analyze demand patterns and trends in addition to hotel factors, such as geographic locations, hotel facilities, service quality, and marketing activities. Recently, with the rise of online booking platforms, customers’ behavior patterns in searching and choosing hotel accommodations have changed. When online customers choose a hotel, they target the date and area. Then, they filter out hotel information that meets their needs on the basis of the accommodation options provided by the online travel agent booking website. Finally, when determining accommodation, previous customer reviews tend to be an influential factor. Thus, more studies are using customer online comment analysis as the subject of hotel management research. The Tourism Bureau of the Ministry of Transportation in Taiwan pointed out some statistical indicators ranking the operating performance of hotels, such as total revenue, annual occupancy rate, annual average price, and the annual average price of saleable rooms. The revenue of tourist hotels usually comes from the guest room and catering departments. However, the consumer reviews on the booking websites are mainly for guests’ consuming experience of hotel accommodation. Thus, hotels’ total operating incomes are not appropriate for further analysis. Furthermore, differences in total operating income and average house prices may be caused by the number of operating rooms, locations, and decoration equipment. Therefore, this study employed hotel room occupancy rates as a criterion for evaluating hotel operating performance. The rest of this study is organized as follows: Section 2 discusses the literature, Section 3 briefs long short-term memory networks, Section 4 introduces the proposed forecasting architecture for hotel room occupancy, Section 5 depicts the numerical results, and Section 6 presents the conclusions.

2. Related Literature

2.1. Online Customer Reviews Analysis

Recently, Online Travel Agencies (OTAs) have emerged, and customers can search, compare prices, and make online reservations through various platforms. Customer reviews thus have been increasingly important sources providing informative data for customers when making purchasing decisions. Three directions can be categorized in the literature of online customer reviews. The first category involves using statistical analysis or hypothesis testing. Salehan and Kim [3] employed correlation analysis to investigate review text and sentiments. The study provided insights regarding the reviews’ impact on potential customers. De Pelsmacker et al. [4] used the number of reviews and review valence to analyze occupancy and revenue per available room. This study revealed that review volume and review valence influenced room occupancy and revenue per available room, respectively. Furthermore, hotel locations and room quality are the two most essential terms of review comments. Anagnostopoulou et al. [5] used statistical methods to analyze the impact of online customer reviews on financial profitability. It was reported that positive comments were significantly related to hotel financial performance. Liu et al. [6] employed customers’ reviews to analyze distributions of domestic and international customers’ hotel ratings. This study pointed out that domestic travel, international travel, language, and culture had a statistically significant impact on hotel preference and satisfaction. The second category involves employing data-driven approaches to explore the relationship between customers’ reviews and hotel performance. Fernandes et al. [7] used a cross-industry standard process for data mining to investigate the influences of online reviewers’ comments on restaurant performance. Phillips et al. [8] utilized a partial least squares path method to examine the influences of positive and negative reviews on revenue per available room. This study concluded that “stars” and percentages of positive comments had a critical impact on revenue per available room. The third category uses sentiment analysis tools, such as a sentiment dictionary, to understand online customer reviews and the impact of reviews. Morente-Molinera et al. [9] indicated that the users’ opinions were usually unstructured, and users were used to expressing opinions in their own ways. This study employed fuzzy ontology to homogenize information; therefore, subjective and unformatted information could be handled. Hu et al. [10] used standardized coefficients to analyze the interrelationship between ratings, sentiments, and sales rankings. The study reported that ratings did not have a significant impact on sales directly but did have an indirect impact through sentiments. Zhang et al. [11] utilized a majority decision algorithm and a stochastic gradient descent technique to classify sentimental orientations of WeChat texts. The numerical results indicated that the presented method outperformed the other four machine learning approaches.

In the previous three categories of studies mentioned, descriptive statistics, hypothesis testing, correlation analysis, cluster analysis, and regression analysis were employed by studies to analyze the influence of customers’ online reviews on hotel performance. Some investigations focused on exploring the relationship between emotional words and ratings. However, predictions of hotel performance, such as room occupancy, have not been conducted.

2.2. Occupancy Rate Forecast

First, for the online data sources employed for occupancy rate predictions, Li et al. [12] pointed out that online data utilized to forecast hotel room occupancy includes Baidu Index [13], Google Trends [14], web traffic volume-Google Analytics [15,16], and Twitter tweets and retweets [17]. Second, the approaches employed in forecasting hotel room occupancy are depicted as follows. Aliyev et al. [18] analyzed time series data and applied the fuzzy c-means clustering algorithm to forecast hotel occupancy rates. Ampountolas [19] investigated hotel daily demand in a U.S. metropolitan city. The attributes used included temperature, holidays, and hotel ranking. The results indicated that a seasonal autoregressive integrated moving average with exogenous variables model was superior to the other forecasting models. Zhang et al. [20] forecasted hotel occupancy in Charleston, South Carolina, USA using weekly data. The authors reported that for the mid-term prediction of hotel stays in tourist destinations, the revised model provided more accurate prediction results than the original EEMD-ARIMA (ensemble empirical mode decomposition–autoregressive integrated moving average) model. Ginindza and Tichaawa [21] studied hotel occupancy rates in the kingdom of Swaziland using the factors of Airbnb room occupancy rates, foreign visitor arrival numbers, unemployment rates, and exchange rates with statistical approaches. The numerical result showed a positive relationship between the hotel occupancy rates and Airbnb room occupancy rates. Fiori and Foroni [22] integrated historical and advanced booking methods into a combination of prediction models. The results revealed that the proposed model had a practical impact on the design and implementation of effective demand-side policies for hotels. Assaf and Tsionas [23] proposed two Bayesian compression methods and applied them in the hotel occupancy rate forecasting. This study indicated that the presented Bayesian compression models outperformed the other forecasting alternative models in terms of forecasting accuracy in all forecast horizons. Al Shehhi and Karathanasopoulos [24] employed the adaptive network fuzzy interference system to forecast hotel room prices in the Gulf Cooperation Council region. The authors claimed that forecasting results provided by the proposed adaptive network fuzzy interference system were superior to the other three forecasting models in terms of forecasting accuracy. Zhang et al. [25] used deep neural networks to predict the occupancy and prices of hotels on weekdays and peak days. The numerical results indicated that deep neural network models could significantly outperform the other seven models, including time-series forecasting models and machine learning models. Sánchez-Medina and Eleazar [26] employed artificial neural networks with genetic algorithms to forecast booking cancellations. The designed method could reach an accuracy of 98 percent and was significantly superior to the other three forecasting techniques in terms of forecasting accuracy. Wang and Duggasani [27] designed two long- and short-term memory models to predict the daily booking volume of hotels by data of arrival dates, booking dates, and the day of the week. The proposed method could improve the forecasting accuracy by about three percent compared with the best results generated by the other six machine learning methods. Huang and Zheng [28] proposed using the agglomeration effect combined with the attention mechanism and Bayesian optimization algorithm to predict the daily demand of hotels with agglomeration effects. The results showed that the proposed model was helpful in increasing hotel revenue management by decreasing the gap between the potential demand and available capacity. Das et al. [29] used a light gradient boosting machine to predict demand with seasonality using historical booking data. In this study, customer booking patterns and potential trends in the booking curve were investigated. Then, daily, monthly, and holiday occupancy were predicted. The experiment results revealed that the light gradient boosting machine could generate more accurate forecasting results than the other five forecasting methods, including the time-series techniques. Lee et al. [30] proposed a combined neural network framework architecture with time series booked data and seasonal components to predict accommodation demand. It was reported that the designed model could generate more accurate forecasting results than the other time-series methods. Phumchusri and Ungtrakul [31] developed an artificial neural networks model to forecast daily hotel demand in Thailand. The numerical results showed that artificial neural networks could obtain more accurate forecasting results than the other forecasting models employed in the study.

According to the review of related literature, deep learning techniques have not been broadly explored in predicting hotel room occupancy rates. Furthermore, most of the previous research on online customer sentiment analysis has focused on reviews in English, apparently lacking multilingual comment content. Thus, this study collected daily and monthly research data from Internet crawling and government databases, and it used long short-term memory networks to forecast hotel room occupancy using sentiment analysis of review text and consumers’ rating scores in multilingual comment content. The other five machine learning forecasting approaches were used to conduct predictions with the same datasets to compare forecasting performance.

3. Long Short-Term Memory Networks

Long short-term memory networks are extended algorithms that improve the gradient vanishing problems in the recurrent neural networks when processing long sequences. They save essential information for a long time so that earlier information can be associated with current tasks. Long short-term memory networks provide fusion memory units, allowing the network to update hidden units on the basis of new information before learning. Thus, they are especially able to remember the most important information for future tasks. In addition, long short-term memory networks can select the tanh function as an activation function, which is a stable method that is beneficial in regression problems. Figure 1 shows the neural network architecture of long short-term memory networks. There are many variants in use called bidirectional long short-term memory networks. The cell or memory area of long short-term memory networks is composed of the input gate, forget gate, output gate, and cell state. The forget gate controls whether the memory should be cleared or how much information is entered into the next cell’s memory. The input gate controls whether the input value enters the memory. On the other hand, the output gate controls whether the updated value should be output to the next layer of networks. These layers all operate in a specific way, of which three gates control all additions or deletions of the unit state. The long short-term memory network calculation is from the input sequence x, represented by Equation (1):

x = (x_{t - 1}, x_{t}, x_{t + 1} \dots, x_{t + n})

(1)

to the output sequence y, denoted by Equation (2).

y = (y_{t - 1}, y_{t}, y_{t + 1} \dots, y_{t + n})

(2)

For mapping, the first step is to input the activation vectors

i_{t}

and

{\tilde{c}}_{t}

, represented by Equations (3) and (4), respectively.

i_{t} = s i g m o i d (W_{i} [x_{t}, h_{t - 1}] + b_{i})

(3)

{\tilde{c}}_{t} = t a n h (W_{c} [x_{t}, h_{t - 1}] + b_{c})

(4)

The sigmoid represents the activation function and maps the variable to values between zero and one, where W is the weight matrix, and b is the bias,

{\tilde{c}}_{t}

is created by each tanh layer, and the forgetting gate

f_{t}

is created by Equation (5).

f_{t} = s i g m o i d (W_{f} [x_{t}, h_{t - 1}] + b_{f})

(5)

{\tilde{c}}_{t}

and

f_{t}

are employed to generate the new state of the memory cell

C_{t}

, which can be expressed by Equation (6):

C_{t} = i_{t} {\tilde{c}}_{t} + f_{t} c_{t - 1}

(6)

The cell state vector

C_{t}

is utilized to calculate the output gate

o_{t}

, which is illustrated in Equation (7).

o_{t} = s i g m o i d (W_{o} [x_{t}, h_{t - 1}] + b_{o})

(7)

Finally, the output vector

h_{t}

of long short-term memory networks is determined by Equation (8).

h_{t} = t a n h (c_{t}) o_{t}

(8)

The output gate uses the cell state and activation vector to calculate the output, and the weight matrix is used in the training phase to learn with the bias vector. The long short-term memory network architecture has been successful and effective in dealing with vanishing gradient problems [32,33,34,35].

4. The Proposed Hotel Occupancy Forecasting Architecture

Figure 2 shows this study’s proposed forecasting architecture for hotel room occupancy. The designed architecture uses the customer-generated experience content on the official database and online travel agent reservation websites to predict monthly hotel room occupancy rates in Taiwan. In particular, three parts are included in the framework, namely data collection, data preprocessing, and model testing. For the data collection, data of hotel lists and reviews of customers were gathered. In the data preprocessing stage, multilingual customer reviews were converted into sentiment scores, and the rating scores and sentiment scores were scaled into ranking variables. Then, the counts of variables were accumulated according to monthly data and mapped into the room occupancy in the next month. Finally, data were split into a training dataset and a testing dataset for modeling and performance evaluation of forecasting models.

4.1. Data Collection

This study collected data from the booking website Booking.com (accessed on 27 April 2021) and the Taiwan travel and accommodation website taiwanstay.net.tw, which is commissioned by the Tourism Bureau of the Ministry of Transportation of Taiwan. The customer hotel database provided by the Tourism Bureau contained monthly data, such as revenue, average room prices, occupancy rates, number of employees, etc. Hotels with room occupancy rates and customer reviews on the Booking.com (accessed on 27 April 2021) website were collected for this study. The datasets were based on the customer hotel database listed by the tourism bureau and text crawled from the booking website, including customers’ online rating scores and comments. Figure 3 depicts the data collection flow of this study. The monthly data collection period was from July 2017 to December 2019. In total, 53 hotels with 1590 hotel data on the booking website were crawled. The customer review page obtained from the Booking.com (accessed on 27 April 2021) website is illustrated in Figure 4. The data of the hotels contain multiple customer review entries, including overall satisfaction scores for consumption experiences as well as positive and negative customer comments.

4.2. Data Preprocessing

In this study, online customer reviews gathered from websites were processed using the natural language application programming interface developed by Google (https://cloud.google.com/natural-language?hl=zh-tw (accessed on 27 April 2021)) to generate sentiment scores. Multilingual sentiment analysis was performed in this study. Google’s natural language application programming interface is based on machine learning theory. It has developed and provided user-friendly functions and interfaces for coping with multilingual sentiment analysis. Some of its successful applications have been reported [36,37]. Figure 5 shows the flow chart of the sentiment analysis generated from customer reviews. Online customer reviews are divided into positive and negative reviews. Therefore, this study separately calculated the sentiment scores of positive and negative comments and integrated the scores. Then, online customer reviews were converted to sentiment scores. Table 1 lists data collected from the booking websites and the results of sentiment analysis. R_s refers to the customers’ rating scores, whereas P_s and N_s refer to positive and negative sentiment scores, respectively, obtained from the comments. In addition, the term Lang. refers to the language categories of the customer reviews.

For use in further analysis, this study transformed the sentiment analysis of the review text and customers’ rating scores into ranking data. The sentiment scores of Google’s application programming interface were between −1 and 1. The greater the positive value, the higher the positive sentiment. Conversely, the greater the negative value, the higher the negative sentiment. Thus, the study divides sentiment scales into seven ranks, as shown in Table 2. In Table 2, the fourth rank is “more than or equal to −0.1 and less than 0.1”. This rank includes a sort of “neutral” state. Negative reviews contain both important aspects and minor issues with respect to various ranks, with scores from −0.1 to −1. Customers’ ratings, illustrated in Table 3, are based on the hotel ratings of the booking website. Table 3 reflects major or minor issues collected from customers in terms of six ranks provided by booking systems with scores from 0 to 10. Then, by integrating Table 2 and Table 3, Table 4 illustrates the sample datasets employed in this study. To compare the performance of the sentiment analysis of the review text, customers’ rating scores, and the combined data of customers’ rating scores in forecasting monthly hotel occupancy rates, three datasets are illustrated in Table 5.

4.3. Modeling and Testing

The data are divided into two datasets—a modeling dataset and a testing dataset. The modeling dataset with 1272 data included monthly data from July 2017 to June 2019, whereas the testing dataset with 318 data consisted of monthly data from July 2019 to December 2019. The modeling dataset was employed to train forecasting models, and the testing dataset was used to evaluate the performance of forecasting models. Figure 6 illustrates the modeling dataset and the testing dataset. A one-month-ahead forecasting policy was used in this study. Therefore, data gathered from the booking website in a given month were employed to predict hotel room occupancy rates in the next month.

5. Numerical Results

In this study, six forecasting models were used to predict hotel room occupancy rates. The long short-term memory network model used in this study contained 64 neurons. The time step, batch size, and epoch were 1, 72, and 100, respectively. For the structure of backpropagation neural networks, one hidden layer with 10 hidden nodes was used. Table 6 lists the parameters used by the forecasting models in this study. Two indices—mean absolute percentage error (MAPE) and root mean square error (RMSE), expressed as Equations (9) and (10)—were employed to measure the performance of the forecasting models. Table 7 shows two measurements of forecasting accuracy generated by six models with three datasets. In terms of forecasting accuracy, it was indicated that long short-term memory network models could generate superior forecasting accuracy compared with the other five forecasting models. According to the numerical results in Table 7, it can be observed that using sentiment analysis of review text and customers’ rating scores in forecasting models can yield more accurate results than using unique data of sentiment analysis of review text or customers’ rating scores only. Thus, in this study, hybrid data were recommended for hotel room occupancy forecasting. Figure 7, Figure 8 and Figure 9 plot the absolute error values of hotel occupancies provided by the forecasting models with three datasets. A boxplot was used to plot the error values. The boxplot provides a clear data distribution map when there is a large number of values in the visual representation [38].

MAPE (%) = \frac{100}{N} \sum_{t = 1}^{N} | \frac{Y_{t} - F_{t}}{Y_{t}} |

(9)

RMSE = \sqrt{\frac{\sum_{t = 1}^{N} {(Y_{t} - F_{t})}^{2}}{N}}

(10)

where N is the number of forecasting periods, Y_t is the real value at period t, and F_t is the forecasting value at period t. In addition, the Wilcoxon signed-rank test [39] was performed to examine the significance of the accuracy improvement of the LSTM models. Table 8 illustrates the results of the Wilcoxon signed-rank test with α = 0.025. The results indicated that the LSTM models statistically and significantly outperformed the other five forecasting models.

6. Conclusions

Because of the expansion of online travel agent booking systems, the collection of customers’ opinions on accommodations has become more diversified. Customers’ online rating scores and review text in booking systems have become major sources that gather customers’ experiences; hence, they are essential influences to forecast hotel occupancy. This study used three datasets—a sentiment analysis of review text, customers’ rating scores, and combined data of customers’ rating scores—to predict the monthly hotel occupancy rates with a one-month-ahead forecasting policy. Besides, multilingual comment contents were included in this study. Six forecasting models—LSTM, BPNN, GRNN, LSSVR, RF, and GPR—were utilized to deal with the same dataset. The numerical results indicated that the long short-term memory network model could provide more accurate hotel occupancy forecasting rates than the other five models in three datasets. In addition, the hybrid data of the sentiment analysis of review text and customers’ rating scores produced the most accurate results among all forecasting models.

For future studies, more detailed data, such as attributes of hotels and customers, can be added as input data for forecasting models. The hotel attributes include international type, customer type, location, the number of rooms, and prices. The customer attributes possibly consist of gender, nationality, and stay type. The sentiment analysis and comparison according to clusters of different countries and the study of site-specific peculiarities could be paths for future study. With more detailed data, forecasting models are able to provide more informative data for hotel management and decision-making. In addition, the hotel room occupancy used in this study was gathered from public sectors that do not provide bed and breakfast room occupancy data. Bed and breakfast room occupancy data, mostly used for the AirBnB system, could be another possible direction for further study to examine the performance of the proposed hotel room occupancy forecasting architecture. Finally, because the COVID-19 crisis affected hotel room occupancy, the data collected in the COVID-19 period is a challenging topic worthy of future investigation.

Author Contributions

Conceptualization, P.-F.P.; data curation; Y.-M.C., C.-H.C. and J.-P.L.; formal analysis, Y.-M.C. and P.-F.P.; methodology, Y.-M.C. and P.-F.P.; software, C.-H.C., J.-P.L. and Y.-L.L.; visualization, Y.-M.C., J.-P.L. and Y.-L.L.; writing—original draft preparation, Y.-M.C. and P.-F.P.; writing—review and editing, P.-F.P.; funding acquisition, P.-F.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology, the Republic of China, Taiwan, under the contract number MOST 109-2410-H-260-023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kimes, S.E.; Chase, R.B.; Choi, S.; Lee, P.Y.; Ngonzi, E.N. Restaurant Revenue Management. Cornell Hotel Restaur. Adm. Q. 1998, 39, 32–39. [Google Scholar] [CrossRef]
Wu, D.C.; Song, H.; Shen, S. New developments in tourism and hotel demand modeling and forecasting. Int. J. Contemp. Hosp. Manag. 2017, 29, 507–529. [Google Scholar] [CrossRef]
Salehan, M.; Kim, D.J. Predicting the performance of online customer reviews: A sentiment mining approach to big data analytics. Decis. Support Syst. 2016, 81, 30–40. [Google Scholar] [CrossRef]
De Pelsmacker, P.; van Tilburg, S.; Holthof, C. Digital marketing strategies, online reviews and hotel performance. Int. J. Hosp. Manag. 2018, 72, 47–55. [Google Scholar] [CrossRef]
Anagnostopoulou, S.C.; Buhalis, D.; Kountouri, I.L.; Manousakis, E.G.; Tsekrekos, A.E. The impact of online reputation on hotel profitability. Int. J. Hosp. Manag. 2020, 32, 20–39. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Teichert, T.; Rossi, M.; Li, H.; Hu, F. Big data for big insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews. Tour. Manag. 2017, 59, 554–563. [Google Scholar] [CrossRef]
Fernandes, E.; Moro, S.; Cortez, P.; Batista, F.; Ribeiro, R. A data-driven approach to measure restaurant performance by combining online reviews with historical sales data. Int. J. Hosp. Manag. 2020, 94, 102830. [Google Scholar] [CrossRef]
Phillips, P.; Zigan, K.; Silva, M.M.S.; Schegg, R. The interactive effects of online reviews on the determinants of Swiss hotel performance: A neural network analysis. Tour. Manag. 2015, 50, 130–141. [Google Scholar] [CrossRef]
Morente-Molinera, J.; Kou, G.; Pang, C.; Cabrerizo, F.; Herrera-Viedma, E. An automatic procedure to create fuzzy ontologies from users’ opinions using sentiment analysis procedures and multi-granular fuzzy linguistic modelling methods. Inf. Sci. 2018, 476, 222–238. [Google Scholar] [CrossRef]
Hu, N.; Koh, N.S.; Reddy, S.K. Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis. Support Syst. 2013, 57, 42–53. [Google Scholar] [CrossRef]
Zhang, S.T.; Wang, F.F.; Duo, F.; Zhang, J.L. Research on the Majority Decision Algorithm based on WeChat sentiment classification. J. Intell. Fuzzy Syst. 2018, 35, 2975–2984. [Google Scholar] [CrossRef]
Li, X.; Law, R.; Xie, G.; Wang, S. Review of tourism forecasting research with internet data. Tour. Manag. 2020, 83, 104245. [Google Scholar] [CrossRef]
Zhang, B.; Pu, Y.; Wang, Y.; Li, J. Forecasting Hotel Accommodation Demand Based on LSTM Model Incorporating Internet Search Index. Sustainability 2019, 11, 4708. [Google Scholar] [CrossRef] [Green Version]
Pan, B.; Wu, D.C.; Song, H. Forecasting hotel room demand using search engine data. J. Hosp. Tour. Technol. 2012, 3, 196–210. [Google Scholar] [CrossRef] [Green Version]
Pan, B.; Yang, Y. Forecasting Destination Weekly Hotel Occupancy with Big Data. J. Travel Res. 2016, 56, 957–970. [Google Scholar] [CrossRef] [Green Version]
Yang, Y.; Pan, B.; Song, H. Predicting Hotel Demand Using Destination Marketing Organization’s Web Traffic Data. J. Travel Res. 2013, 53, 433–447. [Google Scholar] [CrossRef] [Green Version]
Bigne, E.; Oltra, E.; Andreu, L. Harnessing stakeholder input on twitter: A case study of short breaks in Spanish tourist cities. Tour. Manag. 2019, 71, 490–503. [Google Scholar] [CrossRef]
Aliyev, R.; Salehi, S.; Aliyev, R. Development of fuzzy time series model for hotel occupancy forecasting. Sustainability 2019, 11, 793. [Google Scholar] [CrossRef] [Green Version]
Ampountolas, A. Modeling and Forecasting Daily Hotel Demand: A Comparison Based on SARIMAX, Neural Networks, and GARCH Models. Forecasting 2021, 3, 580–595. [Google Scholar] [CrossRef]
Zhang, M.; Li, J.; Pan, B.; Zhang, G. Weekly Hotel Occupancy Forecasting of a Tourism Destination. Sustainability 2018, 10, 4351. [Google Scholar] [CrossRef] [Green Version]
Ginindza, S.; Tichaawa, T.M. The impact of sharing accommodation on the hotel occupancy rate in the kingdom of Swaziland. Curr. Issues Tour. 2017, 22, 1975–1991. [Google Scholar] [CrossRef]
Fiori, A.M.; Foroni, I. Reservation Forecasting Models for Hospitality SMEs with a View to Enhance Their Economic Sustainability. Sustainability 2019, 11, 1274. [Google Scholar] [CrossRef] [Green Version]
Assaf, A.G.; Tsionas, M.G. Forecasting occupancy rate with Bayesian compression methods. Ann. Tour. Res. 2019, 75, 439–449. [Google Scholar] [CrossRef]
Al Shehhi, M.; Karathanasopoulos, A. Forecasting hotel room prices in selected GCC cities using deep learning. J. Hosp. Tour. Manag. 2019, 42, 40–50. [Google Scholar] [CrossRef]
Zhang, Q.; Qiu, L.; Wu, H.; Wang, J.; Luo, H. Deep Learning Based Dynamic Pricing Model for Hotel Revenue Management. In Proceedings of the 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November 2019. [Google Scholar]
Sánchez-Medina, A.J.; C.-Sánchez, E. Using machine learning and big data for efficient forecasting of hotel booking cancellations. Int. J. Hosp. Manag. 2020, 89, 102546. [Google Scholar] [CrossRef]
Wang, J.; Duggasani, A. Forecasting hotel reservations with long short-term memory-based recurrent neural networks. Int. J. Data Sci. Anal. 2018, 9, 77–94. [Google Scholar] [CrossRef]
Huang, L.; Zheng, W. Novel deep learning approach for forecasting daily hotel demand with agglomeration effect. Int. J. Hosp. Manag. 2021, 98, 103038. [Google Scholar] [CrossRef]
Das, R.; Chadha, H.; Banerjee, S. Multi-layered market forecast framework for hotel revenue management by continuously learning market dynamics. J. Revenue Pricing Manag. 2021, 20, 351–367. [Google Scholar] [CrossRef]
Lee, M.; Mu, X.; Zhang, Y. A machine learning approach to improving forecasting accuracy of hotel demand: A comparative analysis of neural networks and traditional models. Issues Inf. Syst. 2020, 21, 12–21. [Google Scholar]
Phumchusri, N.; Ungtrakul, P. Hotel daily demand forecasting for high-frequency and complex seasonality data: A case study in Thailand. J. Revenue Pricing Manag. 2019, 19, 8–25. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–17808. [Google Scholar] [CrossRef]
Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef] [Green Version]
Hua, Y.; Zhao, Z.; Li, R.; Chen, X.; Liu, Z.; Zhang, H. Deep Learning with Long Short-Term Memory for Time Series Prediction. IEEE Commun. Mag. 2019, 57, 114–119. [Google Scholar] [CrossRef] [Green Version]
Chandra, R.; Jain, A.; Chauhan, D.S. Deep learning via LSTM models for COVID-19 infection forecasting in India. arXiv 2021, preprint. arXiv:2101.11881. [Google Scholar]
White, T.E.; Rege, M. Sentiment Analysis on Google Cloud Platform. Issues Inf. Syst. 2020, 21, 221–228. [Google Scholar]
Luo, X.; Zimet, G.; Shah, S. A natural language processing framework to analyse the opinions on HPV vaccination reflected in twitter over 10 years (2008–2017). Hum. Vaccines Immunother. 2019, 15, 1496–1504. [Google Scholar] [CrossRef]
Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Boston, MA, USA, 1977; ISBN 978-0201076165. [Google Scholar]
Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull. 1945, 1, 80. [Google Scholar] [CrossRef]

Figure 1. A long short-term memory network’s neural network architecture.

Figure 2. The proposed hotel room occupancy forecasting architecture.

Figure 3. Data collection process.

Figure 4. Customer review content data items.

Figure 5. The flow chart of sentiment analysis generated from customer reviews.

Figure 6. The modeling dataset and the testing dataset.

Figure 7. Absolute errors with sentiment analysis of review text.

Figure 8. Absolute errors with customers’ rating scores.

Figure 9. Absolute errors with sentiment analysis of review text and customers’ rating scores.

Table 1. Sentiment analysis in multilingual comment content.

R_s	Positive Reviews	Negative Reviews	P_s	N_s	Lang.
9.2	Clean, spacious room.	Corridors leading to room could …	0.9	−0.8	en
7.5	ロケーション抜群	施設が古い	0.9	−0.9	ja
7.9	交通特别方便，捷运出来…	空调声音略有点大	0.9	−0.4	zh
9.6	중앙역 바로 맞은 편이라…	특별한 것은 없음.	0.8	−0.4	ko
7.1	Hotel très bien placé, …	C’est dommage qu’un si grand…	0.9	−0.3	fr
10	ทำเลดีมาก. สดวก	บางครั้งน้ำร้อนไม่ค่อยพอ	0.8	−0.2	th
7.9	Ein sehr großes und …	Es waren sehr kalte und…	0.8	−0.3	de
7.9	La habitación era muy ….	El desayuno tenía poca oferta …	0.3	0	es
7.9	早餐好吃，但是幾乎沒什麼…	衛浴設備過度老舊	0.2	−0.7	zh-Hant

Table 2. Sentiment scales.

Rank	Abbreviations	Boundaries
1	X₁	less than −0.7
2	X₂	more than or equal to −0.7 and less than −0.4
3	X₃	more than or equal to −0.4 and less than −0.1
4	X₄	more than or equal to −0.1 and less than 0.1
5	X₅	more than or equal to 0.1 and less than 0.4
6	X₆	more than or equal to 0.4 and less than 0.7
7	X₇	more than or equal to 0.7

Table 3. Customer rating scales.

Rank	Abbreviations	Boundaries
1	X₈	less than or equal to 6.9
2	X₉	between 7.0 and 7.9
3	X₁₀	between 8.0 and 8.4
4	X₁₁	between 8.5 and 8.9
5	X₁₂	between 9.0 and 9.4
6	X₁₃	more than or equal to 9.5

Table 4. All independent variables and monthly occupancy rates.

X₁	X₂	X₃	X₄	X₅	X₆	X₇	X₈	X₉	X₁₀	X₁₁	X₁₂	X₁₃	Y (%)
3	3	1	2	3	2	7	0	2	0	1	3	5	56.23
5	10	3	1	4	6	20	5	4	2	2	3	15	86.20
2	5	0	1	2	3	11	1	1	0	1	1	11	45.13
5	7	3	1	4	5	12	4	7	1	2	1	9	64.90
2	2	1	0	2	1	6	2	2	0	1	1	4	67.35
0	1	0	0	1	0	1	0	0	2	0	2	0	11.96
3	4	2	0	2	4	14	3	5	1	2	3	9	74.91
0	2	0	0	0	2	1	0	0	0	0	2	3	77.18
4	7	3	1	2	4	14	1	3	0	8	5	8	60.81
5	7	5	0	8	8	16	3	4	5	8	2	13	66.00
1	2	0	1	0	0	5	3	0	1	0	0	2	19.31

Y (%) = monthly occupancy rates.

Table 5. The three divided datasets used in this study.

	Content of Data	Variables
Dataset I	sentiment analysis of review text	X₁–X₇
Dataset II	customers’ rating scores	X₈–X₁₃
Dataset III	sentiment analysis of review text and customers’ rating scores	X₁–X₁₃

Table 6. Parameters of forecasting models.

Model	Parameters	Dataset I	Dataset II	Dataset III
LSTM	Learning rate	0.020	0.029	0.016
	Beta_1	0.725	0.883	0.900
	Beta 2	0.812	0.913	0.997
	Decay	0.023	0.012	0.004
BPNN	Learning rate	0.900	0.200	0.100
BPNN	Momentum	0.500	0.800	0.800
GRNN	Spread	0.100	0.200	0.900
LSSVR	γ	480	500	350
LSSVR	σ	1	50	100
RF	Number of trees	1000	1100	1050
	Minimum number of samples to split	2	4	4
	Minimum number of samples at a leaf node	1	2	3
	Maximum depth	50	54	60
GPR	Kernel bounds	[−11,11]; [−11,11]	[−16,16]; [−11,11]	[−18,18]; [−11,11]

Table 7. Measurements of forecasting accuracy by six models with different datasets.

Datasets	Measurements	Models
Datasets	Measurements	LSTM	BPNN	GRNN	LSSVR	RF	GPR
Dataset I	MAPE (%)	16.618	17.391	20.254	19.449	21.747	22.447
Dataset I	RMSE	13.221	14.698	17.364	16.494	18.753	22.043
Dataset II	MAPE (%)	16.883	17.898	20.721	19.062	20.777	25.296
Dataset II	RMSE	13.494	15.177	18.139	16.876	17.210	25.673
Dataset III	MAPE (%)	16.416	17.006	19.717	18.905	21.471	21.049
Dataset III	RMSE	13.076	13.826	17.197	16.401	17.946	20.624

Table 8. The results of the Wilcoxon signed-rank test with α = 0.025.

Dataset	Measurements	LSTM vs.
Dataset	Measurements	LSSVR	BPNN	GRNN	GPR	RF
Dataset I	Z Value	4.042	3.296	4.534	7.507	5.903
	Significance	Yes	Yes	Yes	Yes	Yes
	Positive number	184	188	185	206	192
	Negative number	134	130	133	112	126
Dataset II	Z Value	3.821	3.327	4.204	5.545	5.315
	Significance	Yes	Yes	Yes	Yes	Yes
	Positive number	184	177	180	193	195
	Negative number	134	141	138	125	123
Dataset III	Z Value	4.065	2.783	3.164	3.849	6.052
	Significance	Yes	Yes	Yes	Yes	Yes
	Positive number	194	174	178	179	198
	Negative number	124	144	140	139	120

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, Y.-M.; Chen, C.-H.; Lai, J.-P.; Lin, Y.-L.; Pai, P.-F. Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews. Appl. Sci. 2021, 11, 10291. https://doi.org/10.3390/app112110291

AMA Style

Chang Y-M, Chen C-H, Lai J-P, Lin Y-L, Pai P-F. Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews. Applied Sciences. 2021; 11(21):10291. https://doi.org/10.3390/app112110291

Chicago/Turabian Style

Chang, Yu-Ming, Chieh-Huang Chen, Jung-Pin Lai, Ying-Lei Lin, and Ping-Feng Pai. 2021. "Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews" Applied Sciences 11, no. 21: 10291. https://doi.org/10.3390/app112110291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forecasting Hotel Room Occupancy Using Long Short-Term Memory Networks with Sentiment Analysis and Scores of Customer Online Reviews

Abstract

1. Introduction

2. Related Literature

2.1. Online Customer Reviews Analysis

2.2. Occupancy Rate Forecast

3. Long Short-Term Memory Networks

4. The Proposed Hotel Occupancy Forecasting Architecture

4.1. Data Collection

4.2. Data Preprocessing

4.3. Modeling and Testing

5. Numerical Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI