Next Article in Journal
Perceptions of COVID-19 Maternal Vaccination among Pregnant Women and Healthcare Workers and Factors That Influence Vaccine Acceptance: A Cross-Sectional Study in Barcelona, Spain
Next Article in Special Issue
Parents’ Attitudes toward Childhood Vaccines and COVID-19 Vaccines in a Turkish Pediatric Outpatient Population
Previous Article in Journal
Hepatitis B, C, and D Virus Infection among Population Aged 10–64 Years in Mongolia: Baseline Survey Data of a Nationwide Cancer Cohort Study
Previous Article in Special Issue
Vaccine Adverse Events Following COVID-19 Vaccination with Inactivated Vaccines in Zimbabwe
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Brief Report

Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage

Praveen SV
Jose Manuel Lorenz
Rajesh Ittamalla
Kuldeep Dhama
Chiranjib Chakraborty
Daruri Venkata Srinivas Kumar
7 and
Thivyaa Mohan
Department of Management Studies, National Institute of Technology, Tiruchirappalli 20015, Tamil Nadu, India
Centro Tecnológico de la Carne de Galicia, Adva. Galicia n° 4, Parque Tecnológico de Galicia, San Cibrao das Vinus, 32900 Ourense, Spain
Facultade de Ciencias de Ourense, Universidade de Vigo, Área de Tecnoloxía dos Alimentos, 32004 Ourense, Spain
Department of Management Studies, Indian Institute of Technology, Hyderabad 502285, Telangana, India
Division of Pathology, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, Uttar Pradesh, India
Department of Biotechnology, School of Life Science and Biotechnology, Admas University, Kolkatta 700126, West Bengal, India
School of Management Studies, University of Hyderabad, Hyderabad 500046, Telangana, India
Author to whom correspondence should be addressed.
Vaccines 2022, 10(11), 1929;
Submission received: 9 September 2022 / Revised: 5 November 2022 / Accepted: 10 November 2022 / Published: 15 November 2022
(This article belongs to the Collection COVID-19 Vaccine Hesitancy: Correlates and Interventions)


This study analyzed perceptions of Indians regarding COVID-19 booster dose vaccines using natural language processing techniques, particularly, sentiment analysis and topic modeling. We analyzed tweets generated by Indian citizens for this study. In late July 2022, the Indian government hastened the process of COVID-19 booster dose vaccinations. Understanding the emotions and concerns of the citizens regarding the health policy being implemented will assist the government, health policy officials, and policymakers implement the policy efficiently so that desired results can be achieved. Seventy-six thousand nine hundred seventy-nine tweets were used for this study. The sentiment analysis study revealed that out of those 76,979 tweets, more than half (n = 40,719 tweets (52.8%) had negative sentiments, 24,242 tweets (31.5%) had neutral sentiments, and 12,018 tweets (15.6%) had positive sentiments. Social media posts by Indians on the COVID-19 booster doses have focused on the feelings that younger people do not need vaccines and that vaccinations are unhealthy.

1. Introduction

The first case of coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was recorded at the late end of 2020 in China, and within a few months, this novel disease rapidly spread in many countries and, consequently, led to a devastating pandemic affecting more than 200 countries worldwide [1,2,3]. At the time of writing this article—as of 5 September 2022—more than 600 million cases and nearly 6.5 million deaths have been reported worldwide, and, in India alone, there have been around 44.5 million cases and 5.28 million deaths [4]. According to a recent study published in The Lancet, COVID-19 vaccinations reduced the possible global death toll during the epidemic by nearly two-thirds in their first year and saved an estimated 19.8 million lives [5]. Governments worldwide have recommended that their citizens receive two doses of vaccines to gain adequate immunity [6]. While the initial two doses of COVID-19 vaccines can immunize people against severe COVID-19 cases and death, immunity tends to wane after some time, which necessitates the administration of booster shots to sustain the protective levels of immunity as SARS-CoV-2 constantly mutates and with the continuous emergence of newer variants that could evade host immunity [3,7,8]. The emerging SARS-CoV-2 variants such as Delta, Omicron, and its lineages (variants of concern, VOCs) have been found to cause significant adverse impacts by overpowering protective immunity induced by COVID-19 vaccines and antibody-based therapies, resulting in vaccine breakthrough infection, re-infection, and overall surging of cases and deaths amid different waves of the ongoing pandemic. Therefore, efforts are being made to develop more effective vaccines including variant-specific, mutation proof, universal next-generation vaccines, as well as administering more doses of vaccines (booster shots) for boosting protective immunity to safeguard health amid emerging variants [9,10,11,12,13,14,15]. Despite the development of few vaccines and the ongoing global vaccination drive, COVID-19 vaccination hesitancy, diplomacy, and inequitable access to vaccines, particularly among low- and middle-income countries, also constitute significant reasons for some hindrances in the ongoing global vaccination drive, including booster shots, which helped in the sustained global burden of COVID-19; hence, global vaccination coverage needs to be enhanced holistically [16,17,18,19,20,21].

2. Materials and Methods

In this study, we analyzed the social media posts of Indians to understand their perspectives regarding COVID-19 booster doses and the concerns they shared regarding booster doses. Recent studies have confirmed that one of the reliable ways to predict, control, and prevent a health crisis or pandemic is by analyzing social media data [22,23]. It is important for the government and policymakers to understand the opinions of people regarding any health policy they implement, because implementing any policy that is not supported by most of the population will lead to failure in achieving the desired results.
Since the beginning of the COVID-19 pandemic, Twitter has evolved into a medium through which people can express their experiences, emotions, and perspectives regarding health policies. Therefore, we chose tweets as the data source for our study. Several research studies were conducted during the initial days of COVID-19 using Twitter data to analyze the situation and understand the perception of common people regarding health policies and various aspects of the pandemic [24,25]. To implement a successful policy and promote adequate disease prevention strategies and public safety measures, government officials and policymakers must understand the beliefs and perceptions of citizens regarding COVID-19 vaccination and booster doses. In this study, we used natural language processing (NLP) techniques, in particular, sentiment analysis and topic modeling, to comprehend the Indian general public’s perceptions regarding the COVID-19 booster dose

2.1. Data Collection

Tweets with the words “COVID-19 Booster” posted by Indians after 1 March 2022 to 7 September 2022 were scraped using the Python library Twint. After we removed the tweets belonging to other languages and duplicate tweets, we were left with 76,979 English tweets. We removed the tweets that were not in English because of the nature of the tweets. The vast majority of Indian tweets in other languages (Hindi, Tamil, and Telugu) were mixed with English and a particular language. For example, most of the Hindi tweets were not written in Hindi alphabets and were written in English alphabets (Hindi words being written with the English alphabet), and, due to this, it is not possible to extract sentiment or topics out of it, and therefore we removed such tweets from our corpus. Since the tweets in our dataset were from different states in India, our results are applicable to the entire Indian population. Twint is an advanced Python Twitter scraping tool that allows researchers to access Twitter data without the need for an Application Programming Interface (API); therefore, we used it to collect data [26].

2.2. Data Cleaning

Data cleaning is a vital task in text analytics studies to achieve the desired results. The data cleaning process includes removing all the entities that are not needed for textual data analysis; before we started our analysis, we cleaned the data. We removed stop words, punctuations, URLs, and other unwanted entities that are not needed for the text analytics. Stop words are words such as ‘a’, ‘a’, an’, ‘an’, and ‘the’ that do not have any meaning on their own and are, therefore, not needed for the analysis. We also stemmed and lemmatized the data in our corpus. Stemming is the process of reducing words into their root type by chopping off end letters such as ‘goals -> goal’ and ‘pens -> pen’ [27]. Lemmatizing is the process in which words of a similar tree are grouped together and analyzed so that analysis can be qualitative [28].

2.3. Sentiment Analysis

Sentiment analysis is an automatic method for extracting and analyzing subjective judgments on different aspects of an item or entity. Sentiment analysis helps us understand the premises of the text and the emotions exhibited by the author of the text [29]. Understanding common people’s sentiments regarding a particular aspect, such as a particular health policy, can help governments and policymakers understand whether the policy they implement attracts common people. In our study, we used sentiment analysis to understand Indian social media users’ perceptions of COVID-19 booster dose vaccines. We used the Python library TextBlob for the process of sentiment analysis. The TextBlob library uses natural language processing and advanced machine learning principles to analyze every word in the documents presented in the corpus, defining the overall sentiments being projected as positive, negative, or neutral. The TextBlob library works on the bag-of-words model and a predefined dictionary classifying negative and positive words. The TextBlob library goes through each word in the document and assigns an individual score to all words, and the final score of that document is determined by a pooling operation (taking an average of all sentiments) [30].

2.4. Topic Modeling

Sentiment analysis helps us to understand the perceptions of common people regarding a particular health policy. However, the factors that drive emotions can only be understood through topic modeling. Topic modeling is a generative statistical model that captures the essence of a text. Latent Dirichlet allocation (LDA) topic modeling is a prominent technique used to understand the premises of a text, upon which the entire corpus is built. LDA algorithms follow the bag-of-words model and operate under two assumptions [31]. The LDA algorithm assumes that all documents present in the corpus are a mixture of topics, where each topic is a probability distribution over words [32]. The Dirichlet process is a probability distribution, whose range is a set of probability distributions. A graphical representation of the LDA model is shown in Figure 1.
Figure 2 provides the graphical representation of the LDA Model. All nodes in the model are random variables, and the observed variable (Wd,n) is shaded. Alpha (α) is a Dirichlet parameter. θd denotes the per-document topic proportion. Zd,n refers to the per-word topic assignment. Wd,n is the observed words. K refers to the different topics. N refers to the number of words in the document. Βk refers to the probability distribution over the top different words for a given topic K. D refers to the total number of documents. Eta (η) is a topic hyperparameter.
LDA algorithms find the latent (hidden) subjects and topics in the corpus, and the observed variables are words. The hyperparameters are alpha (α) and Eta (η). The higher the value of alpha, the higher the probability of all topics appearing, which results in skewed results. For this reason, in our model, we set the value for alpha as low as possible, as a lower alpha corresponds to the model preferring one topic with a higher probability than the other. We ran the model multiple times using different parameters to achieve the desired results. The distribution of LDA algorithms used to draw the per-document topic proportion (θd) is a Dirichlet distribution. The Dirichlet distribution is an exponential family distribution over the simplex (all positive vectors sum to one).
The values of θd, βk, and Zd,n were determined by computing the posterior distribution of all the parameters given the observation. The posterior distribution is a distribution of a set of unknown parameters or latent variables conditioned on the current data. For estimating the posterior probability of these parameters, the LDA model follows the Gibbs sampling method to define the posterior probability for the parameters. Gibbs sampling is a form of Markov chain Monte Carlo that practically stimulates a high-dimensional distribution by sampling on a lower dimensional subset of variables, where each subset is conditioned on the value of others. The sampling process is performed sequentially and continues until the sample values approximate the target distribution. The LDA model is used to estimate the posterior distribution over Z directly and, using the distribution, estimates of beta and theta were drawn.
Compared to previous methods such as manual content analysis and the word frequency method, LDA topic modeling is the best fit for understanding the topics based on which the corpus is built, particularly when dealing with unstructured data. Manual content analysis was the first attempt to understand the determinants of perception in textual data [33]. However, one of the significant drawbacks of manual content analysis is that the entire process relies heavily on the expertise of the expert; therefore, the results are unreliable. Next to manual content analysis, the word frequency model was used to understand the determinants of perceptions in the textual data. The major drawback of this method, however, is that the word frequency analysis method does not consider the word’s context and is merely a representation of the word counts; therefore, the conclusions based on this can often be confusing and ambiguous [34]. LDA is a standard method used by many researchers because it employs a probabilistic framework to determine and detect hidden themes and topics in the corpus by following the bag-of-words approach; therefore, we employed LDA algorithms to understand the concerns Indian citizens discuss regarding COVID-19 booster doses.

3. Results

This study was conducted in two parts. First, sentiment analysis was performed to understand people’s sentiments towards booster doses of COVID-19 vaccines. Sentiment analysis detects sentiments expressed by a person in a text. TextBlob algorithms examine each word in the tweet and determine whether the general sentiment of the particular text in the corpus is positive, negative, or neutral [35]. Second, LDA topic modeling was utilized to identify the major aspects that Indian social media users discussed regarding COVID-19 booster doses on social media. Topic Modeling is an assemblage of algorithms that summarizes a massive corpus of texts by independently identifying obscure subjects and themes covered by a collection of corpora. LDA adheres to the Bayesian principle, where the algorithm considers that each text in the corpus is composed of a variety of discrete topics, each of which has a multinomial word-frequency distribution [36,37,38]. A total of 76,979 tweets were used in this study. We selected an equal number of tweets every month in the corpus for an effective comparison. The sentiment analysis study revealed that out of 76,979 tweets, more than half of the tweets (n = 40,719 tweets (52.8%)) about COVID-19 booster doses had negative sentiments, 24,242 tweets (31.5%) showed neutral sentiments, and 12,018 tweets (15.6%) had positive sentiments. The monthly distribution of sentiments is presented in Table 1.
Figure 3 and Figure 4 provide us the graphical representation of the Table 1. In Part 2, topic modeling was conducted on the tweets to determine the important aspects that Indians discuss when tweeting about “COVID-19 Booster Doses” on social media. For the topic modeling study, we only used tweets about COVID-19 booster doses that had negative sentiments, as the objective of the study was to understand the concerns of Indians regarding the COVID-19 booster doses. The results of topic modeling are presented in Table 2.

4. Discussion

Our sentiment analysis showed that nearly 84.4% of Indians‘ social media posts on the COVID-19 booster dose were either negative or neutral. A previous study analyzing Indians’ perceptions regarding the first two doses of vaccines concluded that 17% of the opinions of Indians regarding normal COVID-19 vaccines were negative and 47% of opinions regarding normal COVID-19 vaccines were neutral [39]. There was an increase of approximately 35% in the negative tone and a 16% decrease in the neutral tone. Our results show that Indians’ opinions on booster doses are more negative and polarized than the original normal COVID-19 vaccines. As shown in Figure 2 and Figure 3, the percentage of people posting about COVID-19 booster doses in a neutral sentiment fluctuated throughout the time period. Compared to March 2022, the percentage of Indians positive for booster doses increased slightly in the later months. Comparing with March, there is a considerable reduction in the percentage of people posting about booster doses in a neutral sentiment. It can be inferred from our results that when comparing to the initial months of 2022, Indians have become more polarized in their opinion regarding the booster dose vaccines.
Our topic modeling results showed that certain aspects, such as feeling that there is no need for young people to take booster doses, feeling that taking booster doses is not healthy, skepticism towards big pharma companies, fear of illness, COVID-19 vaccines not being trustworthy, feeling that normal doses of vaccines are enough, fear of severe side effects, negative perceptions created by media regarding booster doses, fear of chest pain, and feeling booster doses are unnecessary, are the concerns Indian citizens discuss about COVID-19 booster doses. With only 15.6% of the population feeling positive about booster doses, it will be difficult for Indian governments and health policymakers to encourage more citizens to take up additional vaccines. The Indian governments and policymakers should administer and promote effective awareness programs and policies through social media and all forms of necessary communications to educate the Indian public regarding the necessity of taking booster doses to achieve the desired results of protective immunity among the population and safeguard their health amid the ongoing COVID-19 pandemic under the threats of continuously emerging SARS-CoV-2 variants, sub-variants, and lineages. This research has a few limitations. We analyzed the perceptions of Indians regarding booster doses for a period of seven months. The results may vary slightly across different periods. In our research, we also did not consider the aspect of subculture that plays a role in individuals developing their perceptions of COVID-19 vaccines. Future research can analyze the aspect of subculture and how much it modifies or influences an individual’s perception of the development of perception towards COVID-19 vaccines. Further, we have only used English tweets for this study, and so our results analyzed only the perception of English-speaking people in India. Future research can focus on understanding the difference in perception of Indians speaking various languages.

Author Contributions

Designed the study, P.S. and K.D.; made the first draft, J.M.L.; updated the manuscript, R.I., C.C., D.V.S.K. and T.M.; reviewed and edited the final draft, K.D. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are with corresponding author and will be provided upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Dhama, K.; Khan, S.; Tiwari, R.; Sircar, S.; Bhat, S.; Malik, Y.S.; Singh, K.P.; Chaicumpa, W.; Bonilla-Aldana, D.K.; Rodriguez-Morales, A.J. Coronavirus Disease 2019-COVID-19. Clin. Microbiol. Rev. 2020, 33, e00028-20. [Google Scholar] [CrossRef] [PubMed]
  2. Brust, K.B.; Papineni, V.; Columbus, C.; Arroliga, A.C. COVID-19-from emerging global threat to ongoing pandemic crisis. Proc. Bayl. Univ. Med. Cent. 2022, 35, 468–475. [Google Scholar] [CrossRef] [PubMed]
  3. WHO. WHO Coronavirus (COVID-19) Dashboard. Available online: (accessed on 7 September 2022).
  4. WHO. Interim Statement on the Use of Additional Booster Doses of Emergency Use Listed mRNA Vaccines Against COVID-19. Available online: (accessed on 17 May 2022).
  5. Watson, O.J.; Barnsley, G.; Toor, J.; Hogan, A.B.; Winskill, P.; Ghani, A.C. Global impact of the first year of COVID-19 vaccination: A mathematical modelling study. Lancet Infect. Dis. 2022, 22, 1293–1302. [Google Scholar] [CrossRef] [PubMed]
  6. Locht, C. Vaccines against COVID-19. Anaesth. Crit. Care Pain Med. 2020, 39, 703–705. [Google Scholar] [CrossRef]
  7. Barouch, D.H. COVID-19 Vaccines–Immunity, Variants, Boosters. N. Engl. J. Med. 2022, 387, 1011–1020. [Google Scholar] [CrossRef]
  8. Mohapatra, R.K.; El-Shall, N.A.; Tiwari, R.; Nainu, F.; Kandi, V.; Sarangi, A.K.; Mohammed, T.A.; Desingu, P.A.; Chakraborty, C.; Dhama, K. Need of booster vaccine doses to counteract the emergence of SARS-CoV-2 variants in the context of the Omicron variant and increasing COVID-19 cases: An update. Hum. Vaccines Immunother. 2022, 18, 2065824. [Google Scholar] [CrossRef]
  9. Bhattacharya, M.; Chatterjee, S.; Sharma, A.R.; Lee, S.S.; Chakraborty, C. Delta variant (B.1.617.2) of SARS-CoV-2: Current understanding of infection, transmission, immune escape, and mutational landscape. Folia Microbiol. 2022, 12, 1–12. [Google Scholar] [CrossRef]
  10. Hadizadeh, N.; Naderi, M.; Khezri, J.; Yazdani, M.; Shamsara, M.; Hashemi, E. Appraisal of SARS-CoV-2 mutations and their impact on vaccination efficacy: An overview. J. Diabetes Metab. Disord. 2022, 22, 1–21. [Google Scholar] [CrossRef]
  11. Iacobucci, G. COVID-19: Fourth dose of mRNA vaccines is safe and boosts immunity, study finds. BMJ 2022, 377, o1170. [Google Scholar] [CrossRef]
  12. Khandia, R.; Singhal, S.; Alqahtani, T.; Kamal, M.A.; El-Shall, N.A.; Nainu, F.; Desingu, P.A.; Dhama, K. Emergence of SARS-CoV-2 Omicron (B.1.1.529) variant, salient features, high global health concerns and strategies to counter it amid ongoing COVID-19 pandemic. Environ. Res. 2022, 209, 112816. [Google Scholar] [CrossRef]
  13. Tareq, A.M.; Emran, T.B.; Dhama, K.; Dhawan, M.; Tallei, T.E. Impact of SARS-CoV-2 delta variant (B.1.617.2) in surging second wave of COVID-19 and efficacy of vaccines in tackling the ongoing pandemic. Hum. Vaccines Immunother. 2021, 17, 4126–4127. [Google Scholar] [CrossRef]
  14. Gong, W.; Parkkila, S.; Wu, X.; Aspatwar, A. SARS-CoV-2 variants and COVID-19 vaccines: Current challenges and future strategies. Int. Rev. Immunol. 2022, 1–22. [Google Scholar] [CrossRef]
  15. Zhou, H.; Møhlenberg, M.; Thakor, J.C.; Tuli, H.S.; Wang, P.; Assaraf, Y.G.; Dhama, K.; Jiang, S. Sensitivity to Vaccines, Therapeutic Antibodies, and Viral Entry Inhibitors and Advances To Counter the SARS-CoV-2 Omicron Variant. Clin. Microbiol. Rev. 2022, 35, e0001422. [Google Scholar] [CrossRef]
  16. Sharun, K.; Dhama, K. COVID-19 Vaccine Diplomacy and Equitable Access to Vaccines Amid Ongoing Pandemic. Arch. Med. Res. 2021, 52, 761–763. [Google Scholar] [CrossRef]
  17. Bell, E.; Brassel, S.; Oliver, E.; Schirrmacher, H.; Arnetorp, S.; Berg, K.; Darroch-Thompson, D.; Pohja-Hutchison, P.; Mungall, B.; Carroll, S.; et al. Estimates of the Global Burden of COVID-19 and the Value of Broad and Equitable Access to COVID-19 Vaccines. Vaccines 2022, 10, 1320. [Google Scholar] [CrossRef]
  18. Chatterjee, B.; Thakur, S.S. Diverse vaccine platforms safeguarding against SARS-CoV-2 and its variants. Expert Rev. Vaccines 2022, 21, 47–67. [Google Scholar] [CrossRef]
  19. Fajar, J.K.; Sallam, M.; Soegiarto, G.; Sugiri, Y.J.; Anshory, M.; Wulandari, L.; Kosasih, S.A.P.; Ilmawan, M.; Kusnaeni, K.; Fikri, M.; et al. Global Prevalence and Potential Influencing Factors of COVID-19 Vaccination Hesitancy: A Meta-Analysis. Vaccines 2022, 10, 1356. [Google Scholar] [CrossRef]
  20. Khairi, L.N.H.M.; Fahrni, M.L.; Lazzarino, A.I. The Race for Global Equitable Access to COVID-19 Vaccines. Vaccines 2022, 10, 1306. [Google Scholar] [CrossRef]
  21. Park, T.; Hwang, H.; Moon, S.; Kang, S.G.; Song, S.; Kim, Y.H.; Kim, H.; Ko, E.J.; Yoon, S.D.; Kang, S.M.; et al. Vaccines against SARS-CoV-2 variants and future pandemics. Expert Rev. Vaccines 2022, 21, 1363–1376. [Google Scholar] [CrossRef]
  22. Ghazvini, K.; Keikha, M. Social networks and human monkeypox outbreak 2022: Hazards and opportunities—Correspondence. Int. J. Surg. 2022, 104, 106831. [Google Scholar]
  23. Martins-Filho, P.R.; Souza Araújo, A.A.; Quintans-Júnior, L.J. Global online public interest in monkeypox compared with COVID-19: Google trends in 2022. J. Travel Med. 2022. [Google Scholar] [CrossRef] [PubMed]
  24. Praveen, S.V.; Ittamalla, R. An analysis of attitude of general public toward COVID-19 crises—Sentimental analysis and a topic modeling study. Inf. Discov. Deliv. 2021. ahead-of-print. [Google Scholar] [CrossRef]
  25. Sv, P.; Ittamalla, R. Psychological Issues COVID-19 Survivors Face—A Text Analysis Study. J. Loss Trauma 2020, 26, 405–407. [Google Scholar] [CrossRef]
  26. Sv, P.; Ittamalla, R. What concerns the general public the most about monkeypox virus?—A text analytics study based on Natural Language Processing (NLP). Travel Med. Infect. Dis. 2022, 49, 102404. [Google Scholar] [CrossRef]
  27. Sv, P.; Tandon, J.; Vikas Hinduja, H. Indian citizen’s perspective about side effects of COVID-19 vaccine—A machine learning study. Diabetes Metab. Syndr. Clin. Res. Rev. 2021, 15, 102172. [Google Scholar]
  28. Praveen, S.V.; Ittamalla, R.; Deepak, G. Analyzing Indian general public’s perspective on anxiety, stress and trauma during COVID-19—A machine learning study of 840,000 tweets. Diabetes Metab. Syndr. Clin. Res. Rev. 2021, 15, 667–671. [Google Scholar] [CrossRef]
  29. Sv, P.; Ittamalla, R. General public’s attitude toward governments implementing digital contact tracing to curb COVID-19—A study based on natural language processing. Int. J. Pervasive Comput. Commun. 2020. ahead-of-print. [Google Scholar]
  30. Sv, P.; Ittamalla, R. Analyzing Indian citizen’s perspective towards government using wearable sensors to tackle COVID-19 crisis—A Text analytics study. Health Policy Technol. 2021, 10, 100521. [Google Scholar]
  31. Negara, E.S.; Triadi, D.; Andryani, R. Topic Modelling Twitter Data with Latent Dirichlet Allocation Method. In Proceedings of the 2019 International Conference on Electrical Engineering and Computer Science (ICECOS), Piscataway, NJ, USA, 2–3 October 2019. [Google Scholar]
  32. Jelodar, H.; Wang, Y.; Yuan, C.; Feng, X.; Jiang, X.; Li, Y.; Zhao, L. Latent Dirichlet allocation (LDA) and Topic Modeling: Models, Applications, a Survey. Multimedia Tools and Applications. Multimed. Tools Appl. 2018, 78, 15169–15211. Available online: (accessed on 28 November 2018). [CrossRef] [Green Version]
  33. Zhou, L.; Ye, S.; Pearce, P.L.; Wu, M.-Y. Refreshing hotel satisfaction studies by reconfiguring customer review data. International Journal of Hospitality Management. Int. J. Hosp. Manag. 2014, 38, 1–10. Available online: (accessed on 1 April 2014). [CrossRef]
  34. Berezina, K.; Bilgihan, A.; Cobanoglu, C.; Okumus, F. Understanding Satisfied and Dissatisfied Hotel Customers: Text Mining of Online Hotel Reviews. J. Hosp. Mark. Manag. 2015, 25, 1–24. [Google Scholar] [CrossRef]
  35. Sv, P.; Ittamalla, R.; Subramanian, D. How optimistic do citizens feel about digital contact tracing?—Perspectives from developing countries. Int. J. Pervasive Comput. Commun. 2020. ahead-of-print. [Google Scholar] [CrossRef]
  36. Sv, P.; Ittamalla, R.; Subramanian, D. Challenges in successful implementation of Digital contact tracing to curb COVID-19 from global citizen’s perspective: A text analysis study. Int. J. Pervasive Comput. Commun. 2020. ahead-of-print. [Google Scholar] [CrossRef]
  37. Praveen, S.V.; Ittamalla, R. Post COVID-19 Attitude of Consumers towards Processed Food—A Study Based on Natural Language Processing. In Intelligent Systems Design and Applications; Springer: Cham, Switzerland, 2020; pp. 863–868. [Google Scholar] [CrossRef]
  38. Sv, P.; Ittammala, R.; Spoorthi, K. A Study of People’s Perception of Childhood Trauma Using Text Analysis Techniques. J. Loss Trauma 2022, 27, 773–775. [Google Scholar]
  39. Praveen, S.V.; Ittamalla, R.; Deepak, G. Analyzing the attitude of Indian citizens towards COVID-19 vaccine—A text analytics study. Diabetes Metab. Syndr. 2021, 15, 595–599. [Google Scholar] [CrossRef]
Figure 1. Data collection and data pre-processing.
Figure 1. Data collection and data pre-processing.
Vaccines 10 01929 g001
Figure 2. Graphical representation of LDA model.
Figure 2. Graphical representation of LDA model.
Vaccines 10 01929 g002
Figure 3. Graphical representation of Table 1 (by number of tweets).
Figure 3. Graphical representation of Table 1 (by number of tweets).
Vaccines 10 01929 g003
Figure 4. Graphical representation of Table 1 (by percentage).
Figure 4. Graphical representation of Table 1 (by percentage).
Vaccines 10 01929 g004
Table 1. Sentiment analysis.
Table 1. Sentiment analysis.
MonthTotal TweetsPositive%Neutral%Negative%
March 202210,997140911.7475419.6483411.8
April 202210,997157213.0373615.4568913.9
May 202210,997141711.7305712.6652316.0
June 202210,997168013.9269911.1661816.2
July 202210,997194516.1361214.8544013.3
August 202210,997205817.1305712.6588214.4
September 202210,997193716.11332713.7573314.0
76,97912,018 24,242 40,719
Table 2. Topic modeling.
Table 2. Topic modeling.
Topic LabelTop Words
Feeling that young people don’t need booster doses
Not healthy to take booster dose
Skepticism towards big Pharma
Fear of illness
COVID-19 vaccines not trustworthy
Feeling already immune enough
Fear of side effects
Negative perceptions created by media
Chest pain
Feeling not necessary
Age, dose, young, waste, booster, first
Dose, booster, higher, risk, condition, health
BioNTech, news, pharma, shit, profit, dose data
booster, risk, COVID, severe, ill, mrna
vaccines, COVID, taken, even, reinfect, distrust
person, require, immune, vaccine, enough, taken
pain, hand, tired, vaccine, work, high
article, booster, media, news, negative, can
booster, COVID, chest, pain, will, infect
immune, new, healthy, food, nature, develop
Note: Top words are generated by the model. Topic names were manually created.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

SV, P.; Lorenz, J.M.; Ittamalla, R.; Dhama, K.; Chakraborty, C.; Kumar, D.V.S.; Mohan, T. Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage. Vaccines 2022, 10, 1929.

AMA Style

SV P, Lorenz JM, Ittamalla R, Dhama K, Chakraborty C, Kumar DVS, Mohan T. Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage. Vaccines. 2022; 10(11):1929.

Chicago/Turabian Style

SV, Praveen, Jose Manuel Lorenz, Rajesh Ittamalla, Kuldeep Dhama, Chiranjib Chakraborty, Daruri Venkata Srinivas Kumar, and Thivyaa Mohan. 2022. "Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage" Vaccines 10, no. 11: 1929.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop