Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions

Deiana, Giovanna; Dettori, Marco; Arghittu, Antonella; Azara, Antonio; Gabutti, Giovanni; Castiglia, Paolo

doi:10.3390/vaccines11071217

Open AccessArticle

Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions

by

Giovanna Deiana

^1,2

,

Marco Dettori

^2,3,4,*

,

Antonella Arghittu

³

,

Antonio Azara

^2,3

,

Giovanni Gabutti

⁵

and

Paolo Castiglia

^2,3,5

¹

Department of Biomedical Sciences, University of Sassari, 07100 Sassari, Italy

²

Department of Medical, Surgical and Experimental Sciences, University Hospital of Sassari, 07100 Sassari, Italy

³

Department of Medicine, Surgery and Pharmacy, University of Sassari, 07100 Sassari, Italy

⁴

Department of Restorative, Pediatric and Preventive Dentistry, University of Bern, 3012 Bern, Switzerland

⁵

Working Group “Vaccines and Immunization Policies”, Italian Society of Hygiene, Preventive Medicine and Public Health, 16030 Cogorno, Italy

^*

Author to whom correspondence should be addressed.

Vaccines 2023, 11(7), 1217; https://doi.org/10.3390/vaccines11071217

Submission received: 1 June 2023 / Revised: 4 July 2023 / Accepted: 5 July 2023 / Published: 7 July 2023

(This article belongs to the Special Issue New Insight in Vaccination and Public Health)

Download Versions Notes

Abstract

:

Artificial intelligence (AI) tools, such as ChatGPT, are the subject of intense debate regarding their possible applications in contexts such as health care. This study evaluates the Correctness, Clarity, and Exhaustiveness of the answers provided by ChatGPT on the topic of vaccination. The World Health Organization’s 11 “myths and misconceptions” about vaccinations were administered to both the free (GPT-3.5) and paid version (GPT-4.0) of ChatGPT. The AI tool’s responses were evaluated qualitatively and quantitatively, in reference to those myth and misconceptions provided by WHO, independently by two expert Raters. The agreement between the Raters was significant for both versions (p of K < 0.05). Overall, ChatGPT responses were easy to understand and 85.4% accurate although one of the questions was misinterpreted. Qualitatively, the GPT-4.0 responses were superior to the GPT-3.5 responses in terms of Correctness, Clarity, and Exhaustiveness (Δ = 5.6%, 17.9%, 9.3%, respectively). The study shows that, if appropriately questioned, AI tools can represent a useful aid in the health care field. However, when consulted by non-expert users, without the support of expert medical advice, these tools are not free from the risk of eliciting misleading responses. Moreover, given the existing social divide in information access, the improved accuracy of answers from the paid version raises further ethical issues.

Keywords:

ChatGPT; vaccines; immunization; myths and misconceptions; public health; artificial intelligence

1. Introduction

Large Language Models (LLMs) are a type of Artificial Intelligence (AI) designed to reproduce human language processing capabilities. They use deep learning techniques, such as artificial neural networks, and are capable of learning and processing large amounts of language data from various sources [1,2]. With extensive training they can generate highly coherent and realistic text. LLMs analyze patterns and connections within the data they have been trained on and use that knowledge to understand and generate language in various fields such as machine translation and text generation [3,4]. LLMs have become increasingly common over the past decade and have been applied across a variety of sectors, including content marketing, customer services, and numerous business applications [5,6].

Launched on 30 November 2022, ChatGPT, an AI-based LLM developed as a non-profit venture by OpenAI (OpenAI, L.L.C., San Francisco, CA, USA), is an advanced modeling conversational chatbot, a program that can understand and generate responses using a text interface. It has gained widespread popularity in a very short time and its latest version GPT-4.0 was released on 14 March 2023. Two versions are currently available: GPT-3.5, which is free to use and is the fastest version, and GPT-4.0, which must be paid for and is considered the most capable version [7,8].

This conversational system is based on a Generative-Pre-Trained Transformer (GPT) architecture, an LLM with over 175 billion parameters, which can be trained on a broad range of internet sources in multiple languages, including books, articles, and websites [9]. ChatGPT shows considerable proficiency in understanding natural language text and is able to generate highly sophisticated, tailored, human-like responses based on detailed questions related to the context of the input text [10,11]. The LLM in ChatGPT uses supervised fine-tuning modeling, reward model building, proximal policy optimization, and reinforcement learning from human feedback, enabling it to incorporate the complexity of user intentions and respond profitably to various end-user activities, interacting in a conversational manner [12,13].

In the scientific and academic community, the rise of LLMs has generated great interest and inspired an increasingly sophisticated debate about the relative benefits and risks and their ethical implications [14,15]. On the one hand, LLMs can be useful in speaking and writing tasks, helping to increase the efficiency and accuracy of the required output. Moreover, they could be incorporated into teaching and learning, such as mentoring and student assignment assistance, and also into academic writing, where these tools could help researchers optimize the time needed for manuscript preparation [16,17]. On the other hand, concerns have been raised about possible biases based on the datasets used, which may limit their capabilities and could result in factual inaccuracies. Additionally, security issues and the potential for spreading misinformation must be considered, as well as ethical and fair access issues relating to the accessibility of these digital tools with particular reference to the availability of a paid version [18,19].

LLMs have also generated intense interest and debate among healthcare professionals and medical researchers, considering their potential to improve health and patients’ lives [20,21]. In particular, taking into consideration the growing amount of medical data and the complexity of clinical decisions, these tools could theoretically help physicians make timely and informed decisions and improve the overall quality and efficiency of healthcare [22,23]. Reference is made here to “long-distance care”, a concept regarding the use of digital technologies to support the healthcare system in order to make service delivery more effective, streamline the communication between healthcare facilities and citizens, simplify booking systems, and ensure quality healthcare. In particular, in the years of the COVID-19 pandemic, in which the digital transition of the health sector was accelerated, examples of this kind were witnessed with advanced technologies related to AI, tele-medicine, tele-rehabilitation, self-medication, digital health interventions (e-health and m-health), electronic referral and online counseling systems, and systems for monitoring and measuring healthy lifestyle behaviors (e.g., remote monitoring of physical activity and proper nutrition, digital education, and self-medication) [24,25,26].

However, while on the one hand numerous efforts have been made in public health to enable citizens to make informed health choices (e.g., voluntary adherence to vaccination) for example, through digital technologies, on the other hand, the World Wide Web is saturated with data and information, and not all of the information may be accurate. This appears even more worrisome when one considers that, nowadays, technological advances have led to the democratization of knowledge, whereby patients no longer rely solely on healthcare professionals for medical information, but they provide their own health education and information themselves. Monitoring this trend through the study of people’s behaviors and attitudes could be a useful tool to help public health authorities, in guiding vaccination policies, designing new health education, and continuing information interventions aimed at both the general public and responsible cohorts such as health care workers [27,28,29].

This has been evident during the recent COVID-19 pandemic, particularly with regard to vaccinations [30,31]. Indeed, previous evidence has shown that the reliance on online sources of information, which can sometimes provide authoritative answers to complex medical questions, was significantly associated with a greater tendency to vaccine hesitancy and a lower willingness to adhere to vaccination recommendation [32,33]. Moreover, anti-vaccine content on the Web exacerbated the already precarious decision-making process, a dynamic conditioned by the traditional influence of social, cultural, political and religious determinants on vaccine acceptance. Due to a marked decrease in vaccine coverage, this exposes the population to the risk of the reappearance of infectious diseases now under control. Despite the fact that the COVID-19 pandemic has reaffirmed the importance of vaccination as an indispensable tool of primary prevention, the vaccine hesitancy phenomenon continues to affect more than 15 percent of the world’s population, compounded by the recent phenomenon of vaccine fatigue [34,35].

Moreover, in addition to the presence of incorrect information on the Web that can exacerbate vaccine hesitancy, the enormous body of information available online is not equally accessible to the entire population. Nowadays, the digital divide represents a recognized critical aspect of health inequality [36,37].

Given the importance of accurate information regarding vaccines, the study aimed to determine the Correctness, Clarity, and Exhaustiveness of ChatGPT’s responses to misleading questions about vaccines and immunization, in order to: (i) evaluate how these new information tools may be able to provide relevant and correct information with regard to vaccination adherence; (ii) evaluate if GPT-3.5, being free, has significant differences from the more advanced, paid version; and (iii) evaluate whether the use of AI, such as ChatGPT, could help increase health literacy and reduce vaccine hesitancy.

2. Materials and Methods

2.1. Study Design

The study was based on the answers given by ChatGPT to the list of the 11 questions concerning “Vaccines and immunization: Myths and misconceptions” published on 19 October 2020, taken into consideration alongside those given by the World Health Organization (WHO) (Table 1) [38].

This list, originally written by the U.S. Centers for Disease Control and Prevention, addresses common misconceptions about vaccination that are often cited by concerned parents as reasons to question the wisdom of having their children vaccinated [39]. Thus, the WHO responded to the listed questions, in order to provide a useful information tool for the general population and health professionals charged with carrying out vaccination. In order to assess whether the answers provided by the chatbot were equally accurate, the listed questions were administered in an individual chat by an investigator (G.D.) to both the free (GPT-3.5) and paid (GPT-4.0) versions of ChatGPT.

2.2. Quantitative and Qualitative Analysis

ChatGPT responses were independently assessed by two Raters with proven experience in vaccination and health communication topics (P.C. and G.G.), randomly identified as Rater 1 and Rater 2. The Raters were aware of the chatbot version from which the answer was formulated. The responses were evaluated according to predefined scales of accuracy considering three items: Correctness, Clarity, and Exhaustiveness. Each response was rated using a 4-point Likert scale scoring from 1 (strongly disagree) to 4 (strongly agree).

The Raters qualitatively analyzed the responses according to the following determinants: (i) Correctness, in terms of plausibility, coherence, scientific veracity, and evidence; (ii) Clarity, in terms of ease of understanding, appropriateness of vocabulary, conciseness, and logical order; (iii) Exhaustiveness, in terms of the degree of completeness of the answer.

2.3. Statistical Analysis

Results were recorded descriptively as mean (±standard deviation; percentage); the percentage was calculated by the formula: (Xob − Xminp)/(Xmaxp − Xminp) × 100, where Xob is “Obtained score”; Xminp is “Minimum score”; and Xmaxp is “Maximum score”. Differences observed in the scores across ChatGPT versions were compared using the Mann–Whitney U test. Inter-observer reliability and overall agreement between Raters were assessed using Cohen’s kappa statistic on all scores. Differences between proportions were tested with the z-test. A statistical significance of p-value < 0.05 was set for all analyses. Differences among groups were tested via the Kruskal–Wallis H test. Statistical analyses were performed with STATA 17 (StatsCorp., College Station, TX, USA).

3. Results

3.1. Quantitative Analysis

Overall, 132 scores were obtained: 11 questions per 3 items per 2 ChatGPT versions per 2 Raters. The scores are listed in Table 2, divided into four groups.

The average answer score for the eleven questions was: 3.18 (±0.80; 79.5%) for GPT-3.5 and 3.61 (±0.65; 90.2%) for GPT-4.0 according to Rater 1; and 3.30 (±0.94; 82.6%) for GPT-3.5 and 3.58 (±0.65; 89.4%) for GPT-4.0 according to Rater 2.

Considering the four groups, Rater 1 assigned the highest value to 12 out of 33 (36.4%) evaluations for the GPT-3.5 version, and 23 out of 33 (69.7%) for the GPT-4.0 (p-value = 0.0067); likewise, Rater 2 assigned the highest value to 18 out of 33 evaluations (54.5%) for the GPT-3.5, and 22 out of 33 (66.7%) for the GPT-4.0. (p-value = 0.311).

The mean scores for the three items were 3.36 (±0.72; 84.1%), 3.32 (±0.86; 83%), and 3.57 (±0.89; 89.2%) for correctness, clarity, and exhaustiveness, respectively, without statistically significant differences by the groups (KW p-value = 0.78, 0.18 and 0.09, respectively). Inter-observer reliability indicated by Cohen’s Kappa value was 0.52 (p-value = 0.0000) for GPT-3.5 and 0.30 (p-value = 0.0147) for GPT-4.0.

Both versions of ChatGPT obtained the maximum score for accuracy in answering question number 8. Answers to questions 2 and 7 obtained full marks for version GPT-4.0. Conversely, the answer to question number 11 was completely accurate in version GPT-3.5, the only answer which obtained a higher score than version GPT-4.0. A significant difference in mean scores between the two versions was found by Rater 1 (p-value = 0.0107), who indicated that version GPT-4.0 was the most accurate. The answer to question number 3 was graded as completely incorrect by the Raters for both ChatGPT versions.

3.2. Qualitative Analysis

Overall, the mean score assigned by the Raters, based on the determinants reported in the Materials and Methods section, to the GPT-4.0 responses was higher than that of the GPT-3.5 responses, with Δ equal to 5.6% for Correctness, 17.9% for Clarity and 9.3% for Exhaustiveness of the answer (Table 3).

In particular, the 11 questions and the evaluations carried out on the basis of the determinants by the two Raters are reported below (S1).

Q.1: Weren’t Diseases Already Disappearing before Vaccines were Introduced Because of Better Hygiene and Sanitation?

Regarding Clarity of content, both Raters judged the answers offered by the GPT-3.5 version as inaccurate. The imprecise information regarding the transmission route of polio and the reference to the eradication of other vaccine-preventable infectious diseases, apart from smallpox, affected the scoring. Both Raters described the use of more appropriate vocabulary and more complete content as reasons for the higher score given to the Clarity item in the GPT-4.0 version.

Q.2: Which Disease Show the Impact of Vaccines the Best?

Regarding the response offered by GPT-4.0, the Raters were unanimous in awarding the highest score for all items considered. In contrast, the lack of appropriateness of vocabulary and scientific veracity negatively affected the scores for the Clarity and Correctness items generated by the GPT-3.5 version.

Q.3: What about Hepatitis B? Does That Mean the Vaccine Didn’t Work?

Overall, the responses given by GPT-3.5 and GPT-4.0 to question Q.3 scored the lowest. In particular, the Raters agreed that the responses from both versions were haphazard from the point of view of the logical description of the content; there were not very exhaustive, and they were difficult to understand. As for the Correctness item, while both Raters considered the information provided by the GPT-4.0 version to be more complete, the inaccuracies in both versions’ responses made the content misleading thereby negatively affecting the score attributed.

Q.4: What Happens if Countries Don’t Immunize against Diseases?

For both versions of the chatbot, plausibility and scientific veracity positively affected the assigned score, especially in the opinion of Rater 1 for the GPT-4.0 version. On the other hand, the order of the content and the difficulty of comprehension detracted from its Clarity. Finally, for the Exhaustiveness item, the response of the GPT-4.0 version was rated by Rater 2 as less complete than that offered by GPT-3.5.

Q.5: Can Vaccines Cause the Disease? I’ve Heard That the Majority of People Who Get Disease Have Been Vaccinated.

In the GPT-3.5 version, the logical order negatively affected the Clarity of the response for both Raters. In contrast, scientific veracity for Rater 1 and degree of comprehensiveness of the response for Rater 2 were the determinants that accounted for the highest score awarded to Correctness and Exhaustiveness, respectively. In the GPT-4.0 version, for both Raters, ease of comprehension and logical order positively affected the scoring, while imprecision regarding HBV and HPV vaccine definitions negatively affected the rating given for Correctness according to Rater 1.

Q.6: Will Vaccines Cause Harmful Side Effects, Illnesses or Even Death? Could There Be Long Term Effects We Don’t Know about Yet?

The GPT-4.0 version was considered more correct, clear, and exhaustive than the GPT-3.5 version. Specifically, with regard to Correctness, the Raters considered both responses to be sufficiently plausible, but the lack of appropriate references to pharmacovigilance accounted for the lower score in the response provided by the GPT-3.5 version.

Q.7: Is it True That There Is a Link between the Diphtheria-Tetanus-Pertussis (DTP) Vaccine and Sudden Infant Death Syndrome (SIDS)?

The Raters agreed that the answers provided by the chatbots were sufficiently plausible and evidence-based. This resulted in the highest score being given to the Correctness item. However, the level of comprehension and appropriateness of vocabulary allowed a higher score to be assigned to the Clarity of the response of GPT-4.0 than to GPT-3.5. In addition, Rater 1 considered the GPT-4.0 version more complete than GPT-3.5.

Q.8: Isn’t Even a Small Risk Too Much to Justify Vaccination?

For both versions of the chatbot, the Raters considered Correctness, Clarity, and Exhaustiveness of the answers to be no less accurate than those of the answers provided by WHO, assigning the highest score to all items.

Q.9: Vaccine-Preventable Diseases Have Been Virtually Eliminated from My Country. Why Should I Still Vaccinate My Child?

The Raters agreed in assigning the highest score to the response provided by GPT-3.5 considering the contents to be correct, clear, and exhaustive. According to Rater 2, some of the content of the response provided by GPT-4.0 was considered inaccurate, particularly in the definition of the concept of herd immunity, resulting in a lower score being assigned to the Correctness item.

Q.10: Is It True That Giving a Child Multiple Vaccinations for Different Diseases at the Same Time Increases the Risk of Harmful Side Effects and Can Overload the Immune System?

Rater 1 considered the GPT-4.0 version more correct, clear, and exhaustive than the GPT-3.5 version, as the closure provided in the latter penalized the consistency, logical order, and degree of completeness of the response. In contrast, the responses of both versions were considered equivalent by Rater 2, although the inaccuracy in reference to the co-administration of vaccines negatively affected the assessment of Correctness.

Q.11: Why Are Some Vaccines Grouped Together, Such as Those for Measles, Mumps and Rubella?

For both Raters, the GPT-3.5 version was the most correct, clear, and exhaustive for the entire set of responses. In contrast, serious content errors were found in the GPT-4.0 version in relation to potential negative interactions among combined vaccines.

4. Discussion

The emergence of innovative and advanced LLMs such as ChatGPT has given rise to a range of concerns and debates, and as such, it is crucial to discuss its potential benefits, future perspectives, and limitations [40,41]. On the one hand, such LLMs could constitute a revolutionary change in education as a whole, as well as in research and academic writing [42,43]. On the other hand, the same technology could facilitate the spread of misinformation and of other types of information detrimental to users, especially in the field of health topics [44,45].

In the present study, we examined the Correctness, Clarity, and Exhaustiveness of ChatGPT responses to common vaccination myths and misconceptions similarly to what WHO did with its responses. Overall, the Raters perceived that the ChatGPT findings provided accurate and comprehensive information on common myths and misconceptions about vaccination in an easy-to-understand, conversational manner, without providing misinformation or harmful information. In particular, the determinants that had the greatest impact on the scores assigned were: scientific veracity, appropriateness of vocabulary, and the logical order chosen for the description of the contents with regard to Clarity and to the completeness of the answer for the Exhaustiveness item.

Nevertheless, in some cases, several aspects of the description of the contents could be improved. For example, in the Raters’ opinion, the answers given by both versions of ChatGPT to Question 2 were misleading. In particular, citing immunization against smallpox as the only example of the significant impact of vaccination, the chatbot suggested that the eradication of the disease they prevent is the only tangible benefit. From ChatGPT, it is not clear why the implementation of mass vaccination is not directly followed by a dramatic drop in the disease incidence. Indeed, the AI tool appears to entirely disregard the benefits offered by vaccination in the short term (e.g., the management of infection clusters and management of the disease as demonstrated with the COVID-19 vaccination) and in the long term (e.g., the impact of vaccination on economic growth and on the sustainability and efficiency of health systems) [46,47,48].

This is worrying if one considers that nowadays “convenience” and “complacency” are among the main determinants of vaccine hesitancy and any perception that the vaccine may not be essential in the prevention of infectious diseases may discourage citizens from adhering to vaccination programs [49,50,51,52]. Indeed, alongside advanced technologies, accurate and accessible medical information communicated by public health operators, particularly in a context of low health literacy, is essential to providing patients with the information needed to improve their understanding and enable them to make informed decisions about their care [53,54,55,56]. It should be noted that the same chatbot advises, both in the answer to Question 3 and Question 6, that it is important to consult your doctor to discuss any concerns or specific circumstances that could influence your decision to be vaccinated.

Moreover, regarding the Correctness of the answers provided, the Raters identified numerous inaccuracies for both versions. In particular, errors regarding the transmission route and the eradication circumstances of some infectious diseases (Question 1) were found. Misclassifications of the HBV (Hepatitis B Vaccine) and HPV (Human Papilloma Virus Vaccine) vaccines, cited as examples of live attenuated vaccines, were noted in the response to Question 5. Other serious inaccuracies were found in the answers to Question 10 and Question 11. In particular, in Q.10, there are clear references only to combined vaccines, with no mention of the rare cases in which the co-administration of vaccines is expressly contraindicated. In the Raters’ opinion, this limits the transparency of the answer and could cause the user to suspect a potential cover-up of the albeit limited contraindications to the co-administration of vaccines, which are expressly reported in the Summary of Product Characteristics (SPC) as for any other drug [57]. Similarly, in Q.11, it is asserted that “combining vaccines can reduce the likelihood of side effects and the potential for negative interactions between vaccines” without mentioning that the combination of several vaccines can sometimes increase reactogenicity (as in the case of the MMRV vaccine, with side effects such as febrile seizures). This concept should also have been expressed more clearly by mentioning that the administration of separate doses can lead to repeated occasions of local events, also described in each SPC [58].

A separate consideration must be made for the answer to Question 3, which received a considerably lower score than the others, causing the authors to suspect that the question may not have been asked correctly. In this regard, the literature describes how even in the common administration of a survey, the consequentiality of the questions could influence the answers given. In fact, even in the WHO questions, Question 3 seems to follow on from the previous one. Therefore, since ChatGPT remembers previous interactions within the same conversation, we decided to resubmit the two questions to both versions of ChatGPT consecutively (within the same conversation) as opposed to independently. In this case, albeit with further room for improvement in terms of Clarity and Exhaustiveness, the Raters deemed the answers returned by GPT-3.5 and GPT-4.0 to have improved significantly, highlighting the fact that the tool may have misunderstood the original question or did not have sufficient elements to generate a completely exhaustive answer. This could stem from the fact that some answers to topics which are as widely debated and rich in history as vaccinations not only assume an in-depth knowledge but also imply that this very knowledge gives rise to a reasoning which is then applied [59,60].

The above-mentioned is relevant when one considers that people are often unaware how accurate and personalized information is obtained and tend to implicitly trust something that mimics human behaviors and responses, such as AI. They therefore fail to validate the information which, when conveyed by tools as up-to-date and widely discussed by the virtual community as ChatGPT, is deemed to be accurate and reliable [61,62,63].

All things considered, given that ChatGPT is expected to improve significantly in very little time, thanks to the continuous updating and refinement of the algorithms and model parameters, the quality and reproducibility of the responses are likely to improve. On the other hand, the fact that only one of the two Raters found a significant difference between the two versions implies that even experts may have differing opinions when answering these questions.

In this regard, many studies in the literature describe how the interpretation of a concept is not only the result of scientific knowledge but also the product of the coordinated actions of various processes such as perception, attention, imagination, thought and memory, which, when added to knowledge, contribute to the elaboration of the perceived concept [64,65,66]. Thus, it follows that a lack of the basic knowledge necessary to discern between what is correct, clear, and exhaustive versus what is not, must be taken into account when referring to how the general public can question an AI whose aseptic and decontextualized responses can influence the reader’s interpretation of the content.

This means that the use of these tools in healthcare settings will require careful consideration in order to prevent potentially detrimental uses, such as bypassing professional medical advice and ethical issues, including the potential risk of bias and factual inaccuracies [67,68]. This was clearly seen during the COVID-19 pandemic, where the spread of misinformation resulted in a growing infodemic [69,70]. In fact, in a context of continuous media exposure to an enormous volume of apparently conflicting news for an inexperienced user, as well as the conflicting opinions on the efficacy of the different vaccines available, finding reliable and safe sources of information was described as a major source of uncertainty [71,72].

Additionally, since these AI tools are only as trustworthy as the data they are trained on, it is important to consider privacy and ethical issues as well. Indeed, the fact that the system does not clarify the sources from which it draws the information could certainly constitute a problem, especially for those aiming to address or investigate scientific issues. Furthermore, many scientific models contain “black boxes”, simplified constructs that omit or completely ignore the details of the underlying mechanisms, constituting a serious methodological problem in the scientific field and highlighting the existence of an approach to science focused solely on explanation and/or simplification. However, ChatGPT’s own answers underlined the importance of reliable and in-depth sources of information, as well as the use of terms associated with uncertainty, emphasizing that the results generated are no substitute for clinical consultation of healthcare professionals.

Finally, the fact that ChatGPT is available for free allows even the most economically disadvantaged patients to access reliable and personalized medical information. On the other hand, the availability of a better-performing version (GPT-4.0) only for paying users, poses the problem of equality in accessing information. Even if we take into consideration the fact that although ChatGPT-3.5 is free, many cannot access it for economic or cultural reasons and are therefore excluded from these sources of information [73,74].

Overall, ChatGPT, and AI tools in general, has the potential to be a valuable resource both for providing immediate medical information to patients and for improving healthcare efficiency and decision-making for healthcare professionals. Indeed, if evaluated and trained by experts on controlled medical information, LLMs like ChatGPT could rapidly transform the communication of medical knowledge.

Study Limitations

The results of the present study should be evaluated based on the following limitations. First, given that the general body of text data ChatGPT is trained on dates back to 2021, accuracy could be scientifically outdated for some topics. However, the WHO published their myths and misconceptions about vaccinations in late 2020, so the information available for compiling answers overlapped. Second, this study was based on a subjective assessment of the content, and this approach may produce slightly varying results based on the expertise of individual evaluators. Moreover, the Raters knew which version the responses were from, so their rating may have been influenced by the pre-conception of higher capacity of GPT-4.0 versus GPT-3.5. However, it is essential to take into consideration the very high-level of professionalism of the experts involved, as well as their skills in the field of vaccination communication, and the fact that the primary objective of the study was not specifically to make a comparison between the two versions but to verify, also in consideration of the fact that the more advanced version is paid and therefore less accessible to many users, whether both versions could provide information suitable for users. In fact, in one question, ChatGPT-3.5 received a higher score than the more advanced version. In any case, as stated on the ChatGPT landing page, it may occasionally produce malicious instructions or biased content, especially considering that the quality and accuracy of the dataset used to train the tool are unknown.

5. Conclusions

LLM technologies, including ChatGPT, represent a further incremental step, and they are rapidly becoming more widespread, generating both opportunities and concerns regarding their potential misuse. Considering their wide availability and potential societal impact, it is critical to exercise caution, acknowledge their limitations and develop appropriate guidelines and regulations with the involvement of all the relevant stakeholders. In particular, the quality of this innovative approach depends, and will continue to depend, more and more on the ability to ask the correct questions as well as on the critical ability of those who use it and will use it, as possible ethical and legal issues could limit potential future applications.

If implemented correctly, ChatGPT could have a transformative impact both in research, by making it more automated or simplified, and in healthcare, by augmenting rather than replacing human expertise, thus ultimately improving the quality of life for many patients. However, despite displaying a high level of Correctness, Clarity, and Exhaustiveness, further studies are needed to improve the reliability of these tools in the online communication environment, particularly concerning patient education, and to ensure their safe and effective use before clinical integration.

Author Contributions

Conceptualization, G.D. and P.C.; data curation, G.D., M.D., A.A. (Antonio Azara), G.G. and P.C.; formal analysis, G.D., A.A. (Antonella Arghittu), G.G. and P.C.; investigation, G.D., M.D., A.A. (Antonella Arghittu), A.A. (Antonio Azara), G.G. and P.C.; methodology, G.D., M.D., A.A. (Antonella Arghittu) and P.C.; Software, P.C.; Supervision, P.C.; validation, G.G. and P.C.; writing—original draft, G.D., A.A. (Antonella Arghittu) and A.A. (Antonio Azara); writing—review and editing, M.D., G.G. and P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest directly relating to this manuscript. GG declares outside this paper having received personal fees for advisory board membership and consultancy from Emergent BioSolutions, the GSK group of companies, Merck Sharp & Dohme, Pfizer, Sanofi Pasteur Italy, Moderna and Seqirus, as well as personal fees for lectures from Merck Sharp & Dohme, Moderna, Pfizer, and Seqirus. PC declares outside this paper having received personal fees for advisory board membership and travel expenses for lectures from the GSK group of companies, Merck Sharp & Dohme, Pfizer, Sanofi Pasteur Italy, Moderna and Seqirus.

References

Hore, S. What Are Large Language Models (LLMs)? Analitycs Vidhya. 2023. Available online: https://www.analyticsvidhya.com/blog/2023/03/an-introduction-to-large-language-models-llms/ (accessed on 28 June 2023).
Muehmel, K. What Is a Large Language Model, the Tech Behind ChatGPT? Data Iku. 2023. Available online: https://blog.dataiku.com/large-language-model-chatgpt (accessed on 28 June 2023).
Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef] [PubMed]
Korteling, J.E.; van de Boer-Visschedijk, G.C.; Blankendaal, R.A.M.; Boonekamp, R.C.; Eikelboom, A.R. Human-versus Artificial Intelligence. Front. Artif. Intell. 2021, 4, 622364. [Google Scholar] [CrossRef]
Howard, J. Artificial Intelligence: Implications for the Future of Work. Am. J. Ind. Med. 2019, 62, 917–926. [Google Scholar] [CrossRef]
Castelvecchi, D. Are ChatGPT and AlphaCode Going to Replace Programmers? Nature 2022. [Google Scholar] [CrossRef] [PubMed]
OpenAI. Introducing ChatGPT. OpenAI 2022. Available online: https://openai.com/blog/chatgpt (accessed on 28 June 2023).
Burak, A. OpenAI ChatGPT, the Most Powerful Language Model: An Overview. Relevant 2023. Available online: https://relevant.software/blog/openai-chatgpt-the-most-powerful-language-model-an-overview/ (accessed on 28 June 2023).
Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. 2020. Available online: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf (accessed on 28 June 2023).
Ray, P.P. ChatGPT: A Comprehensive Review on Background, Applications, Key Challenges, Bias, Ethics, Limitations and Future Scope. Internet Things Cyber-Phys. Syst. 2023, 3, 121–154. [Google Scholar] [CrossRef]
Al Hakim, Z. ChatGPT: A Revolution in Natural Language Processing. Boer. Tecnology. 2023. Available online: https://btech.id/news/chatgpt-a-revolution-in-natural-language-processing/ (accessed on 28 June 2023).
Christiano, P.; Leike, J.; Brown, T.B.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from Human Preferences. 2017. Available online: https://arxiv.org/pdf/1706.03741.pdf (accessed on 28 June 2023).
Manikandan, B. Demystifying ChatGPT: A Deep Dive into Reinforcement Learning with Human Feedback. Medium 2023. Available online: https://bmanikan.medium.com/demystifying-chatgpt-a-deep-dive-into-reinforcement-learning-with-human-feedback-1b695a770014 (accessed on 28 June 2023).
Strasser, A. On Pitfalls (and Advantages) of Sophisticated Large Language Models. 2023. Available online: https://arxiv.org/pdf/2303.17511.pdf (accessed on 28 June 2023).
Deng, J.; Lin, Y. The Benefits and Challenges of ChatGPT: An Overview. Front. Comput. Intell. Syst. 2022, 2. Available online: https://drpress.org/ojs/index.php/fcis/article/view/4465 (accessed on 28 June 2023). [CrossRef]
Huh, S. Are ChatGPT’s Knowledge and Interpretation Ability Comparable to Those of Medical Students in Korea for Taking a Parasitology Examination?: A Descriptive Study. J. Educ. Eval. Health Prof. 2023, 20, 1. [Google Scholar] [CrossRef] [PubMed]
Else, H. Abstracts written by ChatGPT fool scientists. Nature 2023, 613, 423. [Google Scholar] [CrossRef]
Tai, M.C.T. The Impact of Artificial Intelligence on Human Society and Bioethics. Tzu Chi Med. J. 2020, 32, 339–343. [Google Scholar] [CrossRef] [PubMed]
Harrer, S. Attention Is Not All You Need: The Complicated Case of Ethically Using Large Language Models in Healthcare and Medicine. EBioMedicine 2023, 90, 104512. [Google Scholar] [CrossRef]
Li, H.; Moon, J.T.; Purkayastha, S.; Celi, L.A.; Trivedi, H.; Gichoya, J.W. Ethics of Large Language Models in Medicine and Medical Research. Lancet Digit. Health 2023. [Google Scholar] [CrossRef]
Goodman, R.S.; Patrinely, J.R.; Osterman, T.; Wheless, L.; Johnson, D.B. On the Cusp: Considering the Impact of Artificial Intelligence Language Models in Healthcare. Med 2023, 4, 139–140. [Google Scholar] [CrossRef] [PubMed]
Karabacak, M.; Margetis, K. Embracing Large Language Models for Medical Applications: Opportunities and Challenges. Cureus 2023, 15, e39305. [Google Scholar] [CrossRef] [PubMed]
Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large Language Models Encode Clinical Knowledge. arXiv 2022, arXiv:2212.13138. [Google Scholar] [CrossRef]
Das, A.; Padala, K.P.; Bagla, P.; Padala, P.R. Stress of Overseas Long-Distance Care During COVID-19: Potential “CALM”ing Strategies. Front. Psychiatry 2021, 12, 734967. [Google Scholar] [CrossRef]
World Health Organization. mHealth. Use of Appropriate Digital Technologies for Public Health. Available online: https://apps.who.int/gb/ebwha/pdf_files/WHA71/A71_20-en.pdf (accessed on 28 June 2023).
Flores Mateo, G.; Granado-Font, E.; Ferré-Grau, C.; Montaña-Carreras, X. Mobile Phone Apps to Promote Weight Loss and Increase Physical Activity: A Systematic Review and Meta-Analysis. J. Med. Internet Res. 2015, 17, e253. [Google Scholar] [CrossRef] [Green Version]
Materia, F.T.; Faasse, K.; Smyth, J.M. Understanding and Preventing Health Concerns About Emerging Mobile Health Technologies. JMIR Mhealth Uhealth 2020, 8, e14375. [Google Scholar] [CrossRef] [PubMed]
Arghittu, A.; Dettori, M.; Castiglia, P. First Year of Special Issue “New Insights in Vaccination and Public Health”: Opinions and Considerations. Vaccines 2023, 11, 600. [Google Scholar] [CrossRef]
Sufi, F.K.; Razzak, I.; Khalil, I. Tracking Anti-Vax Social Movement Using AI-Based Social Media Monitoring. IEEE Trans. Technol. Soc. 2022, 3, 290–299. [Google Scholar] [CrossRef]
De Coninck, D.; Frissen, T.; Matthijs, K.; D’haenens, L.; Lits, G.; Champagne-Poirier, O.; Carignan, M.-E.; David, M.D.; Pignard-Cheynel, N.; Salerno, S.; et al. Beliefs in Conspiracy Theories and Misinformation About COVID-19: Comparative Perspectives on the Role of Anxiety, Depression and Exposure to and Trust in Information Sources. Front. Psychol. 2021, 12, 646394. [Google Scholar] [CrossRef]
Arghittu, A.; Dettori, M.; Dempsey, E.; Deiana, G.; Angelini, C.; Bechini, A.; Bertoni, C.; Boccalini, S.; Bonanni, P.; Cinquetti, S.; et al. Health Communication in COVID-19 Era: Experiences from the Italian VaccinarSì Network Websites. Int. J. Environ. Res. Public Heal. 2021, 18, 5642. [Google Scholar] [CrossRef] [PubMed]
Ofri, D. The Emotional Epidemiology of H1N1 Influenza Vaccination. N. Engl. J. Med. 2009, 361, 2594–2595. [Google Scholar] [CrossRef]
Hammershaimb, E.A.; Campbell, J.D.; O’leary, S.T. Coronavirus Disease-2019 Vaccine Hesitancy. Pediatr. Clin. North Am. 2023, 70, 243–257. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. Strategic Advisory Group of Experts on Immunization (SAGE). Available online: https://www.who.int/groups/strategic-advisory-group-of-experts-on-immunization (accessed on 28 June 2023).
Stamm, T.A.; Partheymüller, J.; Mosor, E.; Ritschl, V.; Kritzinger, S.; Alunno, A.; Eberl, J.-M. Determinants of COVID-19 vaccine fatigue. Nat. Med. 2023, 29, 1164–1171. [Google Scholar] [CrossRef]
Wawrzuta, D.; Klejdysz, J.; Jaworski, M.; Gotlib, J.; Panczyk, M. Attitudes toward COVID-19 Vaccination on Social Media: A Cross-Platform Analysis. Vaccines 2022, 10, 1190. [Google Scholar] [CrossRef]
Karami, A.; Zhu, M.; Goldschmidt, B.; Boyajieff, H.R.; Najafabadi, M.M.; Graffigna, G. COVID-19 Vaccine and Social Media in the U.S.: Exploring Emotions and Discussions on Twitter. Vaccines 2021, 9, 1059. [Google Scholar] [CrossRef]
World Health Organization. Vaccines and Immunization: Myths and Misconceptions. Available online: https://www.who.int/news-room/questions-and-answers/item/vaccines-and-immunization-myths-and-misconceptions (accessed on 28 June 2023).
Centers for Disease Control and Prevention. Common Vaccine Safety Questions and Concerns. Available online: https://www.cdc.gov/vaccinesafety/concerns/index.html (accessed on 28 June 2023).
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
Lo, C.K. What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Educ. Sci. 2023, 13, 410. [Google Scholar] [CrossRef]
Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M.; et al. “So What If ChatGPT Wrote It?” Multidisciplinary Perspectives on Opportunities, Challenges and Implications of Generative Conversational AI for Research, Practice and Policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
Stokel-Walker, C. AI Bot ChatGPT Writes Smart Essays—Should Professors Worry? Nature 2022. Available online: https://www.nature.com/articles/d41586-022-04397-7 (accessed on 28 June 2023). [CrossRef]
Del Vicario, M.; Bessi, A.; Zollo, F.; Petroni, F.; Scala, A.; Caldarelli, G.; Stanley, H.E.; Quattrociocchi, W. The Spreading of Misinformation Online. Proc. Natl. Acad. Sci. USA 2016, 113, 554–559. [Google Scholar] [CrossRef]
Rodrigues, C.M.C.; Plotkin, S.A. Impact of Vaccines; Health, Economic and Social Perspectives. Front. Microbiol. 2020, 11, 1526. [Google Scholar] [CrossRef]
Tregoning, J.S.; Flight, K.E.; Higham, S.L.; Wang, Z.; Pierce, B.F. Progress of the COVID-19 Vaccine Effort: Viruses, Vaccines and Variants versus Efficacy, Effectiveness and Escape. Nat. Rev. Immunol. 2021, 21, 626–636. [Google Scholar] [CrossRef]
Quilici, S.; Smith, R.; Signorelli, C. Role of Vaccination in Economic Growth. J. Mark. Access Health Policy 2015, 3, 27044. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Arghittu, A.; Dettori, M.; Azara, A.; Gentili, D.; Serra, A.; Contu, B.; Castiglia, P. Flu Vaccination Attitudes, Behaviours, and Knowledge among Health Workers. Int. J. Environ. Res. Public Health 2020, 17, 3185. [Google Scholar] [CrossRef]
Greyling, T.; Rossouw, S. Positive Attitudes towards COVID-19 Vaccines: A Cross-Country Analysis. PLoS ONE 2022, 17, e0264994. [Google Scholar] [CrossRef]
Roberts, C.H.; Brindle, H.; Rogers, N.T.; Eggo, R.M.; Enria, L.; Lees, S. Vaccine Confidence and Hesitancy at the Start of COVID-19 Vaccine Deployment in the UK: An Embedded Mixed-Methods Study. Front. Public Health 2021, 9, 82. [Google Scholar] [CrossRef]
Dettori, M.; Arghittu, A.; Deiana, G.; Azara, A.; Masia, M.D.; Palmieri, A.; Spano, A.L.; Serra, A.; Castiglia, P. Influenza Vaccination Strategies in Healthcare Workers: A Cohort Study (2018–2021) in an Italian University Hospital. Vaccines 2021, 9, 971. [Google Scholar] [CrossRef]
Lee, S.J.; Lee, C.-J.; Hwang, H. The Impact of COVID-19 Misinformation and Trust in Institutions on Preventive Behaviors. Health Educ. Res. 2023, 38, 95–105. [Google Scholar] [CrossRef]
Arghittu, A.; Deiana, G.; Castiglia, E.; Pacifico, A.; Brizzi, P.; Cossu, A.; Castiglia, P.; Dettori, M. Knowledge, Attitudes, and Behaviors towards Proper Nutrition and Lifestyles in Italian Diabetic Patients during the COVID-19 Pandemic. Int. J. Environ. Res. Public Health 2022, 19, 11212. [Google Scholar] [CrossRef] [PubMed]
Lee, S.K.; Sun, J.; Jang, S.; Connelly, S. Misinformation of COVID-19 Vaccines and Vaccine Hesitancy. Sci. Rep. 2022, 12, 13681. [Google Scholar] [CrossRef] [PubMed]
Clemente-Suárez, V.J.; Navarro-Jiménez, E.; Simón-Sanjurjo, J.A.; Beltran-Velasco, A.I.; Laborde-Cárdenas, C.C.; Benitez-Agudelo, J.C.; Bustamante-Sánchez, Á.; Tornero-Aguilera, J.F. Mis–Dis Information in COVID-19 Health Crisis: A Narrative Review. Int. J. Environ. Res. Public Health 2022, 19, 5321. [Google Scholar] [CrossRef]
Miller, E.; Wodi, A.P. General Best Practice Guidance for Immunization. 2021. Available online: https://www.cdc.gov/vaccines/pubs/pinkbook/downloads/genrec.pdf (accessed on 28 June 2023).
Centers for Disease Control and Prevention. Timing and Spacing of Immunobiologics. 2023. Available online: https://www.cdc.gov/vaccines/hcp/acip-recs/general-recs/timing.html (accessed on 28 June 2023).
Grassi, T.; Bagordo, F.; Savio, M.; Rota, M.C.; Vitale, F.; Arghittu, A.; Sticchi, L.; Gabutti, G. Sero-Epidemiological Study of Bordetella Pertussis Infection in the Italian General Population. Vaccines 2022, 10, 2130. [Google Scholar] [CrossRef]
Shukla, V.V.; Shah, R.C. Vaccinations in Primary Care. Indian J. Pediatr. 2018, 85, 1118–1127. [Google Scholar] [CrossRef] [PubMed]
Chatterjee, J.; Dethlefs, N. This New Conversational AI Model Can Be Your Friend, Philosopher, and Guide. and Even Your Worst Enemy. Patterns 2023, 4, 100676. [Google Scholar] [CrossRef]
Dettori, M.; Castiglia, P. COVID-19 and Digital Health: Evolution, Perspectives and Opportunities. Int. J. Environ. Res. Public Health 2022, 19, 8519. [Google Scholar] [CrossRef]
Stokel-Walker, C.; Van Noorden, R. What ChatGPT and Generative AI Mean for Science. Nature 2023, 614, 214–216. [Google Scholar] [CrossRef]
Sarasso, P.; Neppi-Modona, M.; Sacco, K.; Ronga, I. “Stopping for Knowledge”: The Sense of Beauty in the Perception-Action Cycle. Neurosci. Biobehav. Rev. 2020, 118, 723–738. [Google Scholar] [CrossRef]
Arghittu, A.; Deiana, G.; Dettori, M.; Dempsey, E.; Masia, M.D.; Palmieri, A.; Spano, A.L.; Azara, A.; Castiglia, P. Web-Based Analysis on the Role of Digital Media in Health Communication: The Experience of Vaccinarsinsardegna Website. Acta Biomed. 2021, 92, e2021456. [Google Scholar] [CrossRef] [PubMed]
Shin, D.; Kee, K.F.; Shin, E.Y. The Nudging Effect of Accuracy Alerts for Combating the Diffusion of Misinformation: Algorithmic News Sources, Trust in Algorithms, and Users’ Discernment of Fake News. J. Broadcast. Electron. Media 2023, 67, 141–160. [Google Scholar] [CrossRef]
Sallam, M.; Salim, N.A.; Al-Tammemi, A.B.; Barakat, M.; Fayyad, D.; Hallit, S.; Harapan, H.; Hallit, R.; Mahafzah, A. ChatGPT Output Regarding Compulsory Vaccination and COVID-19 Vaccine Conspiracy: A Descriptive Study at the Outset of a Paradigm Shift in Online Search for Information. Cureus 2023, 15, e35029. [Google Scholar] [CrossRef] [PubMed]
Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in Medicine: An Overview of Its Applications, Advantages, Limitations, Future Prospects, and Ethical Considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef]
Castiglia, P.; Arghittu, A. New Insight in Vaccination and Public Health: A Commentary from Special Issue Editors. Vaccines 2022, 10, 183. [Google Scholar] [CrossRef]
Caceres, M.M.F.; Sosa, J.P.; Lawrence, J.A.; Sestacovschi, C.; Tidd-Johnson, A.; UI Rasool, M.H.; Gadamidi, V.K.; Ozair, S.; Pandav, K.; Cuevas-Lou, C.; et al. The Impact of Misinformation on the COVID-19 Pandemic. AIMS Public Health 2022, 9, 262–277. [Google Scholar] [CrossRef] [PubMed]
Dettori, M.; Arghittu, A.; Castiglia, P. Knowledge and Behaviours towards Immunisation Programmes: Vaccine Hesitancy during the COVID-19 Pandemic Era. Int. J. Environ. Res. Public Health 2022, 19, 4359. [Google Scholar] [CrossRef] [PubMed]
Garett, R.; Young, S.D. Online Misinformation and Vaccine Hesitancy. Transl. Behav. Med. 2021, 11, 2194–2199. [Google Scholar] [CrossRef] [PubMed]
Pahl, S. An Emerging Divide: Who Is Benefiting from AI? United Nations Industrial Development Organization. 2023. Available online: https://iap.unido.org/articles/emerging-divide-who-benefiting-ai (accessed on 28 June 2023).
Dozier, M. ChatGPT Creates Digital Divide. Issuu 2023. Available online: https://issuu.com/megmortiz/docs/final_may_for_issuue/s/24995631 (accessed on 28 June 2023).

Table 1. WHO’s list of eleven myths and misconceptions * relating to vaccines and immunization.

Weren’t diseases already disappearing before vaccines were introduced because of better hygiene and sanitation?
Which disease show the impact of vaccines the best?
What about hepatitis B? Does that mean the vaccine didn’t work?
What happens if countries don’t immunize against diseases?
Can vaccines cause the disease? I’ve heard that the majority of people who get disease have been vaccinated.
Will vaccines cause harmful side effects, illnesses or even death? Could there be long term effects we don’t know about yet?
Is it true that there is a link between the diphtheria-tetanus-pertussis (DTP) vaccine and sudden infant death syndrome (SIDS)?
Isn’t even a small risk too much to justify vaccination?
Vaccine-preventable diseases have been virtually eliminated from my country. Why should I still vaccinate my child?
Is it true that giving a child multiple vaccinations for different diseases at the same time increases the risk of harmful side effects and can overload the immune system?
Why are some vaccines grouped together, such as those for measles, mumps and rubella?

* The questions were worded exactly as given on the WHO website, notwithstanding the typo in Question 2.

Table 2. Scores and mean values of accuracy assigned by Raters to the answers provided by GPT-3.5 and GPT-4.0.

Q	Items	Groups				Item Mean	Mean 3.5	Mean 4.0	Total Mean
		Rater 1		Rater 2
		GPT-3.5	GPT-4.0	GPT-3.5	GPT-4.0
1	Correctness	3	3	3	4	3.25	2.67	3.50	3.08
	Clarity	2	3	2	3	2.50
	Exhaustiveness	3	4	3	4	3.50
2	Correctness	3	4	3	4	3.50	3.17	4.00	3.58
	Clarity	3	4	2	4	3.25
	Exhaustiveness	4	4	4	4	4.00
3	Correctness	2	3	1	2	2.00	1.17	2.17	1.67
	Clarity	1	2	1	2	1.50
	Exhaustiveness	1	2	1	2	1.50
4	Correctness	3	4	3	3	3.25	3.17	3.33	3.25
	Clarity	3	3	3	3	3.00
	Exhaustiveness	3	4	4	3	3.50
5	Correctness	4	3	3	4	3.50	3.33	3.83	3.58
	Clarity	3	4	3	4	3.50
	Exhaustiveness	3	4	4	4	3.75
6	Correctness	3	4	3	3	3.25	3.33	3.83	3.58
	Clarity	3	4	4	4	3.75
	Exhaustiveness	3	4	4	4	3.75
7	Correctness	4	4	4	4	4.00	3.50	4.00	3.75
	Clarity	3	4	3	4	3.50
	Exhaustiveness	3	4	4	4	3.75
8	Correctness	4	4	4	4	4.00	4.00	4.00	4.00
	Clarity	4	4	4	4	4.00
	Exhaustiveness	4	4	4	4	4.00
9	Correctness	4	4	4	3	3.75	4.00	3.83	3.92
	Clarity	4	4	4	4	4.00
	Exhaustiveness	4	4	4	4	4.00
10	Correctness	3	4	3	3	3.25	3.33	3.83	3.58
	Clarity	3	4	4	4	3.75
	Exhaustiveness	3	4	4	4	3.75
11	Correctness	4	2	4	3	3.25	4.00	3.17	3.58
	Clarity	4	3	4	4	3.75
	Exhaustiveness	4	3	4	4	3.75
Total items’ mean	Correctness	3.36	3.55	3.18	3.36	3.36
	Clarity	3.00	3.55	3.09	3.64	3.32
	Exhaustiveness	3.18	3.73	3.64	3.73	3.57
Total	mean	3.18	3.61	3.30	3.58	3.42	3.24	3.59	3.42
	DS	0.80	0.65	0.94	0.65	0.65	0.87	0.65	0.79
	%	79.5	90.2	82.6	89.4	85.4	81.1	89.8	85.4

Table 3. Mean values of the three items assigned by Raters on the answers provided by GPT-3.5 and GPT-4.0.

Item	Mean Score		Percentage (%)		Δ (%)
Item	GPT-3.5	GPT-4.0	GPT-3.5	GPT-4.0	Δ (%)
Correctness	3.27	3.45	81.8	86.4	5.6
Clarity	3.05	3.59	76.1	89.8	17.9
Exhaustiveness	3.41	3.73	85.2	93.2	9.3

Correctness: plausibility, coherence, scientific veracity, and evidence; Clarity: ease of understanding, appropriateness of vocabulary, conciseness, and logical order; Exhaustiveness: degree of completeness.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deiana, G.; Dettori, M.; Arghittu, A.; Azara, A.; Gabutti, G.; Castiglia, P. Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions. Vaccines 2023, 11, 1217. https://doi.org/10.3390/vaccines11071217

AMA Style

Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions. Vaccines. 2023; 11(7):1217. https://doi.org/10.3390/vaccines11071217

Chicago/Turabian Style

Deiana, Giovanna, Marco Dettori, Antonella Arghittu, Antonio Azara, Giovanni Gabutti, and Paolo Castiglia. 2023. "Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions" Vaccines 11, no. 7: 1217. https://doi.org/10.3390/vaccines11071217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Quantitative and Qualitative Analysis

2.3. Statistical Analysis

3. Results

3.1. Quantitative Analysis

3.2. Qualitative Analysis

4. Discussion

Study Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI