Next Article in Journal
A Survey on Big Data in Pharmacology, Toxicology and Pharmaceutics
Next Article in Special Issue
Intelligent Multi-Lingual Cyber-Hate Detection in Online Social Networks: Taxonomy, Approaches, Datasets, and Open Challenges
Previous Article in Journal
Proposal of Decentralized P2P Service Model for Transfer between Blockchain-Based Heterogeneous Cryptocurrencies and CBDCs
Previous Article in Special Issue
A Space-Time Framework for Sentiment Scope Analysis in Social Media
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter

by
Somayeh Labafi
1,*,
Sanee Ebrahimzadeh
2,
Mohamad Mahdi Kavousi
3,
Habib Abdolhossein Maregani
4 and
Samad Sepasgozar
5
1
Iranian Research Institute for Information Science and Technology (IranDoc), Tehran 1314156545, Iran
2
Department of Media Management, University of Tehran, Tehran 1411713114, Iran
3
Institute of Higher Education of Ershad Damavand, Tehran 1349933851, Iran
4
Department of Business Management, University of Tehran, Tehran 1738953355, Iran
5
Faculty of Arts, Design and Architecture, The University of New South Wales, Sydney, NSW 2052, Australia
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2022, 6(4), 160; https://doi.org/10.3390/bdcc6040160
Submission received: 21 September 2022 / Revised: 1 December 2022 / Accepted: 5 December 2022 / Published: 19 December 2022

Abstract

:
Evidence-based policy seeks to use evidence in public policy in a systematic way in a bid to improve decision-making quality. Evidence-based policy cannot work properly and achieve the expected results without accurate, appropriate, and sufficient evidence. Given the prevalence of social media and intense user engagement, the question to ask is whether the data on social media can be used as evidence in the policy-making process. The question gives rise to the debate on what characteristics of data should be considered as evidence. Despite the numerous research studies carried out on social media analysis or policy-making, this domain has not been dealt with through an “evidence detection” lens. Thus, this study addresses the gap in the literature on how to analyze the big text data produced by social media and how to use it for policy-making based on evidence detection. The present paper seeks to fill the gap by developing and offering a model that can help policy-makers to distinguish “evidence” from “non-evidence”. To do so, in the first phase of the study, the researchers elicited the characteristics of the “evidence” by conducting a thematic analysis of semi-structured interviews with experts and policy-makers. In the second phase, the developed model was tested against 6-month data elicited from Twitter accounts. The experimental results show that the evidence detection model performed better with decision tree (DT) than the other algorithms. Decision tree (DT) outperformed the other algorithms by an 85.9% accuracy score. This study shows how the model managed to fulfill the aim of the present study, which was detecting Twitter posts that can be used as evidence. This study contributes to the body of knowledge by exploring novel models of text processing and offering an efficient method for analyzing big text data. The practical implication of the study also lies in its efficiency and ease of use, which offers the required evidence for policy-makers.

1. Introduction

Social media play a special role in our daily lives. These platforms, which are based on the ideology and technology of Web 2 [1], have transformed the way people communicate and interact [2]. They work based on user participation in content generation and have led to the emergence of active and dynamic users instead of passive citizens, so that users are more engaged in diverse social developments [3]. In the meantime, such media are increasingly used to provide feedback on various social issues. In fact, with the emergence of participatory social media, a new ecosystem has come to light that facilitates citizen participation in social events [4]. Sharing content by citizens can give rise to social values, such as building a social discourse or raising awareness on political and economic issues [5]. The use of values created in this environment by the governments can lead to the formation of good governance in all countries, especially developing countries [6].
In the wake of the platform revolution, the role of policy-makers has changed. They have taken on new roles, including actively analyzing and extracting knowledge from the opinions of citizens using digital platforms [7]. One of the best ways to identify and understand citizens’ views in a bid to take them into account in the policy-making process is using social media platforms [8,9]. Policy-makers need to be aware of citizens’ opinions expressed in the form of social media posts, as they are published through minimum gatekeeping [10]. This creates an exceptional chance for policy-makers to interact more effectively with the citizens, learn about their needs and opinions, and take them into account in the policy-making process [11]. In fact, social media can serve as a channel between users and policy-makers and can be used as a new source for engaging citizens in formulating and implementing policies [12]. This also creates a great opportunity for governments to learn about their citizens’ views and communicate effectively with them. Social media analysis has drawn more attention in recent years, with governments seeking to take advantage of this opportunity to boost user participation on social media [13].
Social media data analysis is not new to the literature, and researchers and experts have already analyzed various datasets in different countries through diverse methods and techniques over the past few years [14,15]. However, the application of these data and their analysis in the field of policy-making is a novel concept and there are many gaps and challenges that should be addressed [12,13,14,15,16]. Policy-makers have started using social media data in various sectors, including education [17], health [18], and communications [10]. Analyzing these data can contribute to improving the performance of governments, boosting the quality of services, creating new and developed forms of interaction with citizens, and promoting the welfare of citizens [3]. Doing so can not only help governments improve their decision-making and governance, but also revolutionize the creation and provision of such services [19,20].
Based on the evidence-based policy approach, the policy-maker is obliged to use various types of evidence. Traditionally, the basis of “evidence” is the knowledge elicited from applied research, statistics, surveys and focus groups, and so on. However, now, policy-makers cannot ignore the evidence on various issues that is being widely produced by citizens on social media [21,22]. Since the question of how data retrieved from social media can be used in the policy-making process has not been answered yet [12], finding a response is of great significance. Even though various studies have been conducted on social media analysis and policy-making in recent years [20,22,23], social media have not been viewed as a source of much-needed evidence in the policy-making process. The significance of social media data used as evidence in the policy-making process is undisputed [24,25], but the unanswered problem is how to organize a large body of scattered data and use them as evidence in the policy-making process. Therefore, policy-makers need to distinguish “evidence” from “non-evidence” among a plethora of social media posts. There is an indication that Iranian policy-makers welcome the use of Twitter analytic tools, including monitoring dashboards, to obtain insights into diverse areas, including public health, education, technology, etc. To date, however, the Iranian technology policy-makers have not systematically used social media data in the policy-making process, despite the incontrovertible significance of such data, which can reflect the opinions and feedback of the users. This is set against the backdrop of the fact that the absence of social media users’ views who are already sensitive to policy issues can challenge the formulation, implementation, and evaluation of policies in the future. To this end, the present study sought to provide a model for detecting evidence in Twitter posts. After being designed and tested, a model was proposed to be used during the process of technology policy-making in Iran.
In the first phase of this study, the authors drafted the characteristics of the tweets to be considered as evidence in the field of technology by reviewing the related literature and also by conducting semi-structured interviews with technology policy experts in Iran. By evidence, the authors mean tweets that are relevant, critical, and representative of the facts on the ground, which are capable of creating a criterion for the policy-makers to base their decisions on an evidence-based, rather than an intuition-based, approach. By doing so, the evidence-based approach can help reduce uncertainties and bridge the gap between speculations and facts. In the second phase, the researchers collected the 6-month data from Twitter accounts that pertained to the field of technology and labeled them as evidence and non-evidence. Next, they developed a model, which was based on the six-month Twitter data, to distinguish evidence from non-evidence.
In the following sections, first, the related background studies will be reviewed, then the steps of conducting the research study will be explained, and finally, its applications will be discussed.

2. Background

Technology policy was chosen as a controversial area, given the challenges facing its implementation and the public outcry it occasionally triggers. As a developing country, Iran has been facing complexities in terms of emerging technology policy. The multiplicity of formal and informal policy-making institutions, high conflict of interest among policy stakeholders, lack of transparency in policy formulation and implementation, insufficient knowledge of the technologies, the uncertainty of social, economic, and cultural effects of these technologies, and the complications of the economic environment are among such complexities [26,27]. Needless to say, the policies related to technology in Iran, as in many countries, are limited and too general, lagging behind the emerging technologies and preventing the policy-makers from responding appropriately to the rising problems. To reduce the potential inefficiencies, the policy-makers must be aware of the needs, interests, and desires of all stakeholders in the field of technology.
By January 2020, about 70 percent of Iranians had access to the Internet, a sharp increase of 11 percent from 2019. It is also estimated that more than 33 million Iranians (40%) use social networks. Many Iranians take to Twitter to express their critical and expert-level views on diverse social and political issues. By focusing on content rather than on the users’ profiles, Twitter has created an environment where users can easily debate popular topics, with features such as hashtags and algorithmic timelines [28]. This thematic arrangement of content, as well as the openness and real-timeness of Twitter [29], has enabled experts and policy-makers in various fields to use it as a source of ideas to set their policy priorities [30].
Twitter is ironically expanding its users and is estimated to have over 3.2 million users in the selected country of this study, with nearly 790 million tweets (including 200 million retweets) posted between 21 March 2021 and 20 March 2022. [31]. Twitter users in Iran were estimated to be around 2 million a year ago, with 500 million tweets (including 200 million retweets) published every year. It is evident that the users are mostly from the educated strata of Iranian society, and most of the content is created by the users themselves, and unlike other platforms (such as Instagram), there is less copy and paste. Twitter has become highly politicized in Iran, and most policy-makers, public opinion leaders, and experts in various fields have official Twitter accounts. Via Twitter, Iranians from different walks of life, from ordinary citizens to the members of parliament and cabinet members, can express their views on diverse social, political, and economic issues in society. Iranians have found that launching a Twitter storm is an effective way to spread an idea, belief, demand, or protest. They already use hashtags and share them to make certain issues that have been underreported by the mainstream media trending on Twitter, as part of a strategy to grant themselves a voice [32]. It can be argued that Twitter data, as a representation of public opinions, can influence the policy-making agenda.
Nowadays, policy-makers actively take advantage of social media networks to reach a wider audience, raise awareness of the issues that matter to them, promote their views, mobilize supporters, and receive timely feedback [33]. Given the prevalence of Twitter among Iranian intellectual figures, policy-makers have already kept an eye on the content produced and spread on the platform. If detected and analyzed, such data can be translated into relevant evidence that is much needed by the policy-makers in emerging areas, including technology policy.

3. Literature Review

3.1. Evidence-Based Policy

Evidence-based policy is defined as the systematic use of evidence in formulating public policies. It is an approach that has its roots in evidence-based medicine and treatment [34,35]. Evidence-based policy became pervasive once the modernization of governments rose to the top of the agenda of countries and the tendency for policy-making based on social science analysis increased [35,36]. Policy-makers and government officials sought to respond to the needs and pressures of citizens for whom social services were inadequate or unsuitable. Frustrated with the low efficiency of policies used in the social programs, the policy-makers and government managers were prompted to seek new policy approaches [37]. Policies based on old ideologies no longer worked, and the policy-makers welcomed more modern approaches [27]. These new orientations were coupled with the tendencies of major NGOs, guilds and other stakeholders in the public space to become involved in solving the existent problems [38]. The awakening of policy-makers opened up an opportunity for the public policy field to offer solutions in the form of new approaches to gain more control over the ambiguous and confusing realities of the policy-making environment. Policy-makers adopted an evidence-based policy approach as part of an effort to solve their problems and increase policy efficiency. This new approach is the result of the emergence of the inefficiencies of government policy-making and its implementation in various fields [8,39] and aims to develop policy alternatives to boost the quality of decisions made by policy-makers.
The evidence-based policy approach is based on two contexts without which proper enforcement of this policy-making approach is not possible. The first context is a favorable political culture that allows the inclusion of transparency and rationality in the policy-making process. The second includes a research culture committed to analytical studies using rigorous scientific methods that generate a wide range of evidence for policy-making [40]. Sufficient information is one of the prerequisites of a good policy [29,30,31,32,33,34,35,36,37,38,39,40,41]. Policy-making requires a variety of evidence in complex, variable, and high-risk policy areas. Therefore, generating up-to-date, multidimensional, and multilevel evidence can further help policy-makers in this field [42]. There are several reasons why the traditional evidence collected through applied research, which is available to the policy-makers, is not sufficient for effective policy-making. First, it is not possible to obtain accurate findings on key issues in many areas through research. Second, policy-makers and politicians are often influenced and motivated by many factors other than research evidence [41,42,43]. Therefore, it can be concluded that the availability of reliable research results alone does not guarantee its effectiveness. Valid, extensive, and multidimensional evidence is largely missing in evidence-based policy. Producing evidence with these characteristics demands consistent and long-term efforts by policy-makers and researchers.

3.2. What Is the Evidence?

An evidence-based policy cannot work properly and yield expected results without accurate, sufficient, and appropriate evidence. Searching for accurate and reliable evidence and efficiently using it in the policy-making process is viewed as one of the underlying principles of the evidence-based policy approach [39,40,41,42,43,44]. The following three types of evidence are important in policy-making: political evidence, analytical and technical evidence, and evidence collected from the field and through professional experience [24]. In fact, these types of evidence offer three kinds of lenses for policy-making. Each is produced with respective knowledge, and professional and political protocols and is subject to policy-makers’ interpretations, limitations, and constraints [45].
In evidence-based policy, the scope of evidence must be expanded to include all types of evidence [40,41,42,43,44,45,46]. Traditionally, the basis of “evidence” as the foundation of evidence-based policy is the knowledge produced via applied research on comprehensive trends and the explanation of social and organizational phenomena [46]. Based on the type of policy issue and the ability and capabilities of policy-makers in data collection [34,35,36,37,38,39,40,41,42,43,44,45,46,47], governments need to have a certain level of evidence analysis capabilities for policy-making. In practice, however, such a capability does not always exist, and governments often fail to systematically analyze a large body of formal and informal evidence and incorporate it into the policy-making process [48,49].
From an evidence-based policy perspective, the question is as follows: what kind of data/information is needed to produce evidence? Some policy researchers and analysts have started asking the question of whether the persistence of complex social problems is due to the lack of data available to policy-makers [50,51]. However, now, the evidence shows that obtaining more data to fill the gaps will not necessarily lead us to good policy solutions because the most important step to take in policy-making is to reconcile different values and political views with scientific evidence within a policy-making system that is capable of approximating the evidence elicited from the opinions of stakeholders.
Hence, there is disagreement about what kind and quality of evidence can help improve policies. In a broader sense, it can be argued that there is no single basis to define what evidence is [27]. All types of different evidence must be used in a more inclusive context [37], that is, formal and informal evidence needs to be integrated to meet policy needs.

3.2.1. Challenges of Using Evidence in Policy-Making

Quantitative data and empirical methods have been used as a tool to provide accurate and reliable evidence for policy-makers for many years. Over the years, the focus has been on the advantages, such as the high accuracy of these methods in producing quantitative and analytical evidence [43,44,45,46,47,48]. However, there have always been three main challenges confronting the use of this evidence in policy-making. The first stems from the inherently political and value-oriented nature of policy and decision-making [45], which does not allow the use of this type of evidence in policy-making. The second is related to the fact that evidence is produced in different ways by different actors who look at the phenomena through different lenses. The third challenge is the complicated policy network that has made evidence difficult to use. Policy network actors interpret, understand, and prioritize evidence in different ways based on different experiences and values [45]. In the real world, policies do not follow analytical and empirical evidence but are rather based on judgments, values, and so on. Policy-making is a vague, politicized process, and is sometimes contradictory to the original paradigms [52], and requires compromise, adjustment, and agreement that make it difficult to use evidence.
A crisis or emergency, political priorities, and changing social values in public opinion are among the cases where governments face difficulty in using evidence [27,49,53,54]. These are some of the complicated issues faced by policy-makers who will be entangled with problems in producing appropriate and multidimensional evidence, while pursuing the evidence-based policy approach. Instances of such policies can be found in areas related to moral and ethical issues. These are some of the major problems threatening evidence producers, which have validated the use of different types of evidence-based methods on different foundations in the policy-making process to meet the aforementioned challenges.

3.2.2. Evidence from Public Engagement

In public environment in which government policies are implemented in the public sphere, or more specifically the public opinion sphere, all the policy stakeholders are present. This environment reacts to different policies and the success or failure of policies is determined in this sphere. The policy-makers must respond to this complex environment in which stakeholders are present and encourage them to implement the policies. There is an emphasis on the need to collect and produce evidence based on the engagement of the public and all policy stakeholders, which is viewed as the basis for producing multidimensional and comprehensive evidence [54]. Engaging all stakeholders in a policy is often considered as a part of the policy-makers’ long-term relationship with them, as well as the development of public engagement capabilities [55,56].
Involving citizens’ digital participation in policies and employing it in the policy-making process in the form of evidence is already underway. Policy-makers need to use citizens’ digital interaction data to learn about public views and the way public discourses are formed, who the main stakeholders are, and what their expert groups and communications are like, which are growing on a daily basis [55,56,57]. We are now witnessing an increase in the use of new technologies to collect and analyze citizens’ online participation data [58,59]. These tools operate as a complement to the conventional data collection and analysis methods to generate evidence. This adds to the questions about how these digital audiences can provide useful evidence for policy-makers.

3.3. Social Media Data as Evidence in the Policy-Making Process

Social media data offer a good representation of the big data of any country, opening up many opportunities to raise awareness about the demands of citizens. Social media data enable accurate predictions, create knowledge, bring about new services, and can lead to providing citizens with better facilities [10,16,60]. Using social media data, policy-makers can reduce policy costs and achieve sustainable development in a variety of areas. In addition, social media data can enhance policy transparency, increase the credibility of governments and policy-makers, boost government oversight performance, and narrow the gap between government oversight and the realities in society [61]. Such characteristics of social media data can enable policy-makers to boost the oversight of citizens’ digital interactions [12]. In addition, the ideal combination of social media data with policy-makers’ background knowledge will result in more relevant information. Moreover, social media can generate large amounts of data, in contrast to the traditional sources of evidence, and can garner novel insights into how stakeholders think about policy issues. The value of social media data as part of the policy-making cycle and using evidence collected from social media is highly practical; therefore, it is not possible to ignore its significance [62]. Using social media data in policy-making is not a new issue, as various tools have already been developed for collecting, analyzing, and visualizing social media content almost on a daily basis [60,61,62,63].
Recent research has focused on combining social media data with the policy-making process and how these resources can enable policy-makers to obtain fresh insights. Fernandez et al. [64] employed the citizen participation (CP) framework and linked it to the digital participation created through social media. Panayiotopoulos, Bowen, and Brooker [13] used the crowd capabilities conceptual framework to underline the value of social media data for policy-makers in the policy-making process. They question how policy-makers understand the value of social media data and which of the collective capabilities of social media data can meet the needs of policy-makers for filtering data input in the policy-making process. Edom, Hwang, and Kim [5] examined the roles that communities formed on social media can assume in promoting the social responsibility of governments. They presented a typology that features government communication on social media in which social media data become a new resource for policy-makers and increases communication between policy-makers and citizens. Gintova [63] believes that few studies have shed light on the behaviors and views of users on state social media so far. In her study, she analyzed the experiences of social media users and the way they interact on the Twitter and Facebook pages run by a Canadian federal government agency. Driss, Mellouni, and Trabelsi [12] provide a conceptual framework for employing data generated by citizens on Facebook in policy-making. According to their research results, social media data can be used in two phases of the policy-making cycle, i.e., the definition of the problem and policy evaluation. Napoli [64] argues that social media, based on the framework of “public resources”, are a public resource that can and should be used to promote public interests. He believes that there is a positive correlation between extracting and analyzing social media data and improving public life [53] investigated the impact of social media on citizens’ participation in events that needed policy intervention. They concluded that more access and activity on social media result in better participation in such affairs. Lee, Lee, and Choi [29] examined the impact of politicians’ Twitter communications on advancing their policies, assuming that politicians around the world are increasingly using social media as a channel of direct communication with the public. The study results indicated that communication breakdowns on social media by politicians would have the greatest impact on policy implementation during the process of policy-making, causing the users to distrust the enforcement of the policies. Simonowski, [16] present a framework for analyzing social media data and how to use them in policy-making. They integrate data collected from users’ digital participation on electronic platforms and social media, employing them at various stages of the policy-making process.
Previous research works into social media analysis and policy-making already corroborate the importance of using social media data in policy-making. However, some studies have questioned the use of such data as evidence in policy-making [12,41,58,63,64,65].

4. Methods

4.1. Proposed Model for Evidence Detection

One of the major objectives that the present research seeks to realize is the identification of the characteristics that the Twitter posts need to possess to be considered as policy evidence. Since labeled data (evidence/non-evidence) were required to train the evidence detection model, the researchers carried out in-depth, semi-structured interviews with the Iranian technology policy practitioners to elicit the characteristics. The experts were selected based on their expertise and several years of practical experience related to technology policy-making. The interviews began with general questions about the effectiveness of Twitter in policy-making and proceeded based on the statements made by the interviewees. Prior to the interview, an interview guide, which contained a series of open-ended questions aimed at further preparing the interviewees, was emailed to them. The interviews were transcribed and analyzed via the thematic analysis technique to extract the characteristics of tweets that were deemed as tech-related policy evidence by the policy-makers, based on which the tweets were labeled as evidence or non-evidence in the next phase of the study. Some challenges arose while conducting the interviews with the experts and policy-makers about identifying the characteristics of the so-called evidence tweets. The definition of what was referred to as “evidence” was a subjective concept and lacked an objective criterion. In addition, some of the definitions had a broader scope and included professional experience, political knowledge, ideas of stakeholders, etc., while some others had a narrower definition of evidence based on statistical comparisons. In aggregate, 96 themes, 32 sub-themes, and 15 concepts (characteristics) were extracted from 480 comments and meaningful sentences. Table 1 lists the characteristics of the tweets considered as evidence by the interviewees.
To detect the evidence tweets, all tweets had to be labeled as evidence or non-evidence, based on the characteristics extracted from interviews with technology policy-making experts, which were conducted earlier. The tweets were divided into two classes, evidence and non-evidence-based, using the above-mentioned characteristics during the labeling process.

4.2. Data Set and Feature Engineering

In order to develop an evidence detection model, as one of the other major objectives of this research, the following steps were carried out.

4.2.1. Data Collection

All of the data used in the present study were collected from Persian tweets posted over six months using the Twitter API tool, a data extraction tool employed by the developers. First, 39 keywords related to “technology policy in Iran” were selected by reviewing the literature in Persian [59,60,61,62,63,64,65,66]. The tweets were searched and collected based on the selected keywords. As there were few tweets for some keywords, the authors decided to remove them from the list and place a greater focus on other more frequently used keywords. There were also some other relevant keywords commonly used by the users that did not exist in the technology policy-making jargon. The authors decided to include these keywords with the aim of collecting more tweets related to technology policy-making. They searched and collected tweets that contained the hashtags of these keywords. Based on the selected keywords, 28,277 tweets were initially collected. Nearly half of the tweets, which included duplicate tweets, retweets, spam, advertisements, or irrelevant tweets, were removed from the dataset during the pre-processing phase, leaving 14,029 tweets.

4.2.2. Feature Extraction

After the pre-processing step, the appropriate features were extracted. In the previous research on social network user behaviors, account-based and text-based features were employed to analyze user data, implying the successful use of these features in analyzing Twitter posts. Many studies on social media analysis [56,57,58,59,60,61,62,63,64,65,66,67] have already employed account-based features to evaluate user profiles and text-based features to identify the behavioral patterns used in the text. As far as the authors are concerned, these features have not been used in any of the studies on policy-making to distinguish evidence from non-evidence. Therefore, given the successful use of these features in various studies, the authors decided to employ the selected features in order to distinguish evidence from non-evidence items. The research study also aimed to extract new features that can contribute to detecting evidence posts. Accordingly, both text-based and account-based features were used to distinguish evidence posts from non-evidence posts. Table 2 shows the text-based features used in the study.
Table 3 lists account-based features that showcase the characteristics of the accounts.

4.2.3. Feature Selection

In this phase, to select the best subset of the features, the information gain metric was used to determine the value of each feature. Since the most effective feature for classification is the one that decreases entropy, the information gain metric is used for measuring the amount of entropy decline. The information gain is calculated via the following formula:
Gain S . A = E n t r o p y S j A S j S E n t r o p y S j
The Table 4 shows the gain values of each feature. Some features extracted via the evidence detection model are more important for the model, while others are less important.
In calculating the information gain of each feature, the value of some features was zero, which may be because the values of those features were the same for both evidence and non-evidence labels. For example, all the user accounts in the dataset had descriptions and profile pictures in their accounts. It was also made clear that the time of posting tweets (around the clock) was almost equally distributed between the evidence and non-evidence categories, so features such as the time of posting the tweets were not considered an important feature in training the algorithm.
Taking into account the values of the information gains of the features, different subsets of data were examined to implement the classification model. Accordingly, features 1 to 25 were chosen as the best subset of features in this study, which can train the model to detect evidence tweets with a higher degree of accuracy. This subset included 20 text-based features and 5 account-based features. The Figure 1 shows the information gained from each feature.

4.3. Proposed Classification Approach

Machine learning algorithms have been used to distinguish evidence tweets from non-evidence tweets. Since we aim to achieve specific outputs for samples via evidence detection, our approach is based on supervised learning problems and classification techniques. The steps for implementing our proposed model are presented in the framework (Figure 2). There are two main processes in the supervised learning approach, which include training the algorithm and testing the trained algorithm. The data that are used for this purpose have to be prepared based on the extracted feature set for each sample, for example, S is shown as S = {f1, f2, f3,…, fn} and labeled as L. Therefore, each instance in the dataset is shown as Data = {(S1, L1), (S2, L2), (S3, L3), …, (Sn, Ln)}. Then, to train the machine learning (ML) algorithms, the dataset is split into two parts, including train and test sets. In order to train the ML algorithm, the whole samples in the larger part of the data (train set), along with their labels, are given to the ML algorithm (classifier) created in the format of the data above, and then the second part is used for testing it.
The test set consists of samples that the ML algorithm has never encountered before. The whole samples in the test set without their labels are given to the trained classifier in the format of Test = {S1, S2,…, Sn}. Then, the trained classifier predicts and assigns the label of each sample to it. Finally, the predicted class of each sample is compared with its original label. A decision tree is already commonly used as an inductive inference algorithm to solve classification problems and develop prediction models. Decision trees are known as the most popular class of function, as Chapelle and Chang (2011) reported in their study. The decision tree is known as one of the most widely used classification algorithms in supervised learning problems. Due to its simplicity in interpretation and high power in classifying classes, this algorithm is also widely employed as one of the most powerful algorithms in classification problems. The structure of a decision tree is similar to a tree and consists of roots, nodes, and leaves. The best feature is located at the root and each node is compared by the feature values. The leaves of the tree represent the final class results for the samples in question. In Section 5.3 of this study, we tested several algorithms, with the decision tree showing the best performance.
The performance of each classifier can be assessed by evaluation metrics.

4.4. Evaluation Metrics

To examine the performance of the evidence detection model, we used the metrics widely used by researchers in classification problems. After comparing the labels predicted by the classifiers with the actual label of the samples, the results were grouped into the following four categories:
  • True positive (TP): tweets that belong to class Evidence (E) and are correctly predicted as class E.
  • False positive (FP): tweets that do not belong to class E and are incorrectly predicted as class E.
  • True negative (TN): tweets that do not belong to class E and are correctly predicted as class non-E.
  • False negative (FN): tweets that belong to class E and are incorrectly predicted as class non-E.
The above classifier definitions are displayed via the confusion matrix in the Table 5.
The performance of a classifier can be evaluated by accuracy, precision, recall, and F-measure metrics. Precision, recall, and F-measure metrics are calculated using confusion matrix values according to the following formulas:
Precision = TP/(TP + FP)
Recall = TP/(TP + FN)
F-measure = (2 ∗ Precision ∗ Recall)/(Precision + Recall)
Recall is defined as the number of correct results divided by the number of results that should have been returned, while precision is how closely the measurements are gathered around a point estimation. The F score is the harmonic mean (average) of the precision and recall. Accuracy is one of the most common metrics in evaluating classifiers’ performance. This metric is calculated using the following formula:
Accuracy = (TP + TN)/(TP + TN + FP + FN)

5. Experiments and Results

In this section, first, the significance of the topics from the users’ viewpoints based on the number of tweets that contain the keywords is reviewed, and then the users’ behaviors are analyzed, and finally, the proposed evidence detection model is assessed.

5.1. Statistical Report

The Figure 3 shows the number of extracted tweets. Among the topics related to technology, “filtering” with 8817 tweets and “#filtering” with 1248 tweets were the most frequently posted tweets. This means that they were considered the most important topics from the perspective of users. Tweets that contained the keywords of “Personal Information”, “Information Disclosure”, “Access to Information” and “Free Access” were viewed respectively as the other most important topics from the users’ point of view.
Figure 4 illustrates the most frequently used topics by the users in a word cloud. The word cloud is sorted out according to the number of tweets that contained the related keywords, indicating the significance of various topics in technology policy. They are indicative of the topics that attracted the attention of the users the most in the above-mentioned period.

5.2. Analyzing the Behavior of the Users Posting Evidence Tweets

This section examines the behaviors of the users that posted the evidence tweets. To implement the proposed evidence detection model, first, all of the extracted tweets, regardless of the keywords, were categorized into an integrated data set from which 4560 tweets were randomly selected. As the figure below shows, the evidence generated by Twitter users has its own characteristics. These features are shown in Figure 5. According to Figure 5a, most of the evidence tweets did not contain a hashtag, and very few had only one hashtag. This indicates that the texts that can be considered as evidence are often not hashtagged by the users. Figure 5b shows that most of the evidence tweets lacked emojis. This means that users whose tweets can be considered as evidence mostly do not use emojis in their tweets. Not using emojis in most of the evidence tweets is indicative of the formal setting in which these tweets were posted.
Figure 5c also shows that most evidence-generating users are reluctant to use more than one mention in their tweets, which seems to indicate that the users do not intend to post personal comments or reply to another specific tweet. The users do not also intend to address other users or attract their attention. Figure 5d indicates the absence of URL links in more than 87% of the evidence tweets, with the remaining 13% of tweets containing only one URL link. This indicates that giving references to sources outside Twitter is not very common as far as generating evidence is concerned, as most of the evidence is generated and published within the platform. Figure 5e also shows that evidence-generating users often tend to express their opinions within 1–3 sentences. This also shows the brevity of the evidence tweets.
In Figure 5f, it can be observed that a larger number of characters (mostly over 200 characters) have been used in non-evidence tweets compared to evidence tweets. Similarly, according to Figure 5g, the users are more inclined to use single-line texts, showing no tendency to post long tweets or threads. Figure 5h shows that the average number of words used in both groups is almost the same (around 4). According to Figure 5i, nearly 70% of the users used up to 40 spaces in the text of the evidence tweets to express their views, with 30% of those generating evidence tweets using spaces beyond that range. Moreover, the patterns extracted from user behavior analysis suggest a less frequent use of punctuation marks, less frequent use of words exceeding five letters, and limited use of words exceeding three letters in evidence tweets in the last three charts.
The figures below also show the analysis of the behavior patterns extracted from account-related features (followers, followings, and lists). It was found that 75% of the users who posted evidence tweets published less than 17,000 tweets, as 25% of those generating evidence had posted more than 17,000 tweets. However, 35% of the users who posted non-evidence tweets had less than 17,000 tweets and 65% of these people had more than 17,000 tweets.

5.3. Evaluating Proposed Model

To evaluate the proposed model, the authors used 4560 tweets, which were divided into 2 groups of evidence and non-evidence tweets. In this dataset, 2978 samples were labeled as evidence and 1583 samples as non-evidence. Thus, the data set was divided into two segments to train and evaluate the proposed model, the train set (including 4104 samples) and the test set (including 254 non-evidence and 202 evidence samples). In this step to detect evidence tweets, six classifiers, including support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), linear discriminant analysis (LDA), X-GBoost, and logistic regression (LR) were used [68] and their performances were assessed. First, the classifier models were implemented in the train set, and after the learning process, the trained classifiers were implemented in the test set. Table 6 shows the performance results of each classifier based on the evaluation metrics for each algorithm.
Given that the F-measure is calculated via precision and recall, we decided to choose the F-measure instead of the other two. As can be observed, the decision tree classifier was able to achieve the highest value of F-measure with 84.26 percent, followed by XGBoost, K-nearest neighbor (KNN), logistic regression (LR), linear discriminant analysis (LDA), and support vector machine (SVM).
In terms of accuracy, the results listed in the Table 7 indicate that DT could achieve the highest rate with an accuracy score of 85.09, compared with other classifiers.
These results indicate that the decision tree surpassed other classifiers both in terms of accuracy and F-measure. The experiments suggest that the performance of the proposed model was sufficient to achieve this study’s desired goal, which was as follows: detecting evidence tweets from non-evidence tweets with acceptable accuracy. Since detecting evidence tweets via the proposed model is a new idea and the features used in this study have not been used in previous works in this field, it is taken for granted that the accuracy obtained for this model is desirable. Further analysis of the proposed model reveals that text-based features perform better than account-based features. The reason for this may be the fact that users who post evidence-type tweets can do so at one time and post non-evidence tweets at another time. This technically means that account-based features for a user that posts both evidence and non-evidence tweets will be the same. This is also the case for users who post non-evidence tweets. It is for this reason that the text-based features were found to be more reliable than the account-based tweets in detecting evidence tweets. So, it can be claimed that combining these two kinds of features can increase the accuracy of the model.
The best classifier selected for the proposed model enables the researchers to use data collected from one of the most significant social networks used by Persians in the process of technology policy-making, through an evidence-based policy approach. Labeling the data is one of the challenging stages of preparing the data set, since it is a time-consuming process. By applying the proposed model to distinguish evidence from non-evidence tweets, the policy-makers can have access to useful tweets that can be considered as evidence and learn about the users’ opinions, interests, needs, and capacities. The evidence and learning materials can be considered in making appropriate policy decisions worldwide, especially in the selected case country. Furthermore, the proposed model can be used with larger datasets to improve the training of the model and increase its accuracy.

6. Discussion

This paper contributes to the current body of knowledge by offering an evidence detection model to facilitate the use of data collected from Twitter, an influential social media platform used by policy-makers. This technology supports policy-makers by offering an evidence-based policy-making approach. The policy-makers can use the proposed evidence detection model to learn about the users’ views, interests, needs, and capacities and make proper policy decisions accordingly. This model can be designed as a dashboard to be used by technology policy-makers.
The development of policy evidence from formal evidence to informal evidence elicited from user-generated data on Twitter is the first theoretical contribution of this research. However, this should be treated with caution, as tweets cannot be deemed as scientific evidence. Formalizing the use of social media data as evidence in the evidence-based policy approach is the second contribution of the study. To this end, features that are capable of turning user-generated social media data into evidence were detected. Some extracted features were general enough to be applied to other policy-making areas. The present study formalized user-generated social media data in the policy-making process by presenting a model based on several scientific fields, including policy-making (evidence-based policy-making), social media, and data science. The theoretical approach adopted in this article was initially derived from evidence-based policy. It also originated from citizens’ digital participation in public policy. In this study, to the best of the authors’ knowledge, for the first time, using social media data as evidence in policy-making was proposed and accordingly, a model was designed to detect evidence. The proposed model can rid the policy-makers of the daunting task of detecting evidence among a plethora of irrelevant data collected simply based on selected keywords. The model, if applied, can facilitate using such data in the evidence-based policy-making process. The authors decided to use posts published on Twitter, a social media platform in Iran with extensive user-generated content on public policies. In addition, the scope of the research was limited to technology policy, which, due to the emerging problems it deals with, needs to include citizens’ views. Such policies can be applied to different communities. Moreover, this research study was conducted in a policy area in Iran where the policy-makers are reluctant to use users’ opinions in the policy-making process. The model proposed in this study can facilitate this process and possibly encourage policy-makers to do so. Thus, implementing the model in similar developing countries can prepare the ground for the participation of citizens in shaping public policies.
This study addressed key challenges and the current gap in the literature by focusing on policy-making and addressing some key issues. Some of the issues raised by the researchers regarding using social media data in policy-making can be summarized as follows:
  • Lack of text-based analytical tools for detecting policy evidence from the posts on social media [12,69];
  • Analytical tools developed so far are more suitable for improving the image of brands or obtaining feedback from customers about the products or services offered by companies and may fail to meet policy-making needs [13,63,64,65,66,67,68,69,70];
  • Comments by non-expert users about specialized policy issues, in the form of social media posts, may not be included in the evidence category. Moreover, there is no specific tool to sort out such data [21,71,72];
  • The big data shared on social media may have been tainted with bias, further complicating the analysis [70,71,72,73,74,75];
  • The intents of the users that participate in producing posts on social media usually differ from those of the analysts [36,45,71,76,77,78,79].
The research also offers a great practical tool for policy-makers to use in various contexts. Due to its acceptable level of accuracy for detecting evidence among large datasets obtained from Twitter, the model verified in the present study can be easily employed by policy-makers. It is also possible to apply it to other social media platforms in other developing countries. The authors suggest that researchers should use the evidence detection model in other communities for detecting evidence on diverse policy issues about which the citizens are sensitive to producing content on social media to provide the policy-makers with evidence elicited from the public opinions on social media. This model enables the policy-makers to add a different, but very important, channel to their evidence-gathering channels.

7. Conclusions

The large amount of data shared on social media is considered as one of the obstacles to using this information in the policy-making process. The main reason for this is the complexity and challenging process of analyzing big data created by various users over an extensive time period. This paper facilitated the acceptance of a novel approach to analyzing the data by reducing the size of the data and applying deep learning models. Using evidence in policy-making can facilitate the process, obviating the tremendous task of detecting evidence among a large number of posts on social media. The present study contributes to the efforts to develop a model that can help policy-makers to distinguish evidence from non-evidence.
This study offers innovative contributions at several levels. First, the investigation identified features of user-generated content by using a machine learning approach that can be converted into policy evidence. Such features had not been identified in the previous and relevant literature in the field. Despite being limited to a specific policy area (technology), many of the features are general enough to be used in detecting evidence posts in other areas as well. These features can also be developed based on the selected social media platform and policy area. Second, this study integrated data analysis techniques and applied them to develop a model. Social media data analysis methods have not been used for detecting evidence. This proposed model is limited to tested data and a specific policy area, which is technology in Iran. Given the significance of the first-hand evidence elicited from the end users’ feedback in technology policy-making, this research can serve as a role model for the implementation of similar studies in other communities and policy areas. Moreover, the features based on which the model was developed are of a global nature and can be used in similar cases across the world. The present research, for the first time, proposed a model that distinguishes evidence posts from non-evidence posts on Twitter. It can underpin the decisions made by the policy-makers by providing social media users’ views on different issues. Given the large amount of data on social media, including users’ views and comments, it is not possible to use all the data in the policy-making process. The proposed model is capable of detecting evidence posts and supplying the policy-makers with suitable data and encouraging them to use the data in the policy-making process.
However, distinguishing evidence from non-evidence in social media data cannot be considered the only way to improve evidence-based policy. This is especially the case with complex problems, which can only be solved through different types of evidence. The present study simply aimed to develop different types of evidence and did not intend to criticize other types of evidence. Sole reliance on insights from social media analytics can even result in bipolar views and non-constructive arguments for policy-makers [72] and even mislead them, since the social media data analysis methods are not developed enough to offer comprehensive views on policy-making. However, the development of different concepts, methods, and techniques in this area can be helpful and may increase the efficiency of the policy-making process.
For future work, the authors suggest that other researchers should test the proposed model on other social media platforms to answer the question of whether the features capable of turning social media data into policy evidence on Twitter can gain similar results on other social media platforms. This study was limited to technology policy. The question that future research should address is whether diverse policy areas need different evidence detection models.

Author Contributions

Conceptualization, S.L.; methodology, S.L. and M.M.K.; software, M.M.K. and S.E.; validation, S.L. and S.S. and S.E.; formal analysis, S.S. and H.A.M.; investigation, M.M.K.; resources, S.E.; data curation, M.M.K.; writing—original draft preparation, S.L. and H.A.M.; writing—review and editing, H.A.M.; visualization, S.S.; supervision, S.L. and S.S.; project administration, S.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kaplan, A.M.; Haenlein, M. Users of the world, unite! The challenges and opportunities of social media. Bus. Horiz. 2010, 53, 59–68. [Google Scholar] [CrossRef]
  2. Nooren, P.; Van Gorp, N.; Van Eijk, N.; Fathaigh, R. Should We Regulate Digital Platforms? A New Framework for Evaluating Policy Options. Policy Internet 2018, 10, 264–301. [Google Scholar] [CrossRef]
  3. Chan-Olmsted, S.M.; Wolter, L. Perceptions and practices of media engagement: A global perspective. Int. J. Media Manag. 2018, 20, 1–24. [Google Scholar] [CrossRef]
  4. Fuchs, C. Social Media: A Critical Introduction; SAGE: Thousand Oaks, CA, USA, 2017. [Google Scholar]
  5. Edom, S.J.; Hwang, H.; Kim, J.H. Can social media increase government responsiveness? A case study of Seoul, Korea. Gov. Inf. Q. 2018, 35, 109–122. [Google Scholar]
  6. Evens, T.; Donders, K. Platform Power and Policy in Transforming Television Markets; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  7. Janssen, M.; Helbig, N. Innovating and changing the policy cycle: Policy-makers be prepared! Gov. Inf. Q. 2016, 12, 120–132. [Google Scholar] [CrossRef]
  8. Fernando, S.; Díaz López, J.A.; Şerban, O.; Gómez-Romero, J.; Molina-Solana, M.; Guo, Y. Towards a large-scale twitter observatory for political events. Future Gener. Comput. Syst. 2019, 14, 976–983. [Google Scholar] [CrossRef]
  9. Suzor, N.; Dragiewicz, M.; Harris, B.; Gillett, R.; Burgess, J.; Van Geelen, T. Human Rights by Design: The Responsibilities of Social Media Platforms to Address Gender-Based Violence Online. Policy Internet 2019, 11, 84–103. [Google Scholar] [CrossRef] [Green Version]
  10. Dekker, R.; van den Brink, P.; Meijer, A. Social media adoption in the police: Barriers and strategies. Gov. Inf. Q. 2020, 37, 101441. [Google Scholar] [CrossRef]
  11. DePaula, N.; Dincelli, E.; Harrison, M. Toward a typology of government social media communication: Democratic goals, symbolic acts, and self-presentation. Gov. Inf. Q. 2018, 35, 98–108. [Google Scholar] [CrossRef]
  12. Driss, O.B.; Mellouli, S.; Trabelsi, Z. From citizens to government policy-makers: Social media data analysis. Gov. Inf. Q. 2019, 36, 560–570. [Google Scholar] [CrossRef]
  13. Panayiotopoulos, P.; Bowen, F.; Brooker, P. The value of social media data: Integrating crowd capabilities in evidence-based policy. Gov. Inf. Q. 2017, 34, 601–612. [Google Scholar] [CrossRef]
  14. Chen, C.; Wang, Y.; Zhang, J.; Xiang, Y.; Zhou, W.; Min, G. Statistical features-based real-time detection of drifted twitter spam. IEEE Trans. Inf. Forensics Secur. 2016, 12, 914–925. [Google Scholar] [CrossRef] [Green Version]
  15. Matheus, R.; Janssen, M.; Maheshwari, D. Data science empowering the public: Data-driven dashboards for transparent and accountable decision-making in smart cities. Gov. Inf. Q. 2020, 37, 101284. [Google Scholar] [CrossRef]
  16. Simonofski, A.; Fink, J.; Burnay, C. Supporting policy-making with social media and e-participation platforms data: A policy analytics framework. Gov. Inf. Q. 2021, 38, 101590. [Google Scholar] [CrossRef]
  17. Sun, T.Q.; Medaglia, R. Mapping the Challenges of Artificial Intelligence in the Public Sector: Evidence from Public Healthcare. Gov. Inf. Q. 2019, 36, 368–383. [Google Scholar] [CrossRef]
  18. Clarke, A.; Margetts, H. Governments and citizens getting to know each other? Open, closed, and big data in public management reform. Policy Internet 2014, 6, 393–417. [Google Scholar] [CrossRef]
  19. Enroth, H. Governance: The art of governing after governmentality. Eur. J. Soc. Theory 2014, 17, 60–76. [Google Scholar] [CrossRef]
  20. Davies, H.; Nutley, S.; Smith, P. What Works? Evidence-Based Policy and Practice in Public Services; Policy Press: Bristol, UK, 2001. [Google Scholar]
  21. Nutley, S.M.; Walter, I.; Davies, H.T. Using Evidence: How Research Can Inform Public Services; Policy Press: Bristol, UK, 2007. [Google Scholar]
  22. Picazo-Vela, S.; Gutiérrez-Martínez, I.; Luna-Reyes, L.F. Understanding risks, benefits, and strategic alternatives of social media applications in the public sector. Gov. Inf. Q. 2012, 29, 504–511. [Google Scholar] [CrossRef]
  23. Prpić, J.; Taeihagh, A.; Melton, J. The fundamentals of policy crowdsourcing. Policy Internet 2015, 7, 340–361. [Google Scholar] [CrossRef] [Green Version]
  24. Park, C.S.; Kaye, B.K. The tweet goes on: Interconnection of Twitter opinion leadership, network size, and civic engagement. Comput. Hum. Behav. 2017, 69, 174–180. [Google Scholar] [CrossRef]
  25. Stamatelatos, G.; Gyftopoulos, S.; Drosatos, G.; Efraimidis, P.S. Revealing the political affinity of online entities through their Twitter followers. Inf. Process. Manag. 2020, 57, 102–172. [Google Scholar] [CrossRef]
  26. Parkhurst, J. The Politics of Evidence; from Evidence-Based Policy to the Good Governance of Evidence; Routledge: London, UK, 2017. [Google Scholar]
  27. Bucher, T.; Helmond, A. The affordances of social media platforms. In The SAGE Handbook of Social Media; Burgess, J., Marwick, A., Poell, T., Eds.; SAGE Publications Ltd.: London, UK, 2017; pp. 233–253. [Google Scholar]
  28. Styles, K. Twitter is 10 and It’s Still Not a Social Network. 2016. Available online: http://thenextweb.com/opinion/2016/03/21/twitter-10-still-not-social-network/ (accessed on 24 June 2020).
  29. Lee, E.J.; Lee, H.Y.; Choi, S. Is the message the medium? How politicians’ Twitter blunders affect perceived authenticity of Twitter communication. Comput. Hum. Behav. 2020, 104, 106–188. [Google Scholar] [CrossRef]
  30. Beta Research Center. Report on Social Media Networks in Iran. Available online: http://betaco.ir/%da%af%d8%b2%d8%a7%d8%b1%d8%b4%d8%b4%d8%a8%da%a9%d9%87%d9%87%d8%a7%db%8c-%d8%a7%d8%ac%d8%aa%d9%85%d8%a7%d8%b9%db%8c%db%b1%db%b4%db%b0%db%b0-%d9%85%d8%b1%da%a9%d8%b2%d8%a8%d8%aa%d8%a7/ (accessed on 13 April 2022).
  31. Mahdavi, S. Twitter, power and activism in the public sphere. Q. Mod. Media Stud. 2019, 4, 147–188. [Google Scholar]
  32. Lee, J.; Xu, W. The more attacks, the more retweets: Trump’s and Clinton’s agenda setting on Twitter. Public Relat. Rev. 2018, 44, 201–213. [Google Scholar] [CrossRef]
  33. Howlett, M. Policy analytical capacity and evidence-based policy-making: Lessons from Canada. Can. Public Adm. 2009, 52, 153–175. [Google Scholar] [CrossRef]
  34. Sanderson, I. Evaluation, policy learning and evidence-based policy-making. Public Adm. 2002, 80, 1–22. [Google Scholar] [CrossRef]
  35. Parsons, W. From muddling through to muddling up—Evidence-based policy-making and the modernization of British government. Public Policy Adm. 2002, 17, 43–60. [Google Scholar]
  36. Cairney, P. The Politics of Evidence-Based Policy-Making; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  37. Head, B. Reconsidering evidence-based policy: Key issues and challenges. Policy Soc. 2010, 29, 77–94. [Google Scholar] [CrossRef]
  38. Shaxson, L. Is your evidence robust enough? Questions for policy-makers and practitioners. Evid. Policy A J. Res. Debate Pract. 2005, 1, 101–112. [Google Scholar] [CrossRef]
  39. Cartwright, N.; Hardie, J. Evidence-Based Policy: A Practical Guide to Doing It Better; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
  40. Young, K.; Ashby, D.; Boaz, A.; Grayson, L. Social Science and the Evidence-based Policy Movement. Soc. Policy Soc. 2002, 1, 215–224. [Google Scholar] [CrossRef] [Green Version]
  41. Lodge, M.; Wegrich, K. Crowdsourcing and regulatory reviews: A new way of challenging red tape in British government? Regul. Gov. 2014, 9, 30–46. [Google Scholar] [CrossRef]
  42. Misuraca, G.; Codagnone, C.; Rossel, P. From practice to theory and back to practice: Reflexivity in measurement and evaluation for evidence-based policy making in the information society. Gov. Inf. Q. 2013, 30, S68–S82. [Google Scholar] [CrossRef]
  43. Koziarski, J.; Lee, J.R. Connecting evidence-based policing and cybercrime. Polic. Int. J. 2020, 43, 198–211. [Google Scholar] [CrossRef]
  44. Yang, Q. A New Approach to Evidence-Based Practice Evaluation of Mental Health in Psychological Platform under the Background of Internet + Technology. In Proceedings of the 2019 International Conference on Electronic Engineering and Informatics (EEI), Nanjing, China, 8–10 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 321–323. [Google Scholar]
  45. Shiroishi, Y.; Uchiyama, K.; Suzuki, N. Better actions for society 5.0: Using AI for evidence-based policy-making that keeps humans in the loop. Computer 2019, 52, 73–78. [Google Scholar] [CrossRef]
  46. Setiadarma, E.G. Understanding the Evidence-Based Policy Making (EBPM) Discourse in the Making of the Master Plan of National Research (RIRN) Indonesia 2017-2045. STI Policy Rev. 2018, 9, 30–54. [Google Scholar]
  47. Newman, J.; Cherney, A.; Head, B.W. Policy capacity and evidence-based policy in the public service. Public Manag. Rev. 2017, 19, 157–174. [Google Scholar] [CrossRef]
  48. Head, B.W. Three lenses of evidence-based policy. Aust. J. Public Adm. 2008, 67, 1–11. [Google Scholar] [CrossRef]
  49. Freedman, D. The Politics of Media Policy; Polity: Cambridge, UK, 2008. [Google Scholar]
  50. Cairney, P.; Oliver, K. Evidence-based policy-making is not like evidence-based medicine, so how far should you go to bridge the divide between evidence and policy? Health Res. Policy Syst. 2017, 15, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  51. Pang, M.S.; Lee, G.; DeLone, W.H. IT resources, organizational capabilities, and value creation in public-sector organizations: A public-value management perspective. J. Inf. Technol. 2014, 29, 187–205. [Google Scholar] [CrossRef]
  52. Castelló, I.; Morsing, M.; Schultz, F. Communicative dynamics and the polyphony of corporate social responsibility in the network society. J. Bus. Ethics 2013, 118, 683–694. [Google Scholar] [CrossRef] [Green Version]
  53. Kasabov, E. The challenge of devising public policy for high-tech, science-based, and knowledge-based communities: Evidence from a life science and biotechnology community. Environ. Plan. C Gov. Policy 2008, 26, 210–228. [Google Scholar] [CrossRef]
  54. Zahra, S.A.; George, G. Absorptive capacity: A review, reconceptualization, and extension. Acad. Manag. Rev. 2002, 27, 185–203. [Google Scholar] [CrossRef]
  55. McCay-Peet, L.; Quan-Haase, A. A model of social media engagement: User profiles, gratifications, and experiences. In Why Engagement Matters: Cross-Disciplinary Perspectives and Innovations on User Engagement with Digital Media; O’Brien, H., Lalmas, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  56. Abayomi-Alli, O.; Misra, S.; Abayomi-Alli, A.; Odusami, M. A review of soft techniques for SMS spam classification: Methods, approaches, and applications. Eng. Appl. Artif. Intell. 2019, 86, 197–212. [Google Scholar] [CrossRef]
  57. Yan, Z. Big Data and Government Governance. In Proceedings of the International Conference on Information Management and Processing, London, UK, 12–14 January USA; IEEE: Piscataway, NJ, USA; 2018; pp. 110–114. [Google Scholar]
  58. Lundberg, J.; Laitinen, M. Twitter trolls: A linguistic profile of anti-democratic discourse. Lang. Sci. 2020, 8, 101268. [Google Scholar] [CrossRef]
  59. Bekkers, V.; Edwards, A.; de Kool, D. Social media monitoring: Responsive governance in the shadow of surveillance? Gov. Inf. Q. 2013, 30, 335–342. [Google Scholar] [CrossRef] [Green Version]
  60. Panagiotopoulos, P.; Shan, L.C.; Barnett, J.; Regan, Á.; McConnon, Á. A framework of social media engagement: Case studies with food and consumer organisations in the UK and Ireland. Int. J. Inf. Manag. 2015, 35, 394–402. [Google Scholar] [CrossRef] [Green Version]
  61. Williams, M.L.; Edwards, A.; Housley, W.; Burnap, P.; Rana, O.; Avis, N.; Morgan, J.; Sloan, L. Policing cyber neighbourhoods: Tension monitoring and social media networks. Polic. Soc. 2013, 23, 461–481. [Google Scholar] [CrossRef]
  62. Fernandez, M.; Wandhoefer, T.; Allen, B.; Cano Basave, A.; Alani, H. Using social media to inform policy-making: To whom are we listening? In Proceedings of the European Conference on Social Media (ECSM 2014), Brighton, UK, 10–11 July 2014.
  63. Gintova, M. Understanding government social media users: An analysis of interactions on Immigration, Refugees and Citizenship Canada Twitter and Facebook. Gov. Inf. Q. 2019, 36, 101388. [Google Scholar] [CrossRef]
  64. Napoli, P.M. User data as public resource: Implications for social media regulation. Policy Internet 2019, 11, 439–459. [Google Scholar] [CrossRef]
  65. Benthaus, J.; Risius, M.; Beck, R. Social media management strategies for organizational impression management and their effect on public perception. J. Strateg. Inf. Syst. 2016, 25, 127–139. [Google Scholar] [CrossRef]
  66. Fan, W.; Gordon, M.D. The power of social media analytics. Commun. ACM 2014, 57, 74–81. [Google Scholar] [CrossRef]
  67. Hoffman, D.L.; Fodor, M. Can you measure the ROI of your social media marketing? Sloan Manag. Rev. 2010, 52, 41–49. [Google Scholar]
  68. Zhang, C.; Liu, C.; Zhang, X.; Almpanidis, G. An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 2017, 82, 128–150. [Google Scholar] [CrossRef]
  69. Schniederjans, D.; Cao, E.S.; Schniederjans, M. Enhancing financial performance with social media: An impression management perspective. Decis. Support Syst. 2013, 55, 911–918. [Google Scholar] [CrossRef]
  70. Boyd, D.; Crawford, K. Critical questions for big data. Inf. Commun. Soc. 2012, 15, 662–679. [Google Scholar] [CrossRef]
  71. Schintler, L.A.; Kulkarni, R. Big data for policy analysis: The good, the bad, and the ugly. Rev. Policy Res. 2014, 31, 343–348. [Google Scholar] [CrossRef]
  72. Ghanei Raad, M.; Mohammadi, A.; Beigdeloo, N. Reviewing interactive patterns of institutions collaborating with science and technology supreme councils of policy. Rahyaft 2011, 49, 5–17. [Google Scholar]
  73. Ahmadian, M.; Aqajani, H.; Shirkhodaei, M. Tehranchian A. Designing science & technology policy model based on economic complexity approach. Public Policy 2018, 4, 27–29. [Google Scholar]
  74. Kalantari, E.; Montazer, G.; Qazinoori, S. Drafting passe scenarios of enhanced science and technology policy structure in Iran. Strateg. Manag. Res. 2019, 74, 75–102. [Google Scholar]
  75. Sedhai, S.; Sun, A. Semi-supervised spam detection in the Twitter stream. IEEE Trans. Comput. Soc. Syst. 2017, 5, 169–175. [Google Scholar] [CrossRef] [Green Version]
  76. Mostafa, S.A.; Mustapha, A.; Mohammed, M.A.; Hamed, R.I.; Arunkumar, N.; Abd Ghani, M.K.; Jaber, M.M.; Khaleefah, S.H. Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 2019, 54, 90–99. [Google Scholar] [CrossRef]
  77. Ghasemaghaei, M. Are firms ready to use big data analytics to create value? The role of structural and psychological readiness. Enterp. Inf. Syst. 2019, 13, 650–674. [Google Scholar] [CrossRef]
  78. De Paula, N.O.B.; de Araújo Costa, I.P.; Drumond, P.; Moreira, M.Â.L.; Gomes, C.F.S.; Dos Santos, M.; do Nascimento Maêda, S.M. Strategic support for the distribution of vaccines against Covid-19 to Brazilian remote areas: A multicriteria approach in the light of the ELECTRE-MOr method. Procedia Comput. Sci. 2022, 199, 40–47. [Google Scholar] [CrossRef] [PubMed]
  79. Moreira, M.Â.L.; Gomes, C.F.S.; Dos Santos, M.; da Silva Júnior, A.C.; de Araújo Costa, I.P. Sensitivity Analysis by the PROMETHEE-GAIA method: Algorithms evaluation for COVID-19 prediction. Procedia Comput. Sci. 2022, 199, 431–438. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Values of information gain of each feature.
Figure 1. Values of information gain of each feature.
Bdcc 06 00160 g001
Figure 2. Proposed model for detecting evidence tweets.
Figure 2. Proposed model for detecting evidence tweets.
Bdcc 06 00160 g002
Figure 3. The number of tweets related to each keyword.
Figure 3. The number of tweets related to each keyword.
Bdcc 06 00160 g003
Figure 4. Word cloud of keywords.
Figure 4. Word cloud of keywords.
Bdcc 06 00160 g004
Figure 5. Comparative charts related to the features.
Figure 5. Comparative charts related to the features.
Bdcc 06 00160 g005
Table 1. The evidence evaluation criteria in the field of technology policy from the perspective of policy-makers.
Table 1. The evidence evaluation criteria in the field of technology policy from the perspective of policy-makers.
NoEvaluation Criteria
1Relevance to technology policies
2Distinguish individual comments from retweets
3Contain a specific need relevant to technology
4Contain statistics relevant to technology
5Relevance to modern technologies
6Provide political knowledge relevant to technology
7Provide practical and professional experience relevant to technology
8Posted by a technology expert or someone with relevant experience
9Contain a critical issue
10Indicative of social values in technology
11Capable of creating a network effect
12The topic has political priorities for the policy-maker
13The urgency of the topic mentioned in a tweet
14Reveal corruption in technology
15Provide analytic and technical knowledge relevant to technology
Table 2. Text-based features used in the study.
Table 2. Text-based features used in the study.
NoFeature NameDescription
1Swear WordThe tweet contains swear words
2Tweet TimeThe time a tweet was sent
3No_SentencesThe number of sentences in a tweet
4No_LinesThe number of lines that a tweet has
5No_MentionsThe number of mentions included in a tweet
6No_UrlsThe number of URLs included in a tweet
7No_HashtagsThe number of hashtags included in a tweet
8No_DigitsThe total number of digits in a tweet
9No_EmojisThe number of emojis included in a tweet
10No_SpacesThe number of spaces included in a tweet
11Length of TweetThe length of a tweet
12Max Length of WordsThe maximum length of words that a tweet has
13Mean Length of WordsThe mean length of words that a tweet has
14No_Exclamation MarksThe number of exclamation marks included in a tweet
15No_Question MarksThe number of question marks included in a tweet
16No_PunctuationsThe number of punctuations marks included in a tweet, except for question and exclamation marks
17No_WordsTotal number of words that a tweet has
18No_CharactersThe total number of characters that have been used in a tweet
19Digits To Chars RatioThe number of digits to the number of characters ratio in a tweet
20Lines To Sentences RatioThe number of lines to the number of sentences ratio in a tweet
21Words To Sentences RatioThe number of words to the number of sentences ratio in a tweet
22Hashtags More Than 2The tweet has more than 2 hashtags
23No_Words Less Than 3 CharsTotal number of words with less than 3 characters that a tweet has
24No_Words More Than 5 CharsTotal number of words with more than 5 characters that a tweet has
25VideoThe tweet contains a video
26ImageThe tweet contains an image
Table 3. Account-based features used in the study.
Table 3. Account-based features used in the study.
NoFeature NameDescription
1No. of_FollowersThe number of followers of this Twitter user
2No. of_FollowingThe number of accounts this Twitter user follows
3FF_RatioThe number of followers to the number of followings ratio
4DescriptionContains a description in the profile
5No. of_LikesThe number of user favorites by this Twitter user
6URL In DescriptionContains a URL in the description of the Twitter user
7No. of ListsThe number of lists that this Twitter user added
8No. of_TweetsThe number of tweets this Twitter user sent
9Profile ImageContains a profile image in the Twitter profile account
10Background ImageContains a background image in the Twitter profile account
11Profile Background ImageContains a profile background image in the Twitter profile account
Table 4. Information gain of each feature.
Table 4. Information gain of each feature.
NoFeature NameIGNoFeature NameIG
1No_Hashtags0.0730920Max Length of Words0.02279
2No_Emojis0.0730521Mean Length of Words0.02258
3No_Digits0.0705422No_Followings0.01878
4No_Lines0.0616423No_Followers0.01825
5Length of Tweet0.0600324No_Sentences0.01737
6Digits to Chars Ratio0.0595425No_Lists0.01227
7Swear Words0.0565426Words to Sentences Ratio0.00597
8Hashtags More Than 20.0435827Ff_Ratio0.00544
9No_ Characters0.0429328Background Image0.00313
10No_Spaces0.0428729Profile Background Image0.00313
11No_ Punctuations0.0416330No_Question Marks0.00258
12No_Words0.0374131URL In Description0.00199
13No_Word More Than 5 Chars0.0362632Tweet Time0
14No_Mentions0.0342833No_Exclamation Marks0
15Lines To Sentences Ratio0.0303634Profile Image0
16No_Tweets0.0269735Description0
17No_Words Less Than 3 Chars0.0265336Video0
18No_Urls0.0246937Image0
19No_Likes0.02287
Table 5. Confusion matrix.
Table 5. Confusion matrix.
Prediction
EvidenceNon-Evidence
LabelEvidenceTrue-positive
(TP)
False-negative
(FN)
Non-evidenceFalse-positive
(FP)
True-negative
(TN)
Table 6. Performance evaluation comparison based on precision, recall, and F-measure (percent).
Table 6. Performance evaluation comparison based on precision, recall, and F-measure (percent).
AlgorithmPrecisionRecallF_Measure
Decision tree (DT)79.1390.184.26
XGBoost8083.1781.55
K-nearest neighbor (KNN)75.5468.8172.02
Logistic regression (LR)72.9670.7971.86
Linear discriminant analysis (LDA)73.8547.5257.83
Support vector machine (SVM)62.850.9956.28
Table 7. Accuracy of classifiers.
Table 7. Accuracy of classifiers.
AlgorithmAccuracy (%)
Decision tree (DT)85.09
XGBoost83.33
K-nearest neighbor (KNN)76.32
Logistic regression (LR)75.44
Linear discriminate analysis (LDA)69.3
Support vector machine (SVM)64.91
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Labafi, S.; Ebrahimzadeh, S.; Kavousi, M.M.; Abdolhossein Maregani, H.; Sepasgozar, S. Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter. Big Data Cogn. Comput. 2022, 6, 160. https://doi.org/10.3390/bdcc6040160

AMA Style

Labafi S, Ebrahimzadeh S, Kavousi MM, Abdolhossein Maregani H, Sepasgozar S. Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter. Big Data and Cognitive Computing. 2022; 6(4):160. https://doi.org/10.3390/bdcc6040160

Chicago/Turabian Style

Labafi, Somayeh, Sanee Ebrahimzadeh, Mohamad Mahdi Kavousi, Habib Abdolhossein Maregani, and Samad Sepasgozar. 2022. "Using an Evidence-Based Approach for Policy-Making Based on Big Data Analysis and Applying Detection Techniques on Twitter" Big Data and Cognitive Computing 6, no. 4: 160. https://doi.org/10.3390/bdcc6040160

Article Metrics

Back to TopTop