Text Mining: Challenges, Algorithms, Tools and Applications

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: 15 June 2024 | Viewed by 14873

Special Issue Editor

Department of Computer Science & Computer Engineering, La Trobe University, Melbourne, Australia
Interests: sentiment analysis; text summarization; semantic web; logic programming

Special Issue Information

Dear Colleagues,

Text mining has emerged as a prominent field in data mining. From information retrieval, information extraction, and text classification to sentiment analysis and text summarization, text mining plays a significant role in several application fields. In recent years, various mining techniques have been developed, including rule-based and statistics-based models, support vector machines, clustering, neutral networks, and deep learning. In each category, distance and similarity estimation has always been a key issue.

The aim of the Special Issue is to offer an opportunity to publish original research: cutting-edge theories, innovative algorithms, and novel applications. In particular, we welcome manuscripts from text summarization which has been commonly regarded as the most challenging area of text mining. Survey articles describing the state of the art are also welcome.

Topics include, but are not limited to, the following:

  • Information retrieval and extraction;
  • Question-answering systems;
  • Recommendation systems;
  • Security and privacy;
  • Sentiment analysis;
  • Text classification;
  • Text summarization.

Dr. Fei Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information retrieval
  • information extraction
  • recommendation systems
  • sentiment analysis
  • text summarization
  • text mining
  • machine learning
  • artificial intelligence

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 852 KiB  
Article
Domain-Specific Dictionary between Human and Machine Languages
by Md Saiful Islam and Fei Liu
Information 2024, 15(3), 144; https://doi.org/10.3390/info15030144 - 05 Mar 2024
Viewed by 862
Abstract
In the realm of artificial intelligence, knowledge graphs have become an effective area of research. Relationships between entities are depicted through a structural framework in knowledge graphs. In this paper, we propose to build a domain-specific medicine dictionary (DSMD) based on the principles [...] Read more.
In the realm of artificial intelligence, knowledge graphs have become an effective area of research. Relationships between entities are depicted through a structural framework in knowledge graphs. In this paper, we propose to build a domain-specific medicine dictionary (DSMD) based on the principles of knowledge graphs. Our dictionary is composed of structured triples, where each entity is defined as a concept, and these concepts are interconnected through relationships. This comprehensive dictionary boasts more than 348,000 triples, encompassing over 20,000 medicine brands and 1500 generic medicines. It presents an innovative method of storing and accessing medical data. Our dictionary facilitates various functionalities, including medicine brand information extraction, brand-specific queries, and queries involving two words or question answering. We anticipate that our dictionary will serve a broad spectrum of users, catering to both human users, such as a diverse range of healthcare professionals, and AI applications. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

27 pages, 967 KiB  
Article
Detecting Moral Features in TV Series with a Transformer Architecture through Dictionary-Based Word Embedding
by Paolo Fantozzi, Valentina Rotondi, Matteo Rizzolli, Paola Dalla Torre and Maurizio Naldi
Information 2024, 15(3), 128; https://doi.org/10.3390/info15030128 - 24 Feb 2024
Viewed by 937
Abstract
Moral features are essential components of TV series, helping the audience to engage with the story, exploring themes beyond sheer entertainment, reflecting current social issues, and leaving a long-lasting impact on the viewers. Their presence shows through the language employed in the plot [...] Read more.
Moral features are essential components of TV series, helping the audience to engage with the story, exploring themes beyond sheer entertainment, reflecting current social issues, and leaving a long-lasting impact on the viewers. Their presence shows through the language employed in the plot description. Their detection helps regarding understanding the series writers’ underlying message. In this paper, we propose an approach to detect moral features in TV series. We rely on the Moral Foundations Theory (MFT) framework to classify moral features and use the associated MFT dictionary to identify the words expressing those features. Our approach combines that dictionary with word embedding and similarity analysis through a deep learning SBERT (Sentence-Bidirectional Encoder Representations from Transformers) architecture to quantify the comparative prominence of moral features. We validate the approach by applying it to the definition of the MFT moral feature labels as appearing in general authoritative dictionaries. We apply our technique to the summaries of a selection of TV series representative of several genres and relate the results to the actual content of each series, showing the consistency of results. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

23 pages, 1268 KiB  
Article
Reimagining Literary Analysis: Utilizing Artificial Intelligence to Classify Modernist French Poetry
by Liu Yang, Gang Wang and Hongjun Wang
Information 2024, 15(2), 70; https://doi.org/10.3390/info15020070 - 24 Jan 2024
Viewed by 1377
Abstract
Aligned with global Sustainable Development Goals (SDGs) and multidisciplinary approaches integrating AI with sustainability, this research introduces an innovative AI framework for analyzing Modern French Poetry. It applies feature extraction techniques (TF-IDF and Doc2Vec) and machine learning algorithms (especially SVM) to create a [...] Read more.
Aligned with global Sustainable Development Goals (SDGs) and multidisciplinary approaches integrating AI with sustainability, this research introduces an innovative AI framework for analyzing Modern French Poetry. It applies feature extraction techniques (TF-IDF and Doc2Vec) and machine learning algorithms (especially SVM) to create a model that objectively classifies poems by their stylistic and thematic attributes, transcending traditional subjective analyses. This work demonstrates AI’s potential in literary analysis and cultural exchange, highlighting the model’s capacity to facilitate cross-cultural understanding and enhance poetry education. The efficiency of the AI model, compared to traditional methods, shows promise in optimizing resources and reducing the environmental impact of education. Future research will refine the model’s technical aspects, ensuring effectiveness, equity, and personalization in education. Expanding the model’s scope to various poetic styles and genres will enhance its accuracy and generalizability. Additionally, efforts will focus on an equitable AI tool implementation for quality education access. This research offers insights into AI’s role in advancing poetry education and contributing to sustainability goals. By overcoming the outlined limitations and integrating the model into educational platforms, it sets a path for impactful developments in computational poetry and educational technology. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

27 pages, 4466 KiB  
Article
Understanding Website Privacy Policies—A Longitudinal Analysis Using Natural Language Processing
by Veronika Belcheva, Tatiana Ermakova and Benjamin Fabian
Information 2023, 14(11), 622; https://doi.org/10.3390/info14110622 - 19 Nov 2023
Cited by 1 | Viewed by 2006
Abstract
Privacy policies are the main method for informing Internet users of how their data are collected and shared. This study aims to analyze the deficiencies of privacy policies in terms of readability, vague statements, and the use of pacifying phrases concerning privacy. This [...] Read more.
Privacy policies are the main method for informing Internet users of how their data are collected and shared. This study aims to analyze the deficiencies of privacy policies in terms of readability, vague statements, and the use of pacifying phrases concerning privacy. This represents the undertaking of a step forward in the literature on this topic through a comprehensive analysis encompassing both time and website coverage. It characterizes trends across website categories, top-level domains, and popularity ranks. Furthermore, studying the development in the context of the General Data Protection Regulation (GDPR) offers insights into the impact of regulations on policy comprehensibility. The findings reveal a concerning trend: privacy policies have grown longer and more ambiguous, making it challenging for users to comprehend them. Notably, there is an increased proportion of vague statements, while clear statements have seen a decrease. Despite this, the study highlights a steady rise in the inclusion of reassuring statements aimed at alleviating readers’ privacy concerns. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

28 pages, 1207 KiB  
Article
Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications
by Jai Prakash Verma, Shir Bhargav, Madhuri Bhavsar, Pronaya Bhattacharya, Ali Bostani, Subrata Chowdhury, Julian Webber and Abolfazl Mehbodniya
Information 2023, 14(9), 472; https://doi.org/10.3390/info14090472 - 22 Aug 2023
Cited by 1 | Viewed by 1964
Abstract
The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive [...] Read more.
The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

15 pages, 2511 KiB  
Article
Aspect-Based Sentiment Analysis with Dependency Relation Weighted Graph Attention
by Tingyao Jiang, Zilong Wang, Ming Yang and Cheng Li
Information 2023, 14(3), 185; https://doi.org/10.3390/info14030185 - 16 Mar 2023
Cited by 3 | Viewed by 2136
Abstract
Aspect-based sentiment analysis is a fine-grained sentiment analysis that focuses on the sentiment polarity of different aspects of text, and most current research methods use a combination of dependent syntactic analysis and graphical neural networks. In this paper, a graph attention network aspect-based [...] Read more.
Aspect-based sentiment analysis is a fine-grained sentiment analysis that focuses on the sentiment polarity of different aspects of text, and most current research methods use a combination of dependent syntactic analysis and graphical neural networks. In this paper, a graph attention network aspect-based sentiment analysis model based on the weighting of dependencies (WGAT) is designed to address the problem in that traditional models do not sufficiently analyse the types of syntactic dependencies; in the proposed model, graph attention networks can be weighted and averaged according to the importance of different nodes when aggregating information. The model first transforms the input text into a low-dimensional word vector through pretraining, while generating a dependency syntax graph by analysing the dependency syntax of the input text and constructing a dependency weighted adjacency matrix according to the importance of different dependencies in the graph. The word vector and the dependency weighted adjacency matrix are then fed into a graph attention network for feature extraction, and sentiment polarity is predicted through the classification layer. The model can focus on syntactic dependencies that are more important for sentiment classification during training, and the results of the comparison experiments on the Semeval-2014 laptop and restaurant datasets and the ACL-14 Twitter social comment dataset show that the WGAT model has significantly improved accuracy and F1 values compared to other baseline models, validating its effectiveness in aspect-level sentiment analysis tasks. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

25 pages, 1487 KiB  
Article
From Text Representation to Financial Market Prediction: A Literature Review
by Saeede Anbaee Farimani, Majid Vafaei Jahan and Amin Milani Fard
Information 2022, 13(10), 466; https://doi.org/10.3390/info13100466 - 29 Sep 2022
Cited by 2 | Viewed by 3573
Abstract
News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is [...] Read more.
News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market). Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

Back to TopTop