Large Language Models: Methods and Applications

A special issue of Machine Learning and Knowledge Extraction (ISSN 2504-4990). This special issue belongs to the section "Network".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 6133

Special Issue Editors

1. College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
2. Automated Systems & Soft Computing Lab (ASSCL), Prince Sultan University, Riyadh 12435, Saudi Arabia
3. Faculty of Computers and Artificial Intelligence, Benha University, Benha 13518, Egypt
Interests: artificial intelligence; robotics; control theory and applications; machine learning; computational intelligence
Special Issues, Collections and Topics in MDPI journals
1. School of Computing Technologies, RMIT University, Melbourne 3000, Australia
2. School of Computing and Information Systems, The University of Melbourne, Melbourne 3010, Australia
Interests: biomedical natural language processing; computational linguistics; text mining; health informatics; computational biology
Special Issues, Collections and Topics in MDPI journals
School of Computer Science & Informatics, Cardiff University, Cardiff CF24 4AG, UK
Interests: text mining; natural language processing; health informatics

Special Issue Information

Dear Colleagues,

In recent years, large language models have emerged as transformative tools in the fields of natural language processing and artificial intelligence. These models, powered by deep learning techniques, transformer architectures leveraging attention, and vast amounts of training data, have revolutionized the way we understand and interact with human language. Their applications span across various domains, including machine translation, text generation, sentiment analysis, question answering, summarization, and much more. As the capabilities of these models continue to expand, it has become increasingly essential to examine the methods behind their development and explore the diverse range of applications that leverage their power.

This Special Issue, titled "Large Language Models: Methods and Applications," aims to provide a comprehensive overview of the latest advancements in the realm of large language models. It aims to bring together researchers, practitioners, and experts from various fields to delve into the intricacies of these models and showcase their broad spectrum of applications. Whether you are a seasoned professional seeking to deepen your understanding of these models or a newcomer curious about their potential, this collection of articles will offer valuable insights.

The Special Issue will encompass a variety of topics, including the architecture and training methodologies of large language models, critical evaluations of their capabilities and limitations, algorithmic approaches to improving their computational and energy efficiency, the ethical considerations surrounding their training or deployment, and their applications in areas such as natural language understanding, content generation, and conversational AI. Each article in this Special Issue provides a unique perspective, contributing to the overall discourse on the methods and applications of large language models.

The articles in this Special Issue not only aim to explore the current state of the art but also provide a glimpse into the future possibilities and challenges associated with these models. Whether you are a researcher seeking inspiration for your next project or a professional eager to harness the potential of large language models for practical applications, the knowledge and insights shared in this Special Issue will prove invaluable.

We invite you to embark on this journey through the world of large language models as we examine the methods that underpin their remarkable capabilities and explore the diverse applications that are shaping the future of natural language processing and artificial intelligence.

We hope that this Special Issue serves as a valuable resource for researchers, practitioners, policymakers, and anyone interested in the exciting advances enabled through large language models.

Prof. Dr. Ahmad Taher Azar
Prof. Dr. Karin Verspoor
Prof. Dr. Irena Spasic
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Machine Learning and Knowledge Extraction is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • advanced LLM architectures
  • BERT Applications (Bidirectional Encoder Representations from Transformers)
  • complexity of LLMs
  • computational/energy efficiency
  • deep reinforcement learning
  • deep generative models
  • ethics of LLM systems and applications
  • evaluation of LLM performance
  • language understanding
  • machine translation
  • NLP (Natural Language Processing)
  • sentiment analysis
  • text generation
  • transformer architectures
  • transfer learning
  • unsupervised learning
  • attention mechanisms
  • data augmentation
  • explainability
  • GPT (Generative Pretrained Transformer)
  • multimodal models
  • question-answering
  • summarization
  • tokenization
  • zero-shot learning

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 1057 KiB  
Article
Prompt Engineering or Fine-Tuning? A Case Study on Phishing Detection with Large Language Models
Mach. Learn. Knowl. Extr. 2024, 6(1), 367-384; https://doi.org/10.3390/make6010018 - 06 Feb 2024
Viewed by 1091
Abstract
Large Language Models (LLMs) are reshaping the landscape of Machine Learning (ML) application development. The emergence of versatile LLMs capable of undertaking a wide array of tasks has reduced the necessity for intensive human involvement in training and maintaining ML models. Despite these [...] Read more.
Large Language Models (LLMs) are reshaping the landscape of Machine Learning (ML) application development. The emergence of versatile LLMs capable of undertaking a wide array of tasks has reduced the necessity for intensive human involvement in training and maintaining ML models. Despite these advancements, a pivotal question emerges: can these generalized models negate the need for task-specific models? This study addresses this question by comparing the effectiveness of LLMs in detecting phishing URLs when utilized with prompt-engineering techniques versus when fine-tuned. Notably, we explore multiple prompt-engineering strategies for phishing URL detection and apply them to two chat models, GPT-3.5-turbo and Claude 2. In this context, the maximum result achieved was an F1-score of 92.74% by using a test set of 1000 samples. Following this, we fine-tune a range of base LLMs, including GPT-2, Bloom, Baby LLaMA, and DistilGPT-2—all primarily developed for text generation—exclusively for phishing URL detection. The fine-tuning approach culminated in a peak performance, achieving an F1-score of 97.29% and an AUC of 99.56% on the same test set, thereby outperforming existing state-of-the-art methods. These results highlight that while LLMs harnessed through prompt engineering can expedite application development processes, achieving a decent performance, they are not as effective as dedicated, task-specific LLMs. Full article
(This article belongs to the Special Issue Large Language Models: Methods and Applications)
Show Figures

Figure 1

14 pages, 1132 KiB  
Article
Benefits from Variational Regularization in Language Models
Mach. Learn. Knowl. Extr. 2022, 4(2), 542-555; https://doi.org/10.3390/make4020025 - 09 Jun 2022
Cited by 3 | Viewed by 1994
Abstract
Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying [...] Read more.
Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks. Full article
(This article belongs to the Special Issue Large Language Models: Methods and Applications)
Show Figures

Figure 1

14 pages, 805 KiB  
Article
The Case of Aspect in Sentiment Analysis: Seeking Attention or Co-Dependency?
Mach. Learn. Knowl. Extr. 2022, 4(2), 474-487; https://doi.org/10.3390/make4020021 - 13 May 2022
Cited by 2 | Viewed by 2462
Abstract
(1) Background: Aspect-based sentiment analysis (SA) is a natural language processing task, the aim of which is to classify the sentiment associated with a specific aspect of a written text. The performance of SA methods applied to texts related to health and well-being [...] Read more.
(1) Background: Aspect-based sentiment analysis (SA) is a natural language processing task, the aim of which is to classify the sentiment associated with a specific aspect of a written text. The performance of SA methods applied to texts related to health and well-being lags behind that of other domains. (2) Methods: In this study, we present an approach to aspect-based SA of drug reviews. Specifically, we analysed signs and symptoms, which were extracted automatically using the Unified Medical Language System. This information was then passed onto the BERT language model, which was extended by two layers to fine-tune the model for aspect-based SA. The interpretability of the model was analysed using an axiomatic attribution method. We performed a correlation analysis between the attribution scores and syntactic dependencies. (3) Results: Our fine-tuned model achieved accuracy of approximately 95% on a well-balanced test set. It outperformed our previous approach, which used syntactic information to guide the operation of a neural network and achieved an accuracy of approximately 82%. (4) Conclusions: We demonstrated that a BERT-based model of SA overcomes the negative bias associated with health-related aspects and closes the performance gap against the state-of-the-art in other domains. Full article
(This article belongs to the Special Issue Large Language Models: Methods and Applications)
Show Figures

Figure 1

Back to TopTop