Information Extraction and Language Discourse Processing
A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".
Deadline for manuscript submissions: 30 April 2024 | Viewed by 9660
Special Issue Editors
Interests: information extraction; text mining; natural language processing; knowledge graphs
Special Issue Information
Dear Colleagues,
Information extraction (IE) plays an increasingly important and pervasive role in today’s era of digitalized communication media based on the Semantic Web. E.g., search engine results, as snippets, are slowly replaced by “rich snippets”; there is an interest in converting scholarly publications to structured records available in such downstream IT applications as Leaderboards, etc. IE is thus the task of automatically extracting structured information from unstructured and/or semi-structured electronically represented documents. In most cases, this activity concerns processing of human language texts by means of natural language processing (NLP). The automatic extraction of information from unstructured sources has opened up new avenues for querying, organizing, and analyzing data by drawing upon the clean semantics of structured databases and the abundance of unstructured data.
Apart from extrinsic models of IE, research in linguistics and computational linguistics have long pointed out that text is not just simple sequence of clauses and sentences but rather follows a highly elaborated structure formalized within discourse. The framework used for discourse analysis has long since been rhetorical structure theory (RST). Within a well-written text, no unit of the text is completely isolated; interpretation requires understanding the unit’s relation with the context. Research in discourse analysis aims to unmask such relations in the text, which is helpful for many downstream applications such as summarization, information retrieval, and question answering.
This Special Issue seeks novel research reports on the spectrum that blends information extraction and language discourse processing research in diverse communities. The editors welcome submissions along various dimensions derived from the nature of the extraction task, the advanced neural techniques used for extraction, the variety of input resources exploited, and the type of output produced. Quantitative, qualitative, and mixed methods studies are welcome, as are case studies and experience reports if they describe an impactful application at a scale that delivers useful lessons to the journal readership.
Topics of interest include (but are not limited to):
- Knowledge base population with discourse-centric information extraction (IE)
- Coreference resolution and its impact on discourse-centric IE
- Relationship extraction leveraging linguistic discourse
- Template filling
- Impact of pragmatics or rhetorics on information extraction
- Discourse-centric IE at scale
- Intelligent and novel assessment models of discourse-centric IE
- Survey of discourse-centric IE in natural language processing (NLP)
- Challenges implementing discourse-centric IE in real-world scenarios
- Modeling domains using discourse-centric IE
- Human–AI hybrid systems for learning discourse and IE
- Application of discourse-centric IE
Dr. Jennifer D'Souza
Prof. Dr. Chengzhi Zhang
Guest Editors
Manuscript Submission Information
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.
Keywords
- coherence
- topic focus
- information structure
- conversation structure
- discourse processing
- scholarly discourse processing
- anaphora resolution
Planned Papers
The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.
Title: Comparative Analysis of ORKG Properties and LLM-Generated Research Dimensions
Authors: Jennifer D'Souza; Vlad Nechakhin; Steffen Eger
Affiliation: TIB and University of Mannheim
Abstract: Structuring papers (or research ideas) in terms of various dimensions/properties is the basis to effectively searching scientific articles beyond mere keyword based search. Existing endeavours, e.g., from the Open Research Knowledge Graph (ORKG) use manually codified attributes to describe papers in terms of such properties. For example, "time period of study", "location of study population" for the research problem "reproductive number estimates of a population." Manual specification is extremely time-consuming however, and suffers from inconsistencies among human coders involved. In this study, we conduct a thorough comparative analysis between manually extracted papers’ properties from the ORKG and research dimensions generated by Large Language Models (LLMs) such as GPT, Mistral, and Llama. Our objective is to assess the similarity and divergence across various criteria, including semantic alignment and deviation, mapping accuracy between properties, and cosine similarity across generated embeddings from state-of-the-art models like SciNCL. By quantifying the relatedness of LLMs to manually created ORKG properties, we explore their performance across diverse research fields. Our findings provide insights into the correspondence between ORKG properties and LLM dimensions, with significant implications for the advancement of automated research metadata generation and effective related work search beyond using keywords.