Next Issue
Volume 8, December
Previous Issue
Volume 8, October
 
 

Data, Volume 8, Issue 11 (November 2023) – 15 articles

Cover Story (view full-size image): A data management plan (DMP) sets out research data management (RDM) practices. In plant sciences, RDM practices across projects and funding sources are often more reusable than the correlated DMPs. Here, we developed DataPLAN, which supports the writing of DMPs aligned to Horizon 2020, Horizon Europe, and German Research Foundation projects. As part of the DataPLANT consortium in the German National Research Data Initiative (NFDI), DataPLAN integrates RDM practices for genomics, transcriptomics, proteomics, metabolomics and plant phenotyping, aligning with recognized minimum information standards. By using DataPLAN, the workload related to DMPs in plant sciences has been reduced by presenting reusable RDM practices optimized for different funding contexts. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
11 pages, 2082 KiB  
Data Descriptor
Biodiversity of Terrestrial Testate Amoebae in Western Siberia Lowland Peatlands
Data 2023, 8(11), 173; https://doi.org/10.3390/data8110173 - 17 Nov 2023
Viewed by 1295
Abstract
Testate amoebae are unicellular eukaryotic organisms covered with an external skeleton called a shell. They are an important component of many terrestrial ecosystems, especially peatlands, where they can be preserved in peat deposits and used as a proxy of surface wetness in paleoecological [...] Read more.
Testate amoebae are unicellular eukaryotic organisms covered with an external skeleton called a shell. They are an important component of many terrestrial ecosystems, especially peatlands, where they can be preserved in peat deposits and used as a proxy of surface wetness in paleoecological reconstructions. Here, we represent a database from a vast but poorly studied region of the Western Siberia Lowland containing information on TA occurrences in relation to substrate moisture and WTD. The dataset includes 88 species from 32 genera, with 2181 incidences and 21,562 counted individuals. All samples were collected in oligotrophic peatlands and prepared using the method of wet sieving with a subsequent sedimentation of aqueous suspensions. This database contributes to the understanding of the distribution of testate amoebae and can be further used in large-scale investigations. Full article
(This article belongs to the Special Issue Data Science in Invertebrate)
Show Figures

Figure 1

8 pages, 1658 KiB  
Data Descriptor
Testate Amoebae (Amphitremida, Arcellinida, Euglyphida) in Sphagnum Bogs: The Dataset from Eastern Fennoscandia
Data 2023, 8(11), 172; https://doi.org/10.3390/data8110172 - 15 Nov 2023
Viewed by 1472
Abstract
The paper describes a dataset, comprising 236 surface moss samples and 143 testate amoeba taxa. The samples were collected in 11 Sphagnum-dominated bogs during frost-free seasons of 2004, 2007, 2009, 2017, and 2022. For the whole dataset, the sampling effort was sufficient [...] Read more.
The paper describes a dataset, comprising 236 surface moss samples and 143 testate amoeba taxa. The samples were collected in 11 Sphagnum-dominated bogs during frost-free seasons of 2004, 2007, 2009, 2017, and 2022. For the whole dataset, the sampling effort was sufficient in terms of observed species richness (143 species in total), though a regional species pool is deemed to be discovered incompletely (143 species is its lower 95 % confidence limit using Chao’s estimator). The local community composition demonstrated high heterogeneity in a reduced ordination space. It supports the opinion that the high versatility of bog ecosystems should be taken into account during ecological studies. Full article
(This article belongs to the Special Issue Data Science in Invertebrate)
Show Figures

Figure 1

19 pages, 5497 KiB  
Article
ChatGPT across Arabic Twitter: A Study of Topics, Sentiments, and Sarcasm
Data 2023, 8(11), 171; https://doi.org/10.3390/data8110171 - 14 Nov 2023
Viewed by 2434
Abstract
While ChatGPT has gained global significance and widespread adoption, its exploration within specific cultural contexts, particularly within the Arab world, remains relatively limited. This study investigates the discussions among early Arab users in Arabic tweets related to ChatGPT, focusing on topics, sentiments, and [...] Read more.
While ChatGPT has gained global significance and widespread adoption, its exploration within specific cultural contexts, particularly within the Arab world, remains relatively limited. This study investigates the discussions among early Arab users in Arabic tweets related to ChatGPT, focusing on topics, sentiments, and the presence of sarcasm. Data analysis and topic-modeling techniques were employed to examine 34,760 Arabic tweets collected using specific keywords. This study revealed a strong interest within the Arabic-speaking community in ChatGPT technology, with prevalent discussions spanning various topics, including controversies, regional relevance, fake content, and sector-specific dialogues. Despite the enthusiasm, concerns regarding ethical risks and negative implications of ChatGPT’s emergence were highlighted, indicating apprehension toward advanced artificial intelligence (AI) technology in language generation. Region-specific discussions underscored the diverse adoption of AI applications and ChatGPT technology. Sentiment analysis of the tweets demonstrated a predominantly neutral sentiment distribution (92.8%), suggesting a focus on objectivity and factuality over emotional expression. The prevalence of neutral sentiments indicated a preference for evidence-based reasoning and logical arguments, fostering constructive discussions influenced by cultural norms. Sarcasm was found in 4% of the tweets, distributed across various topics but not dominating the conversation. This study’s implications include the need for AI developers to address ethical concerns and the importance of educating users about the technology’s ethical considerations and risks. Policymakers should consider the regional relevance and potential scams, emphasizing the necessity for ethical guidelines and regulations. Full article
(This article belongs to the Special Issue Sentiment Analysis in Social Media Data)
Show Figures

Figure 1

10 pages, 904 KiB  
Data Descriptor
Introducing DeReKoGram: A Novel Frequency Dataset with Lemma and Part-of-Speech Information for German
Data 2023, 8(11), 170; https://doi.org/10.3390/data8110170 - 10 Nov 2023
Viewed by 1229
Abstract
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on [...] Read more.
We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divided into 16 parts based on 16 corpus folds. We describe how the dataset was created and structured. By evaluating the distribution over the 16 folds, we show that it is possible to work with a subset of the folds in many use cases (e.g., to save computational resources). In a case study, we investigate the growth of vocabulary (as well as the number of hapax legomena) as an increasing number of folds are included in the analysis. We cross-combine this with the various cleaning stages of the dataset. We also give some guidance in the form of Python, R, and Stata markdown scripts on how to work with the resource. Full article
Show Figures

Figure 1

17 pages, 733 KiB  
Article
Machine Learning for Credit Risk Prediction: A Systematic Literature Review
Data 2023, 8(11), 169; https://doi.org/10.3390/data8110169 - 07 Nov 2023
Viewed by 4227
Abstract
In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions [...] Read more.
In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite. Full article
(This article belongs to the Special Issue Data Science in Fintech)
Show Figures

Figure 1

27 pages, 7200 KiB  
Article
Applying Eye Tracking with Deep Learning Techniques for Early-Stage Detection of Autism Spectrum Disorders
Data 2023, 8(11), 168; https://doi.org/10.3390/data8110168 - 03 Nov 2023
Viewed by 1550
Abstract
Autism spectrum disorder (ASD) poses a complex challenge to researchers and practitioners, with its multifaceted etiology and varied manifestations. Timely intervention is critical in enhancing the developmental outcomes of individuals with ASD. This paper underscores the paramount significance of early detection and diagnosis [...] Read more.
Autism spectrum disorder (ASD) poses a complex challenge to researchers and practitioners, with its multifaceted etiology and varied manifestations. Timely intervention is critical in enhancing the developmental outcomes of individuals with ASD. This paper underscores the paramount significance of early detection and diagnosis as a pivotal precursor to effective intervention. To this end, integrating advanced technological tools, specifically eye-tracking technology and deep learning algorithms, is investigated for its potential to discriminate between children with ASD and their typically developing (TD) peers. By employing these methods, the research aims to contribute to refining early detection strategies and support mechanisms. This study introduces innovative deep learning models grounded in convolutional neural network (CNN) and recurrent neural network (RNN) architectures, employing an eye-tracking dataset for training. Of note, performance outcomes have been realised, with the bidirectional long short-term memory (BiLSTM) achieving an accuracy of 96.44%, the gated recurrent unit (GRU) attaining 97.49%, the CNN-LSTM hybridising to 97.94%, and the LSTM achieving the most remarkable accuracy result of 98.33%. These outcomes underscore the efficacy of the applied methodologies and the potential of advanced computational frameworks in achieving substantial accuracy levels in ASD detection and classification. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications in Diagnostics)
Show Figures

Figure 1

8 pages, 1043 KiB  
Data Descriptor
Draft Genome Sequence Data of Lysinibacillus sphaericus Strain 1795 with Insecticidal Properties
Data 2023, 8(11), 167; https://doi.org/10.3390/data8110167 - 03 Nov 2023
Cited by 1 | Viewed by 1322
Abstract
Lysinibacillus sphaericus holds a significant agricultural importance by being able to produce insecticidal toxins and chemical moieties of varying antibacterial and fungicidal activities. In this study, the genome of the L. sphaericus strain 1795 is presented. Illumina short reads sequenced on the HiSeq [...] Read more.
Lysinibacillus sphaericus holds a significant agricultural importance by being able to produce insecticidal toxins and chemical moieties of varying antibacterial and fungicidal activities. In this study, the genome of the L. sphaericus strain 1795 is presented. Illumina short reads sequenced on the HiSeq X platform were used to obtain the genome’s assembly by applying the SPAdes v3.15.4 software. The genome size based on a cumulative length of 23 contigs reached 4.74 Mb, with a respective N50 of 1.34 Mb. The assembled genome carried 4672 genes, including 4643 protein-encoding ones, 5 of which represented loci coding for insecticidal toxins active against the orders Diptera, Lepidoptera, and Blattodea. We also revealed biosynthetic gene clusters responsible for the synthesis of secondary metabolites with predicted antibacterial, fungicidal, and growth-promoting properties. The genomic data provided will be helpful for deepening our understanding of genetic markers determining the efficient application of the L. sphaericus strain 1795 primarily for biocontrol purposes in veterinary and medical applications against several groups of blood-sucking insects. Full article
Show Figures

Figure 1

24 pages, 581 KiB  
Article
A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations
Data 2023, 8(11), 166; https://doi.org/10.3390/data8110166 - 03 Nov 2023
Viewed by 1126
Abstract
The graph model enables a broad range of analyses; thus, graph processing (GP) is an invaluable tool in data analytics. At the heart of every GP system lies a concurrent graph data structure that stores the graph. Such a data structure needs to [...] Read more.
The graph model enables a broad range of analyses; thus, graph processing (GP) is an invaluable tool in data analytics. At the heart of every GP system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, GP systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and fast, low-memory graph mutations. Existing graph structures offer a hard tradeoff among read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics, and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists (ALs) to achieve the best of both worlds. We compare CSR++ to CSR, ALs from the Boost Graph Library (BGL), and the following state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular GP algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average) while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed those of several state-of-the-art graph structures while maintaining low memory consumption when the workload includes updates. Full article
Show Figures

Figure 1

11 pages, 707 KiB  
Article
Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?
Data 2023, 8(11), 165; https://doi.org/10.3390/data8110165 - 31 Oct 2023
Viewed by 1221
Abstract
The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this [...] Read more.
The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of results manipulation with a 96% confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy. Full article
Show Figures

Figure 1

11 pages, 1730 KiB  
Data Descriptor
Information Competences and Academic Achievement: A Dataset
Data 2023, 8(11), 164; https://doi.org/10.3390/data8110164 - 27 Oct 2023
Viewed by 1296
Abstract
Information literacy (IL) is becoming fundamental in the modern world. Although several IL standards and assessments have been developed for secondary and higher education, there is still no agreement about the possible associations between IL and both academic achievement and student dropout rates. [...] Read more.
Information literacy (IL) is becoming fundamental in the modern world. Although several IL standards and assessments have been developed for secondary and higher education, there is still no agreement about the possible associations between IL and both academic achievement and student dropout rates. In this article, we present a dataset including IL competences measurements, as well as academic achievement and socioeconomic indicators for 153 Chilean first- and second-year engineering students. The dataset is intended to allow researchers to use machine learning methods to study to what extent, if any, IL and academic achievement are related. Full article
Show Figures

Figure 1

24 pages, 5572 KiB  
Data Descriptor
A Large-Scale Dataset of Search Interests Related to Disease X Originating from Different Geographic Regions
Data 2023, 8(11), 163; https://doi.org/10.3390/data8110163 - 26 Oct 2023
Cited by 1 | Viewed by 1607
Abstract
The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, [...] Read more.
The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, researchers from various disciplines utilized Google Trends to mine multimodal components of web behavior to study, investigate, and analyze the global awareness, preparedness, and response associated with these respective virus outbreaks. As the world prepares for Disease X, a dataset on web behavior related to Disease X would be crucial to contribute towards the timely advancement of research in this field. Furthermore, none of the prior works in this field have focused on the development of a dataset to compile relevant web behavior data, which would help to prepare for Disease X. To address these research challenges, this work presents a dataset of web behavior related to Disease X, which emerged from different geographic regions of the world, between February 2018 and August 2023. Specifically, this dataset presents the search interests related to Disease X from 94 geographic regions. These regions were chosen for data mining as these regions recorded significant search interests related to Disease X during this timeframe. The dataset was developed by collecting data using Google Trends. The relevant search interests for all these regions for each month in this time range are available in this dataset. This paper also discusses the compliance of this dataset with the FAIR principles of scientific data management. Finally, an analysis of this dataset is presented to uphold the applicability, relevance, and usefulness of this dataset for the investigation of different research questions in the interrelated fields of Big Data, Data Mining, Healthcare, Epidemiology, and Data Analysis with a specific focus on Disease X. Full article
Show Figures

Figure 1

28 pages, 8885 KiB  
Article
The Development of a Water Resource Monitoring Ontology as a Research Tool for Sustainable Regional Development
Data 2023, 8(11), 162; https://doi.org/10.3390/data8110162 - 26 Oct 2023
Viewed by 1143
Abstract
The development of knowledge graphs about water resources as a tool for studying the sustainable development of a region is currently an urgent task, because the growing deterioration of the state of water bodies affects the ecology, economy, and health of the population [...] Read more.
The development of knowledge graphs about water resources as a tool for studying the sustainable development of a region is currently an urgent task, because the growing deterioration of the state of water bodies affects the ecology, economy, and health of the population of the region. This study presents a new ontological approach to water resource monitoring in Kazakhstan, providing data integration from heterogeneous sources, semantic analysis, decision support, and querying and searching and presenting new knowledge in the field of water monitoring. The contribution of this work is the integration of table extraction and understanding, semantic web rule language, semantic sensor network, time ontology methods, and the inclusion of a module of socioeconomic indicators that reveal the impact of water quality on the quality of life of the population. Using machine learning methods, the study derived six ontological rules to establish new knowledge about water resource monitoring. The results of the queries demonstrate the effectiveness of the proposed method, demonstrating its potential to improve water monitoring practices, promote sustainable resource management, and support decision-making processes in Kazakhstan, and can also be integrated into the ontology of water resources at the scale of Central Asia. Full article
Show Figures

Figure 1

15 pages, 1801 KiB  
Data Descriptor
Dataset: Biodiversity of Ground Beetles (Coleoptera, Carabidae) of the Republic of Mordovia (Russia)
Data 2023, 8(11), 161; https://doi.org/10.3390/data8110161 - 24 Oct 2023
Viewed by 1370
Abstract
(1) Background: Carabidae is one of the most diverse families of Coleoptera. Many species of Carabidae are sensitive to anthropogenic impacts and are indicators of their environmental state. Some species of large beetles are on the verge of extinction. The aim of this [...] Read more.
(1) Background: Carabidae is one of the most diverse families of Coleoptera. Many species of Carabidae are sensitive to anthropogenic impacts and are indicators of their environmental state. Some species of large beetles are on the verge of extinction. The aim of this research is to describe the Carabidae fauna of the Republic of Mordovia (central part of European Russia); (2) Methods: The research was carried out in April-September 1979, 1987, 2000, 2001, 2005, 2007–2022. Collections were performed using a variety of methods (light trapping, soil traps, window traps, etc.). For each observation, the coordinates of the sampling location, abundance, and dates were recorded; (3) Results: The dataset contains data on 251 species of Carabidae from 12 subfamilies and 4576 occurrences. A total of 66,378 specimens of Carabidae were studied. Another 29 species are additionally known from other publications. Also, twenty-two species were excluded from the fauna of the region, as they were determined earlier by mistake (4). Conclusions: The biodiversity of Carabidae in the Republic of Mordovia included 280 species from 12 subfamilies. Four species (Agonum scitulum, Lebia scapularis, Bembidion humerale, and Bembidion tenellum) were identified for the first time in the Republic of Mordovia. Full article
(This article belongs to the Special Issue Data Science in Invertebrate)
Show Figures

Figure 1

13 pages, 2224 KiB  
Data Descriptor
Fabaceae: South African Medicinal Plant Species Used in the Treatment and Management of Sexually Transmitted and Related Opportunistic Infections Associated with HIV-AIDS
Data 2023, 8(11), 160; https://doi.org/10.3390/data8110160 - 24 Oct 2023
Viewed by 1394
Abstract
The use of medicinal plants, particularly in the treatment of sexually transmitted and related infections, is ancient. These plants may well be used as alternative and complementary medicine to a variety of antibiotics that may possess limitations mainly due to an emerging enormous [...] Read more.
The use of medicinal plants, particularly in the treatment of sexually transmitted and related infections, is ancient. These plants may well be used as alternative and complementary medicine to a variety of antibiotics that may possess limitations mainly due to an emerging enormous antimicrobial resistance. Several computerized database literature sources such as ScienceDirect, Scopus, Scielo, PubMed, and Google Scholar were used to retrieve information on Fabaceae species used in the treatment and management of sexually transmitted and related infections in South Africa. The other information was sourced from various academic dissertations, theses, and botanical books. A total of 42 medicinal plant species belonging to the Fabaceae family, used in the treatment of sexually transmitted and related opportunistic infections associated with HIV-AIDS, have been documented. Trees were the most reported life form, yielding 47.62%, while Senna and Vachellia were the frequently cited genera yielding six and three species, respectively. Peltophorum africanum Sond. was the most preferred medicinal plant, yielding a frequency of citation of 14, while Vachellia karoo (Hayne) Banfi and Glasso as well as Elephantorrhiza burkei Benth. yielded 12 citations each. The most frequently used plant parts were roots, yielding 57.14%, while most of the plant species were administered orally after boiling (51.16%) until the infection subsided. Amazingly, many of the medicinal plant species are recommended for use to treat impotence (29.87%), while most common STI infections such as chlamydia (7.79%), gonorrhea (6.49%), syphilis (5.19%), genital warts (2.60%), and many other unidentified STIs that may include “Makgoma” and “Divhu” were less cited. Although there are widespread data on the in vitro evidence of the use of the Fabaceae species in the treatment of sexually transmitted and related infections, there is a need to explore the in vivo studies to further ascertain the use of species as a possible complementary and alternative medicine to the currently used antibiotics in both developing and underdeveloped countries. Furthermore, the toxicological profiles of many of these studies need to be further explored. The safety and efficacy of over-the-counter pharmaceutical products developed using these species also need to be explored. Full article
Show Figures

Figure 1

19 pages, 3337 KiB  
Article
DataPLAN: A Web-Based Data Management Plan Generator for the Plant Sciences
Data 2023, 8(11), 159; https://doi.org/10.3390/data8110159 - 24 Oct 2023
Viewed by 1428
Abstract
Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data [...] Read more.
Research data management (RDM) combines a set of practices for the organization, storage and preservation of data from research projects. The RDM strategy of a project is usually formalized as a data management plan (DMP)—a document that sets out procedures to ensure data findability, accessibility, interoperability and reusability (FAIR-ness). Many aspects of RDM are standardized across disciplines so that data and metadata are reusable, but the components of DMPs in the plant sciences are often disconnected. The inability to reuse plant-specific DMP content across projects and funding sources requires additional time and effort to write unique DMPs for different settings. To address this issue, we developed DataPLAN—an open-source tool incorporating prewritten DMP content for the plant sciences that can be used online or offline to prepare multiple DMPs. The current version of DataPLAN supports Horizon 2020 and Horizon Europe projects, as well as projects funded by the German Research Foundation (DFG). Furthermore, DataPLAN offers the option for users to customize their own templates. Additional templates to accommodate other funding schemes will be added in the future. DataPLAN reduces the workload needed to create or update DMPs in the plant sciences by presenting standardized RDM practices optimized for different funding contexts. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop