Topic Menu

Topic Editors

E-Mail Website

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China

Prof. Dr. Han Zhang

E-Mail Website

College of Artificial Intelligence, Nankai University, Tianjin 300350, China

Prof. Dr. Junwei Han

E-Mail Website

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China

Bioinformatics and Intelligent Information Processing

Abstract submission deadline

closed (31 July 2023)

Manuscript submission deadline

closed (30 November 2023)

Viewed by

9791

Topic Information

Dear Colleagues,

The 2023 Bioinformatics and Intelligent Information Processing Conference (BIIP2023), the annual conference of the Bioinformatics and Artificial Life Committee of the Chinese Association for Artificial Intelligence (CAAI), will be held in Jinan City, Shandong Province, China, from June 18th to June 20th, 2023. The conference is organized by the CAAI and hosted by the Bioinformatics and Artificial Life Committee of CAAI and the School of Control Science and Engineering of Shandong University. Under the current breakthroughs in large language models of AI, it is of great significance and value to discuss how to use new AI technologies to promote biomedical research. BIIP2023 aims to build such a platform for scientists in related fields. The conference will invite many distinguished experts and scholars in the fields of AI, life science, and medical science to give talks and run tutorials. In addition, sessions will also be set up for talks about the latest research progress and trends of interesting topics. The topic collection plans to present novel and advanced interdisciplinary research achievements in bioinformatics and intelligent information processing. We warmly welcome scholars in the related fields to submit their works to the journals involved in this topic collection. The topics include but are not limited to the following areas:

S1: Self-organization phenomena and mechanisms in natural and human-made systems

S2: Bioanalysis and intelligent processing algorithms

S3: Biological multi-omics data analysis

S4: Biological networks and systems biology

S5: Intelligent drug design

S6: Precision medicine and big data

S7: Biological and health big data analytics

S8: Biomedical image analysis

S9: Digital diagnosis and smart health

S10: Bioinformatics foundations of the brain and brain-like intelligence

S11: Artificial life systems and synthetic biology

S12: Artificial life and artificial intelligence

S13: Digital-based life and intelligent health

S14: Intelligent computing for digital-based life

S15: Other related fields

Prof. Dr. Zhiping Liu
Prof. Dr. Han Zhang
Prof. Dr. Junwei Han
Topic Editors

Keywords

bioinformatics
artificial intelligence
intelligent information processing
artificial life
models and algorithms
systems and simulators
systems biology
biomedical big data
large language models

Participating Journals

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
AI ai	-	-	2020	20.8 Days	CHF 1600
Entropy entropy	2.7	4.7	1999	20.8 Days	CHF 2600
Genes genes	3.5	5.1	2010	16.5 Days	CHF 2600
International Journal of Molecular Sciences ijms	5.6	7.8	2000	16.3 Days	CHF 2900
Machine Learning and Knowledge Extraction make	3.9	8.5	2019	19.9 Days	CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

Immediately share your ideas ahead of publication and establish your research priority;
Protect your idea from being stolen with this time-stamped preprint article;
Enhance the exposure and impact of your research;
Receive feedback from your peers in advance;
Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

15 pages, 911 KiB

Open AccessArticle

A New and Lightweight R-Peak Detector Using the TEDA Evolving Algorithm

by Lucileide M. D. da Silva, Sérgio N. Silva, Luísa C. de Souza, Karolayne S. de Azevedo, Luiz Affonso Guedes and Marcelo A. C. Fernandes

Mach. Learn. Knowl. Extr. 2024, 6(2), 736-750; https://doi.org/10.3390/make6020034 - 29 Mar 2024

Viewed by 724

Abstract

The literature on ECG delineation algorithms has seen significant growth in recent decades. However, several challenges still need to be addressed. This work aims to propose a lightweight R-peak-detection algorithm that does not require pre-setting and performs classification on a sample-by-sample basis. The novelty of the proposed approach lies in the utilization of the typicality eccentricity detection anomaly (TEDA) algorithm for R-peak detection. The proposed method for R-peak detection consists of three phases. Firstly, the ECG signal is preprocessed by calculating the signal’s slope and applying filtering techniques. Next, the preprocessed signal is inputted into the TEDA algorithm for R-peak estimation. Finally, in the third and last step, the R-peak identification is carried out. To evaluate the effectiveness of the proposed technique, experiments were conducted on the MIT-BIH arrhythmia database (MIT-AD) for R-peak detection and validation. The results of the study demonstrated that the proposed evolutive algorithm achieved a sensitivity (Se in %), positive predictivity (+P in %), and accuracy (ACC in %) of 95.45%, 99.61%, and 95.09%, respectively, with a tolerance (TOL) of 100 milliseconds. One key advantage of the proposed technique is its low computational complexity, as it is based on a statistical framework calculated recursively. It employs the concepts of typicity and eccentricity to determine whether a given sample is normal or abnormal within the dataset. Unlike most traditional methods, it does not require signal buffering or windowing. Furthermore, the proposed technique employs simple decision rules rather than heuristic approaches, further contributing to its computational efficiency. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

17 pages, 4207 KiB

Open AccessArticle

A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods

by Ghulam Murtaza, Atishay Jain, Madeline Hughes, Justin Wagner and Ritambhara Singh

Genes 2024, 15(1), 54; https://doi.org/10.3390/genes15010054 - 29 Dec 2023

Viewed by 962

Abstract

Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework—Hi-CY—that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

12 pages, 2481 KiB

Open AccessArticle

Lambda CI Binding to Related Phage Operator Sequences Validates Alignment Algorithm and Highlights the Importance of Overlooked Bonds

by Jacklin Sedhom and Lee A. Solomon

Genes 2023, 14(12), 2221; https://doi.org/10.3390/genes14122221 - 15 Dec 2023

Viewed by 882

Abstract

Bacteriophage λ’s CI repressor protein controls a genetic switch between the virus’s lysogenic and lytic lifecycles, in part, by selectively binding to six different DNA sequences within the phage genome—collectively referred to as operator sites. However, the minimal level of information needed for CI to recognize and specifically bind these six unique-but-related sequences is unclear. In a previous study, we introduced an algorithm that extracts the minimal direct readout information needed for λ-CI to recognize and bind its six binding sites. We further revealed direct readout information shared among three evolutionarily related lambdoid phages: λ-phage, Enterobacteria phage VT2-Sakai, and Stx2 converting phage I, suggesting that the λ-CI protein could bind to the operator sites of these other phages. In this study, we show that λ-CI can indeed bind the other two phages’ cognate binding sites as predicted using our algorithm, validating the hypotheses from that paper. We go on to demonstrate the importance of specific hydrogen bond donors and acceptors that are maintained despite changes to the nucleobase itself, and another that has an important role in recognition and binding. This in vitro validation of our algorithm supports its use as a tool to predict alternative binding sites for DNA-binding proteins. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

26 pages, 16374 KiB

Open AccessArticle

Statistical Analysis of Imbalanced Classification with Training Size Variation and Subsampling on Datasets of Research Papers in Biomedical Literature

by Jose Dixon and Md Rahman

Mach. Learn. Knowl. Extr. 2023, 5(4), 1953-1978; https://doi.org/10.3390/make5040095 - 11 Dec 2023

Viewed by 1769

Abstract

The overall purpose of this paper is to demonstrate how data preprocessing, training size variation, and subsampling can dynamically change the performance metrics of imbalanced text classification. The methodology encompasses using two different supervised learning classification approaches of feature engineering and data preprocessing with the use of five machine learning classifiers, five imbalanced sampling techniques, specified intervals of training and subsampling sizes, statistical analysis using R and tidyverse on a dataset of 1000 portable document format files divided into five labels from the World Health Organization Coronavirus Research Downloadable Articles of COVID-19 papers and PubMed Central databases of non-COVID-19 papers for binary classification that affects the performance metrics of precision, recall, receiver operating characteristic area under the curve, and accuracy. One approach that involves labeling rows of sentences based on regular expressions significantly improved the performance of imbalanced sampling techniques verified by performing statistical analysis using a t-test documenting performance metrics of iterations versus another approach that automatically labels the sentences based on how the documents are organized into positive and negative classes. The study demonstrates the effectiveness of ML classifiers and sampling techniques in text classification datasets, with different performance levels and class imbalance issues observed in manual and automatic methods of data processing. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

18 pages, 2565 KiB

Open AccessArticle

Enhancing Electrocardiogram (ECG) Analysis of Implantable Cardiac Monitor Data: An Efficient Pipeline for Multi-Label Classification

by Amnon Bleich, Antje Linnemann, Benjamin Jaidi, Björn H. Diem and Tim O. F. Conrad

Mach. Learn. Knowl. Extr. 2023, 5(4), 1539-1556; https://doi.org/10.3390/make5040077 - 21 Oct 2023

Viewed by 1543

Abstract

Implantable Cardiac Monitor (ICM) devices are demonstrating, as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient’s heart rhythm, and when triggered, send it to a secure server where health care professionals (HCPs) can review it. These devices employ a relatively simplistic rule-based algorithm (due to energy consumption constraints) to make alerts for abnormal heart rhythms. This algorithm is usually parameterized to an over-sensitive mode in order to not miss a case (resulting in a relatively high false-positive rate), and this, combined with the device’s nature of constantly monitoring the heart rhythm and its growing popularity, results in HCPs having to analyze and diagnose an increasingly growing number of data. In order to reduce the load on the latter, automated methods for ECG analysis are nowadays becoming a great tool to assist HCPs in their analysis. While state-of-the-art algorithms are data-driven rather than rule-based, training data for ICMs often consist of specific characteristics that make their analysis unique and particularly challenging. This study presents the challenges and solutions in automatically analyzing ICM data and introduces a method for its classification that outperforms existing methods on such data. It carries this out by combining high-frequency noise detection (which often occurs in ICM data) with a semi-supervised learning pipeline that allows for the re-labeling of training episodes and by using segmentation and dimension-reduction techniques that are robust to morphology variations of the sECG signal (which are typical to ICM data). As a result, it performs better than state-of-the-art techniques on such data with, e.g., an F1 score of 0.51 vs. 0.38 of our baseline state-of-the-art technique in correctly calling atrial fibrillation in ICM data. As such, it could be used in numerous ways, such as aiding HCPs in the analysis of ECGs originating from ICMs by, e.g., suggesting a rhythm type. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

14 pages, 2920 KiB

Open AccessArticle

A Comprehensive Self-Resistance Gene Database for Natural-Product Discovery with an Application to Marine Bacterial Genome Mining

by Hua Dong and Dengming Ming

Int. J. Mol. Sci. 2023, 24(15), 12446; https://doi.org/10.3390/ijms241512446 - 04 Aug 2023

Viewed by 904

Abstract

In the world of microorganisms, the biosynthesis of natural products in secondary metabolism and the self-resistance of the host always occur together and complement each other. Identifying resistance genes from biosynthetic gene clusters (BGCs) helps us understand the self-defense mechanism and predict the biological activity of natural products synthesized by microorganisms. However, a comprehensive database of resistance genes is still lacking, which hinders natural product annotation studies in large-scale genome mining. In this study, we compiled a resistance gene database (RGDB) by scanning the four available databases: CARD, MIBiG, NCBIAMR, and UniProt. Every resistance gene in the database was annotated with resistance mechanisms and possibly involved chemical compounds, using manual annotation and transformation from the resource databases. The RGDB was applied to analyze resistance genes in 7432 BGCs in 1390 genomes from a marine microbiome project. Our calculation showed that the RGDB successfully identified resistance genes for more than half of the BGCs, suggesting that the database helps prioritize BGCs that produce biologically active natural products. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

16 pages, 1733 KiB

Open AccessArticle

Search for Dispersed Repeats in Bacterial Genomes Using an Iterative Procedure

by Eugene Korotkov, Yulia Suvorova, Dimitry Kostenko and Maria Korotkova

Int. J. Mol. Sci. 2023, 24(13), 10964; https://doi.org/10.3390/ijms241310964 - 30 Jun 2023

Cited by 1 | Viewed by 1758

Abstract

We have developed a de novo method for the identification of dispersed repeats based on the use of random position-weight matrices (PWMs) and an iterative procedure (IP). The created algorithm (IP method) allows detection of dispersed repeats for which the average number of substitutions between any two repeats per nucleotide (x) is less than or equal to 1.5. We have shown that all previously developed methods and algorithms (RED, RECON, and some others) can only find dispersed repeats for x ≤ 1.0. We applied the IP method to find dispersed repeats in the genomes of E. coli and nine other bacterial species. We identify three families of approximately 1.09 × 10⁶, 0.64 × 10⁶, and 0.58 × 10⁶ DNA bases, respectively, constituting almost 50% of the complete E. coli genome. The length of the repeats is in the range of 400 to 600 bp. Other analyzed bacterial genomes contain one to three families of dispersed repeats with a total number of 10³ to 6 × 10³ copies. The existence of such highly divergent repeats could be associated with the presence of a single-type triplet periodicity in various genes or with the packing of bacterial DNA into a nucleoid. Full article

(This article belongs to the Topic Bioinformatics and Intelligent Information Processing)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-7

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
AI ai	-	-	2020	20.8 Days	CHF 1600
Entropy entropy	2.7	4.7	1999	20.8 Days	CHF 2600
Genes genes	3.5	5.1	2010	16.5 Days	CHF 2600
International Journal of Molecular Sciences ijms	5.6	7.8	2000	16.3 Days	CHF 2900
Machine Learning and Knowledge Extraction make	3.9	8.5	2019	19.9 Days	CHF 1800

Topic Menu

Topic Editors

Bioinformatics and Intelligent Information Processing

Topic Information

Keywords

Participating Journals

Published Papers (7 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI