Next Article in Journal
Age-Associated Loss in Renal Nestin-Positive Progenitor Cells
Next Article in Special Issue
iADRGSE: A Graph-Embedding and Self-Attention Encoding for Identifying Adverse Drug Reaction in the Earlier Phase of Drug Development
Previous Article in Journal
How Do Intermolecular Interactions Evolve at the Nematic to Twist–Bent Phase Transition?
Previous Article in Special Issue
A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences

1
Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
2
School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2022, 23(19), 11026; https://doi.org/10.3390/ijms231911026
Submission received: 8 August 2022 / Revised: 10 September 2022 / Accepted: 15 September 2022 / Published: 20 September 2022

Abstract

:
N6,2′-O-dimethyladenosine (m6Am) is a post-transcriptional modification that may be associated with regulatory roles in the control of cellular functions. Therefore, it is crucial to accurately identify transcriptome-wide m6Am sites to understand underlying m6Am-dependent mRNA regulation mechanisms and biological functions. Here, we used three sequence-based feature-encoding schemes, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), to represent RNA sequence samples. Additionally, we proposed an ensemble deep learning framework, named DLm6Am, to identify m6Am sites. DLm6Am consists of three similar base classifiers, each of which contains a multi-head attention module, an embedding module with two parallel deep learning sub-modules, a convolutional neural network (CNN) and a Bi-directional long short-term memory (BiLSTM), and a prediction module. To demonstrate the superior performance of our model’s architecture, we compared multiple model frameworks with our method by analyzing the training data and independent testing data. Additionally, we compared our model with the existing state-of-the-art computational methods, m6AmPred and MultiRM. The accuracy (ACC) for the DLm6Am model was improved by 6.45% and 8.42% compared to that of m6AmPred and MultiRM on independent testing data, respectively, while the area under receiver operating characteristic curve (AUROC) for the DLm6Am model was increased by 4.28% and 5.75%, respectively. All the results indicate that DLm6Am achieved the best prediction performance in terms of ACC, Matthews correlation coefficient (MCC), AUROC, and the area under precision and recall curves (AUPR). To further assess the generalization performance of our proposed model, we implemented chromosome-level leave-out cross-validation, and found that the obtained AUROC values were greater than 0.83, indicating that our proposed method is robust and can accurately predict m6Am sites.

1. Introduction

More than 160 RNA modification types have been discovered so far [1]. Among them, N6-methyladenosine (m6A) is the most widespread post-transcriptional modification of lncRNA and mRNA in mammalian cells [2]. There is another reversible modification termed N6,2′-O-dimethyladenosine (m6Am), which was originally found at the 5′ end of mRNA in viruses and animal cells in 1975 [3]. Different from m6A modification, m6Am is a terminal modification, which is usually 2′-O-methylated at the second base adjacent to the 5′ cap in many mRNAs and is further methylated at the N6 position [4] (see Figure 1). The RNA modification m6Am is catalyzed by phosphorylated C-terminal domain (CTD)-interacting factor 1, i.e., PCIF1 [5], a writer protein specific to the cap-related m6Am, and could be demethylated by the fat mass and obesity-associated protein, i.e., FTO [6], one of the m6A demethylases. Thus, m6Am is regulated by PCIF1 and FTO dynamically, educing the direction of the cap epitranscriptomics.
Since the initial discovery of m6Am modification, some scholars have started to reveal its function. Several recent studies demonstrated that m6Am might be associated with higher protein expression [4], obesity-related translation regulation [4], increased translation efficiency [7,8], and mRNA stability [6,9,10]. Mauer et al. [6] found that the stability of transcripts that begin with m6Am was enhanced, while the stability of m6Am mRNAs could be reduced once demethylated by FTO. However, this observation was recently challenged. Wei et al. [11] reported that the expression levels of transcripts possessing cap m6Am seemed not to change with the knockdown of FTO. Moreover, another two studies [5,8] suggested that the loss of m6Am in PCIF1 knockout (KO) cells did not markedly affect the level of transcripts with m6Am. More corroborating evidence is needed to support the conclusion that m6Am can influence mRNA stability. On the other hand, it seems that m6Am modification can also affect mRNA translation. The positive association between m6Am methylation level and translation level was revealed using ribosome profiling experiments [8]. However, the biological function of m6Am has largely remained a mystery due to the lack of robust methods for sensitively identifying this modification at the transcriptome-wide level. Consequently, accurate identification of transcriptome-wide m6Am sites is crucial to understanding and exploring underlying m6Am-dependent mRNA regulation mechanisms and biological functions.
There have been efforts aiming to identify m6Am sites with wet-lab experimental methods [12,13,14]. However, such experimental methods are still expensive and time-consuming; thus, the development of computational approaches to accurately identify m6Am sites is urgently needed. Recently, researchers have attempted to computationally identify m6Am sites with machine-learning algorithms. In this field, Meng’s team first developed a computation method named m6AmPred [15] based on sequence-derived information in 2021. They collected m6Am sequencing data generated by miCLIP-seq technology from [6,16], selected 41 nt sequences centered on Adenosine (A) as positive samples, and randomly chose 41 nt sequences centered on A in non-modified BCA (B = C, G, or U) motif as negative samples. Electron–ion interaction potential and pseudo-EIIP (EIIP-PseEIIP) were employed to encode each mRNA sequence, and the eXtreme Gradient Boosting with Dart algorithm (XgbDart) was used to build the m6Am site predictor. In the same year, this team collected experimental data of twelve widely occurring RNA modifications, including m6Am sequencing data generated by miCLIP-seq [6,10,16]. Additionally, they developed an attention-based multi-label deep learning framework named MultiRM [17], consisting of an embedding module and an LSTM-Attention block, which can simultaneously predict the putative sites of twelve RNA modifications covering m6Am.
Despite recent advances in the computational identification of m6Am sites, some limitations and shortcomings still exist, as shown below. (i) Current computational methods trained on m6Am data from miCLIP-seq [6,10,16] are limited by the confidence level of training data. The experimental method miCLIP-seq for mapping m6Am relies on m6A antibodies, which is not good enough to distinguish m6Am from 5′-UTR m6A. Such an indirect approach is constrained by the limited activity of 5′ exonuclease, inaccuracy of TSS annotation, and low efficiency of UV crosslinking, thereby greatly reducing the precision and sensitivity of m6Am identification. Moreover, data sources [6,10,16] of current methods only provide m6Am peak regions with lengths ranging from 100 to 250 nt rather than the m6Am site at single-nucleotide resolution. More high-confidence data of m6Am sites are needed to construct computational models for m6Am site identification. (ii) Current computational methods do not implement sequence-redundancy removal. Generally, to reduce the redundancy in sample sequences, the CD-HIT-EST tool [19] is employed to remove sequences with sequence similarity greater than a certain threshold (typically 80%). After sequence-redundancy removal with the threshold set as 80%, we found that the original positive and negative datasets provided by m6AmPred lost 55 and 3630 sequences, respectively. (iii) The generalization performance of current computational methods remains to be further validated.
Recently, a sensitive and specific approach termed m6Am-seq [14] has been developed to directly profile transcriptome-wide m6Am, which can provide high-confidence m6Am site data at the single-nucleotide level to develop computational methods of m6Am site detection and future functional studies of m6Am modification. Moreover, in the field of m6A site prediction, MultiRM [17], DeepM6ASeq [20], and MASS [21] were successfully developed using an effective hybrid framework embedding with CNN and LSTM and achieved promising performance. Inspired by these single-nucleotide m6Am data and successful applications of deep learning frameworks, here, we present DLm6Am, an attention-based ensemble deep-learning framework to accurately identify m6Am sites. We firstly discuss the effect of several frequently used RNA sequence encoding methods, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND), on six different classifiers, including random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), Bi-directional long short-term memory (BiLSTM), convolutional neural network (CNN), and the embedding model CNN-BiLSTM. We found that the embedding deep learning model CNN-BiLSTM trained by fusion features could achieve the best prediction performance. Subsequently, we compared the prediction performance of the embedding deep learning model with and without an attention layer. Next, after ranking ACC values of embedding deep learning models with an attention layer under different hyper-parameters, we selected three models with the top three ACC values as base classifiers and adopted a voting strategy to build the final ensemble deep-learning model. To further assess the generalization performance of our proposed model, we implemented an independent test and also employed chromosome-level leave-one-out validation. Finally, a user-friendly webserver was established based on the proposed model and made freely accessible at http://47.94.248.117/DLm6Am/ (accessed on 15 September 2022) [22] to serve the research community. Moreover, the source code was provided in a Github repository (https://github.com/pythonLzt/DLm6Am (accessed on 15 September 2022)) to enable future uses.

2. Results and Discussion

2.1. Overview of DLm6Am

DLm6Am can identify m6Am modification sites from RNA sequences by the following steps (see Figure 2): (1) extracting m6Am sequences from the data generated by the sensitive and specific approach termed m6Am-seq, and constructing positive and negative samples using CD-HIT-EST tool; (2) generating context-based features using one-hot, NCP, and ND encoding schemes based on the primary RNA sequences; (3) establishing the ensemble-leaning-based classification model based on three CNN-BiLSTM-attention models with the top three ACC values; (4) averaging output from all the individual base classifiers as the final decision to achieve m6Am or non-m6Am prediction.
Specifically, although three base classifiers have different hyper-parameters, they possess similar architecture, mainly consisting of three parts (see Figure 2). The first module is a multi-head attention layer, which can capture the importance feature scores from individual input positions along the input sequence, thereby enhancing the learning ability of the prediction model. The second module is an embedding module that took the output of the multi-head attention layer as input and fed them into two parallel deep learning models, a CNN and a BiLSTM, respectively. The sub-module CNN, which could extract the hidden contextual features of RNA sequences, included two convolution layers to extract different input features and a maximum pooling layer to reduce the dimension of the extracted features and the amount of calculation. Moreover, the sub-module BiLSTM could capture possible long-range dependencies of the features. A fully connected layer (FCN) was followed by BiLSTM for feature dimension reduction, and the rectified linear unit (ReLU) [23] was used as the activation function to improve the computational efficiency and retain the gradient. After, we combined the two kinds of features extracted from the embedding module and fed them into the third module, a prediction module, which consists of two fully connected layers. Each of these two fully connected layers is succeeded by a dropout operation to mitigate overfitting. The Softmax function [24] was applied on the last layer to predict whether the central nucleotide A of the given RNA sequence is m6Am or non-m6Am site. More details are given in Supplementary Note S2 regarding the actual model configurations used, such as layer sizes, depth, and the number of parameters.

2.2. Comparison with Different Model Architectures

To demonstrate the superior performance of our model architecture, we compared multiple model frameworks with our method by analyzing the training data described in Section 3.1, “Benchmark dataset”, using the fusion features generated by binary encoding, NCP, and ND. The competitors mainly contained classical traditional classifiers, such as random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost); and deep learning feature extractors, such as CNN, BiLSTM, CNN-BiLSTM, and CNN-BiLSTM-attention. Among them, CNN-BiLSTM represents the embedding model with CNN and BiLSTM, while CNN-BiLSTM-attention represents the embedding model generated by CNN with an attention layer and BiLSTM with an attention layer. As mentioned above, our model DLm6Am is an ensemble deep learning framework integrated by three CNN-BiLSTM-attention models with the top three ACC values.
To demonstrate the stability of models, we implemented five-fold cross-validation 20 times for each model with tuned hyper-parameters. The detailed configuration of models is found in Supplementary Note S2. All detailed results generated by five-fold cross-validation on training data, including the area under the PR (Precision and Recall) curves and ROC curves, are found in Table 1. It can be seen that the standard deviations for most metrics of models are small, demonstrating the good fitness and stability of these models. Additionally, as shown in Table 1, it can be concluded that the average values of important indicators of deep-learning models were higher than those of traditional classifiers. For example, compared to the traditional classifier RF, the ACC, MCC, AUROC, and AUPR average values of the CNN model were higher by 1.02%, 2.12%, 2%, and 3.22%, respectively, while the ACC, MCC, AUROC, and AUPR average values of the BiLSTM model were increased by 0.34%, 0.71%, 1.08%, and 2.29%, respectively. Especially, our model DLm6Am was improved by 3.25%, 6.5%, 2.36%, and 3.45% in terms of ACC, MCC, AUROC, and AUPR average values, respectively, compared to RF.
Additionally, we implemented a series of ablation tests to demonstrate the superiority of our model architecture. Firstly, the framework without ensemble learning, i.e., CNN–BiLSTM–attention was included in comparison with our proposed model, DLm6Am. It can be seen from Table 1 that the ACC, MCC, AUROC, and AUPR average values of the CNN–BiLSTM–attention were reduced by 0.35%, 0.7%, 0.21%, and 0.44%, respectively. This demonstrates that the prediction performance for m6Am site identification can be further improved using ensemble learning. Secondly, the embedding architecture without the attention module CNN-BilSTM was also compared with DLm6Am and CNN–BiLSTM–attention. The CNN–BiLSTM–attention framework achieved higher performance than the embedding architecture without an attention layer in terms of ACC, MCC, AUROC, and AUPR. This indicates that the attention mechanism has the capacity to identify m6Am key information, thereby improving model performance. The metric values of DLm6Am were improved overall compared to those of CNN-BilSTM, indicating that ensemble learning and the attention mechanism are of crucial importance in identifying m6Am sites. Moreover, the embedding deep learning model CNN-BiLSTM is superior to single deep learning models in terms of ACC and MCC, because the embedding deep learning model can gain the advantage of CNN and BiLSTM to simultaneously capture possible local-range and long-range dependencies of the features. Compared with single deep learning models, the performance of our model has obvious advantages, e.g., our model DLm6Am was improved by 2.77%, 3.04%, 2.91%, 5.79%, 1.28%, and 1.16% over BiLSTM in terms of average values of Sn, Sp, ACC, MCC, AUROC, and AUPR, respectively.
As some of the differences are minimal when compared with other methods, on the one hand, we sought to find an effective solution for the additional performance gains of the DLm6Am model while varying model complexity (e.g., the number of layers/nodes), as well varying/increasing training set size. The results listed in Table S1 of Supplementary Note S3 show that DLm6Am can achieve better prediction performance under the current model configurations used in this study. On the other hand, we calculated the statistical significance of the observed differences using the Wilcoxon test. A graphical representation of results, shown in Figure 3, was mainly created by the R package “ggplot2”, which can provide a single layer geom_signif to calculate the significance of a difference between metric values of DLm6Am and other models. The level of significance (NS, *, **, ***) was added to the plot in a single line. As seen, with the exception of AUPR values of DLm6Am and CNN, the observed differences between DLm6Am and other models are significant. Based on these results, the embedding architecture CNN–BiLSTM–attention and ensemble learning were chosen to build the DLm6Am model.
To further confirm the superiority of our proposed framework, we compared the prediction performance of DLm6Am and other combinations of different base classifiers, such as SVM, RF, XGBoost, LSTM, and CNN. The ensemble classification results are listed in Table S2 of Supplementary Note S4. It can be seen that the DLm6Am achieved better prediction performance than other ensemble combinations.
Next, we used the independent testing data mentioned in Section 2.1, “Benchmark dataset”, to evaluate the generalization performance of these different models. All the testing results are listed in Table 2. Additionally, Figure 4 illustrates the comparison results based on PR curves and the ROC curves of different methods. It can be seen from Table 2 and Figure 4 that our proposed DLm6Am are generally better than the other models, further demonstrating the advantages of embedding deep-learning architecture and ensemble learning. These independent testing results indicate that our method can be used to accurately identify novel m6Am sites that have not been seen in training data.

2.3. Hold-Out Cross-Validation on Chromosome Level

Hold-out cross-validation is generally an effective method to validate the generalization performance of the proposed models. In this study, for chromosomes associated with data greater than 50, we used data of each chromosome as the testing data to evaluate the performance of the model trained on the data of the remaining chromosomes. Additionally, because the data sizes of chromosomes 13, 18, 20, 21, and 22 are all less than 50, these data were integrated as one testing dataset, while the remaining data of chromosomes were used as a training dataset. For all hold-out cross-validation, models were fitted on the training data using the same hyper-parameters of DLm6Am for the strict evaluation of generalization performance.
The results using hold-out cross-validation on chromosome level are reported in Table 3. It can be clearly seen that the most important metric, AUROC, is greater than 0.83, indicating that our proposed method is robust and can accurately predict m6Am sites on different chromosome data.

2.4. Comparison with Existing Methods

To further assess the performance of DLm6Am, we compared our model with the other existing state-of-the-art computational methods, MultiRM and m6AmPred, to identify m6Am sites in RNA sequences. Among them, MultiRM consists of three parts, in which the first module takes the one-hot encoding of RNA sequences as input and adopts three different embedding techniques to embed features; then, the embedding features are fed into an LSTM and an attention layer. For m6AmPred, electron–ion interaction potential (EIIP) and Pseudo-EIIP (PseEIIP) were used as encoding schemes to represent the sample sequences, and the eXtreme Gradient Boosting with Dart algorithm (XgbDart) was employed to build the final model. Here, to provide a fair performance comparison, we applied an identical encoding scheme and classification algorithm to the training data used in this study and used independent testing data to evaluate the constructed model.
All the results using five-fold cross-validation on training data and using independent testing data are deposited in Figure 5a,d, respectively. It can be seen clearly that DLm6Am showed better predictive performance than the other two predictors. More specifically, Sn, Sp, ACC, MCC, AUROC, and AUPR for the DLm6Am model outperformed m6AmPred using five-fold cross-validation on training data, by 5.43%, 6.84%, 6.13%, 12.26%, 4.60%, and 4.27%, respectively. Sn, Sp, ACC, MCC, AUROC, and AUPR of DLm6Am were higher than those of MultiRM by 5.43%, 11.42%, 8.42%, 17.14%, 7.18%, and 7.55%, respectively. Moreover, in comparison with m6AmPred using independent testing data, Sn, Sp, ACC, MCC, AUROC, and AUPR for the DLm6Am model were improved by 9.60%, 3.32%, 6.45%, 12.95%, 4.28%, and 4.26%, respectively. Compared with MultiRM, these metric values were increased by 3.12%, 13.74%, 8.42%, 16.43%, 5.75%, and 6.57%, respectively. This result suggests that joint use of the hybrid architecture CNN–BiLSTM–attention and ensemble learning has a strong potential for application in other modification site prediction tasks in RNAs.
Additionally, both ROC and PR curves were plotted to demonstrate the performance of MultiRM, m6AmPred, and DLm6Am (Figure 5). The DLm6Am model had much higher performance than MultiRM and m6AmPred, further illustrating the stability and generalization ability of our proposed model, DLm6Am.

2.5. Webserver Functionality

To achieve quick prediction of m6Am sites from RNA sequences, a user-friendly web interface DLm6Am was developed based on python using the web micro-framework Flask. The webserver DLm6Am has several user interfaces and provides multiple functions, including m6Am site prediction, the introduction of this webserver, data download, and citation of the relevant literature. DLm6Am allows users to perform prediction by typing or copying/pasting the query RNA sequences with FASTA format into the input box. After a short online wait, the results will be displayed on the webserver. Additionally, the users can receive the prediction results by email without a long wait after uploading a FASTA file with multiple RNA sequences of interest and inputting the email address and job IDs. In summary, this service allows researchers to identify the transcriptome-wide m6Am sites, thereby enabling researchers to understand underlying m6Am-dependent mRNA regulation mechanisms and biological functions.

3. Materials and Methods

3.1. Benchmark Dataset

Recently, Sun et al. [14] developed a sensitive and direct method named m6Am-seq to identify transcriptome-wide m6Am and provided 2166 high-confidence m6Am sites at single-nucleotide resolution throughout the human transcriptome. In terms of these site information and the human genome assembly hg19, we can extract corresponding sample sequences using the (2ξ + 1)-nt long sliding window, formulated as below.
R ξ A = R ξ R ξ 1 R 2 R 1 A R + 1 R + 2 R + ξ 1 R + ξ
where the double-line character A represents the nucleotide adenosine in BCA (B = C, G, or U) motifs, the value of subscript ξ is an integer, R ξ represents the ξ -th upstream nucleotide from the center, R + ξ represents the ξ -th downstream nucleotide from the center, and so on. In this study, after preliminary analysis, ξ value was set to 41. If the centers of RNA sequence segments are the experimentally confirmed m6Am sites, these sequences are regarded as positive candidate samples and are put into the positive dataset S + . Otherwise, the RNA sequence segments are considered negative samples and are classified into the negative dataset S . After reducing the sequence identity to 80% by using CD-HIT-EST tool [19], we randomly selected negative samples at 1:1 positive-to-negative ratio to construct the final benchmark dataset S ( S = S + S ), summarized in Table 4. For constructing and training the prediction model, we randomly sampled 80% of data from the benchmark dataset as training set, and the remaining 20% was considered as independent testing dataset to test the prediction performance of the constructed model. This ratio of the training/testing set was obtained by analyzing the effect of training set size on performance (Figure S1, Supplementary Note S1). The training and independent testing can be downloaded from http://47.94.248.117/DLm6Am/download (accessed on 15 September 2022) [22].

3.2. Feature Extraction from RNA Sequence

It is essential to carefully design and extract features to develop robust and reliable computational methods of m6Am site identification. Various feature-encoding methods have been developed to represent and convert RNA sequences into numeric vectors, such as context-based features, structure-based features, and integrated features. Majority of these features could be easily generated by useful bioinformatics tools, such as PseKNC [25], iLearn [26], and iLearnPlus [27]. In this study, we selected the most prevalent context-based features to represent RNA sequence fragments, including one-hot, nucleotide chemical property (NCP), and nucleotide density (ND).

3.2.1. Binary Encoding of Nucleotide

Binary encoding of nucleotide is a simple and efficient method to characterize sample sequences of protein, DNA, or RNA. Generally, four types of nucleotide, i.e., A (adenine), C (cytosine), G (guanine), and U (uracil), can be encoded by (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1), respectively. Thus, each sample sequence with 41 nt length can be transformed into a 41 × 4 numerical matrix.

3.2.2. Nucleotide Chemical Property (NCP)

In terms of ring structures (two for purines, i.e., A and G, one for pyrimidines, i.e., C and U), chemical functionality (amino group for A and C, keto group for G and U), and strength of hydrogen bonding interaction between the base pairs (stronger between C and G, weaker between A and U), the four types of nucleotides can be classified into three different categories. Thus, the i -th nucleotide of a given sample sequence formulated by Equation (1) can be represented by a 3-dimensional vector, as shown below.
R i = x i ,   y i ,   z i
where x i = 1 0 R i A , G R i C , U represents ring structures, y i = 1 0 R i A , C R i G , U for chemical functionality, and z i = 1 0 R i A , U R i C , G for hydrogen bond. Therefore, each sample sequence with 41 nt length can be converted into a 41 × 3 numerical matrix.

3.2.3. Nucleotide Density (ND)

Nucleotide density, also termed accumulated nucleotide frequency (ANF), represents the accumulated frequency distribution of the nucleotide at each position along a given sample sequence. Specifically, nucleotide density of the nucleotide R i in the sample sequence R ξ A can be calculated by the number of occurrences of the nucleotide R i in the region from position 1 to position i divided by the length i of this region. For example, if the given sequence is “ACGACUUAGA”, it can be transformed into a numerical vector (1, 1/2, 1/3, 2/4, 2/5, 1/6, 2/7, 3/8, 2/9, 4/10). Generally, this feature can be combined with NCP to form a widely applied fusion feature named NCPD [28,29,30,31,32,33,34,35,36].

3.3. Classification Method

Generally, three types of major prediction algorithms have been employed to implement various prediction tasks in bioinformatics field, including (i) scoring-function-based methods, such as positional weight matrix (PWM) [37]; position-correlation scoring function (PCSF) [38] and relative stability (DE) [39]; (ii) traditional machine-learning-based method, such as random forest (RF) [40], support vector machine (SVM) [41], and decision tree (DT) [42]; and (iii) deep-learning-based methods, such as convolutional neural network (CNN) [43] and long short-term memory (LSTM) [44].
Deep-learning-based methods have been widely applied in biological research [17,45,46,47,48,49], especially for CNN and LSTM. The CNN framework can depict latent information of sequential features by integrating local dependencies, and LSTM architecture can capture possible long-range dependencies of the features. Recently, an effective hybrid framework embedding with CNN and LSTM has been successfully used in m6A site prediction, such as MultiRM [17], DeepM6ASeq [20], and MASS [21]. Inspired by these successful applications, in this study, we constructed an embedded deep-learning model with multi-head attention layer using CNN and BiLSTM to identify m6Am sites from RNA sequences. Finally, to further improve the generalization performance, we selected three embedded models with top three ACC values under different hyper-parameter combinations as base classifiers and adopted voting strategy to obtain the final ensemble deep-learning model, named DLm6Am. To demonstrate the superiority of our model architecture, we designed a series of ablation tests, including the model architecture without ensemble learning (CNN-BiLSTM-attention), without the attention layer (CNN-BiLSTM), and single deep-learning models.

3.4. Performance Evaluation

Prediction performance of the proposed models is generally measured using several metrics, such as sensitivity (Sn), specificity (Sp), accuracy (ACC), and Matthews correlation coefficient (MCC), formulated by the following equation.
S n = T P T P + F N   S p = T N T N + F P   A c c = T P + T N T P + T N + F P + F N   M C C = T P × T N F P × F N T P + F N × T N + F N × T P + F P × T N + F P
where TP, FN, TN, and FP stand for the numbers of true positives, false negatives, true negatives, and false positives, respectively. Additionally, the area under receiver operating characteristic curve (AUROC) and the area under precision and recall curves (AUPR) are usually calculated to evaluate the prediction performance. If their value is higher, the prediction performance is better.

4. Conclusions

In this study, we have developed a new computational method, called DLm6Am, to predict the transcriptome-wide likelihoods of m6Am sites. DLm6Am is a deep-learning-based framework integrated by three base classifiers with the top three ACC values, which have similar network architectures. Each of the base classifiers has three parts, a built-in attention module to help extract useful sequence patterns, an embedding module consisting of a CNN and a BiLSTM for extracting features from RNA sample sequences, and a prediction module consisting of two fully connected layers to obtain a final prediction decision. The ablation tests in Section 3.2 showed the superiority of our designed framework. Moreover, independent tests and hold-out cross-validation on the chromosome level have demonstrated the generalization capacity of our model in predicting novel m6Am sites. Comparison results on benchmark datasets have illustrated the superior prediction performance of DLm6Am over other state-of-the-art methods, m6AmPred and MultiRM. However, currently, we only used limited experimental data to construct the model for m6Am site prediction. In the future, we will collect large-scale experimental data to develop better prediction algorithms.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms231911026/s1.

Author Contributions

Conceptualization, methodology, writing—original draft preparation, writing—review and editing, supervision, project administration, Z.X. and X.X.; software, visualization, W.S. and Z.L.; validation, Z.L., W.S. and L.L.; formal analysis, investigation, W.S. and W.Q.; data curation, Z.L.; resources, funding acquisition, Z.X., X.X. and W.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Scientific Foundation of China, grant numbers 62062043, 32270789, 31860312, and 62162032.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The training and independent testing used in this study can be downloaded from http://47.94.248.117/DLm6Am/download (accessed on 15 September 2022).

Acknowledgments

The authors are also grateful to Xiang Chen for the extensive revision of written English. Moreover, the authors thank Guoying Zou and Weizhong Lin for the helpful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Boccaletto, P.; Stefaniak, F.; Ray, A.; Cappannini, A.; Mukherjee, S.; Purta, E.; Kurkowska, M.; Shirvanizadeh, N.; Destefanis, E.; Groza, P.; et al. MODOMICS: A database of RNA modification pathways. 2021 update. Nucleic Acids Res. 2022, 50, D231–D235. [Google Scholar] [CrossRef]
  2. Zhao, B.S.; Roundtree, I.A.; He, C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 2017, 18, 31–42. [Google Scholar] [CrossRef]
  3. Wei, C.-M.; Gershowitz, A.; Moss, B. N6, O2′-dimethyladenosine a novel methylated ribonucleoside next to the 5′ terminal of animal cell and virus mRNAs. Nature 1975, 257, 251–253. [Google Scholar] [CrossRef]
  4. Ben-Haim, M.S.; Pinto, Y.; Moshitch-Moshkovitz, S.; Hershkovitz, V.; Kol, N.; Diamant-Levi, T.; Beeri, M.S.; Amariglio, N.; Cohen, H.Y.; Rechavi, G. Dynamic regulation of N6,2′-O-dimethyladenosine (m6Am) in obesity. Nat. Commun. 2021, 12, 7185. [Google Scholar] [CrossRef]
  5. Sendinc, E.; Valle-Garcia, D.; Dhall, A.; Chen, H.; Henriques, T.; Navarrete-Perea, J.; Sheng, W.; Gygi, S.P.; Adelman, K.; Shi, Y. PCIF1 Catalyzes m6Am mRNA Methylation to Regulate Gene Expression. Mol. Cell 2019, 75, 620–630.e9. [Google Scholar] [CrossRef]
  6. Mauer, J.; Luo, X.; Blanjoie, A.; Jiao, X.; Grozhik, A.V.; Patil, D.P.; Linder, B.; Pickering, B.F.; Vasseur, J.-J.; Chen, Q.; et al. Reversible methylation of m6Am in the 5′ cap controls mRNA stability. Nature 2017, 541, 371–375. [Google Scholar] [CrossRef]
  7. Schwartz, S.; Mumbach, M.R.; Jovanovic, M.; Wang, T.; Maciag, K.; Bushkin, G.G.; Mertins, P.; Ter-Ovanesyan, D.; Habib, N.; Cacchiarelli, D.; et al. Perturbation of m6A Writers Reveals Two Distinct Classes of mRNA Methylation at Internal and 5′ Sites. Cell Rep. 2014, 8, 284–296. [Google Scholar] [CrossRef]
  8. Akichika, S.; Hirano, S.; Shichino, Y.; Suzuki, T.; Nishimasu, H.; Ishitani, R.; Sugita, A.; Hirose, Y.; Iwasaki, S.; Nureki, O.; et al. Cap-specific terminal N-6-methylation of RNA by an RNA polymerase II-associated methyltransferase. Science 2019, 363, eaav0080. [Google Scholar] [CrossRef]
  9. Pandey, R.R.; Delfino, E.; Homolka, D.; Roithova, A.; Chen, K.-M.; Li, L.; Franco, G.; Vågbø, C.B.; Taillebourg, E.; Fauvarque, M.-O.; et al. The Mammalian Cap-Specific m6Am RNA Methyltransferase PCIF1 Regulates Transcript Levels in Mouse Tissues. Cell Rep. 2020, 32, 108038. [Google Scholar] [CrossRef]
  10. Boulias, K.; Toczydłowska-Socha, D.; Hawley, B.R.; Liberman, N.; Takashima, K.; Zaccara, S.; Guez, T.; Vasseur, J.-J.; Debart, F.; Aravind, L.; et al. Identification of the m6Am Methyltransferase PCIF1 Reveals the Location and Functions of m6Am in the Transcriptome. Mol. Cell 2019, 75, 631–643.e8. [Google Scholar] [CrossRef]
  11. Wei, J.; Liu, F.; Lu, Z.; Fei, Q.; Ai, Y.; He, P.C.; Shi, H.; Cui, X.; Su, R.; Klungland, A.; et al. Differential m6A, m6Am, and m1A Demethylation Mediated by FTO in the Cell Nucleus and Cytoplasm. Mol. Cell 2018, 71, 973–985.e5. [Google Scholar] [CrossRef]
  12. Hawley, B.R.; Jaffrey, S.R. Transcriptome-Wide Mapping of m6A and m6Am at Single-Nucleotide Resolution Using miCLIP. Curr. Protoc. Mol. Biol. 2019, 126, e88. [Google Scholar] [CrossRef]
  13. Koh, C.W.Q.; Goh, Y.T.; Goh, W.S.S. Atlas of quantitative single-base-resolution N6-methyl-adenine methylomes. Nat. Commun. 2019, 10, 5636. [Google Scholar] [CrossRef]
  14. Sun, H.; Li, K.; Zhang, X.; Liu, J.; Zhang, M.; Meng, H.; Yi, C. m6Am-seq reveals the dynamic m6Am methylation in the human transcriptome. Nat. Commun. 2021, 12, 4778. [Google Scholar] [CrossRef]
  15. Jiang, J.; Song, B.; Chen, K.; Lu, Z.; Rong, R.; Zhong, Y.; Meng, J. m6AmPred: Identifying RNA N6, 2’-O-dimethyladenosine (m6Am) sites based on sequence-derived infor-mation. Methods 2021, 203, 328–334. [Google Scholar] [CrossRef]
  16. Linder, B.; Grozhik, A.V.; Olarerin-George, A.O.; Meydan, C.; Mason, C.E.; Jaffrey, S.R. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat. Methods 2015, 12, 767–772. [Google Scholar] [CrossRef]
  17. Song, Z.; Huang, D.; Song, B.; Chen, K.; Song, Y.; Liu, G.; Su, J.; de Magalhães, J.P.; Rigden, D.J.; Meng, J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nat. Commun. 2021, 12, 4011. [Google Scholar] [CrossRef]
  18. Liu, J.; Li, K.; Cai, J.; Zhang, M.; Zhang, X.; Xiong, X.; Meng, H.; Xu, X.; Huang, Z.; Peng, J.; et al. Landscape and Regulation of m6A and m6Am Methylome across Human and Mouse Tissues. Mol. Cell 2019, 77, 426–440.e6. [Google Scholar] [CrossRef]
  19. Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef]
  20. Zhang, Y.; Hamada, M. DeepM6ASeq: Prediction and characterization of m6A-containing sequences using deep learning. BMC Bioinform. 2018, 19, 524. [Google Scholar] [CrossRef]
  21. Xiong, Y.; He, X.; Zhao, D.; Tian, T.; Hong, L.; Jiang, T.; Zeng, J. Modeling multi-species RNA modification through multi-task curriculum learning. Nucleic Acids Res. 2021, 49, 3719–3734. [Google Scholar] [CrossRef] [PubMed]
  22. DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences. Available online: http://47.94.248.117/DLm6Am/ (accessed on 15 September 2022).
  23. Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
  24. Williams, C.; Barber, D. Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1342–1351. [Google Scholar] [CrossRef]
  25. Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem. 2014, 456, 53–60. [Google Scholar] [CrossRef] [PubMed]
  26. Chen, Z.; Zhao, P.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Revote, J.; Zhu, Y.; Powell, D.R.; Akutsu, T.; Webb, G.I.; et al. iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform. 2019, 21, 1047–1057. [Google Scholar] [CrossRef]
  27. Chen, Z.; Zhao, P.; Li, C.; Li, F.; Xiang, D.; Chen, Y.Z.; Akutsu, T.; Daly, R.J.; Webb, G.I.; Zhao, Q.; et al. iLearnPlus: A comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. 2021, 49, e60. [Google Scholar] [CrossRef]
  28. Chen, W.; Tran, H.; Liang, Z.; Lin, H.; Zhang, L. Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci. Rep. 2015, 5, 13859. [Google Scholar] [CrossRef]
  29. Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.C. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. Mol. Ther. Nucleic Acids 2017, 7, 155–163. [Google Scholar] [CrossRef]
  30. Chen, W.; Tang, H.; Lin, H. MethyRNA: A web server for identification of N6-methyladenosine sites. J. Biomol. Struct. Dyn. 2017, 35, 683–687. [Google Scholar] [CrossRef]
  31. Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C. iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites. Mol. Ther. Nucleic Acids 2018, 11, 468–474. [Google Scholar] [CrossRef]
  32. Chen, W.; Feng, P.; Ding, H.; Lin, H. Identifying N6-methyladenosine sites in the Arabidopsis thaliana transcriptome. Mol. Genet. Genom. 2016, 291, 2225–2229. [Google Scholar] [CrossRef]
  33. Dao, F.-Y.; Lv, H.; Yang, Y.-H.; Zulfiqar, H.; Gao, H.; Lin, H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput. Struct. Biotechnol. J. 2020, 18, 1084–1091. [Google Scholar] [CrossRef]
  34. Khan, A.; Rehman, H.U.; Habib, U.; Ijaz, U. Detecting N6-methyladenosine sites from RNA transcriptomes using random forest. J. Comput. Sci. 2020, 47, 101238. [Google Scholar] [CrossRef]
  35. Islam, N.; Park, J. bCNN-Methylpred: Feature-Based Prediction of RNA Sequence Modification Using Branch Convolu-tional Neural Network. Genes 2021, 12, 1155. [Google Scholar] [CrossRef]
  36. Chen, K.; Wei, Z.; Zhang, Q.; Wu, X.; Rong, R.; Lu, Z.; Su, J.; de Magalhães, J.P.; Rigden, D.J.; Meng, J. WHISTLE: A high-accuracy map of the human N-6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res. 2019, 47, e41. [Google Scholar] [CrossRef]
  37. Georgi, B.; Schliep, A. Context-specific independence mixture modeling for positional weight matrices. Bioinformatics 2006, 22, e166–e173. [Google Scholar] [CrossRef]
  38. Xing, Y.; Zhao, X.; Cai, L. Prediction of nucleosome occupancy in Saccharomyces cerevisiae using position-correlation scoring function. Genomics 2011, 98, 359–366. [Google Scholar] [CrossRef]
  39. Rangannan, V.; Bansal, M. Relative stability of DNA as a generic criterion for promoter prediction: Whole genome annotation of microbial genomes with varying nucleotide base composition. Mol. BioSyst. 2009, 5, 1758–1769. [Google Scholar] [CrossRef]
  40. Breiman, L. Random forest. Mach. Learn. 1999, 45, 1–35. [Google Scholar]
  41. Saunders, C.; Stitson, M.O.; Weston, J.; Holloway, R.; Bottou, L.; Scholkopf, B.; Smola, A. Support Vector Machine. Comput. Sci. 2002, 1, 1–28. [Google Scholar]
  42. Dobson, R.J.; Munroe, P.B.; Caulfield, M.J.; Saqi, M.A. Predicting deleterious nsSNPs: An analysis of sequence and structural attributes. BMC Bioinform. 2006, 7, 217. [Google Scholar] [CrossRef]
  43. Kruitbosch, H.T.; Mzayek, Y.; Omlor, S.; Guerra, P.; Milias-Argeitis, A. A convolutional neural network for segmentation of yeast cells without manual training annotations. Bioinformatics 2021, 38, 1427–1433. [Google Scholar] [CrossRef]
  44. Sun, S.; Wu, Q.; Peng, Z.; Yang, J. Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles. Bioinformatics 2018, 35, 1686–1691. [Google Scholar] [CrossRef] [Green Version]
  45. Di Lena, P.; Nagata, K.; Baldi, P. Deep architectures for protein contact map prediction. Bioinformatics 2012, 28, 2449–2457. [Google Scholar] [CrossRef]
  46. Kuksa, P.P.; Min, M.R.; Dugar, R.; Gerstein, M. High-order neural networks and kernel methods for peptide-MHC binding prediction. Bioinformatics 2015, 31, 3600–3607. [Google Scholar] [CrossRef]
  47. Angermueller, C.; Lee, H.; Reik, W.; Stegle, O. DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017, 18, 67. [Google Scholar] [CrossRef]
  48. Lei, Y.; Li, S.; Liu, Z.; Wan, F.; Tian, T.; Li, S.; Zhao, D.; Zeng, J. A deep-learning framework for multi-level peptide–protein interaction prediction. Nat. Commun. 2021, 12, 5465. [Google Scholar] [CrossRef]
  49. Xie, R.; Li, J.; Wang, J.; Dai, W.; Leier, A.; Marquez-Lago, T.T.; Akutsu, T.; Lithgow, T.; Song, J.; Zhang, Y. DeepVF: A deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief. Bioinform. 2021, 22, bbaa125. [Google Scholar] [CrossRef]
Figure 1. Chemical structures of m6Am [18].
Figure 1. Chemical structures of m6Am [18].
Ijms 23 11026 g001
Figure 2. The workflow and architecture of DLm6Am. DLm6Am identifies m6Am sites from RNA sequences by several key steps, including feature extraction, model construction, and m6Am site prediction. DLm6Am integrates three CNN-BiLSTM-attention models into an ensemble deep learning predictor, in which each base classifier includes multi-head-attention module, embedding module, and prediction module.
Figure 2. The workflow and architecture of DLm6Am. DLm6Am identifies m6Am sites from RNA sequences by several key steps, including feature extraction, model construction, and m6Am site prediction. DLm6Am integrates three CNN-BiLSTM-attention models into an ensemble deep learning predictor, in which each base classifier includes multi-head-attention module, embedding module, and prediction module.
Ijms 23 11026 g002
Figure 3. Performance analysis of different m6Am prediction models using five-fold cross-validation on training data. Subgraphs (ad) represent boxplots of ACC, MCC, AUROC, and AUPR of different models, respectively. The level of significance (NS, *, **, ***) represents non-significant (p > 0.05), low significance (p < 0.05), medium significance (p < 0.01), and high significance (p < 0.001), respectively.
Figure 3. Performance analysis of different m6Am prediction models using five-fold cross-validation on training data. Subgraphs (ad) represent boxplots of ACC, MCC, AUROC, and AUPR of different models, respectively. The level of significance (NS, *, **, ***) represents non-significant (p > 0.05), low significance (p < 0.05), medium significance (p < 0.01), and high significance (p < 0.001), respectively.
Ijms 23 11026 g003
Figure 4. Performance analysis of different m6Am prediction models using independent testing data. (a) Receiver operating characteristic (ROC) curves of different models. (b) Precision–recall curves of different models.
Figure 4. Performance analysis of different m6Am prediction models using independent testing data. (a) Receiver operating characteristic (ROC) curves of different models. (b) Precision–recall curves of different models.
Ijms 23 11026 g004
Figure 5. Performance comparison with existing methods. (ac) Performance comparison between MultiRM, m6AmPred, and our method DLm6Am using five-fold cross-validation on training data. (df) Performance comparison between MultiRM, m6AmPred, and our method DLm6Am using independent testing data.
Figure 5. Performance comparison with existing methods. (ac) Performance comparison between MultiRM, m6AmPred, and our method DLm6Am using five-fold cross-validation on training data. (df) Performance comparison between MultiRM, m6AmPred, and our method DLm6Am using independent testing data.
Ijms 23 11026 g005
Table 1. Comparison of performance between different model architectures using five-fold cross-validation on training data.
Table 1. Comparison of performance between different model architectures using five-fold cross-validation on training data.
ModelSn ± SD (%)Sp ± SD (%)ACC ± SD (%)MCC ± SDAUROC ± SDAUPR ± SD
RF74.74 ± 0.5175.89 ± 0.5475.31 ± 0.390.5063 ± 0.00780.8273 ± 0.00120.8163 ± 0.0021
SVM77.06 ± 0.4273.46 ± 0.7275.26 ± 0.390.5055 ± 0.00780.8323 ± 0.00270.8288 ± 0.0029
XGBoost75.42 ± 0.6275.54 ± 0.6475.48 ± 0.550.5096 ± 0.01110.8325 ± 0.00310.8362 ± 0.0031
CNN75.95 ± 3.2176.71 ± 2.7376.33 ± 0.500.5275 ± 0.00970.8473 ± 0.00430.8485 ± 0.0047
BiLSTM76.17 ± 2.3275.14 ± 2.0675.65 ± 0.670.5134 ± 0.01340.8381 ± 0.00470.8392 ± 0.0053
CNN-BiLSTM78.33 ± 1.0876.55 ± 1.1177.45 ± 0.290.5491 ± 0.00580.8462 ± 0.00340.8432 ± 0.0045
CNN-BiLSTM-attention78.60 ± 0.8177.82 ± 0.7878.21 ± 0.290.5643 ± 0.00580.8488 ± 0.00310.8464 ± 0.0053
DLm6Am78.94 ± 0.7978.18 ± 0.6078.56 ± 0.270.5713 ± 0.00540.8509 ± 0.00270.8508 ± 0.0033
Table 2. Comparison of performance between different model architectures using independent testing data.
Table 2. Comparison of performance between different model architectures using independent testing data.
ModelSn (%)Sp (%)ACC (%)MCCAUROC
RF75.7776.9076.340.52680.8380
SVM75.4977.1876.340.52680.8438
XGBoost74.3776.0675.210.50430.8435
CNN69.5886.2077.890.56560.8580
BiLSTM72.9678.0375.490.51050.8341
CNN-BiLSTM79.8977.2178.550.57120.8603
CNN-BiLSTM-attention77.2180.4578.840.57690.8612
DLm6Am81.7177.4079.550.59160.8634
Table 3. The performance of our method using hold-out cross-validation on chromosome level.
Table 3. The performance of our method using hold-out cross-validation on chromosome level.
ChromosomeSn (%)Sp (%)ACC (%)MCCAUROC
Chr176.9680.2078.610.57210.8490
Chr283.4876.5280.000.60150.8749
Chr377.0079.6378.370.56650.8575
Chr481.2578.4679.840.59730.8663
Chr591.2581.9486.840.73730.9255
Chr677.6676.2476.920.53860.8332
Chr784.7877.4281.080.62360.8789
Chr884.6280.7082.570.65250.8691
Chr981.1680.3080.740.61460.8678
Chr1087.5077.4682.520.65320.8533
Chr1185.1975.5380.690.61160.8696
Chr1276.3482.1879.380.58670.8549
Chr1483.3377.1180.000.60300.8531
Chr1583.1086.3684.670.69420.9078
Chr1679.6983.6181.600.63290.8768
Chr1782.5777.5780.090.60230.8298
Chr1982.4177.7880.190.60290.8585
ChrX75.8290.2882.210.65800.9087
Chr13, 18, 20, 21, and 2287.3872.3679.200.59790.8773
Table 4. The distribution of sample numbers in different chromosomes.
Table 4. The distribution of sample numbers in different chromosomes.
ChromosomePositiveNegativeChromosomePositiveNegative
Chr1191197Chr133239
Chr2115115Chr147283
Chr3100108Chr157166
Chr46465Chr166461
Chr58072Chr17109107
Chr694101Chr181618
Chr79293Chr1910899
Chr85257Chr203744
Chr96966Chr211822
Chr107271Chr222623
Chr1110894ChrX9172
Chr1293101Total17741774
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Luo, Z.; Su, W.; Lou, L.; Qiu, W.; Xiao, X.; Xu, Z. DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences. Int. J. Mol. Sci. 2022, 23, 11026. https://doi.org/10.3390/ijms231911026

AMA Style

Luo Z, Su W, Lou L, Qiu W, Xiao X, Xu Z. DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences. International Journal of Molecular Sciences. 2022; 23(19):11026. https://doi.org/10.3390/ijms231911026

Chicago/Turabian Style

Luo, Zhengtao, Wei Su, Liliang Lou, Wangren Qiu, Xuan Xiao, and Zhaochun Xu. 2022. "DLm6Am: A Deep-Learning-Based Tool for Identifying N6,2′-O-Dimethyladenosine Sites in RNA Sequences" International Journal of Molecular Sciences 23, no. 19: 11026. https://doi.org/10.3390/ijms231911026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop