Next Article in Journal
Targeted Metagenomic Databases Provide Improved Analysis of Microbiota Samples
Previous Article in Journal
Effect of Agroindustrial Waste Substrate Fermented with Lactic Acid Bacteria and Yeast on Changes in the Gut Microbiota of Guinea Pigs
Previous Article in Special Issue
Intervention Strategies to Control Campylobacter at Different Stages of the Food Chain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility

1
Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA
2
Department of Population Medicine, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA
3
Department of Computer Science and Engineering, Mississippi State University, Starkville, MS 39762, USA
*
Author to whom correspondence should be addressed.
Microorganisms 2024, 12(1), 134; https://doi.org/10.3390/microorganisms12010134
Submission received: 29 November 2023 / Revised: 4 January 2024 / Accepted: 8 January 2024 / Published: 10 January 2024
(This article belongs to the Special Issue Antimicrobial Resistance in the Food Chain)

Abstract

:
Salmonella spp., a leading cause of foodborne illness, is a formidable global menace due to escalating antimicrobial resistance (AMR). The evaluation of minimum inhibitory concentration (MIC) for antimicrobials is critical for characterizing AMR. The current whole genome sequencing (WGS)-based approaches for predicting MIC are hindered by both computational and feature identification constraints. We propose an innovative methodology called the “Genome Feature Extractor Pipeline” that integrates traditional machine learning (random forest, RF) with deep learning models (multilayer perceptron (MLP) and DeepLift) for WGS-based MIC prediction. We used a dataset from the National Antimicrobial Resistance Monitoring System (NARMS), comprising 4500 assembled genomes of nontyphoidal Salmonella, each annotated with MIC metadata for 15 antibiotics. Our pipeline involves the batch downloading of annotated genomes, the determination of feature importance using RF, Gini-index-based selection of crucial 10-mers, and their expansion to 20-mers. This is followed by an MLP network, with four hidden layers of 1024 neurons each, to predict MIC values. Using DeepLift, key 20-mers and associated genes influencing MIC are identified. The 10 most significant 20-mers for each antibiotic are listed, showcasing our ability to discern genomic features affecting Salmonella MIC prediction with enhanced precision. The methodology replaces binary indicators with k-mer counts, offering a more nuanced analysis. The combination of RF and MLP addresses the limitations of the existing WGS approach, providing a robust and efficient method for predicting MIC values in Salmonella that could potentially be applied to other pathogens.

1. Introduction

Salmonella spp., one of the leading causes of foodborne illness all around the world, can contaminate a wide range of food products including meat, poultry, eggs, dairy, fruits, and vegetables. When consumed, contaminated food can cause salmonellosis and gastroenteritis with symptoms such as nausea, diarrhea, abdominal pain, and fever. According to the Centers for Disease Control and Prevention (CDC), Salmonella is responsible for an alarming global toll, causing an estimated 150 million cases of illness and resulting in 60,000 fatalities annually [1]. In the United States alone, salmonellosis accounts for more than one million illnesses and approximately 420 deaths each year [2]. While many cases of foodborne illnesses may naturally resolve or, in severe instances, can be treated with antibiotics, the emergence of AMR poses a significant challenge to effective therapeutic strategies.
Due to the concerted efforts of the National Antimicrobial Resistance Monitoring System (NARMS, [3]), a collaborative effort between the United States Department of Agriculture (USDA), CDC, and Food and Drug Administration (FDA) since 2002, comprehensive surveillance has been in place. Surveillance focuses on tracking the prevalence of pathogens responsible for foodborne illnesses, including Salmonella, Campylobacter, Escherichia, and Enterococcus. NARMS assesses foodborne pathogen susceptibility to 40 antibiotics (15 for both Salmonella and Escherichia, 9 for Campylobacter, and 16 for Enterococcus), resulting in a wealth of MIC information for these pathogens. The available MIC data are critical for effectively inhibiting the growth of these pathogens. In addition to determining the MICs, NARMS has also been actively expanding its data collection efforts by gathering WGS from randomly selected isolates of foodborne pathogens.
The NARMS program has achieved considerable success by delivering timely insights into the trends of antibiotic resistance [4,5]. However, there is a concerning trend showing that AMR in Salmonella has shown a steady increase since 2015, especially in poultry chickens (ceca during slaughtering, chicken carcass/parts during processing and inspection, and even sampled retail chickens that have been sold to the public) [3]. The reports have gone a step further in pinpointing recent resistance trends, particularly regarding ciprofloxacin, which is among the first-line antibiotics for treating Salmonella infections [6,7,8], and some other antibiotics such as chloramphenicol, trimethoprim-sulfamethoxazole, sulfisoxazole, nalidixic acid, streptomycin, and tetracycline. Determining the MIC quickly with minimal lab testing, while making accommodations for the genetic diversity within pathogenic strains, is essential to ensure that treatments can be customized, timely, and effective.
MIC values for bacterial strain and antibiotic pairs are traditionally determined using agar or broth dilution methods, described by the Clinical and Laboratory Standards Institute [9,10,11,12]. However, traditional methods for determining the MIC of antimicrobial agents are hindered by time-consuming processes, posing challenges in promptly addressing serious infections [10] as they often involve substantial hands-on labor, involving tasks like plate preparation and serial dilutions, increasing the risk of errors and operator-dependent variability. Publicly available WGS data, paired with clinical AMR metadata, has enabled the use of machine learning (ML) to predict MIC values and track temporal trends, eliminating sole reliance on AMR databases. Using short nucleotide sequences (referred to as k-mers or genomic features interchangeably henceforth, where k denotes the sequence length) as features and laboratory-derived MIC values as labels, precise predictions of susceptibility or resistance to antibiotics are made, even without prior genetic information about the organisms [13,14,15,16,17]. The WGS data from NARMS has been used to predict the MIC values of 15 commonly monitored antibiotics for Salmonella using XGBoost [18,19], with an average accuracy of 95% within a ±1 2-fold dilution step of the laboratory-determined values. However, the study identified k-mers that play a crucial role in MIC prediction only by using a subset of the samples owing to computational limitations. A random forest and a neural network model in parallel were used to predict susceptibility/resistance in Mycobacterium tuberculosis, Escherichia coli, Salmonella enterica, and Staphylococcus aureus [20], while Adaboost was used to predict resistance to carbapenem, methicillin, and beta-lactam in Acinetobacter baumannii, Staphylococcus aureus, and Streptococcus pneumoniae, respectively [21], using data from the PATRIC database [22]. Similarly, random forest, support vector machine, and XGBoost were used to predict cefoxitin resistance in S. aureus [23] and logistic regression was used to predict resistance to ethambutol, ethionamide, isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin in M. tuberculolis and S. aureus [24]. The existing approach, which employs WGS to predict MIC, has made significant strides. However, it still faces some limitations. One of these limitations is the substantial computational memory required for processing WGS data and making MIC predictions. For example, Nguyen et al. successfully predicted MIC using 10-mers and analyzed 4500 genomes. However, when attempting to identify crucial k-mers through BLAST searches, their scope was limited to 15-mers and 1000 genomes [19]. Furthermore, the current approach faces challenges in distinguishing whether the identified k-mers are associated with low or high MIC values. These challenges highlight the need for more efficient and precise methods to address these shortcomings and enhance our understanding of MIC prediction.
In this study, we have developed the “Genome Feature Extractor Pipeline” to address challenges associated with using 10-mers for MIC prediction. Our pipeline transforms approximately one million possible 10-mers into a more manageable set of a few tens of thousands 20-mers, effectively capturing genomic regions influencing MIC values. It distinguishes the contributions of essential 20-mers and genes, crucial for understanding susceptibility and resistance in a dataset of 4500 genomes. Many of the predictive k-mers and genes align with known resistance mechanisms. Additionally, our model reveals potential antibiotic resistance-related genes, although these require validation through experiments.

2. Materials and Methods

2.1. Data Curation and Analysis Pipeline

The NARMS dataset used in this study has 4500 assembled and annotated genomes (used to identify features) of nontyphoidal Salmonella along with their associated MIC metadata information (labels). We identified genomic features that are most predictive of MIC for the 15 antibiotics, listed in Table 1, for Salmonella [3]. We used the frequencies (number of occurrences) of a specific subset of 20-mer to predict MICs. With 420 (≈1012) possible 20-mers, and a dataset of Salmonella genomes with an average of 5 × 106 base pairs, the search to obtain occurrence frequencies or counts of the k-mers is almost impossible. Thus, we identified the subset of k-mers using the 4-step process, in the genome feature extraction pipeline depicted in Figure 1. The first step was to batch-download the annotated genomes from the Bacterial and Viral Bioinformatics Resource Center [22]. In the second step, we chose the set of “important” 10-mers. The total number of possible 10-mers is only of the order of a million (410 = (210) ≈ (103)2), and for each sample genome sequence, we extracted the 10-mer counts, creating a count data of size 4500 × 106. Counts of k-mer occurrences, though easy to calculate, have shown promise in MIC predictions [25], although other features such as the individual or joint (co-occurrence) positional behavior of k-mers are slightly more computationally intensive, and may provide further biological insights.
The RF (additional details in Section 2.2) is a collection of decision trees that learn both individual and joint feature interactions and is nonparametric and computationally efficient. This prediction problem fits the limited sample sizes (4500) and high-dimensional (106 for 10-mers) feature space case, where the RF is a better choice than a deeper neural network. Feature selection or dimensionality reduction approaches explicitly calculate a subset of input features that best describe (or estimate) the target variable (MIC). We compute each feature’s contribution, called “feature importance”, based on the Gini index [26]. Figure 2 shows an example plot of the feature importance values of 10-mers for the antibiotic ‘AMP’. An inflection point or “elbow point” is the point at which we observe the shift in the gradient of the importance values from a large negative value to smaller ones, indicating the saturation of the representation. We chose the 10-mers corresponding to the indices with importance values above the elbow points, as depicted for ampicillin in Figure 2, and chose the important 10-mers for all 15 antibiotics. Expanding these 10-mers to 20-mers, though not exhaustive, is computationally efficient and is also sufficient for achieving good prediction accuracy, thereby validating the approach.
While considering the identification of antibiotic responsive genes using k-mers, we know that shorter k-mers, such as those of length 10, may present challenges when attempting to perform BLAST searches for the identification of such resistant genes. Therefore, we made the decision to increase the k-mer length to enhance our ability to accurately identify antibiotic responsive genes. To achieve this, in the third step of the algorithm pipeline, we equally expanded each 10-mer on either side to obtain the 20-mer, as shown in Figure 3. We also experimented with all 11 extending options, but the improvement in MIC prediction accuracy was minimal from choosing the 10-mer in the center. This extension allowed us to significantly improve the specificity and sensitivity of our approach. As a result, when we expanded the 1352 ‘important’ 10-mers into 20-mers, we obtained 27,932 unique 20-mers from the database, which is a far smaller number than all possible (≈109 in our case) 20-mers. This approach provides a more comprehensive representation of potential antibiotic resistance gene sequences, addressing the limitations of shorter k-mers for this specific purpose.
In this last (fourth) step, we repeat the dimensionality reduction by using feature importance, similar to the second step. The expanded 20-mer data were trained in an MLP regressor instead of the RF (additional details in Section 2.2), with the MIC values used as target labels. After training the model, we used DeepLift [27] to extract features (specific 20-mers and the genes that they aligned to; see more details of individual results in Section 3.1) that contribute to the MIC in a positive or negative manner. We have only listed the 10 most significant 20-mers (genes) for each of the antibiotics. We integrated traditional machine learning (RF) as well as a Multilayer Perceptron (MLP) sequentially to select and refine the important genomic features that affect the MIC prediction of Salmonella in this dataset. We are using the counts of the k-mers in our algorithm rather than just binary presence–absence indicators, as reported earlier [19].

2.2. RF and MLP

Random forest [28,29] is a powerful ensemble algorithm, which is a collection of individual decision trees that collaborate to improve the classification or regression task. Decision-tree-based algorithms are sensitive to the training data [30] and have low bias but a high variance [31]. In a random forest, this sensitivity is addressed by constructing multiple decision trees from random samples selected from the dataset, often with replacement. Predictions (regression) are then derived through a majority vote (average) from the ensemble of trees. In our analysis, we utilized 100 trees in the RF, using the Scikit-learn library (v 1.2.2).
The MLP network had 4 hidden layers, and each layer had 1024 neurons. Since the network’s aim is to obtain a regression on the MIC value, which is a positive variable with gaps in its range, (1) the final output layer had only one neuron with a linear activation function, while the other layers used a ReLU activation, and (2) the loss function used was MSE with an ADAM optimizer. For regularization, we used both batch normalization (BN) and dropout (DR). BN adjusts the values of units individually for each batch using their respective mean and standard deviations, while DR randomly deactivates a fraction (0.3) of units within the network. BN and DR help control scaling and overfitting, respectively. For both the RF and the neural network, we used an 80–20% train-test split and estimated test accuracy.

3. Results and Discussion

3.1. MIC Prediction Accuracy

For advancing personalized and effective antibiotic treatments, two notable limitations are (1) that the laboratory determination of MIC, crucial for tailored treatment, often takes a substantial amount of time, usually 3 to 5 days [32], and (2) that the MIC values exhibit natural variability, mainly due to genetic differences among pathogenic strains [33]. Genetic makeup significantly influences pathogen susceptibility to antibiotics.
In this study, we used integrated traditional machine learning (RF) and deep learning (multilayer perceptron—MLP) to predict Salmonella MIC values based on 20-mer counts in data from WGS. We reduced the dimension of a million 10-mers to produce 27,932 unique 20-mers, and we identified the top 10 20-mers that are predictive of the increase and decrease in MIC using our innovative Genome Feature Extractor Pipeline. The average prediction accuracy of MICs for 15 antibiotics over the entire 4500 genomes dataset is >96% (Table 2). The lowest accuracy (89.67) was obtained with sulfisoxazole, which had the largest range of MIC values, and that probably caused a large mean square error (MSE) in the predictor outputs. The MIC prediction for at least 10 of the 15 antibiotics was at ≥96.4% prediction accuracy and low MSE. These results underscore the importance of the dimension reduction and filtering of k-mers as critical steps in optimizing the performance of MIC prediction models.
Considering potential sequencing variations, it has been recommended to construct prediction models based on well-controlled experiments using WGS data sourced from the same laboratories [34]. The study achieved an average prediction accuracy of 92% for 24 antibiotics, with 321 WGS as predictors. In contrast, our approach, leveraging 4500 WGS, demonstrated an accuracy exceeding 96%. This suggests that variation in WGS originating from different labs could be less critical when there are ample data points, allowing machine learning models to learn more effectively. Furthermore, from a prediction perspective, lower variability in data can enhance accuracy but may compromise the model’s robustness by increasing sensitivity to variations. While WGS is effective for predicting AMR, the presence of a heteroresistant subpopulation in Salmonella enterica, exhibiting variability in sensitivity to an antimicrobial agent, could lead to an incorrect indication of an absence of resistance [35]. This introduces a notable limitation in machine learning, as the models might struggle to decipher the presence of heteroresistance when making MIC predictions. However, the results of our study align with previous predictive learning models that utilized 10-mer counts from the PATRIC database data. The earlier models, utilizing deep learning (neural network) [36] and traditional machine learning (XGBoost) [19], achieved prediction accuracies within the range of 85% to 95%. In our study, we employed both deep learning and random forest approaches for comprehensive analysis. Furthermore, in contrast to the XGBoost feature importance analysis that identified important k-mers for MIC prediction [19], our study employs the DeepLift technique to categorize the identified k-mers as specifically crucial for either high or low MIC values. Detailed discussions of these observations are included in the following sections.

3.2. Identification of Genomic Features Predictive of Antibiotic Susceptibility/Resistance

A positive correlation between the presence of known antibiotic genes and laboratory-determined MIC values was shown in [37]. Furthermore, additional investigations have delved into the use of single nucleotide polymorphisms (SNPs) within known antibiotic genes to predict susceptibility and resistance [14,38,39]. However, these previous approaches often overlooked the potential contribution of novel genes or k-mers to MIC values. We hypothesized that the use of the frequencies (occurrence counts) of “important” k-mers to predict MIC values could generate novel gene/k-mer relevance to MIC values. Our analysis of the 15 antibiotics is categorized based on the set of known resistance genes to which they belong, see Table 1, with visual representation of the results, identified kmers and genes, in Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8.

3.2.1. β-Lactams (Ampicillin, Amoxicillin-Clavulanic Acid, Ceftriaxone, Cefoxitin, Ceftiofur)

We observed that seven, seven, seven, seven, and two out of ten crucial 20-mers are, respectively, associated with high ampicillin, amoxicillin-clavulanic acid, ceftriaxone, cefoxitin, and ceftiofur. MICs are prominently located within the Class A and C β-lactamases, as shown in Figure 4a–e. This finding is in alignment with the well-established association of Class A and C β-lactamases with penicillin resistance [40]. Our model identifies other genes encoding protein, such as mobile element protein and lipocalin that have been implicated in antibiotic resistance. While lipocalin has been computationally predicted to play an essential role in antibiotic resistance in Salmonella, as indicated by previous studies [19,41], investigation through both in vitro and in vivo analysis has confirmed that the presence of lipocalin extracted from Burkholderia cenocepacia can indeed induce resistance to quinolone and β-lactam antibiotics [42]. Mobile elements, often called mobile genetic elements, are segments of DNA that can move around in a genome. They can carry genes, including antibiotic resistance genes, and facilitate their spread among bacteria. The capability of these mobile elements to transport resistance genes within Salmonella has been well-established [43]. While the direct roles of other important identified k-mers/genes in β-lactam antibiotic resistance may not be evident, it is worth noting that d-alanyl-d-alanine carboxypeptidase is known to be involved in cell wall synthesis in Streptomyces coelicolo [44]. Furthermore, exonuclease activity associated with DNA repair in Salmonella [45] may contribute to its overall fitness and ability to withstand β-lactam antibiotics. Upon examining the genes predicted to contribute to low values of β-lactam MIC, indicative of susceptibility, our model identified AmpE, a well-known negative regulator of β-lactamase in E. coli [46] and Pseudomonas aeruginosa [47]. These findings underscore the robustness and versatility of our approach in antibiotic susceptibility and/or resistance prediction.
Figure 4. Predicted impact of k-mers on MIC of β-lactams antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) ampicillin, (b) amoxicillin-clavulanic acid, (c) ceftriaxone, (d) cefoxitin, and (e) ceftiofur. The bar graphs show the Salmonella genes that the kmers align to, and the length of the bar based on the importance score.
Figure 4. Predicted impact of k-mers on MIC of β-lactams antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) ampicillin, (b) amoxicillin-clavulanic acid, (c) ceftriaxone, (d) cefoxitin, and (e) ceftiofur. The bar graphs show the Salmonella genes that the kmers align to, and the length of the bar based on the importance score.
Microorganisms 12 00134 g004

3.2.2. Aminoglycosides (Gentamycin, Kanamycin, Streptomycin)

Aminoglycoside phosphotransferases and nucleotidyltransferase [48,49,50] are well-established resistance genes emerging as the top predictors, seven, five, and four out of ten, respectively, for high MIC values in streptomycin, kanamycin, and gentamicin, as shown in Figure 5. This alignment between our predictions and established knowledge underscores the reliability of our model in capturing essential antibiotic resistance mechanisms. While the MIC dependence on heat shock proteins family genes may not be direct, they have been shown to be involved in protein folding and stability, as well as stress response [51,52] and biofilm formation [53]. which could ultimately impact antibiotic resistance.
Figure 5. Predicted impact of k-mers on MIC of aminoglycoside antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) gentamicin, (b) kanamycin, and (c) streptomycin. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Figure 5. Predicted impact of k-mers on MIC of aminoglycoside antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) gentamicin, (b) kanamycin, and (c) streptomycin. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Microorganisms 12 00134 g005

3.2.3. Quinolones (Ciprofloxacin, Nalixidic Acid)

Using RF, mutations in DNA gyrase genes gyrA, parC, and quinolone resistance gene qnrS have been identified as predictors of quinolone resistance in E. coli [54]. The plasmid-mediated quinolone resistance gene B (qnrB) encodes proteins belonging to the pentapeptide repeat family gene [55]. These proteins safeguard DNA gyrase and topoisomerase IV against inhibition by quinolone antibiotics. The pentapeptide protein, associated with the high MIC values for ciprofloxacin and nalixidic acid, as illustrated in Figure 6, is a well-established quinolone resistance determinant [56,57,58]. Furthermore, we observed that the presence of the phage shock protein (PSP) operon, necessary to maintain membrane integrity, contributes to high MIC values of quinolone antibiotics in this study. The upregulation of PSP has been linked to quinolone resistance in E. coli in [59]. In addition, significant upregulation of outer membrane protein genes is associated with resistance to quinolones in Salmonella Typhi [60]. This study identifies outer membrane porin, a type of outer membrane protein, as important for nalixidic (quinolone) resistance (Figure 6).
Figure 6. Predicted impact of k-mers on the MIC of quinolone antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) ciprofloxacin and (b) nalixidic acid. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Figure 6. Predicted impact of k-mers on the MIC of quinolone antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) ciprofloxacin and (b) nalixidic acid. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Microorganisms 12 00134 g006

3.2.4. Sulfonamides (Trimethoprim-Sulfamethoxazole, Sulfisoxazole)

Dihydrofolate reductase (four out of ten important genes) and dihydropteroate synthase type-2 (seven out of ten of the important genes) (Figure 7) are the principal contributors to high MIC in trimethoprim-sulfamethoxazole and sulfisoxazole, respectively [61]. These genes are well known to confer resistance to sulfonamides in Salmonella genomes [62]. Mutations in both dihydrofolate reductase and dihydropteroate synthetase have been demonstrated to elevate Plasmodium falciparum resistance to sulfadoxine-pyrimethamine, a known sulfonamide antibiotic [61]. Furthermore, we observed that tetracycline resistance genes, and transposase, linked to antibiotic resistance [63,64] appear to play a significant secondary role in Salmonella resistance to sulfisoxazole and trimethoprim-sulfamethoxazole in our dataset.
Figure 7. Predicted impact of k-mers on the MIC of sulfonamide antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) Trimethoprim-sulfamethoxazole and (b) sulfisoxazole. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Figure 7. Predicted impact of k-mers on the MIC of sulfonamide antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) Trimethoprim-sulfamethoxazole and (b) sulfisoxazole. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Microorganisms 12 00134 g007

3.2.5. Individual Antibiotic Class (Tetracycline, Chloramphenicol, Azithromycin)

Tetracycline, chloramphenicol, and azithromycin belong to the tetracycline, chloramphenicol, and macrolide class, respectively. In the case of tetracycline (Tet), all identified genes indeed encode essential components, including major facilitator superfamily (MFS) efflux Tet (A) and Tet (B) resistance genes, as well as the tetracycline regulatory gene involved in tetracycline resistance (Figure 8). Similarly for chloramphenicol, our model identity the presence of chloramphenicol resistant genes, as expected. However, despite the success of our model in identifying several established resistance genes in 14 different antibiotics across six classes, it could only identify one resistance gene belonging to tetracycline class in our azithromycin (macrolide) model. This may not be totally surprising as macrolide resistance genes such as erythromycin ribosome methylation (erm) gene are often carried on the extrachromosomal plasmid [65,66].
Figure 8. Predicted impact of k-mers on MIC of individual class of antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) tetracycline, (b) chloramphenicol, and (c) azithromycin. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Figure 8. Predicted impact of k-mers on MIC of individual class of antibiotics. We used DeepLift to predict whether k-mers (20-mers) increase (blue) or decrease (orange) MIC for (a) tetracycline, (b) chloramphenicol, and (c) azithromycin. The bar graphs show the Salmonella genes that the k-mers align to, and the length of the bar based on the importance score.
Microorganisms 12 00134 g008
In summary, we introduce the “Genome Feature Extractor Pipeline” as a novel solution to the challenges posed by employing 10-mers for minimum inhibitory concentration (MIC) prediction. While 10-mers serve well for predicting MIC values, their utility diminishes when used in BLAST searches for pinpointing genomic regions influencing MIC values. Our innovative pipeline addresses this issue by effectively reducing the dimensionality of the massive pool of approximately one million 10-mers. It does so by transitioning these 10-mers into a more manageable set of tens of thousands of 20-mers, specifically tailored to encapsulate the genomic regions that exert a significant influence on MIC values. This approach not only helps us to navigate the computational complexities associated with working with thousands of genomes, but also unveils a clear understanding of the genomic features that drive antibiotic susceptibility and resistance. Moreover, our tool exhibits a remarkable ability to discriminate the specific contributions of essential 20-mers and the genes in which they are embedded. This level of discrimination is instrumental in elucidating the roles these genetic elements play in determining low MIC values, indicative of susceptibility, or high MIC values, indicative of resistance, within a dataset comprising 4500 genomes. Importantly, many of the k-mers and genes predictive of resistance to β-lactam, aminoglycosides, quinolones, sulfonamides, chloramphenicols, and tetracyclines, as identified by our learning model using a combination of random forest, multilayer perceptron, and DeepLift techniques, are consistent with known resistance mechanisms reported in the scientific literature. Finally, our model extends its contribution to the identification of genes encoding various proteins, including lipocalin, heat shock protein, mobile elements, phage shock protein, and several hypothetical proteins. These proteins hold the potential to play a role in conferring antibiotic resistance in Salmonella. However, their actual contribution needs validation through rigorous experimental studies, a scope that extends beyond the focus of this study.

Author Contributions

Conceptualization, M.B.A., A.R.D., B.N., D.R.S. and M.R.; Methodology, M.B.A., A.R.D. and M.R.; software, M.B.A. and A.R.D.; Validation, M.B.A., A.R.D. and B.S.K.; Resources, B.N. and M.R.; Data curation, A.R.D. and M.B.A.; Original draft preparation, M.B.A. and A.R.D.; Review and Editing, ALL; Visualization, B.S.K., M.B.A. and A.R.D.; Supervision, M.R., B.N. and D.R.S.; Funding acquisition, B.N. and M.R. All authors have read and agreed to the published version of the manuscript.

Funding

The dataset used in this study belongs to the National Antimicrobial Resistance Monitoring System (NARMS). Funding and cloud computation access is provided by the Agricultural Research Service, USDA NACA project entitled “Advancing Agricultural Research through High Performance Computing”: #58-0200-0-002.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets from PATRIC database (now known as Bacterial and Viral Bioinformatics Resource Center) were analyzed in this study. These data can be found here: https://www.bv-brc.org/ (accessed on 15 October 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Plumb, I.; Fields, P.; Bruce, B. Salmonellosis, Nontyphoidal. CDC Yellow Book 2024. Available online: https://wwwnc.cdc.gov/travel/yellowbook/2024/infections-diseases/salmonellosis-nontyphoidal (accessed on 10 October 2023).
  2. FDA. Get the Facts about Salmonella. Available online: https://www.fda.gov/animal-veterinary/animal-health-literacy/get-facts-about-salmonella (accessed on 10 October 2023).
  3. FDA. 2019 NARMS Update: Integrated Report Summary. Available online: https://www.fda.gov/animal-veterinary/national-antimicrobial-resistance-monitoring-system/2019-narms-update-integrated-report-summary (accessed on 10 October 2023).
  4. Henao, O.L.; Jones, T.F.; Vugia, D.J.; Griffin, P.M.; Foodborne Diseases Active Surveillance Network (FoodNet) Workgroup. Foodborne Diseases Active Surveillance Network-2 Decades of Achievements, 1996–2015. Emerg. Infect. Dis. 2015, 21, 1529–1536. [Google Scholar] [CrossRef] [PubMed]
  5. Karp, B.E.; Tate, H.; Plumblee, J.R.; Dessai, U.; Whichard, J.M.; Thacker, E.L.; Hale, K.R.; Wilson, W.; Friedman, C.R.; Griffin, P.M.; et al. National Antimicrobial Resistance Monitoring System: Two Decades of Advancing Public Health through Integrated Surveillance of Antimicrobial Resistance. Foodborne Pathog. Dis. 2017, 14, 545–557. [Google Scholar] [CrossRef] [PubMed]
  6. Lin, D.; Chen, K.; Chan, E.W.-C.; Chen, S. Increasing prevalence of ciprofloxacin-resistant food-borne Salmonella strains harboring multiple PMQR elements but not target gene mutations. Sci. Rep. 2015, 5, 14754. [Google Scholar] [CrossRef] [PubMed]
  7. Jahangir Alam, M.; Renter, D.; Taylor, E.; Mina, D.; Moxley, R.; Smith, D. Antimicrobial Susceptibility Profiles of Salmonella enterica Serotypes Recovered from Pens of Commercial Feedlot Cattle Using Different Types of Composite Samples. Curr. Microbiol. 2009, 58, 354–359. [Google Scholar] [CrossRef]
  8. Balbin, M.M.; Hull, D.; Guest, C.; Nichols, L.; Dunn, R.; Thakur, S. Antimicrobial resistance and virulence factors profile of Salmonella spp. and Escherichia coli isolated from different environments exposed to anthropogenic activity. J. Glob. Antimicrob. Resist. 2020, 22, 578–583. [Google Scholar] [CrossRef] [PubMed]
  9. CLSI. M100 Performance Standards for Antimicrobial Susceptibility Testing, 32nd ed.; Clinical Laboratory Standard Institute: Malvern, PA, USA, 2021. [Google Scholar]
  10. Salam, A.; Al-Amin, Y.; Pawar, J.S.; Akhter, N.; Lucy, I.B. Conventional methods and future trends in antimicrobial susceptibility testing. Saudi J. Biol. Sci. 2023, 30, 103582. [Google Scholar] [CrossRef]
  11. Weinstein, M.P.; Lewis, J.S. The Clinical and Laboratory Standards Institute Subcommittee on Antimicrobial Susceptibility Testing: Background, Organization, Functions, and Processes. J. Clin. Microbiol. 2020, 58, e01864-19. [Google Scholar] [CrossRef]
  12. Satlin, M.J.; Lewis, J.S.; Weinstein, M.P.; Patel, J.; Humphries, R.M.; Kahlmeter, G.; Giske, C.G.; Turnidge, J. Clinical and Laboratory Standards Institute and European Committee on Antimicrobial Susceptibility Testing Position Statements on Polymyxin B and Colistin Clinical Breakpoints. Clin. Infect. Dis. 2020, 71, e523–e529. [Google Scholar] [CrossRef]
  13. Bradley, P.; Gordon, N.C.; Walker, T.M.; Dunn, L.; Heys, S.; Huang, B.; Earle, S.; Pankhurst, L.J.; Anson, L.; de Cesare, M.; et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun. 2015, 6, 10063. [Google Scholar] [CrossRef]
  14. Niehaus, K.E.; Walker, T.M.; Crook, D.W.; Peto, T.E.A.; Clifton, D.A. Machine learning for the prediction of antibacterial susceptibility in Mycobacterium tuberculosis. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Valencia, Spain, 1–4 June 2014; pp. 618–621. [Google Scholar] [CrossRef]
  15. Pesesky, M.W.; Hussain, T.; Wallace, M.; Patel, S.; Andleeb, S.; Burnham, C.-A.D.; Dantas, G. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data. Front. Microbiol. 2016, 7, 1887. [Google Scholar] [CrossRef]
  16. Stoesser, N.; Batty, E.M.; Eyre, D.W.; Morgan, M.; Wyllie, D.H.; Elias, C.D.O.; Johnson, J.R.; Walker, A.S.; Peto, T.E.A.; Crook, D.W. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data. J. Antimicrob. Chemother. 2013, 68, 2234–2244. [Google Scholar] [CrossRef] [PubMed]
  17. Coelho, J.R.; Carriço, J.A.; Knight, D.; Martínez, J.-L.; Morrissey, I.; Oggioni, M.R.; Freitas, A.T. The Use of Machine Learning Methodologies to Analyse Antibiotic and Biocide Susceptibility in Staphylococcus aureus. PLoS ONE 2013, 8, e55582. [Google Scholar] [CrossRef]
  18. Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  19. Nguyen, M.; Long, S.W.; McDermott, P.F.; Olsen, R.J.; Olson, R.; Stevens, R.L.; Tyson, G.H.; Zhao, S.; Davis, J.J. Using Machine Learning to Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella. J. Clin. Microbiol. 2019, 57, e01260-18. [Google Scholar] [CrossRef] [PubMed]
  20. Aytan-Aktug, D.; Clausen, P.T.L.C.; Bortolaia, V.; Aarestrup, F.M.; Lund, O. Prediction of Acquired Antimicrobial Resistance for Multiple Bacterial Species Using Neural Networks. mSystems 2020, 5, e00774-19. [Google Scholar] [CrossRef] [PubMed]
  21. Davis, J.J.; Boisvert, S.; Brettin, T.; Kenyon, R.W.; Mao, C.; Olson, R.; Overbeek, R.; Santerre, J.; Shukla, M.; Wattam, A.R.; et al. Antimicrobial Resistance Prediction in PATRIC and RAST. Sci. Rep. 2016, 6, 27930. [Google Scholar] [CrossRef]
  22. Formerly PATRIC Database. Bacterial and Viral Bioinformatics Resource Centre. Available online: https://www.bv-brc.org/ (accessed on 15 October 2023).
  23. Wang, S.; Zhao, C.; Yin, Y.; Chen, F.; Chen, H.; Wang, H. A Practical Approach for Predicting Antimicrobial Phenotype Resistance in Staphylococcus aureus through Machine Learning Analysis of Genome Data. Front. Microbiol. 2022, 13, 841289. [Google Scholar] [CrossRef]
  24. Mahé, P.; Tournoud, M. Predicting bacterial resistance from whole-genome sequences using k-mers and stability selection. BMC Bioinform. 2018, 19, 383. [Google Scholar] [CrossRef]
  25. Kromer-Edwards, C.; Castanheira, M.; Oliveira, S. K-Mer Fingerprinting with RNN to predict MICs for K. pneumoniae. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; pp. 1927–1932. [Google Scholar] [CrossRef]
  26. Menze, B.H.; Kelm, B.M.; Masuch, R.; Himmelreich, U.; Bachert, P.; Petrich, W.; Hamprecht, F.A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinform. 2009, 10, 213. [Google Scholar] [CrossRef]
  27. Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  28. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar] [CrossRef]
  29. Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  30. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013; Volume 103. [Google Scholar] [CrossRef]
  31. Hastie, T.; Tibshirani, R.; Friedman, J. Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009; Volume 27. [Google Scholar] [CrossRef]
  32. Kowalska-Krochmal, B.; Dudek-Wicher, R. The Minimum Inhibitory Concentration of Antibiotics: Methods, Interpretation, Clinical Relevance. Pathogens 2021, 10, 165. [Google Scholar] [CrossRef]
  33. Listorti, V.; Garcia-Vozmediano, A.; Pitti, M.; Maurella, C.; Adriano, D.; Ercolini, C.; Dellepiane, M.; Guardone, L.; Razzuoli, E. Antimicrobial Resistance of Salmonella Strains Isolated from Human, Wild Boar, and Environmental Samples in 2018–2020 in the Northwest of Italy. Pathogens 2022, 11, 1446. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, C.-C.; Hung, Y.-T.; Chou, C.-Y.; Hsuan, S.-L.; Chen, Z.-W.; Chang, P.-Y.; Jan, T.-R.; Tung, C.-W. Using random forest to predict antimicrobial minimum inhibitory concentrations of nontyphoidal Salmonella in Taiwan. Veter. Res. 2023, 54, 11. [Google Scholar] [CrossRef] [PubMed]
  35. Zwe, Y.H.; Chin, S.F.; Kohli, G.S.; Aung, K.T.; Yang, L.; Yuk, H.-G. Whole genome sequencing (WGS) fails to detect antimicrobial resistance (AMR) from heteroresistant subpopulation of Salmonella enterica. Food Microbiol. 2020, 91, 103530. [Google Scholar] [CrossRef]
  36. Barros, C.C. Neural network-based predictions of antimicrobial resistance in Salmonella spp. using k-mers counting from whole-genome sequences. bioRxiv 2021. [Google Scholar] [CrossRef]
  37. McDermott, P.F.; Tyson, G.H.; Kabera, C.; Chen, Y.; Li, C.; Folster, J.P.; Ayers, S.L.; Lam, C.; Tate, H.P.; Zhao, S. Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella. Antimicrob. Agents Chemother. 2016, 60, 5515–5520. [Google Scholar] [CrossRef] [PubMed]
  38. Lipworth, S.; Hough, N.; Leach, L.; Morgan, M.; Jeffery, K.; Andersson, M.; Robinson, E.; Smith, E.G.; Crook, D.; Peto, T.; et al. Whole-Genome Sequencing for Predicting Clarithromycin Resistance in Mycobacterium abscessus. Antimicrob. Agents Chemother. 2019, 63, e01204-18. [Google Scholar] [CrossRef]
  39. Eyre, D.W.; De Silva, D.; Cole, K.; Peters, J.; Cole, M.J.; Grad, Y.H.; Demczuk, W.; Martin, I.; Mulvey, M.R.; Crook, D.W.; et al. WGS to predict antibiotic MICs for Neisseria gonorrhoeae. J. Antimicrob. Chemother. 2017, 72, 1937–1947. [Google Scholar] [CrossRef] [PubMed]
  40. Tooke, C.L.; Hinchliffe, P.; Bragginton, E.C.; Colenso, C.K.; Hirvonen, V.H.A.; Takebayashi, Y.; Spencer, J. β-Lactamases and β-Lactamase Inhibitors in the 21st Century. J. Mol. Biol. 2019, 431, 3472–3500. [Google Scholar] [CrossRef]
  41. Maguire, F.; Rehman, M.A.; Carrillo, C.; Diarra, M.S.; Beiko, R.G. Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning. mSystems 2019, 4, e00211-19. [Google Scholar] [CrossRef]
  42. El-Halfawy, O.M.; Klett, J.; Ingram, R.J.; Loutet, S.A.; Murphy, M.E.P.; Martín-Santamaría, S.; Valvano, M.A. Antibiotic Capture by Bacterial Lipocalins Uncovers an Extracellular Mechanism of Intrinsic Antibiotic Resistance. Mbio 2017, 8, e00225-17. [Google Scholar] [CrossRef]
  43. Johansson, M.H.K.; Bortolaia, V.; Tansirichaiya, S.; Aarestrup, F.M.; Roberts, A.P.; Petersen, T.N. Detection of mobile genetic elements associated with antibiotic resistance in Salmonella enterica using a newly developed web tool: MobileElementFinder. J. Antimicrob. Chemother. 2021, 76, 101–109. [Google Scholar] [CrossRef] [PubMed]
  44. Rioseras, B.; Yagüe, P.; López-García, M.T.; Gonzalez-Quiñonez, N.; Binda, E.; Marinelli, F.; Manteca, A. Characterization of SCO4439, a D-alanyl-D-alanine carboxypeptidase involved in spore cell wall maturation, resistance and germination in Streptomyces coelicolor. Sci. Rep. 2016, 6, 21659. [Google Scholar] [CrossRef] [PubMed]
  45. Wang, Z.; Zhu, S.; Li, C.; Lyu, L.; Yu, J.; Wang, D.; Xu, Z.; Ni, J.; Gao, B.; Lu, J.; et al. Gene essentiality profiling reveals a novel determinant of stresses preventing protein aggregation in Salmonella. Emerg. Microbes Infect. 2022, 11, 1554–1571. [Google Scholar] [CrossRef] [PubMed]
  46. Mallik, D.; Jain, D.; Bhakta, S.; Ghosh, A.S. Role of AmpC-Inducing Genes in Modulating Other Serine Beta-Lactamases in Escherichia coli. Antibiotics 2022, 11, 67. [Google Scholar] [CrossRef] [PubMed]
  47. Juan, C.; Maciá, M.D.; Gutiérrez, O.; Vidal, C.; Pérez, J.L.; Oliver, A. Molecular Mechanisms of β-Lactam Resistance Mediated by AmpC Hyperproduction in Pseudomonas aeruginosa Clinical Strains. Antimicrob. Agents Chemother. 2005, 49, 4733–4738. [Google Scholar] [CrossRef] [PubMed]
  48. Maka, L.; Popowska, M. Antimicrobial resistance of Salmonella spp. isolated from food. Rocz. Panstw. Zakl. Hig. 2016, 67, 343–357. [Google Scholar] [PubMed]
  49. Lu, W.; Li, K.; Huang, J.; Sun, Z.; Li, A.; Liu, H.; Zhou, D.; Lin, H.; Zhang, X.; Li, Q.; et al. Identification and characteristics of a novel aminoglycoside phosphotransferase, APH(3′)-IId, from an MDR clinical isolate of Brucella intermedia. J. Antimicrob. Chemother. 2021, 76, 2787–2794. [Google Scholar] [CrossRef] [PubMed]
  50. Ramirez, M.S.; Tolmasky, M.E. Amikacin: Uses, Resistance, and Prospects for Inhibition. Molecules 2017, 22, 2267. [Google Scholar] [CrossRef]
  51. Wirk, B. Heat shock protein inhibitors for the treatment of fungal infections. Recent Pat. Anti-Infect. Drug Discov. 2011, 6, 38–44. [Google Scholar] [CrossRef]
  52. Nachappa, S.A.; Neelambike, S.M.; Ramachandra, N.B. Differential expression of the Mycobacterium tuberculosis heat shock protein genes in response to drug-induced stress. Tuberculosis 2022, 134, 102201. [Google Scholar] [CrossRef]
  53. Dong, T.; Wang, W.; Xia, M.; Liang, S.; Hu, G.; Ye, H.; Cao, Q.; Dong, Z.; Zhang, C.; Feng, D.; et al. Involvement of the Heat Shock Protein HtpG of Salmonella Typhimurium in Infection and Proliferation in Hosts. Front. Cell. Infect. Microbiol. 2021, 11, 758898. [Google Scholar] [CrossRef] [PubMed]
  54. Pataki, B.; Matamoros, S.; van der Putten, B.C.L.; Remondini, D.; Giampieri, E.; Aytan-Aktug, D.; Hendriksen, R.S.; Lund, O.; Csabai, I.; Schultsz, C.; et al. Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning. Sci. Rep. 2020, 10, 15026. [Google Scholar] [CrossRef] [PubMed]
  55. Jacoby, G.A.; Strahilevitz, J.; Hooper, D.C. Plasmid-Mediated Quinolone Resistance. Microbiol. Spectr. 2014, 2. [Google Scholar] [CrossRef] [PubMed]
  56. Hooper, D.C.; Jacoby, G.A. Mechanisms of drug resistance: Quinolone resistance. Ann. N. Y. Acad. Sci. 2015, 1354, 12–31. [Google Scholar] [CrossRef] [PubMed]
  57. Truong-Bolduc, Q.C.; Hooper, D.C. The Transcriptional Regulators NorG and MgrA Modulate Resistance to both Quinolones and β-Lactams in Staphylococcus aureus. J. Bacteriol. 2007, 189, 2996–3005. [Google Scholar] [CrossRef]
  58. Xiong, X.; Bromley, E.H.C.; Oelschlaeger, P.; Woolfson, D.N.; Spencer, J. Structural insights into quinolone antibiotic resistance mediated by pentapeptide repeat proteins: Conserved surface loops direct the activity of a Qnr protein from a Gram-negative bacterium. Nucleic Acids Res. 2011, 39, 3917–3927. [Google Scholar] [CrossRef] [PubMed]
  59. Yamane, T.; Enokida, H.; Hayami, H.; Kawahara, M.; Nakagawa, M. Genome-wide transcriptome analysis of fluoroquinolone resistance in clinical isolates of Escherichia coli. Int. J. Urol. 2011, 19, 360–368. [Google Scholar] [CrossRef]
  60. Akshay, S.D.; Nayak, S.; Deekshit, V.K.; Rohit, A.; Maiti, B. Differential expression of outer membrane proteins and quinolone resistance determining region mutations can lead to ciprofloxacin resistance in Salmonella Typhi. Arch. Microbiol. 2023, 205, 136. [Google Scholar] [CrossRef]
  61. Carter, T.E.; Warner, M.; Mulligan, C.J.; Existe, A.; Victor, Y.S.; Memnon, G.; Boncy, J.; Oscar, R.; Fukuda, M.M.; Okech, B.A. Evaluation of dihydrofolate reductase and dihydropteroate synthetase genotypes that confer resistance to sulphadoxine-pyrimethamine in Plasmodium falciparum in Haiti. Malar. J. 2012, 11, 275. [Google Scholar] [CrossRef]
  62. Yu, K.; Wang, H.; Cao, Z.; Gai, Y.; Liu, M.; Li, G.; Lu, L.; Luan, X. Antimicrobial resistance analysis and whole-genome sequencing of Salmonella enterica serovar Indiana isolate from ducks. J. Glob. Antimicrob. Resist. 2022, 28, 78–83. [Google Scholar] [CrossRef]
  63. Hu, L.-F.; Xu, X.-H.; Yang, H.-F.; Ye, Y.; Li, J.-B. Role of sul2 Gene Linked to Transposase in Resistance to Trimethoprim/Sulfamethoxazole Among Stenotrophomonas maltophilia Isolates. Ann. Lab. Med. 2016, 36, 73–75. [Google Scholar] [CrossRef] [PubMed]
  64. Wang, M.; Li, Y.; Lin, X.; Xu, H.; Li, Y.; Xue, R.; Wang, G.; Sun, S.; Li, J.; Lan, Z.; et al. Genetic characterization, mechanisms and dissemination risk of antibiotic resistance of multidrug-resistant Rothia nasimurium. Infect. Genet. Evol. 2021, 90, 104770. [Google Scholar] [CrossRef] [PubMed]
  65. Doan, T.; Worden, L.; Hinterwirth, A.; Arzika, A.M.; Maliki, R.; Abdou, A.; Zhong, L.; Chen, C.; Cook, C.; Lebas, E.; et al. Macrolide and Nonmacrolide Resistance with Mass Azithromycin Distribution. N. Engl. J. Med. 2020, 383, 1941–1950. [Google Scholar] [CrossRef] [PubMed]
  66. Reimer, D.; Cowles, K.N.; Proschak, A.; Nollmann, F.I.; Dowling, A.J.; Kaiser, M.; Constant, R.F.; Goodrich-Blair, H.; Bode, H.B. Rhabdopeptides as Insect-Specific Virulence Factors from Entomopathogenic Bacteria. ChemBioChem 2013, 14, 1991–1997. [Google Scholar] [CrossRef]
Figure 1. The four step genome feature extractor pipeline. (1) Downloader: batch-downloads the annotated genome dataset. (2) The 10-mer handle: creates 10-mer count vectors from each sample based on a model that chooses a subset of important 10-mers. (3) The 10-mer expander: generates 20-mers from 10-mers. (4) The 20-mer handle: creates the dataset and performs training and extraction of important 20-mers from the extended dataset.
Figure 1. The four step genome feature extractor pipeline. (1) Downloader: batch-downloads the annotated genome dataset. (2) The 10-mer handle: creates 10-mer count vectors from each sample based on a model that chooses a subset of important 10-mers. (3) The 10-mer expander: generates 20-mers from 10-mers. (4) The 20-mer handle: creates the dataset and performs training and extraction of important 20-mers from the extended dataset.
Microorganisms 12 00134 g001
Figure 2. Filtering 10-mers based on feature importance plot and the elbow point, shown here for Ampicillin.
Figure 2. Filtering 10-mers based on feature importance plot and the elbow point, shown here for Ampicillin.
Microorganisms 12 00134 g002
Figure 3. The 10-mer Expander searches for a 10-mer and extends both ends by 5 nucleotides to generate 20-mers.
Figure 3. The 10-mer Expander searches for a 10-mer and extends both ends by 5 nucleotides to generate 20-mers.
Microorganisms 12 00134 g003
Table 1. Antibiotics and their known biological target, and the associated group of resistance genes.
Table 1. Antibiotics and their known biological target, and the associated group of resistance genes.
AntibioticTargetResistance Genes Group
AmpicillinCell Wallβ-lactam
Amoxicillin-clavulanic acidCell Wallβ-lactam
CeftriaxoneCell Wallβ-lactam
AzithromycinProteinMacrolide
ChloramphenicolProteinPhenicol
CiprofloxacinDNAQuinolone
Trimethoprim-SulfamethoxazoleDNASulfonamide
SulfisoxazoleDNASulfonamide
CefoxitinCell Wallβ-lactam
GentamicinProteinAminoglycoside
KanamycinProteinAminoglycoside
Nalixidic acidDNAQuinolone
StreptomycinProteinAminoglycoside
TetracyclineProteinTetracycline
CeftiofurCell Wallβ-lactam
Table 2. Multilayer perceptron prediction accuracy of MIC for 15 antibiotics.
Table 2. Multilayer perceptron prediction accuracy of MIC for 15 antibiotics.
AntibioticPrediction AccuracyMSE for PredictionMIC Range
1Ampicillin96.890.541–32
2Amoxicillin-clavulanic acid97.440.351–32
3Ceftriaxone97.460.170.25–64
4Azithromycin96.370.281–16
5Chloramphenicol97.440.212–32
6Ciprofloxacin98.200.160.01–2
7Trimethoprim-Sulfamethoxazole98.560.160.12–4
8Sulfisoxazole89.670.4816–2048
9Cefoxitin93.000.331–32
10Gentamicin92.330.640.25–16
11Kanamycin97.50.238–64
12Nalixidic acid95.780.271–64
13Streptomycin94.140.472–64
14Tetracycline99.110.214–32
15Ceftiofur96.670.180.25–8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ayoola, M.B.; Das, A.R.; Krishnan, B.S.; Smith, D.R.; Nanduri, B.; Ramkumar, M. Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility. Microorganisms 2024, 12, 134. https://doi.org/10.3390/microorganisms12010134

AMA Style

Ayoola MB, Das AR, Krishnan BS, Smith DR, Nanduri B, Ramkumar M. Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility. Microorganisms. 2024; 12(1):134. https://doi.org/10.3390/microorganisms12010134

Chicago/Turabian Style

Ayoola, Moses B., Athish Ram Das, B. Santhana Krishnan, David R. Smith, Bindu Nanduri, and Mahalingam Ramkumar. 2024. "Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility" Microorganisms 12, no. 1: 134. https://doi.org/10.3390/microorganisms12010134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop