# A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Sustainable Machine Learning

_{2}emissions. These experiments must be repeated frequently because classification algorithms used in machine learning are not generalized to all data. Accordingly, some researchers have argued for the necessity of a generalization method that would take into account the characteristics of the dataset [10], but so far no solution has been found.

_{2}e, NAS = 626,155 CO

_{2}e) are greater than the carbon emission of one passenger who travels from New York to San Francisco, which is 1984 CO

_{2}e. Many companies and data scientists are generating a large volume of carbon by repeating experiments with AI algorithms for high performance. If there is a guideline for the production of AI algorithms, it may be motivated by the need to develop AI algorithms in a more eco-friendly manner, while reducing the amount of carbon generated in repeated experiments. In particular, regulations on carbon emissions have emerged as international concerns. Currently, carbon emissions from transportation and factories are actively regulated, but with the advent of the fourth industry, AI developers based on machine learning will also be subject to such regulations due to carbon emissions from graphics processing units (GPUs) and tensor processing units (TPUs). Therefore, AI-based companies will have to make efforts to reduce carbon emissions. Companies may gain benefits from their eco-friendly images that are associated with the efforts to reduce carbon emissions, especially for AI-based companies.

## 3. Methods

#### 3.1. Metadata Collection for Finding Classification Algorithm Selection Rules

#### 3.2. Dataset Characteristics Used in the Metadata

#### 3.2.1. Number of Instances

#### 3.2.2. Number of Numeric Variables

#### 3.2.3. Number of Nominal Variables

#### 3.2.4. Number of Missing Values

#### 3.2.5. Herfindahl–Hirschman Index (HHI)

#### 3.2.6. Number of Variables

#### 3.2.7. Number of Classes

#### 3.2.8. Entropy

#### 3.2.9. Silhouette Score

#### 3.2.10. Data Nonlinearity

#### 3.2.11. Hub Score

#### 3.2.12. Feature Overlap

#### 3.2.13. Neighborhood

#### 3.2.14. Dimensionality

#### 3.3. Sampling Methods Used in the Metadata

#### 3.4. Classification Algorithms Used in the Metadata

#### 3.5. Classification Performance Measurement

#### 3.6. Extraction of Classification Algorithm Recommendation Rules According to Data Characteristics

## 4. Validation

#### 4.1. COVID-19 Datasets

#### 4.2. Performance Comparison

#### 4.3. Recommendations

## 5. Discussion

#### 5.1. Contributions

#### 5.2. Limitations

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Zhong, L.; Mu, L.; Li, J.; Wang, J.; Yin, Z.; Liu, D. Early prediction of the 2019 novel coronavirus outbreak in the mainland china based on simple mathematical model. IEEE Access
**2020**, 8, 51761–51769. [Google Scholar] [CrossRef] [PubMed] - Zhang, X.; Ma, R.; Wang, L. Predicting turning point, duration and attack rate of COVID-19 outbreaks in major Western countries. Chaos Solitons Fractals
**2020**, 135, 109829. [Google Scholar] [CrossRef] - Ghosal, S.; Sengupta, S.; Majumder, M.; Sinha, B. Prediction of the number of deaths in India due to SARS-CoV-2 at 5–6 weeks. Diabetes Metab. Syndr. Clin. Res. Rev.
**2020**, 14, 311–315. [Google Scholar] [CrossRef] - Garcia, L.P.; Lorena, A.C.; de Souto, M.C.; Ho, T.K. Classifier recommendation using data complexity measures. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 874–879. [Google Scholar]
- Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for deep learning in NLP. arXiv
**2019**, arXiv:1906.02243. [Google Scholar] - Zhang, Z.; Schott, J.A.; Liu, M.; Chen, H.; Lu, X.; Sumpter, B.G.; Fu, J.; Dai, S. Prediction of carbon dioxide adsorption via deep learning. Angew. Chem.
**2019**, 131, 265–269. [Google Scholar] [CrossRef] - Mardani, A.; Liao, H.; Nilashi, M.; Alrasheedi, M.; Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod.
**2020**, 275, 122942. [Google Scholar] [CrossRef] - Siebert, M.; Krennrich, G.; Seibicke, M.; Siegle, A.F.; Trapp, O. Identifying high-performance catalytic conditions for carbon dioxide reduction to dimethoxymethane by multivariate modelling. Chem. Sci.
**2019**, 10, 10466–10474. [Google Scholar] [CrossRef] [Green Version] - Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green ai. arXiv
**2019**, arXiv:1907.10597. [Google Scholar] - Sun, S.; Shi, H.; Wu, Y. A survey of multi-source domain adaptation. Inf. Fusion
**2015**, 24, 84–92. [Google Scholar] [CrossRef] - Cano, J.R. Analysis of data complexity measures for classification. Expert Syst. Appl.
**2013**, 40, 4820–4831. [Google Scholar] [CrossRef] - Barella, V.H.; Garcia, L.P.; de Souto, M.P.; Lorena, A.C.; de Carvalho, A. Data complexity measures for imbalanced classification tasks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Zhu, B.; Baesens, B.; vanden Broucke, S.K. An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci.
**2017**, 408, 84–99. [Google Scholar] [CrossRef] [Green Version] - Brazdil, P.; Gama, J.; Henery, B. Characterizing the applicability of classification algorithms using meta-level learning. In Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Springer: Berlin/Heidelberg, Germany, 1994; pp. 83–102. [Google Scholar]
- Dogan, N.; Tanrikulu, Z. A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness. Inf. Technol. Manag.
**2013**, 14, 105–124. [Google Scholar] [CrossRef] - Sim, J.; Lee, J.S.; Kwon, O. Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications. Math. Probl. Eng.
**2015**, 2015, 538613. [Google Scholar] [CrossRef] - Matsumoto, A.; Merlone, U.; Szidarovszky, F. Some notes on applying the Herfindahl–Hirschman Index. Appl. Econ. Lett.
**2012**, 19, 181–184. [Google Scholar] [CrossRef] - Lu, C.; Qiao, J.; Chang, J. Herfindahl–Hirschman Index based performance analysis on the convergence development. Clust. Comput.
**2017**, 20, 121–129. [Google Scholar] [CrossRef] - Wu, G.; Chang, E.Y. Aligning boundary in kernel space for learning imbalanced dataset. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), Brighton, UK, 1–4 November 2004; pp. 265–272. [Google Scholar]
- Andrić, K.; Kalpić, D.; Bohaček, Z. An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment. Comput. Sci. Inf. Syst.
**2019**, 16, 155–178. [Google Scholar] [CrossRef] - Nemhauser, G.; Wolsey, L. The scope of integer and combinatorial optimization. In Integer and Combinatorial Optimization; John Wiley & Sons: New York, NY, USA, 1999; pp. 1–26. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res.
**2003**, 3, 1157–1182. [Google Scholar] - Morán-Fernández, L.; Bolón-Canedo, V.; Alonso-Betanzos, A. Can classification performance be predicted by complexity measures? A study using microarray data. Knowl. Inf. Syst.
**2017**, 51, 1067–1090. [Google Scholar] [CrossRef] - Rok, B.; Lusa, L. Improved shrunken centroid classifiers for high-dimensional class-imbalanced data. BMC Bioinform.
**2013**, 14, 64. [Google Scholar] - Prabakaran, S.; Sahu, R.; Verma, S. Classification of multi class dataset using wavelet power spectrum. Data Min. Knowl. Discov.
**2007**, 15, 297–319. [Google Scholar] [CrossRef] - Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell.
**2016**, 5, 221–232. [Google Scholar] [CrossRef] [Green Version] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Brissaud, J.B. The meanings of entropy. Entropy
**2005**, 7, 68–96. [Google Scholar] [CrossRef] [Green Version] - SáEz, J.A.; Luengo, J.; Herrera, F. Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit.
**2013**, 46, 355–364. [Google Scholar] [CrossRef] - Garcia, L.P.; de Carvalho, A.C.; Lorena, A.C. Effect of label noise in the complexity of classification problems. Neurocomputing
**2015**, 160, 108–119. [Google Scholar] [CrossRef] - Lorena, A.C.; Maciel, A.I.; de Miranda, P.B.; Costa, I.G.; Prudêncio, R.B. Data complexity meta-features for regression problems. Mach. Learn.
**2018**, 107, 209–246. [Google Scholar] [CrossRef] [Green Version] - Leyva, E.; González, A.; Perez, R. A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng.
**2014**, 27, 354–367. [Google Scholar] [CrossRef] - Lorena, A.C.; Costa, I.G.; Spolaôr, N.; De Souto, M.C. Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing
**2012**, 75, 33–42. [Google Scholar] [CrossRef] - Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput.
**1997**, 1, 67–82. [Google Scholar] [CrossRef] [Green Version] - L’heureux, A.; Grolinger, K.; Elyamany, H.F.; Capretz, M.A. Machine learning with big data: Challenges and approaches. IEEE Access
**2017**, 5, 7776–7797. [Google Scholar] [CrossRef] - He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng.
**2009**, 21, 1263–1284. [Google Scholar] - Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data
**2019**, 6, 27. [Google Scholar] [CrossRef] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE world congress on computational intelligence. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks, Chemnitz, Germany, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Hart, P. The condensed nearest neighbor rule (Corresp.). IEEE Trans. Inf. Theory
**1968**, 14, 515–516. [Google Scholar] [CrossRef] - Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern.
**1972**, 3, 408–421. [Google Scholar] [CrossRef] [Green Version] - Tomek, I Two modifications of cnn. IEEE Trans. Syst. Man Cybern.
**1976**, 6, 769–772. - Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. In Proceedings of the Conference on Artificial Intelligence in Medicine in Europe, Hong Kong, China, 1–8 June 2008; Springer: Berlin/Heidelberg, Germany, 2001; pp. 63–66. [Google Scholar]
- Hussain, M.; Wajid, S.K.; Elzaart, A.; Berbar, M. A comparison of SVM kernel functions for breast cancer detection. Imaging and Visualization. In Proceedings of the 2011 Eighth International Conference Computer Graphics, Washington, DC, USA, 17–19 August 2011; pp. 145–150. [Google Scholar]
- Wu, X.; Kumar, V.; Ross, J.Q.; Ghosh, J.; Yang, Q.; Motoda, H.; Geoffrey, J.M.; Ng, A.; Liu, B.; Yu, P.S.; et al. Top 10 algorithms in data mining. Knowl. Inf. Syst.
**2008**, 14, 1–37. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**CO

_{2}e and power consumption (modified from Strubell et al., 2019). ELMo, Embeddings from Language Model; BERT, Bidirectional Encoder Representations from Transformers; NAS, Neural Architecture Search.

**Figure 3.**Decision tree for algorithm selection rules. Abbreviations: Coefficient of determination = RQ, mean silhouette score = Silhouette, linearity = Linearity, overlap = Overlap, neighborhood = NN, dimensionality = Dimensionality, network = Net, variable entropy = Variable Entropy, entropy of classes = Entropy of Classes, HHI = HHI, number of missing values = Num of NA, number of numeric variables = Num of Num, number of features = Num of Features, number of classes = Number of Classes.

**Figure 4.**Decision tree for sampling method selection rules. Page: 11. Abbreviations: coefficient of determination = RQ, mean silhouette Score = Silhouette, linearity = Linearity, overlap = Overlap, neighborhood = NN, dimensionality = Dimensionality, network = Net, variable entropy = Variable Entropy, entropy of classes = Entropy of Classes, HHI = HHI, number of missing values = Num of NA, number of numeric variables = Num of Num, number of features = Num of Features, number of classes = Number of Classes.

Data | Number of Classes | Number of Features | Number of Instances | Imbalance Ratio | HHI |
---|---|---|---|---|---|

Wine | 3 | 13 | 178 | 0.676 | 0.342 |

new_thyroid | 3 | 6 | 215 | 0.200 | 0.533 |

hayes_roth | 3 | 5 | 132 | 0.588 | 0.350 |

Cmc | 3 | 10 | 1473 | 0.529 | 0.354 |

Dermatology | 6 | 35 | 366 | 0.179 | 0.201 |

Glass | 6 | 10 | 214 | 0.118 | 0.263 |

balance_scale | 3 | 5 | 625 | 0.170 | 0.431 |

page_block | 5 | 11 | 5473 | 0.006 | 0.810 |

Method | Description |
---|---|

Random Oversampling (ROS) | This is a method of iteratively recovering and extracting data by randomly selecting data until a few classes are equal to the data size of many classes. The random overextraction method has the advantage of being very convenient to use, with almost no loss of information. However, if the sampling rate is unreasonably increased, an overfitting problem may occur because data of a minority class are repeatedly reconstructed and extracted. |

Synthetic Minority Oversampling Technique (SMOTE) | This is a method of selecting random data of a minority class and artificially generating new data between k-nearest neighbors [38]. Unlike ROS, which restores and extracts fractional class data, SMOTE has been proposed to avoid the overfitting problem by generating new data. |

Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) | This is a method of generating data with consideration of the density distribution of a minority class based on SMOTE [39]. The method is similar to the distribution of the original data because it considers the density distribution of the data. |

Random Undersampling (RUS) | This is a sampling method in which the majority class is randomly deleted and its proportion adjusted. RUS has the advantage of being easy to use with large-scale data, which can reduce the cost by reducing the amount of data. However, there is a high possibility of losing important information because the data are arbitrarily reduced. |

Condensed Nearest Neighbors (CNN) | CNN is a method of removing data until there are no data concentrated in a majority class, leaving only representative data in the data distribution. The CNN method leaves data with clear boundaries of different classes [40]. Data are stored one-by-one and a suitable dataset is constructed by removing duplicate data. |

Edited Nearest Neighbors (ENN) | Unlike CNN, if the value included in set X is misclassified, it can be excluded from X [41]. |

Tomek link | Based on the CNN sampling method, Tomek link is a method of removing internal data near the decision boundary. The method has the effect of removing ambiguous data overlapping with other classes [42]. Therefore, it is regarded as an efficient sampling method for removing abnormal data. |

Neighborhood Cleaning Rule (NCL) | NCL is a method that combines condensed nearest neighbors (CNN) and edited nearest neighbors (ENN). It has the effect of clarifying class boundaries by removing data from multiple classes rather than the nearest data, avoiding fractional data [43]. |

Mean Decrease Accuracy | Mean Decrease Gini | |
---|---|---|

Variable Entropy | 41.594 | 33.339 |

Network | 39.014 | 33.990 |

Coefficient of Determination | 38.380 | 27.771 |

Mean Silhouette Score | 36.215 | 26.814 |

Linearity | 34.618 | 25.051 |

Overlap | 33.996 | 25.400 |

Neighborhood | 33.315 | 25.976 |

Number of Instances | 26.958 | 18.619 |

Entropy of Classes | 20.992 | 15.608 |

HHI | 19.424 | 13.738 |

Dimensionality | 15.510 | 8.321 |

Number of Missing Values | 13.542 | 3.068 |

Number of Numeric Variables | 10.892 | 3.782 |

Number of Features | 10.744 | 3.804 |

Number of Classes | 6.075 | 0.964 |

Number of Nominal Features | 0.000 | 0.000 |

Mean Decrease Accuracy | Mean Decrease Gini | |
---|---|---|

Overlap | 41.423 | 34.153 |

Coefficient of Determination | 39.421 | 29.778 |

Network | 38.493 | 31.584 |

Feature Entropy | 38.077 | 32.219 |

Mean Silhouette Score | 37.022 | 31.640 |

Neighborhood | 36.737 | 29.920 |

Linearity | 33.677 | 24.640 |

Number of Instances | 25.214 | 15.447 |

Entropy of Classes | 23.919 | 14.360 |

HHI | 23.122 | 13.595 |

Number of Missing Values | 18.096 | 4.990 |

Dimensionality | 15.644 | 6.733 |

Number of Features | 8.798 | 1.925 |

Number of Numeric Variables | 8.464 | 1.759 |

Number of Classes | 6.160 | 0.532 |

Number of Nominal Features | 0 | 0 |

Data | Number of Classes | Number of Variables | Number of Instances | Imbalance Ratio | HHI |
---|---|---|---|---|---|

Africa | 3 | 5 | 583 | 0.379 | 0.402 |

China | 3 | 18 | 341 | 0.190 | 0.513 |

South Korea | 2 | 29 | 187 | 0.307 | 0.640 |

United States | 3 | 10 | 561 | 0.275 | 0.423 |

Number of Classes | Africa | China | Korea | United States |
---|---|---|---|---|

3 | 3 | 2 | 3 | |

HHI | 0.403 | 0.513 | 0.64 | 0.423 |

Number of Instances | 524.7 | 306.9 | 168.3 | 504.9 |

Number of Variables | 5 | 18 | 29 | 10 |

Number of Nominal Variables | 4 | 17 | 28 | 9 |

Number of Nominal Features | 0 | 0 | 0 | 0 |

Number of Missing Values | 6 | 10 | 16 | 0 |

Coefficient of Determination | 0.013 | 0.199 | 0.684 | 0.233 |

Entropy of Classes | 1.444 | 1.22 | 0.787 | 1.394 |

Entropy of Variables | 9.461 | 8.114 | 7.733 | 8.927 |

Dimensionality | 0.25 | 0.353 | 0.107 | 0.444 |

Linearity | 0.242 | 0.133 | 0.000 | 0.150 |

Network | 0.882 | 0.71 | 0.749 | 0.899 |

Overlap | 0.949 | 0.952 | 0.929 | 0.932 |

Neighborhood | 0.478 | 0.489 | 0.47 | 0.472 |

Mean Silhouette Score | 0.677 | 0.361 | 0.638 | 0.479 |

Method | F-Score | F-Score (s.d.) | Elapsed Time (s) | Elapsed Time (s.d.) (s) |
---|---|---|---|---|

Random | 0.618 | 0.187 | 1.858 | 0.870 |

Ensemble | 0.555 | 0.260 | 5.018 | 1.936 |

Greedy | 0.763 | 0.263 | 9.288 | 4.348 |

Proposed | 0.765 | 0.126 | 0.740 | 0.916 |

**Table 8.**Results of performance comparison between class imbalance resolution method selection methods.

Method | F-Score | F-Score (s.d.) | Elapsed Time (s) | Elapsed Time (s.d.) (s) |
---|---|---|---|---|

Ensemble | 0.679 | 0.172 | 5.627 | 2.704 |

Random | 0.568 | 0.241 | 1.654 | 0.750 |

Greedy | 0.693 | 0.418 | 13.236 | 5.998 |

Proposed | 0.688 | 0.173 | 0.567 | 0.371 |

Data | Performance | k-NN | Logistic Regression (LR) | Naïve Bayes (NB) | Random Forest (RF) | SVM | Ensemble | Recommended Algorithm |
---|---|---|---|---|---|---|---|---|

Africa | F-score | 0.49 | 0.17 | 0.67 | 0.56 | 0.34 | 0.27 | Random Forest |

Elapsed | 0.01 | 0.11 | 0.00 | 0.08 | 0.03 | 0.21 | ||

Elapsed Sum | 0.37 | 3.46 | 0.21 | 4.64 | 1.83 | 6.58 | ||

Total Time | 4.30 | 5.99 | 0.78 | 5.10 | 2.36 | 9.23 | ||

China | F-score | 0.39 | 0.53 | 0.71 | 0.53 | 0.55 | 0.45 | Naïve Bayes |

Elapsed | 0.01 | 0.04 | 0.01 | 0.13 | 0.04 | 0.14 | ||

Elapsed Sum | 0.46 | 1.16 | 0.37 | 7.18 | 2.35 | 4.71 | ||

Total Time | 1.94 | 3.22 | 0.96 | 7.79 | 2.96 | 6.80 | ||

South Korea | F-score | 0.72 | 0.81 | 0.94 | 0.95 | 0.88 | 0.88 | k-NN, Logistic Regression |

Elapsed | 0.01 | 0.01 | 0.01 | 0.04 | 0.02 | 0.08 | ||

Elapsed Sum | 0.38 | 0.45 | 0.59 | 2.11 | 0.87 | 2.40 | ||

Total Time | 1.23 | 1.14 | 0.97 | 2.31 | 1.19 | 3.06 | ||

United States | F-score | 0.54 | 0.56 | 0.73 | 0.67 | 0.62 | 0.62 | Naïve Bayes, Random Forest |

Elapsed | 0.01 | 0.05 | 0.00 | 0.11 | 0.04 | 0.19 | ||

Elapsed Sum | 0.38 | 1.80 | 0.27 | 6.11 | 2.16 | 6.38 | ||

Total Time | 3.29 | 6.05 | 0.86 | 6.59 | 2.74 | 10.38 |

Data | Performance | ADASYN | CNN | ENN | NCL | ROS | RUS | SMOTE | Tomek | Recommended Sampling Method | Suggested Method Elapsed |
---|---|---|---|---|---|---|---|---|---|---|---|

Africa | F-score | 0.15 | 0.25 | 0.57 | 0.04 | 0.55 | 0.34 | 0.53 | 0.34 | ADASYN, SMOTE | 0.02 |

Sampling Time | 0.05 | 0.11 | 0.02 | 0.09 | 0.04 | 0.10 | 0.04 | 0.21 | |||

Sum of Sampling Time | 0.38 | 0.77 | 0.18 | 0.65 | 2.83 | 7.32 | 3.29 | 1.67 | |||

Total Time | 0.61 | 8.62 | 0.31 | 1.25 | 3.14 | 7.42 | 4.18 | 2.23 | |||

China | F-score | 0.28 | 0.45 | 0.49 | 0.06 | 0.62 | 0.51 | 0.62 | 0.41 | ROS | |

Sampling Time | 0.08 | 0.09 | 0.03 | 0.06 | 0.06 | 0.06 | 0.06 | 0.07 | |||

Sum of Sampling Time | 0.60 | 0.72 | 0.23 | 0.49 | 4.82 | 4.62 | 4.25 | 0.50 | |||

Total Time | 0.79 | 5.15 | 0.35 | 1.10 | 5.16 | 4.90 | 5.41 | 0.81 | |||

South Korea | F-score | 0.91 | 0.81 | 0.92 | 0.89 | 0.92 | 0.79 | 0.92 | 0.85 | SMOTE | |

Sampling Time | 0.03 | 0.04 | 0.02 | 0.05 | 0.02 | 0.03 | 0.02 | 0.04 | |||

Sum of Sampling Time | 0.22 | 0.25 | 0.17 | 0.36 | 1.54 | 2.30 | 1.64 | 0.32 | |||

Total Time | 0.29 | 1.71 | 0.27 | 0.45 | 1.77 | 2.55 | 2.14 | 0.72 | |||

United States | F-score | 0.68 | 0.54 | 0.67 | 0.60 | 0.68 | 0.58 | 0.66 | 0.56 | ADASYN, SMOTE, ROS | |

Sampling Time | 0.06 | 0.15 | 0.03 | 0.09 | 0.05 | 0.08 | 0.05 | 0.10 | |||

Sum of Sampling Time | 0.45 | 1.22 | 0.23 | 0.74 | 4.00 | 5.87 | 3.86 | 0.73 | |||

Total Time | 0.69 | 10.80 | 0.39 | 1.71 | 4.22 | 5.98 | 4.89 | 1.23 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, J.; Kwon, O.
A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data. *Sustainability* **2021**, *13*, 3099.
https://doi.org/10.3390/su13063099

**AMA Style**

Kim J, Kwon O.
A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data. *Sustainability*. 2021; 13(6):3099.
https://doi.org/10.3390/su13063099

**Chicago/Turabian Style**

Kim, Jeonghun, and Ohbyung Kwon.
2021. "A Model for Rapid Selection and COVID-19 Prediction with Dynamic and Imbalanced Data" *Sustainability* 13, no. 6: 3099.
https://doi.org/10.3390/su13063099