# Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

## 3. Related Works

## 4. The Datasets

#### 4.1. UNSW-NB15

#### 4.2. UWF-ZeekData22

## 5. Experimental Design

#### 5.1. The Classifier Used: Random Forest

#### 5.2. Preprocessing

#### 5.2.1. Information Gain

- Info(D) is the average amount of information needed to identify the class level of a tuple in the data, D;
- Info
_{A}(D) is the expected information required to classify a tuple from D based on the partitioning by attribute A; - p
_{i}is the nonzero probability that an arbitrary tuple belongs to a class; - |Dj|/|D| is the weight of the jth partition;
- V is the number of distinct values in attribute A.

#### 5.2.2. Preprocessing UNSW-NB15

- ct_flw_http_mthd and is_ftp_login;
- Unique identifiers and time stamps;
- IP addresses.

- The attack categories, NaN, were filled with zeros;
- Categorical data were turned into numeric representation: protocol, state, and attack category;
- A normalization technique was used on continuous data for all numeric variables:

#### 5.2.3. Preprocessing UWF-ZeekData22

- Continuous features, duration, orig_bytes, orig_pkts, orig_ip_bytes, resp_bytes, resp_pkts, resp_ip_bytes, and missed_bytes were binned using a moving mean;
- Nominal features, that is, features that contain non-numeric data, were converted to numbers using the StringIndexer method from MLib [25], Apache Spark’s scalable machine learning library. The nominal features in this dataset were proto, conn_state, local_orig, history, and service;
- The IP address columns were categorized using the commonly recognized network classifications [26];
- Port numbers were binned as per the Internet Assigned Numbers Authority (IANA) [27].

## 6. Hardware and Software Configurations

#### 6.1. Hardware and Software Used in Random Undersampling before Stratified Splitting

#### 6.2. Python Libraries Used in Random Undersampling before Stratified Splitting

#### 6.3. Hardware and Software Used in Random Undersampling after Stratified Splitting

#### 6.4. Python Libraries Used in Random Undersampling after Stratified Splitting

#### 6.5. Stratified Sampling

## 7. Metrics Used for the Assessment of Results

#### 7.1. Classification Metrics Used

+ TN + False Negatives]

False Positives]

- All Real Positives = [True Positives + False Negatives]
- All Real Negatives = [True Negatives + False Positives]

#### 7.2. Welch’s t-Tests

_{1}

^{2}and s

_{2}

^{2}are sample variances, n

_{1}and n

_{2}are sample sizes, and the df $v$ is calculated using Satterwaite approximation.

## 8. Results and Discussion

#### 8.1. Selection of an Oversampling Technique

#### 8.2. Resampling before and after Splitting

#### 8.2.1. Random Undersampling before Stratified Splitting

#### 8.2.2. Random Undersampling after Stratified Splitting

## 9. Conclusions

## 10. Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Zippia, How Many People Use the Internet? Available online: https://www.zippia.com/advice/how-many-people-use-the-internet/ (accessed on 1 March 2023).
- CSO, Up to Three Percent of Internet Traffic is Malicious, Researcher Says. Available online: https://www.csoonline.com/article/2122506/up-to-three-percent-of-internet-traffic-is-malicious--researcher-says.html (accessed on 15 February 2023).
- Bagui, S.; Li, K. Resampling Imbalanced Data for Network Intrusion Detection Datasets. J. Big Data
**2021**, 8, 6. [Google Scholar] [CrossRef] - Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- UWF-ZeekData22 Dataset. Available online: Datasets.uwf.edu (accessed on 1 February 2023).
- Machine Learning Mastery Random Oversampling and Undersampling for Imbalanced Classification. Available online: https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.under_sampling.RandomUnderSampler.html#imblearn.under_sampling.RandomUnderSampler (accessed on 12 December 2022).
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Han, H.; Wang, W.-Y.; Mao, B.-G. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Abdi, L.; Hashemi, S. To Combat Multi-class Imbalanced Problems by Means of Over-sampling Techniques. IEEE
**2016**, 28, 238–251. [Google Scholar] [CrossRef] - Imbalanced-Learn, RandomUnderSampler. Available online: https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.RandomUnderSampler.html (accessed on 5 January 2023).
- Shamsudin, H.; Yusof, U.; Jayalakshmi, A.; Akmal Khalid, M. Combining Oversampling and Undersampling Techniques for Imbalanced Classification: A Comparative Study Using Credit Card Fraudulent Transaction Dataset. In Proceedings of the 2020 IEEE 16th International Conference on Control & Automation, Singapore, 9–11 October 2020. [Google Scholar]
- Barandela, R.; Sánchez, J.S.; García, V.; Rangel, E. Strategies for Learning in Class Imbalance Problems. Pattern Recognit.
**2003**, 36, 849–851. [Google Scholar] [CrossRef] - Vandewiele, G.; Dehaene, I.; Kovács, G.; Sterckx, L.; Janssens, O.; Ongenae, F.; De Backere, F.; De Turck, F.; Roelens, K.; Decruyenaere, J.; et al. Overly Optimistic Prediction Results on Imbalanced Data: Flaws and benefits of Applying Over-sampling. Artif. Intell. Med.
**2020**. preprint. [Google Scholar] [CrossRef] [PubMed] - Bajer, D.; Zonć, B.; Dudjak, M.; Martinović, G. Performance Analysis of SMOTE-based Oversampling Techniques When Dealing with Data Imbalance. In Proceedings of the 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia, 5–7 June 2019; pp. 265–271. [Google Scholar] [CrossRef]
- Bagui, S.; Simonds, J.; Plenkers, R.; Bennett, T.A.; Bagui, S. Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark. Int. J. Big Data Intell. Appl.
**2021**, 2, 39–61. [Google Scholar] [CrossRef] - Koziarski, M. CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Liu, A.Y. The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets. Ph.D. Thesis, The University of Texas at Austin, Austin, TX, USA, 2004. [Google Scholar]
- Estabrooks, A.; Jo, T.; Japkowicz, N. A Multiple Resampling Method for Learning from Imbalanced Data Sets. Comput. Intell.
**2004**, 20, 18–36. [Google Scholar] [CrossRef][Green Version] - Gonzalez-Cuautle, D.; Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, L.K.; Portillo-Portillo, J.; Olivares-Mercado, J.; Perez-Meana, H.M.; Sandoval-Orozco, A.L. Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets. Appl. Sci.
**2020**, 10, 794. [Google Scholar] [CrossRef][Green Version] - Bagui, S.S.; Mink, D.; Bagui, S.C.; Ghosh, T.; Plenkers, R.; McElroy, T.; Dulaney, S.; Shabanali, S. Introducing UWF-ZeekData22: A Comprehensive Network Traffic Dataset Based on the MITRE ATT&CK Framework. Data
**2023**, 8, 18. [Google Scholar] [CrossRef] - Bagui, S.; Mink, D.; Bagui, S.; Ghosh, T.; McElroy, T.; Paredes, E.; Khasnavis, N.; Plenkers, R. Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark’s Machine Learning in the Big Data Framework. Sensors
**2022**, 22, 7999. [Google Scholar] [CrossRef] [PubMed] - Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
- Brieman, L. Random Forests. Mach. Learn.
**2001**, 45, 1. [Google Scholar] - SparkApache StringIndexer. Available online: https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.StringIndexer.html. (accessed on 1 March 2023).
- Understand TCP/IP Addressing and Subnetting Basics. Available online: https://docs.microsoft.com/en-us/troubleshoot/windows-client/networking/tcpip-addressing-and-subnetting (accessed on 1 March 2023).
- Service Name and Transport Protocol Port Number Registry. Available online: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml (accessed on 2 March 2023).
- Scikit Learn 3.3 Metrics and Scoring: Quantifying the Quality of Predictions. Available online: https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score. (accessed on 12 February 2023).
- Powders, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol.
**2011**, 2, 37–63. [Google Scholar] - sklearn.metrics.precision_recall_fscore_support. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html (accessed on 12 February 2023).

**Table 1.**UNSW-NB15: distribution of attack families [4].

Type of Attack | Count | % of Attack Data | % of Benign Data | % of Total Data |
---|---|---|---|---|

Worms | 174 | 0.054 | 0.007 | 0.006 |

Shellcode | 1511 | 0.47 | 0.068 | 0.059 |

Backdoors | 2329 | 0.724 | 0.104 | 0.091 |

Analysis | 2677 | 0.833 | 0.12 | 0.105 |

Reconnaissance | 13,987 | 4.353 | 0.63 | 0.55 |

DoS | 16,353 | 5.089 | 0.737 | 0.643 |

Fuzzers | 24,246 | 7.546 | 1.092 | 0.954 |

Exploits | 44,525 | 13.858 | 2.006 | 1.752 |

Generic | 215,481 | 67.068 | 9.711 | 8.483 |

Total attack data | 321,283 | - | - | - |

Benign data | 2,218,761 | - | - | 87.351 |

Total | 2,540,044 | - | - | - |

Label_Tactic | Count | % of Attack Data | % of Benign Data | % of Total Data |
---|---|---|---|---|

Persistence | 1 | 0.00001 | 0.00001 | 0.000005 |

Initial_access | 1 | 0.00001 | 0.00001 | 0.000005 |

Defense_evasion | 1 | 0.00001 | 0.00001 | 0.000005 |

Resource_development | 3 | 0.00003 | 0.00003 | 0.00001 |

Lateral_movement | 4 | 0.00004 | 0.00004 | 0.00002 |

Exfiltration | 7 | 0.00007 | 0.00007 | 0.00003 |

Privilege_escalation | 13 | 0.00014 | 0.00014 | 0.00007 |

Credential_access | 31 | 0.00033 | 0.00033 | 0.00016 |

Discovery | 2086 | 0.02247 | 0.02247 | 0.01123 |

Reconnaissance | 9,278,722 | 99.97686 | 99.969 | 49.98646 |

Total attack data | 9,280,869 | - | - | - |

Benign_data | 9,281,599 | - | - | 50.00196 |

Total | 18,562,468 | - | - | - |

Random Undersampling before Stratified Splitting | Random Undersampling after Stratified Splitting | |
---|---|---|

Processor | AMD Ryzon 7 5700 | Intel Core i7 1165G7 |

RAM | 32 GB | 16 GB |

OS | Windows 11 Home | Windows 11 Home |

OS Version | 22 H2 | 21 H2 |

OS Build | 22621.819 | 22000.1219 |

GPU | RTX 3060 | NA |

Random Undersampling before Stratified Splitting | Random Undersampling after Stratified Splitting | |
---|---|---|

Python | 3.9 | 3.10.4 |

Anaconda | 2022.1 | 2021.5 |

Pandas | 1.5.2 | 1.5.0 |

Scikit-learn | 1.9.3 | 1.0.2 |

Numpy | 1.23.5 | 1.23.4 |

Imblearn | 0.10.0 | 0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9999 | 0.769 | 0.794 | 0.778 | 0.884 |

SD | 0.056 | 0.063 | 0.034 | 0.028 | ||

0.2 | Avg | 0.99991 | 0.806 | 0.779 | 0.789 | 0.902 |

SD | 0.065 | 0.09 | 0.058 | 0.032 | ||

0.3 | Avg | 0.9999 | 0.799 | 0.717 | 0.752 | 0.899 |

SD | 0.059 | 0.075 | 0.047 | 0.029 | ||

0.4 | Avg | 0.99991 | 0.802 | 0.783 | 0.787 | 0.901 |

SD | 0.068 | 0.076 | 0.043 | 0.034 | ||

0.5 | Avg | 0.999 | 0.836 | 0.871 | 0.852 | 0.918 |

SD | 0.051 | 0.037 | 0.033 | 0.026 | ||

0.6 | Avg | 0.999 | 0.828 | 0.851 | 0.838 | 0.914 |

SD | 0.057 | 0.057 | 0.042 | 0.029 | ||

0.7 | Avg | 0.999 | 0.849 | 0.88 | 0.862 | 0.924 |

SD | 0.053 | 0.051 | 0.026 | 0.027 | ||

0.8 | Avg | 0.999 | 0.847 | 0.851 | 0.847 | 0.924 |

SD | 0.056 | 0.051 | 0.039 | 0.028 | ||

0.9 | Avg | 0.999 | 0.807 | 0.892 | 0.845 | 0.903 |

SD | 0.056 | 0.043 | 0.034 | 0.028 | ||

1.0 | Avg | 0.999 | 0.818 | 0.75 | 0.782 | 0.909 |

SD | 0.069 | 0.051 | 0.055 | 0.034 | ||

Averages | 0.8161 | 0.8168 | 0.8132 | 0.9078 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −1.366 | 0.441 | −0.451 | −1.366 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | −1.167 | 2.489 | 1.423 | −1.167 | 0.1 is better |

0.1 vs. 0.4 | −1.179 | 0.368 | −0.516 | −1.180 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | −2.793 | −3.315 | −4.955 | −2.795 | 0.5 is better than 0.1 |

0.5 vs. 0.6 | 0.326 | 0.949 | 0.863 | 0.326 | 0.5 and 0.6 are statistically equal |

0.5 vs. 0.7 | −0.551 | −0.423 | −0.722 | −0.551 | 0.5 and 0.7 are statistically equal |

0.5 vs. 0.8 | −0.467 | 1.022 | 0.291 | −0.466 | 0.5 and 0.8 are statistically equal |

0.5 vs. 0.9 | 1.213 | −1.130 | 0.442 | 1.213 | 0.5 and 0.9 are statistically equal |

1 vs. 0.5 | 0.649 | 6.075 | 3.451 | 0.651 | 0.5 is better than 1.0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9996 | 0.849 | 0.941 | 0.892 | 0.924 |

SD | 0.013 | 0.014 | 0.012 | 0.006 | ||

0.2 | Avg | 0.9997 | 0.889 | 0.969 | 0.927 | 0.944 |

SD | 0.015 | 0.01 | 0.007 | 0.007 | ||

0.3 | Avg | 0.9997 | 0.896 | 0.962 | 0.927 | 0.948 |

SD | 0.013 | 0.012 | 0.007 | 0.006 | ||

0.4 | Avg | 0.9997 | 0.898 | 0.958 | 0.927 | 0.949 |

SD | 0.008 | 0.008 | 0.007 | 0.004 | ||

0.5 | Avg | 0.9996 | 0.887 | 0.964 | 0.924 | 0.944 |

SD | 0.014 | 0.007 | 0.007 | 0.007 | ||

0.6 | Avg | 0.9996 | 0.876 | 0.964 | 0.918 | 0.938 |

SD | 0.012 | 0.014 | 0.008 | 0.006 | ||

0.7 | Avg | 0.9995 | 0.844 | 0.925 | 0.883 | 0.922 |

SD | 0.012 | 0.013 | 0.009 | 0.006 | ||

0.8 | Avg | 0.9995 | 0.846 | 0.932 | 0.887 | 0.923 |

SD | 0.012 | 0.013 | 0.005 | 0.006 | ||

0.9 | Avg | 0.9995 | 0.849 | 0.929 | 0.887 | 0.924 |

SD | 0.013 | 0.01 | 0.007 | 0.006 | ||

1 | Avg | 0.9996 | 0.855 | 0.939 | 0.895 | 0.928 |

SD | 0.014 | 0.014 | 0.008 | 0.007 | ||

Averages | 0.9996 | 0.8689 | 0.9483 | 0.9067 | 0.9344 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −6.530 | −5.250 | −8.049 | −6.537 | 0.2 is better than 0.1 |

0.2 vs. 0.3 | −1.093 | 1.438 | −0.107 | −1.092 | 0.2 better than 0.3 |

0.2 vs. 0.4 | −1.613 | 2.616 | −0.005 | −1.609 | 0.2 has better recall |

0.4 vs. 0.5 | 2.015 | −1.784 | 0.888 | 2.012 | 0.5 has better recall |

0.4 vs. 0.6 | 4.557 | −1.176 | 2.507 | 4.553 | 0.4 is better than 0.6 |

04. vs. 0.7 | 11.157 | 6.763 | 11.799 | 11.163 | 0.4 is better than 0.7 |

0.4 vs. 0.8 | 11.097 | 5.620 | 13.965 | 11.110 | 0.4 is better than 0.8 |

0.4 vs. 0.9 | 10.104 | 7.201 | 12.712 | 10.113 | 0.4 is better than 0.9 |

0.4 vs. 1 | 8.290 | 3.908 | 9.362 | 8.296 | 0.4 is better than 1.0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9998 | 0.962 | 0.96 | 0.961 | 0.981 |

SD | 0.006 | 0.006 | 0.003 | 0.003 | ||

0.2 | Avg | 0.9998 | 0.969 | 0.964 | 0.966 | 0.984 |

SD | 0.009 | 0.007 | 0.006 | 0.004 | ||

0.3 | Avg | 0.9997 | 0.962 | 0.95 | 0.956 | 0.981 |

SD | 0.005 | 0.01 | 0.005 | 0.003 | ||

0.4 | Avg | 0.9998 | 0.97 | 0.956 | 0.963 | 0.985 |

SD | 0.007 | 0.009 | 0.005 | 0.003 | ||

0.5 | Avg | 0.9998 | 0.97 | 0.951 | 0.961 | 0.985 |

SD | 0.007 | 0.008 | 0.005 | 0.003 | ||

0.6 | Avg | 0.9998 | 0.965 | 0.954 | 0.96 | 0.982 |

SD | 0.007 | 0.009 | 0.005 | 0.003 | ||

0.7 | Avg | 0.9997 | 0.968 | 0.946 | 0.957 | 0.984 |

SD | 0.006 | 0.011 | 0.006 | 0.003 | ||

0.8 | Avg | 0.9998 | 0.966 | 0.95 | 0.958 | 0.983 |

SD | 0.005 | 0.01 | 0.006 | 0.002 | ||

0.9 | Avg | 0.9998 | 0.967 | 0.957 | 0.962 | 0.983 |

SD | 0.008 | 0.006 | 0.004 | 0.004 | ||

1 | Avg | 0.9998 | 0.968 | 0.952 | 0.96 | 0.984 |

SD | 0.009 | 0.01 | 0.008 | 0.004 | ||

Averages | 0.99978 | 0.9667 | 0.954 | 0.9604 | 0.9832 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −1.813 | −1.602 | −2.697 | −1.818 | 0.2 is better than 0.1 across all metrics |

0.2 vs. 0.3 | 1.955 | 3.627 | 4.492 | 1.970 | 0.2 is better than 0.1 across all metrics |

0.2 vs. 0.4 | −0.404 | 2.338 | 1.491 | −0.397 | 0.2 is better than 0.4 in recall and F-score |

0.2 vs. 0.5 | −0.474 | 3.866 | 2.359 | −0.464 | 0.2 is better than 0.5 in recall and F-score |

0.2 vs. 0.6 | 1.070 | 2.621 | 2.861 | 1.079 | 0.2 is better than 0.6 in recall and F-score |

0.2 vs. 0.7 | 0.189 | 4.415 | 3.563 | 0.206 | 0.2 is better than 0.7 in recall and F-score |

0.2 vs. 0.8 | 0.829 | 3.661 | 3.254 | 0.842 | 0.2 is better than 0.8 in recall and F-score |

0.2 vs. 0.9 | 0.524 | 2.378 | 2.136 | 0.531 | 0.2 is better than 0.9 in recall and F-score |

0.2 vs. 1.0 | 0.180 | 3.043 | 2.149 | 0.189 | 0.2 is better than 1.0 in recall and F-score |

**Table 11.**UWF-ZeekData22: credential access—classification results for random undersampling before splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.742 | 0.889 | 0.801 | 0.871 |

SD | 0.134 | 0.165 | 0.126 | 0.067 | ||

0.2 | Avg | 0.999 | 0.885 | 0.911 | 0.891 | 0.942 |

SD | 0.05 | 0.139 | 0.075 | 0.024 | ||

0.3 | Avg | 0.999 | 0.847 | 0.867 | 0.849 | 0.923 |

SD | 0.101 | 0.147 | 0.107 | 0.05 | ||

0.4 | Avg | 0.999 | 0.836 | 0.856 | 0.83 | 0.918 |

SD | 0.106 | 0.165 | 0.107 | 0.053 | ||

0.5 | Avg | 0.999 | 0.936 | 0.911 | 0.919 | 0.968 |

SD | 0.069 | 0.109 | 0.072 | 0.034 | ||

0.6 | Avg | 0.999 | 0.906 | 0.867 | 0.87 | 0.953 |

SD | 0.103 | 0.163 | 0.094 | 0.052 | ||

0.7 | Avg | 0.999 | 0.882 | 0.922 | 0.894 | 0.941 |

SD | 0.067 | 0.122 | 0.06 | 0.033 | ||

0.8 | Avg | 0.999 | 0.826 | 0.911 | 0.853 | 0.913 |

SD | 0.143 | 0.097 | 0.084 | 0.071 | ||

0.9 | Avg | 0.999 | 0.829 | 0.967 | 0.889 | 0.915 |

SD | 0.079 | 0.071 | 0.054 | 0.04 | ||

1 | Avg | 0.999998 | 0.832 | 0.911 | 0.864 | 0.916 |

SD | 0.109 | 0.097 | 0.076 | 0.054 | ||

Averages | 0.9991 | 0.8521 | 0.9012 | 0.866 | 0.926 |

**Table 12.**Welch’s t-test results: UWF-ZeekData22: credential access—random undersampling before splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −3.162 | −0.322 | −1.941 | −3.155 | 0.2 is better than 0.1 in precision, F-score, and macro precision |

0.2 vs. 0.3 | 1.066 | 0.688 | 1.016 | 1.083 | 0.2 and 0.3 are statistically the same |

0.2 vs. 0.4 | 1.322 | 0.806 | 1.476 | 1.304 | 0.2 is better than 0.4 in F-score |

0.2 vs. 0.5 | −1.893 | 0.000 | −0.852 | −1.976 | 0.5 is better than 0.2 in precision and macro precision |

0.5 vs. 0.6 | 0.765 | 0.710 | 1.309 | 0.763 | 0.5 and 0.6 are statistically equal |

0.5 vs. 0.7 | 0.113 | −0.188 | −0.099 | 0.077 | 0.5 and 0.7 are statistically equal |

0.5 vs. 0.8 | 2.191 | 0.000 | 1.886 | 2.209 | 0.5 is better than 0.8 in precision, F-score, and macro precision |

0.5 vs. 0.9 | 3.226 | −1.361 | 1.054 | 3.193 | 0.5 is better than 0.9 in precision and macro precision |

0.5 vs. 1.0 | 1.398 | 0.000 | 0.800 | 1.391 | 0.5 is better than 1.0 in precision and macro precision |

**Table 13.**UWF-ZeekData22: privilege escalation—classification results for random undersampling before splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9999 | 0.902 | 0.775 | 0.81 | 0.951 |

SD | 0.125 | 0.236 | 0.156 | 0.063 | ||

0.2 | Avg | 0.999996 | 0.904 | 0.8 | 0.828 | 0.952 |

SD | 0.138 | 0.232 | 0.165 | 0.069 | ||

0.3 | Avg | 0.999996 | 0.895 | 0.825 | 0.841 | 0.947 |

SD | 0.15 | 0.243 | 0.185 | 0.075 | ||

0.4 | Avg | 0.999996 | 0.9 | 0.794 | 0.824 | 0.95 |

SD | 0.147 | 0.249 | 0.185 | 0.074 | ||

0.5 | Avg | 0.999996 | 0.908 | 0.815 | 0.839 | 0.954 |

SD | 0.139 | 0.244 | 0.177 | 0.069 | ||

0.6 | Avg | 0.999996 | 0.907 | 0.846 | 0.857 | 0.954 |

SD | 0.136 | 0.233 | 0.169 | 0.068 | ||

0.7 | Avg | 0.999996 | 0.905 | 0.843 | 0.853 | 0.952 |

SD | 0.139 | 0.24 | 0.175 | 0.07 | ||

0.8 | Avg | 0.999996 | 0.906 | 0.834 | 0.848 | 0.953 |

SD | 0.139 | 0.247 | 0.182 | 0.069 | ||

0.9 | Avg | 0.999996 | 0.899 | 0.844 | 0.851 | 0.95 |

SD | 0.14 | 0.24 | 0.178 | 0.07 | ||

1 | Avg | 0.999996 | 0.899 | 0.848 | 0.852 | 0.949 |

SD | 0.14 | 0.242 | 0.181 | 0.07 | ||

Averages | 0.9999959 | 0.9025 | 0.8224 | 0.8403 | 0.9512 |

**Table 14.**Welch’s t-test results: UWF-ZeekData22: privilege escalation—random undersampling before splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −0.042 | −0.239 | −0.252 | −0.042 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.108 | −0.467 | −0.411 | 0.108 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 0.034 | −0.173 | −0.178 | 0.034 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | −0.102 | −0.373 | −0.387 | −0.102 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.6 | −0.100 | −0.675 | −0.644 | −0.100 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.7 | −0.056 | −0.638 | −0.587 | −0.056 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | −0.074 | −0.550 | −0.502 | −0.074 | 0.1 and 0.8 are statistically equal |

0.1 vs. 0.9 | 0.037 | −0.652 | −0.555 | 0.037 | 0.1 and 0.9 are statistically equal |

0.2 vs. 1.0 | 0.048 | −0.678 | −0.556 | 0.048 | 0.1 and 1.0 are statistically equal |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.608 | 0.737 | 0.665 | 0.804 |

SD | NA | 0.067 | 0.044 | 0.0495 | 0.034 | |

0.2 | Avg | 0.999 | 0.601 | 0.712 | 0.646 | 0.8 |

SD | NA | 0.108 | 0.089 | 0.083 | 0.054 | |

0.3 | Avg | 0.999 | 0.566 | 0.773 | 0.651 | 0.783 |

SD | NA | 0.066 | 0.059 | 0.051 | 0.033 | |

0.4 | Avg | 0.999 | 0.566 | 0.781 | 0.654 | 0.783 |

SD | NA | 0.035 | 0.063 | 0.026 | 0.018 | |

0.5 | Avg | 0.999 | 0.581 | 0.738 | 0.65 | 0.791 |

SD | NA | 0.078 | 0.082 | 0.079 | 0.039 | |

0.6 | Avg | 0.999 | 0.587 | 0.76 | 0.656 | 0.793 |

SD | NA | 0.097 | 0.044 | 0.061 | 0.049 | |

0.7 | Avg | 0.999 | 0.62 | 0.753 | 0.679 | 0.81 |

SD | NA | 0.053 | 0.046 | 0.041 | 0.026 | |

0.8 | Avg | 0.999 | 0.54 | 0.719 | 0.614 | 0.77 |

SD | NA | 0.081 | 0.036 | 0.041 | 0.018 | |

0.9 | Avg | 0.999 | 0.573 | 0.711 | 0.629 | 0.787 |

SD | NA | 0.117 | 0.097 | 0.089 | 0.058 | |

1 | Avg | 0.999 | 0.601 | 0.75 | 0.666 | 0.801 |

SD | NA | 0.062 | 0.081 | 0.06 | 0.031 | |

Averages | 0.999 | 0.5843 | 0.7434 | 0.651 | 0.7922 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.183 | 0.799 | 0.612 | 0.183 | No significant difference |

0.1 vs. 0.3 | 1.413 | −1.563 | 0.592 | 1.413 | 0.1 is better than 0.3 in precision and macro precision, but 0.3 is better in recall |

0.1 vs. 0.4 | 1.773 | −1.826 | 0.628 | 1.773 | 0.1 is better than 0.4 in precision and macro precision, but 0.3 is better in recall |

0.1 vs. 0.5 | 0.836 | −0.065 | 0.506 | 0.836 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.6 | 0.451 | −0.957 | 0.264 | 0.451 | 0.1 and 0.6 are statistically equal |

0.1 vs. 0.7 | −0.431 | −0.859 | −0.706 | −0.431 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | 2.052 | 0.959 | 2.5 | 2.826 | 0.1 better than 0.8 except for recall, where both of them are statistically equal |

0.1 vs. 0.9 | 0.821 | 0.745 | 1.131 | 0.821 | 0.1 and 0.9 are statically equal |

0.1 vs. 1 | 0.219 | −0.746 | −0.033 | 0.302 | 0.1 and 1 are statically equal |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.698 | 0.906 | 0.788 | 0.849 |

SD | NA | 0.016 | 0.017 | 0.010 | 0.008 | |

0.2 | Avg | 0.999 | 0.694 | 0.907 | 0.786 | 0.847 |

SD | NA | 0.016 | 0.015 | 0.011 | 0.008 | |

0.3 | Avg | 0.999 | 0.691 | 0.911 | 0.786 | 0.846 |

SD | NA | 0.017 | 0.015 | 0.011 | 0.008 | |

0.4 | Avg | 0.999 | 0.686 | 0.905 | 0.780 | 0.843 |

SD | NA | 0.014 | 0.011 | 0.010 | 0.007 | |

0.5 | Avg | 0.999 | 0.678 | 0.906 | 0.776 | 0.839 |

SD | NA | 0.011 | 0.014 | 0.008 | 0.005 | |

0.6 | Avg | 0.999 | 0.699 | 0.908 | 0.790 | 0.849 |

SD | NA | 0.020 | 0.003 | 0.013 | 0.010 | |

0.7 | Avg | 0.999 | 0.699 | 0.908 | 0.790 | 0.849 |

SD | NA | 0.021 | 0.016 | 0.016 | 0.011 | |

0.8 | Avg | 0.999 | 0.692 | 0.911 | 0.787 | 0.846 |

SD | NA | 0.011 | 0.021 | 0.013 | 0.006 | |

0.9 | Avg | 0.999 | 0.688 | 0.901 | 0.780 | 0.844 |

SD | NA | 0.021 | 0.010 | 0.015 | 0.011 | |

1.0 | Avg | 0.999 | 0.688 | 0.888 | 0.775 | 0.844 |

SD | NA | 0.018 | 0.015 | 0.016 | 0.009 | |

Averages | 0.999 | 0.6913 | 0.9051 | 0.7838 | 0.8456 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.566 | −0.276 | 0.386 | 0.566 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.874 | −0.706 | 0.442 | 0.874 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 1.805 | 0.07 | 1.702 | 1.806 | 0.1 better than 0.4 in precision, F-score, and macro precision |

0.1 vs. 0.5 | 3.349 | −0.128 | 3.104 | 3.349 | 0.1 better than 0.5 in precision, F-score, and macro precision |

0.1 vs. 0.6 | −0.084 | −0.489 | −0.284 | −0.084 | 0.1 and 0.6 are statistically equal |

0.1 vs. 0.7 | 0.684 | −0.73 | 0.293 | 0.683 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | 1.62 | 0.52 | 1.48 | 1.62 | 0.1 better than 0.8 in precision, F-score, and macro precision |

0.1 vs. 0.9 | 1.202 | 2.814 | 2.187 | 1.204 | 0.1 is better than 0.9 in recall and F-score |

0.1 vs. 1 | 2.101 | 0.246 | 1.832 | 2.101 | 0.1 is better than 1 in all metrics except recall |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.939 | 0.951 | 0.945 | 0.969 |

SD | NA | 0.01 | 0.005 | 0.004 | 0.005 | |

0.2 | Avg | 0.999 | 0.938 | 0.948 | 0.943 | 0.969 |

SD | NA | 0.013 | 0.008 | 0.006 | 0.006 | |

0.3 | Avg | 0.999 | 0.936 | 0.945 | 0.941 | 0.968 |

SD | NA | 0.009 | 0.011 | 0.007 | 0.004 | |

0.4 | Avg | 0.999 | 0.939 | 0.95 | 0.944 | 0.969 |

SD | NA | 0.01 | 0.008 | 0.007 | 0.005 | |

0.5 | Avg | 0.999 | 0.938 | 0.948 | 0.943 | 0.969 |

SD | NA | 0.01 | 0.007 | 0.006 | 0.005 | |

0.6 | Avg | 0.999 | 0.941 | 0.945 | 0.943 | 0.97 |

SD | NA | 0.006 | 0.011 | 0.005 | 0.003 | |

0.7 | Avg | 0.999 | 0.946 | 0.948 | 0.947 | 0.973 |

SD | NA | 0.007 | 0.003 | 0.003 | 0.003 | |

0.8 | Avg | 0.999 | 0.946 | 0.943 | 0.944 | 0.973 |

SD | NA | 0.012 | 0.009 | 0.01 | 0.006 | |

0.9 | Avg | 0.999 | 0.94 | 0.949 | 0.944 | 0.97 |

SD | NA | 0.007 | 0.011 | 0.003 | 0.003 | |

1 | Avg | 0.999 | 0.944 | 0.943 | 0.943 | 0.972 |

SD | NA | 0.012 | 0.007 | 0.005 | 0.006 | |

Averages | 0.999 | 0.9407 | 0.947 | 0.9437 | 0.9702 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.013 | 0.812 | 0.555 | 0.013 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.493 | 1.461 | 1.579 | 0.495 | 0.1 better than 0.3 in recall and F-score |

0.1 vs. 0.4 | −0.036 | 0.225 | 0.097 | −0.036 | 0.1 and 0.4 statistically equal |

0.1 vs. 0.5 | 0.102 | 0.978 | 0.68 | 0.103 | 0.1 and 0.5 statistically equal |

0.1 vs. 0.6 | −0.676 | 1.296 | 0.657 | −0.675 | 0.1 and 0.6 statistically equal |

0.1 vs. 0.7 | −1.738 | 1.318 | −1.316 | −1.738 | 0.1 and 0.7 statistically equal |

0.1 vs. 0.8 | −1.389 | 2.264 | 0.063 | −1.387 | 0.8 is better than 0.1 in precision and F-score while recall is better in 0.1 |

0.1 vs. 0.9 | 1.336 | −1.37 | −0.011 | 1.334 | 0.8 and 0.9 are statistically equal but 0.9 has better recall |

0.1 vs. 1 | 0.353 | 0 | 0.282 | 0.353 | 0.8 and 1 are statistically equal |

**Table 21.**UWF-ZeekData22: credential access—classification results for random undersampling after splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.822 | 0.878 | 0.833 | 0.911 |

SD | NA | 0.176 | 0.116 | 0.112 | 0.088 | |

0.2 | Avg | 0.999 | 0.732 | 0.900 | 0.788 | 0.867 |

SD | NA | 0.141 | 0.168 | 0.123 | 0.071 | |

0.3 | Avg | 0.999 | 0.799 | 0.911 | 0.847 | 0.899 |

SD | NA | 0.112 | 0.120 | 0.101 | 0.056 | |

0.4 | Avg | 0.999 | 0.744 | 0.911 | 0.804 | 0.872 |

SD | NA | 0.162 | 0.097 | 0.100 | 0.081 | |

0.5 | Avg | 0.999 | 0.770 | 0.944 | 0.835 | 0.885 |

SD | NA | 0.154 | 0.075 | 0.085 | 0.077 | |

0.6 | Avg | 0.999 | 0.696 | 0.944 | 0.793 | 0.848 |

SD | NA | 0.116 | 0.102 | 0.092 | 0.056 | |

0.7 | Avg | 0.999 | 0.713 | 0.922 | 0.793 | 0.857 |

SD | NA | 0.155 | 0.122 | 0.116 | 0.0777 | |

0.8 | Avg | 0.999 | 0.639 | 0.933 | 0.749 | 0.820 |

SD | NA | 0.067 | 0.133 | 0.058 | 0.034 | |

0.9 | Avg | 0.999 | 0.722 | 0.922 | 0.800 | 0.861 |

SD | NA | 0.110 | 0.100 | 0.052 | 0.055 | |

1.0 | Avg | 0.999 | 0.742 | 0.889 | 0.789 | 0.871 |

SD | NA | 0.146 | 0.131 | 0.083 | 0.073 | |

Averages | 0.999 | 0.7379 | 0.9154 | 0.8031 | 0.8691 |

**Table 22.**Welch’s t-test results: UWF-ZeekData22: credential access—random undersampling after splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 1.252 | −0.344 | 0.858 | 1.252 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.348 | −0.632 | −0.286 | 0.348 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 1.023 | −0.697 | 0.611 | 1.023 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | 0.704 | −1.528 | −0.047 | 0.704 | 0.1 and 0.5 are statistically equal, but 0.5 has better recall |

0.1 vs. 0.6 | 1.204 | 0 | 1.06 | 1.204 | 0.5 and 0.6 are statistically equal |

0.1 vs. 0.7 | 0.814 | 0.49 | 0.936 | 0.814 | 0.5 and 0.7 are statistically equal |

0.1 vs. 0.8 | 2.461 | 0.23 | 2.653 | 2.461 | 0.5 is better than 0.8 except for recall |

0.1 vs. 0.9 | 0.786 | 0.563 | 1.112 | 0.786 | 0.5 and 0.9 are statistically equal |

0.1vs. 1 | 0.408 | 1.162 | 1.22 | 0.408 | 0.5 and 1.0 are statistically equal |

**Table 23.**UWF-ZeekData22: privilege escalation—classification results for random undersampling after splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.900 | 0.750 | 0.779 | 0.949 |

SD | NA | 0.213 | 0.273 | 0.226 | 0.106 | |

0.2 | Avg | 0.999 | 0.813 | 0.725 | 0.712 | 0.906 |

SD | NA | 0.203 | 0.343 | 0.260 | 0.101 | |

0.3 | Avg | 0.999 | 0.820 | 0.825 | 0.768 | 0.909 |

SD | NA | 0.130 | 0.275 | 0.147 | 0.065 | |

0.4 | Avg | 0.999 | 0.800 | 0.800 | 0.788 | 0.899 |

SD | NA | 0.322 | 0.331 | 0.316 | 0.161 | |

0.5 | Avg | 0.999 | 0.843 | 0.825 | 0.811 | 0.921 |

SD | NA | 0.253 | 0.275 | 0.238 | 0.126 | |

0.6 | Avg | 0.999 | 0.745 | 0.899 | 0.804 | 0.872 |

SD | NA | 0.091 | 0.135 | 0.068 | 0.045 | |

0.7 | Avg | 0.999 | 0.704 | 0.911 | 0.785 | 0.852 |

SD | NA | 0.121 | 0.147 | 0.107 | 0.060 | |

0.8 | Avg | 0.999 | 0.707 | 0.888 | 0.781 | 0.853 |

SD | NA | 0.094 | 0.157 | 0.099 | 0.047 | |

0.9 | Avg | 0.999 | 0.738 | 0.855 | 0.778 | 0.869 |

SD | NA | 0.138 | 0.131 | 0.087 | 0.069 | |

1.0 | Avg | 0.999 | 0.759 | 0.966 | 0.843 | 0.879 |

SD | NA | 0.131 | 0.071 | 0.087 | 0.065 | |

Averages | 0.999 | 0.7833 | 0.845 | 0.7853 | 0.8915 |

**Table 24.**Welch’s t-test results: UWF-ZeekData22: privilege escalation—random undersampling after splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.929 | 0.179 | 0.611 | 0.929 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 1.012 | −0.611 | 0.118 | 1.012 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 0.817 | −0.367 | −0.08 | 0.817 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | 0.541 | −0.611 | −0.307 | 0.541 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.6 | 2.1 | −1.552 | −0.344 | 2.1 | 0.1 is better than 0.6 except for recall |

0.1 vs. 0.7 | 2.52 | −1.638 | −0.086 | 2.52 | 0.1 is better than 0.7 except for recall |

0.1 vs. 0.8 | 2.606 | −1.391 | −0.027 | 2.606 | 0.1 is better than 0.8 except for recall |

0.1 vs. 0.9 | 1.999 | −1.098 | 0.011 | 1.999 | 0.1 is better than 0.9 in precision and macro precision |

0.1 vs. 1 | 1.77 | −2.421 | −0.834 | 1.77 | 0.1 is better than 1 except for recall |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bagui, S.; Mink, D.; Bagui, S.; Subramaniam, S.; Wallace, D. Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks. *Future Internet* **2023**, *15*, 130.
https://doi.org/10.3390/fi15040130

**AMA Style**

Bagui S, Mink D, Bagui S, Subramaniam S, Wallace D. Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks. *Future Internet*. 2023; 15(4):130.
https://doi.org/10.3390/fi15040130

**Chicago/Turabian Style**

Bagui, Sikha, Dustin Mink, Subhash Bagui, Sakthivel Subramaniam, and Daniel Wallace. 2023. "Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks" *Future Internet* 15, no. 4: 130.
https://doi.org/10.3390/fi15040130