# Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks

## Abstract

## 1. Introduction

## 2. Background

## 3. Related Works

## 4. The Datasets

#### 4.1. UNSW-NB15

#### 4.2. UWF-ZeekData22

## 5. Experimental Design

#### 5.1. The Classifier Used: Random Forest

#### 5.2. Preprocessing

#### 5.2.1. Information Gain

- Info(D) is the average amount of information needed to identify the class level of a tuple in the data, D;
- Info
_{A}(D) is the expected information required to classify a tuple from D based on the partitioning by attribute A; - p
_{i}is the nonzero probability that an arbitrary tuple belongs to a class; - |Dj|/|D| is the weight of the jth partition;
- V is the number of distinct values in attribute A.

#### 5.2.2. Preprocessing UNSW-NB15

- ct_flw_http_mthd and is_ftp_login;
- Unique identifiers and time stamps;
- IP addresses.

- The attack categories, NaN, were filled with zeros;
- Categorical data were turned into numeric representation: protocol, state, and attack category;
- A normalization technique was used on continuous data for all numeric variables:

#### 5.2.3. Preprocessing UWF-ZeekData22

- Continuous features, duration, orig_bytes, orig_pkts, orig_ip_bytes, resp_bytes, resp_pkts, resp_ip_bytes, and missed_bytes were binned using a moving mean;
- Nominal features, that is, features that contain non-numeric data, were converted to numbers using the StringIndexer method from MLib [25], Apache Spark’s scalable machine learning library. The nominal features in this dataset were proto, conn_state, local_orig, history, and service;
- The IP address columns were categorized using the commonly recognized network classifications [26];
- Port numbers were binned as per the Internet Assigned Numbers Authority (IANA) [27].

## 6. Hardware and Software Configurations

#### 6.1. Hardware and Software Used in Random Undersampling before Stratified Splitting

#### 6.2. Python Libraries Used in Random Undersampling before Stratified Splitting

#### 6.3. Hardware and Software Used in Random Undersampling after Stratified Splitting

#### 6.4. Python Libraries Used in Random Undersampling after Stratified Splitting

#### 6.5. Stratified Sampling

## 7. Metrics Used for the Assessment of Results

#### 7.1. Classification Metrics Used

+ TN + False Negatives]

False Positives]

- All Real Positives = [True Positives + False Negatives]
- All Real Negatives = [True Negatives + False Positives]

#### 7.2. Welch’s t-Tests

_{1}

^{2}and s

_{2}

^{2}are sample variances, n

_{1}and n

_{2}are sample sizes, and the df $v$ is calculated using Satterwaite approximation.

## 8. Results and Discussion

#### 8.1. Selection of an Oversampling Technique

#### 8.2. Resampling before and after Splitting

#### 8.2.1. Random Undersampling before Stratified Splitting

#### 8.2.2. Random Undersampling after Stratified Splitting

## 9. Conclusions

## 10. Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

**Table 1.**UNSW-NB15: distribution of attack families [4].

Type of Attack | Count | % of Attack Data | % of Benign Data | % of Total Data |
---|---|---|---|---|

Worms | 174 | 0.054 | 0.007 | 0.006 |

Shellcode | 1511 | 0.47 | 0.068 | 0.059 |

Backdoors | 2329 | 0.724 | 0.104 | 0.091 |

Analysis | 2677 | 0.833 | 0.12 | 0.105 |

Reconnaissance | 13,987 | 4.353 | 0.63 | 0.55 |

DoS | 16,353 | 5.089 | 0.737 | 0.643 |

Fuzzers | 24,246 | 7.546 | 1.092 | 0.954 |

Exploits | 44,525 | 13.858 | 2.006 | 1.752 |

Generic | 215,481 | 67.068 | 9.711 | 8.483 |

Total attack data | 321,283 | - | - | - |

Benign data | 2,218,761 | - | - | 87.351 |

Total | 2,540,044 | - | - | - |

Label_Tactic | Count | % of Attack Data | % of Benign Data | % of Total Data |
---|---|---|---|---|

Persistence | 1 | 0.00001 | 0.00001 | 0.000005 |

Initial_access | 1 | 0.00001 | 0.00001 | 0.000005 |

Defense_evasion | 1 | 0.00001 | 0.00001 | 0.000005 |

Resource_development | 3 | 0.00003 | 0.00003 | 0.00001 |

Lateral_movement | 4 | 0.00004 | 0.00004 | 0.00002 |

Exfiltration | 7 | 0.00007 | 0.00007 | 0.00003 |

Privilege_escalation | 13 | 0.00014 | 0.00014 | 0.00007 |

Credential_access | 31 | 0.00033 | 0.00033 | 0.00016 |

Discovery | 2086 | 0.02247 | 0.02247 | 0.01123 |

Reconnaissance | 9,278,722 | 99.97686 | 99.969 | 49.98646 |

Total attack data | 9,280,869 | - | - | - |

Benign_data | 9,281,599 | - | - | 50.00196 |

Total | 18,562,468 | - | - | - |

Random Undersampling before Stratified Splitting | Random Undersampling after Stratified Splitting | |
---|---|---|

Processor | AMD Ryzon 7 5700 | Intel Core i7 1165G7 |

RAM | 32 GB | 16 GB |

OS | Windows 11 Home | Windows 11 Home |

OS Version | 22 H2 | 21 H2 |

OS Build | 22621.819 | 22000.1219 |

GPU | RTX 3060 | NA |

Random Undersampling before Stratified Splitting | Random Undersampling after Stratified Splitting | |
---|---|---|

Python | 3.9 | 3.10.4 |

Anaconda | 2022.1 | 2021.5 |

Pandas | 1.5.2 | 1.5.0 |

Scikit-learn | 1.9.3 | 1.0.2 |

Numpy | 1.23.5 | 1.23.4 |

Imblearn | 0.10.0 | 0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9999 | 0.769 | 0.794 | 0.778 | 0.884 |

SD | 0.056 | 0.063 | 0.034 | 0.028 | ||

0.2 | Avg | 0.99991 | 0.806 | 0.779 | 0.789 | 0.902 |

SD | 0.065 | 0.09 | 0.058 | 0.032 | ||

0.3 | Avg | 0.9999 | 0.799 | 0.717 | 0.752 | 0.899 |

SD | 0.059 | 0.075 | 0.047 | 0.029 | ||

0.4 | Avg | 0.99991 | 0.802 | 0.783 | 0.787 | 0.901 |

SD | 0.068 | 0.076 | 0.043 | 0.034 | ||

0.5 | Avg | 0.999 | 0.836 | 0.871 | 0.852 | 0.918 |

SD | 0.051 | 0.037 | 0.033 | 0.026 | ||

0.6 | Avg | 0.999 | 0.828 | 0.851 | 0.838 | 0.914 |

SD | 0.057 | 0.057 | 0.042 | 0.029 | ||

0.7 | Avg | 0.999 | 0.849 | 0.88 | 0.862 | 0.924 |

SD | 0.053 | 0.051 | 0.026 | 0.027 | ||

0.8 | Avg | 0.999 | 0.847 | 0.851 | 0.847 | 0.924 |

SD | 0.056 | 0.051 | 0.039 | 0.028 | ||

0.9 | Avg | 0.999 | 0.807 | 0.892 | 0.845 | 0.903 |

SD | 0.056 | 0.043 | 0.034 | 0.028 | ||

1.0 | Avg | 0.999 | 0.818 | 0.75 | 0.782 | 0.909 |

SD | 0.069 | 0.051 | 0.055 | 0.034 | ||

Averages | 0.8161 | 0.8168 | 0.8132 | 0.9078 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −1.366 | 0.441 | −0.451 | −1.366 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | −1.167 | 2.489 | 1.423 | −1.167 | 0.1 is better |

0.1 vs. 0.4 | −1.179 | 0.368 | −0.516 | −1.180 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | −2.793 | −3.315 | −4.955 | −2.795 | 0.5 is better than 0.1 |

0.5 vs. 0.6 | 0.326 | 0.949 | 0.863 | 0.326 | 0.5 and 0.6 are statistically equal |

0.5 vs. 0.7 | −0.551 | −0.423 | −0.722 | −0.551 | 0.5 and 0.7 are statistically equal |

0.5 vs. 0.8 | −0.467 | 1.022 | 0.291 | −0.466 | 0.5 and 0.8 are statistically equal |

0.5 vs. 0.9 | 1.213 | −1.130 | 0.442 | 1.213 | 0.5 and 0.9 are statistically equal |

1 vs. 0.5 | 0.649 | 6.075 | 3.451 | 0.651 | 0.5 is better than 1.0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9996 | 0.849 | 0.941 | 0.892 | 0.924 |

SD | 0.013 | 0.014 | 0.012 | 0.006 | ||

0.2 | Avg | 0.9997 | 0.889 | 0.969 | 0.927 | 0.944 |

SD | 0.015 | 0.01 | 0.007 | 0.007 | ||

0.3 | Avg | 0.9997 | 0.896 | 0.962 | 0.927 | 0.948 |

SD | 0.013 | 0.012 | 0.007 | 0.006 | ||

0.4 | Avg | 0.9997 | 0.898 | 0.958 | 0.927 | 0.949 |

SD | 0.008 | 0.008 | 0.007 | 0.004 | ||

0.5 | Avg | 0.9996 | 0.887 | 0.964 | 0.924 | 0.944 |

SD | 0.014 | 0.007 | 0.007 | 0.007 | ||

0.6 | Avg | 0.9996 | 0.876 | 0.964 | 0.918 | 0.938 |

SD | 0.012 | 0.014 | 0.008 | 0.006 | ||

0.7 | Avg | 0.9995 | 0.844 | 0.925 | 0.883 | 0.922 |

SD | 0.012 | 0.013 | 0.009 | 0.006 | ||

0.8 | Avg | 0.9995 | 0.846 | 0.932 | 0.887 | 0.923 |

SD | 0.012 | 0.013 | 0.005 | 0.006 | ||

0.9 | Avg | 0.9995 | 0.849 | 0.929 | 0.887 | 0.924 |

SD | 0.013 | 0.01 | 0.007 | 0.006 | ||

1 | Avg | 0.9996 | 0.855 | 0.939 | 0.895 | 0.928 |

SD | 0.014 | 0.014 | 0.008 | 0.007 | ||

Averages | 0.9996 | 0.8689 | 0.9483 | 0.9067 | 0.9344 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −6.530 | −5.250 | −8.049 | −6.537 | 0.2 is better than 0.1 |

0.2 vs. 0.3 | −1.093 | 1.438 | −0.107 | −1.092 | 0.2 better than 0.3 |

0.2 vs. 0.4 | −1.613 | 2.616 | −0.005 | −1.609 | 0.2 has better recall |

0.4 vs. 0.5 | 2.015 | −1.784 | 0.888 | 2.012 | 0.5 has better recall |

0.4 vs. 0.6 | 4.557 | −1.176 | 2.507 | 4.553 | 0.4 is better than 0.6 |

04. vs. 0.7 | 11.157 | 6.763 | 11.799 | 11.163 | 0.4 is better than 0.7 |

0.4 vs. 0.8 | 11.097 | 5.620 | 13.965 | 11.110 | 0.4 is better than 0.8 |

0.4 vs. 0.9 | 10.104 | 7.201 | 12.712 | 10.113 | 0.4 is better than 0.9 |

0.4 vs. 1 | 8.290 | 3.908 | 9.362 | 8.296 | 0.4 is better than 1.0 |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9998 | 0.962 | 0.96 | 0.961 | 0.981 |

SD | 0.006 | 0.006 | 0.003 | 0.003 | ||

0.2 | Avg | 0.9998 | 0.969 | 0.964 | 0.966 | 0.984 |

SD | 0.009 | 0.007 | 0.006 | 0.004 | ||

0.3 | Avg | 0.9997 | 0.962 | 0.95 | 0.956 | 0.981 |

SD | 0.005 | 0.01 | 0.005 | 0.003 | ||

0.4 | Avg | 0.9998 | 0.97 | 0.956 | 0.963 | 0.985 |

SD | 0.007 | 0.009 | 0.005 | 0.003 | ||

0.5 | Avg | 0.9998 | 0.97 | 0.951 | 0.961 | 0.985 |

SD | 0.007 | 0.008 | 0.005 | 0.003 | ||

0.6 | Avg | 0.9998 | 0.965 | 0.954 | 0.96 | 0.982 |

SD | 0.007 | 0.009 | 0.005 | 0.003 | ||

0.7 | Avg | 0.9997 | 0.968 | 0.946 | 0.957 | 0.984 |

SD | 0.006 | 0.011 | 0.006 | 0.003 | ||

0.8 | Avg | 0.9998 | 0.966 | 0.95 | 0.958 | 0.983 |

SD | 0.005 | 0.01 | 0.006 | 0.002 | ||

0.9 | Avg | 0.9998 | 0.967 | 0.957 | 0.962 | 0.983 |

SD | 0.008 | 0.006 | 0.004 | 0.004 | ||

1 | Avg | 0.9998 | 0.968 | 0.952 | 0.96 | 0.984 |

SD | 0.009 | 0.01 | 0.008 | 0.004 | ||

Averages | 0.99978 | 0.9667 | 0.954 | 0.9604 | 0.9832 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −1.813 | −1.602 | −2.697 | −1.818 | 0.2 is better than 0.1 across all metrics |

0.2 vs. 0.3 | 1.955 | 3.627 | 4.492 | 1.970 | 0.2 is better than 0.1 across all metrics |

0.2 vs. 0.4 | −0.404 | 2.338 | 1.491 | −0.397 | 0.2 is better than 0.4 in recall and F-score |

0.2 vs. 0.5 | −0.474 | 3.866 | 2.359 | −0.464 | 0.2 is better than 0.5 in recall and F-score |

0.2 vs. 0.6 | 1.070 | 2.621 | 2.861 | 1.079 | 0.2 is better than 0.6 in recall and F-score |

0.2 vs. 0.7 | 0.189 | 4.415 | 3.563 | 0.206 | 0.2 is better than 0.7 in recall and F-score |

0.2 vs. 0.8 | 0.829 | 3.661 | 3.254 | 0.842 | 0.2 is better than 0.8 in recall and F-score |

0.2 vs. 0.9 | 0.524 | 2.378 | 2.136 | 0.531 | 0.2 is better than 0.9 in recall and F-score |

0.2 vs. 1.0 | 0.180 | 3.043 | 2.149 | 0.189 | 0.2 is better than 1.0 in recall and F-score |

**Table 11.**UWF-ZeekData22: credential access—classification results for random undersampling before splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.742 | 0.889 | 0.801 | 0.871 |

SD | 0.134 | 0.165 | 0.126 | 0.067 | ||

0.2 | Avg | 0.999 | 0.885 | 0.911 | 0.891 | 0.942 |

SD | 0.05 | 0.139 | 0.075 | 0.024 | ||

0.3 | Avg | 0.999 | 0.847 | 0.867 | 0.849 | 0.923 |

SD | 0.101 | 0.147 | 0.107 | 0.05 | ||

0.4 | Avg | 0.999 | 0.836 | 0.856 | 0.83 | 0.918 |

SD | 0.106 | 0.165 | 0.107 | 0.053 | ||

0.5 | Avg | 0.999 | 0.936 | 0.911 | 0.919 | 0.968 |

SD | 0.069 | 0.109 | 0.072 | 0.034 | ||

0.6 | Avg | 0.999 | 0.906 | 0.867 | 0.87 | 0.953 |

SD | 0.103 | 0.163 | 0.094 | 0.052 | ||

0.7 | Avg | 0.999 | 0.882 | 0.922 | 0.894 | 0.941 |

SD | 0.067 | 0.122 | 0.06 | 0.033 | ||

0.8 | Avg | 0.999 | 0.826 | 0.911 | 0.853 | 0.913 |

SD | 0.143 | 0.097 | 0.084 | 0.071 | ||

0.9 | Avg | 0.999 | 0.829 | 0.967 | 0.889 | 0.915 |

SD | 0.079 | 0.071 | 0.054 | 0.04 | ||

1 | Avg | 0.999998 | 0.832 | 0.911 | 0.864 | 0.916 |

SD | 0.109 | 0.097 | 0.076 | 0.054 | ||

Averages | 0.9991 | 0.8521 | 0.9012 | 0.866 | 0.926 |

**Table 12.**Welch’s t-test results: UWF-ZeekData22: credential access—random undersampling before splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −3.162 | −0.322 | −1.941 | −3.155 | 0.2 is better than 0.1 in precision, F-score, and macro precision |

0.2 vs. 0.3 | 1.066 | 0.688 | 1.016 | 1.083 | 0.2 and 0.3 are statistically the same |

0.2 vs. 0.4 | 1.322 | 0.806 | 1.476 | 1.304 | 0.2 is better than 0.4 in F-score |

0.2 vs. 0.5 | −1.893 | 0.000 | −0.852 | −1.976 | 0.5 is better than 0.2 in precision and macro precision |

0.5 vs. 0.6 | 0.765 | 0.710 | 1.309 | 0.763 | 0.5 and 0.6 are statistically equal |

0.5 vs. 0.7 | 0.113 | −0.188 | −0.099 | 0.077 | 0.5 and 0.7 are statistically equal |

0.5 vs. 0.8 | 2.191 | 0.000 | 1.886 | 2.209 | 0.5 is better than 0.8 in precision, F-score, and macro precision |

0.5 vs. 0.9 | 3.226 | −1.361 | 1.054 | 3.193 | 0.5 is better than 0.9 in precision and macro precision |

0.5 vs. 1.0 | 1.398 | 0.000 | 0.800 | 1.391 | 0.5 is better than 1.0 in precision and macro precision |

**Table 13.**UWF-ZeekData22: privilege escalation—classification results for random undersampling before splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.9999 | 0.902 | 0.775 | 0.81 | 0.951 |

SD | 0.125 | 0.236 | 0.156 | 0.063 | ||

0.2 | Avg | 0.999996 | 0.904 | 0.8 | 0.828 | 0.952 |

SD | 0.138 | 0.232 | 0.165 | 0.069 | ||

0.3 | Avg | 0.999996 | 0.895 | 0.825 | 0.841 | 0.947 |

SD | 0.15 | 0.243 | 0.185 | 0.075 | ||

0.4 | Avg | 0.999996 | 0.9 | 0.794 | 0.824 | 0.95 |

SD | 0.147 | 0.249 | 0.185 | 0.074 | ||

0.5 | Avg | 0.999996 | 0.908 | 0.815 | 0.839 | 0.954 |

SD | 0.139 | 0.244 | 0.177 | 0.069 | ||

0.6 | Avg | 0.999996 | 0.907 | 0.846 | 0.857 | 0.954 |

SD | 0.136 | 0.233 | 0.169 | 0.068 | ||

0.7 | Avg | 0.999996 | 0.905 | 0.843 | 0.853 | 0.952 |

SD | 0.139 | 0.24 | 0.175 | 0.07 | ||

0.8 | Avg | 0.999996 | 0.906 | 0.834 | 0.848 | 0.953 |

SD | 0.139 | 0.247 | 0.182 | 0.069 | ||

0.9 | Avg | 0.999996 | 0.899 | 0.844 | 0.851 | 0.95 |

SD | 0.14 | 0.24 | 0.178 | 0.07 | ||

1 | Avg | 0.999996 | 0.899 | 0.848 | 0.852 | 0.949 |

SD | 0.14 | 0.242 | 0.181 | 0.07 | ||

Averages | 0.9999959 | 0.9025 | 0.8224 | 0.8403 | 0.9512 |

**Table 14.**Welch’s t-test results: UWF-ZeekData22: privilege escalation—random undersampling before splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | −0.042 | −0.239 | −0.252 | −0.042 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.108 | −0.467 | −0.411 | 0.108 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 0.034 | −0.173 | −0.178 | 0.034 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | −0.102 | −0.373 | −0.387 | −0.102 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.6 | −0.100 | −0.675 | −0.644 | −0.100 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.7 | −0.056 | −0.638 | −0.587 | −0.056 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | −0.074 | −0.550 | −0.502 | −0.074 | 0.1 and 0.8 are statistically equal |

0.1 vs. 0.9 | 0.037 | −0.652 | −0.555 | 0.037 | 0.1 and 0.9 are statistically equal |

0.2 vs. 1.0 | 0.048 | −0.678 | −0.556 | 0.048 | 0.1 and 1.0 are statistically equal |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.608 | 0.737 | 0.665 | 0.804 |

SD | NA | 0.067 | 0.044 | 0.0495 | 0.034 | |

0.2 | Avg | 0.999 | 0.601 | 0.712 | 0.646 | 0.8 |

SD | NA | 0.108 | 0.089 | 0.083 | 0.054 | |

0.3 | Avg | 0.999 | 0.566 | 0.773 | 0.651 | 0.783 |

SD | NA | 0.066 | 0.059 | 0.051 | 0.033 | |

0.4 | Avg | 0.999 | 0.566 | 0.781 | 0.654 | 0.783 |

SD | NA | 0.035 | 0.063 | 0.026 | 0.018 | |

0.5 | Avg | 0.999 | 0.581 | 0.738 | 0.65 | 0.791 |

SD | NA | 0.078 | 0.082 | 0.079 | 0.039 | |

0.6 | Avg | 0.999 | 0.587 | 0.76 | 0.656 | 0.793 |

SD | NA | 0.097 | 0.044 | 0.061 | 0.049 | |

0.7 | Avg | 0.999 | 0.62 | 0.753 | 0.679 | 0.81 |

SD | NA | 0.053 | 0.046 | 0.041 | 0.026 | |

0.8 | Avg | 0.999 | 0.54 | 0.719 | 0.614 | 0.77 |

SD | NA | 0.081 | 0.036 | 0.041 | 0.018 | |

0.9 | Avg | 0.999 | 0.573 | 0.711 | 0.629 | 0.787 |

SD | NA | 0.117 | 0.097 | 0.089 | 0.058 | |

1 | Avg | 0.999 | 0.601 | 0.75 | 0.666 | 0.801 |

SD | NA | 0.062 | 0.081 | 0.06 | 0.031 | |

Averages | 0.999 | 0.5843 | 0.7434 | 0.651 | 0.7922 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.183 | 0.799 | 0.612 | 0.183 | No significant difference |

0.1 vs. 0.3 | 1.413 | −1.563 | 0.592 | 1.413 | 0.1 is better than 0.3 in precision and macro precision, but 0.3 is better in recall |

0.1 vs. 0.4 | 1.773 | −1.826 | 0.628 | 1.773 | 0.1 is better than 0.4 in precision and macro precision, but 0.3 is better in recall |

0.1 vs. 0.5 | 0.836 | −0.065 | 0.506 | 0.836 | 0.1 and 0.5 are statistically equal |

0.1 vs. 0.6 | 0.451 | −0.957 | 0.264 | 0.451 | 0.1 and 0.6 are statistically equal |

0.1 vs. 0.7 | −0.431 | −0.859 | −0.706 | −0.431 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | 2.052 | 0.959 | 2.5 | 2.826 | 0.1 better than 0.8 except for recall, where both of them are statistically equal |

0.1 vs. 0.9 | 0.821 | 0.745 | 1.131 | 0.821 | 0.1 and 0.9 are statically equal |

0.1 vs. 1 | 0.219 | −0.746 | −0.033 | 0.302 | 0.1 and 1 are statically equal |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.698 | 0.906 | 0.788 | 0.849 |

SD | NA | 0.016 | 0.017 | 0.010 | 0.008 | |

0.2 | Avg | 0.999 | 0.694 | 0.907 | 0.786 | 0.847 |

SD | NA | 0.016 | 0.015 | 0.011 | 0.008 | |

0.3 | Avg | 0.999 | 0.691 | 0.911 | 0.786 | 0.846 |

SD | NA | 0.017 | 0.015 | 0.011 | 0.008 | |

0.4 | Avg | 0.999 | 0.686 | 0.905 | 0.780 | 0.843 |

SD | NA | 0.014 | 0.011 | 0.010 | 0.007 | |

0.5 | Avg | 0.999 | 0.678 | 0.906 | 0.776 | 0.839 |

SD | NA | 0.011 | 0.014 | 0.008 | 0.005 | |

0.6 | Avg | 0.999 | 0.699 | 0.908 | 0.790 | 0.849 |

SD | NA | 0.020 | 0.003 | 0.013 | 0.010 | |

0.7 | Avg | 0.999 | 0.699 | 0.908 | 0.790 | 0.849 |

SD | NA | 0.021 | 0.016 | 0.016 | 0.011 | |

0.8 | Avg | 0.999 | 0.692 | 0.911 | 0.787 | 0.846 |

SD | NA | 0.011 | 0.021 | 0.013 | 0.006 | |

0.9 | Avg | 0.999 | 0.688 | 0.901 | 0.780 | 0.844 |

SD | NA | 0.021 | 0.010 | 0.015 | 0.011 | |

1.0 | Avg | 0.999 | 0.688 | 0.888 | 0.775 | 0.844 |

SD | NA | 0.018 | 0.015 | 0.016 | 0.009 | |

Averages | 0.999 | 0.6913 | 0.9051 | 0.7838 | 0.8456 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.566 | −0.276 | 0.386 | 0.566 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.874 | −0.706 | 0.442 | 0.874 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 1.805 | 0.07 | 1.702 | 1.806 | 0.1 better than 0.4 in precision, F-score, and macro precision |

0.1 vs. 0.5 | 3.349 | −0.128 | 3.104 | 3.349 | 0.1 better than 0.5 in precision, F-score, and macro precision |

0.1 vs. 0.6 | −0.084 | −0.489 | −0.284 | −0.084 | 0.1 and 0.6 are statistically equal |

0.1 vs. 0.7 | 0.684 | −0.73 | 0.293 | 0.683 | 0.1 and 0.7 are statistically equal |

0.1 vs. 0.8 | 1.62 | 0.52 | 1.48 | 1.62 | 0.1 better than 0.8 in precision, F-score, and macro precision |

0.1 vs. 0.9 | 1.202 | 2.814 | 2.187 | 1.204 | 0.1 is better than 0.9 in recall and F-score |

0.1 vs. 1 | 2.101 | 0.246 | 1.832 | 2.101 | 0.1 is better than 1 in all metrics except recall |

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.939 | 0.951 | 0.945 | 0.969 |

SD | NA | 0.01 | 0.005 | 0.004 | 0.005 | |

0.2 | Avg | 0.999 | 0.938 | 0.948 | 0.943 | 0.969 |

SD | NA | 0.013 | 0.008 | 0.006 | 0.006 | |

0.3 | Avg | 0.999 | 0.936 | 0.945 | 0.941 | 0.968 |

SD | NA | 0.009 | 0.011 | 0.007 | 0.004 | |

0.4 | Avg | 0.999 | 0.939 | 0.95 | 0.944 | 0.969 |

SD | NA | 0.01 | 0.008 | 0.007 | 0.005 | |

0.5 | Avg | 0.999 | 0.938 | 0.948 | 0.943 | 0.969 |

SD | NA | 0.01 | 0.007 | 0.006 | 0.005 | |

0.6 | Avg | 0.999 | 0.941 | 0.945 | 0.943 | 0.97 |

SD | NA | 0.006 | 0.011 | 0.005 | 0.003 | |

0.7 | Avg | 0.999 | 0.946 | 0.948 | 0.947 | 0.973 |

SD | NA | 0.007 | 0.003 | 0.003 | 0.003 | |

0.8 | Avg | 0.999 | 0.946 | 0.943 | 0.944 | 0.973 |

SD | NA | 0.012 | 0.009 | 0.01 | 0.006 | |

0.9 | Avg | 0.999 | 0.94 | 0.949 | 0.944 | 0.97 |

SD | NA | 0.007 | 0.011 | 0.003 | 0.003 | |

1 | Avg | 0.999 | 0.944 | 0.943 | 0.943 | 0.972 |

SD | NA | 0.012 | 0.007 | 0.005 | 0.006 | |

Averages | 0.999 | 0.9407 | 0.947 | 0.9437 | 0.9702 |

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.013 | 0.812 | 0.555 | 0.013 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.493 | 1.461 | 1.579 | 0.495 | 0.1 better than 0.3 in recall and F-score |

0.1 vs. 0.4 | −0.036 | 0.225 | 0.097 | −0.036 | 0.1 and 0.4 statistically equal |

0.1 vs. 0.5 | 0.102 | 0.978 | 0.68 | 0.103 | 0.1 and 0.5 statistically equal |

0.1 vs. 0.6 | −0.676 | 1.296 | 0.657 | −0.675 | 0.1 and 0.6 statistically equal |

0.1 vs. 0.7 | −1.738 | 1.318 | −1.316 | −1.738 | 0.1 and 0.7 statistically equal |

0.1 vs. 0.8 | −1.389 | 2.264 | 0.063 | −1.387 | 0.8 is better than 0.1 in precision and F-score while recall is better in 0.1 |

0.1 vs. 0.9 | 1.336 | −1.37 | −0.011 | 1.334 | 0.8 and 0.9 are statistically equal but 0.9 has better recall |

0.1 vs. 1 | 0.353 | 0 | 0.282 | 0.353 | 0.8 and 1 are statistically equal |

**Table 21.**UWF-ZeekData22: credential access—classification results for random undersampling after splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.822 | 0.878 | 0.833 | 0.911 |

SD | NA | 0.176 | 0.116 | 0.112 | 0.088 | |

0.2 | Avg | 0.999 | 0.732 | 0.900 | 0.788 | 0.867 |

SD | NA | 0.141 | 0.168 | 0.123 | 0.071 | |

0.3 | Avg | 0.999 | 0.799 | 0.911 | 0.847 | 0.899 |

SD | NA | 0.112 | 0.120 | 0.101 | 0.056 | |

0.4 | Avg | 0.999 | 0.744 | 0.911 | 0.804 | 0.872 |

SD | NA | 0.162 | 0.097 | 0.100 | 0.081 | |

0.5 | Avg | 0.999 | 0.770 | 0.944 | 0.835 | 0.885 |

SD | NA | 0.154 | 0.075 | 0.085 | 0.077 | |

0.6 | Avg | 0.999 | 0.696 | 0.944 | 0.793 | 0.848 |

SD | NA | 0.116 | 0.102 | 0.092 | 0.056 | |

0.7 | Avg | 0.999 | 0.713 | 0.922 | 0.793 | 0.857 |

SD | NA | 0.155 | 0.122 | 0.116 | 0.0777 | |

0.8 | Avg | 0.999 | 0.639 | 0.933 | 0.749 | 0.820 |

SD | NA | 0.067 | 0.133 | 0.058 | 0.034 | |

0.9 | Avg | 0.999 | 0.722 | 0.922 | 0.800 | 0.861 |

SD | NA | 0.110 | 0.100 | 0.052 | 0.055 | |

1.0 | Avg | 0.999 | 0.742 | 0.889 | 0.789 | 0.871 |

SD | NA | 0.146 | 0.131 | 0.083 | 0.073 | |

Averages | 0.999 | 0.7379 | 0.9154 | 0.8031 | 0.8691 |

**Table 22.**Welch’s t-test results: UWF-ZeekData22: credential access—random undersampling after splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 1.252 | −0.344 | 0.858 | 1.252 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 0.348 | −0.632 | −0.286 | 0.348 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 1.023 | −0.697 | 0.611 | 1.023 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | 0.704 | −1.528 | −0.047 | 0.704 | 0.1 and 0.5 are statistically equal, but 0.5 has better recall |

0.1 vs. 0.6 | 1.204 | 0 | 1.06 | 1.204 | 0.5 and 0.6 are statistically equal |

0.1 vs. 0.7 | 0.814 | 0.49 | 0.936 | 0.814 | 0.5 and 0.7 are statistically equal |

0.1 vs. 0.8 | 2.461 | 0.23 | 2.653 | 2.461 | 0.5 is better than 0.8 except for recall |

0.1 vs. 0.9 | 0.786 | 0.563 | 1.112 | 0.786 | 0.5 and 0.9 are statistically equal |

0.1vs. 1 | 0.408 | 1.162 | 1.22 | 0.408 | 0.5 and 1.0 are statistically equal |

**Table 23.**UWF-ZeekData22: privilege escalation—classification results for random undersampling after splitting.

Oversampling % | Accuracy | Precision | Recall | F-Score | Macro Precision | |
---|---|---|---|---|---|---|

0.1 | Avg | 0.999 | 0.900 | 0.750 | 0.779 | 0.949 |

SD | NA | 0.213 | 0.273 | 0.226 | 0.106 | |

0.2 | Avg | 0.999 | 0.813 | 0.725 | 0.712 | 0.906 |

SD | NA | 0.203 | 0.343 | 0.260 | 0.101 | |

0.3 | Avg | 0.999 | 0.820 | 0.825 | 0.768 | 0.909 |

SD | NA | 0.130 | 0.275 | 0.147 | 0.065 | |

0.4 | Avg | 0.999 | 0.800 | 0.800 | 0.788 | 0.899 |

SD | NA | 0.322 | 0.331 | 0.316 | 0.161 | |

0.5 | Avg | 0.999 | 0.843 | 0.825 | 0.811 | 0.921 |

SD | NA | 0.253 | 0.275 | 0.238 | 0.126 | |

0.6 | Avg | 0.999 | 0.745 | 0.899 | 0.804 | 0.872 |

SD | NA | 0.091 | 0.135 | 0.068 | 0.045 | |

0.7 | Avg | 0.999 | 0.704 | 0.911 | 0.785 | 0.852 |

SD | NA | 0.121 | 0.147 | 0.107 | 0.060 | |

0.8 | Avg | 0.999 | 0.707 | 0.888 | 0.781 | 0.853 |

SD | NA | 0.094 | 0.157 | 0.099 | 0.047 | |

0.9 | Avg | 0.999 | 0.738 | 0.855 | 0.778 | 0.869 |

SD | NA | 0.138 | 0.131 | 0.087 | 0.069 | |

1.0 | Avg | 0.999 | 0.759 | 0.966 | 0.843 | 0.879 |

SD | NA | 0.131 | 0.071 | 0.087 | 0.065 | |

Averages | 0.999 | 0.7833 | 0.845 | 0.7853 | 0.8915 |

**Table 24.**Welch’s t-test results: UWF-ZeekData22: privilege escalation—random undersampling after splitting.

Welch’s t-Test Results (p < 0.10) | Precision t-Value | Recall t-Value | F-Score t-Value | Macro Precision t-Value | Analysis |
---|---|---|---|---|---|

0.1 vs. 0.2 | 0.929 | 0.179 | 0.611 | 0.929 | 0.1 and 0.2 are statistically equal |

0.1 vs. 0.3 | 1.012 | −0.611 | 0.118 | 1.012 | 0.1 and 0.3 are statistically equal |

0.1 vs. 0.4 | 0.817 | −0.367 | −0.08 | 0.817 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.5 | 0.541 | −0.611 | −0.307 | 0.541 | 0.1 and 0.4 are statistically equal |

0.1 vs. 0.6 | 2.1 | −1.552 | −0.344 | 2.1 | 0.1 is better than 0.6 except for recall |

0.1 vs. 0.7 | 2.52 | −1.638 | −0.086 | 2.52 | 0.1 is better than 0.7 except for recall |

0.1 vs. 0.8 | 2.606 | −1.391 | −0.027 | 2.606 | 0.1 is better than 0.8 except for recall |

0.1 vs. 0.9 | 1.999 | −1.098 | 0.011 | 1.999 | 0.1 is better than 0.9 in precision and macro precision |

0.1 vs. 1 | 1.77 | −2.421 | −0.834 | 1.77 | 0.1 is better than 1 except for recall |

