Malicious JavaScript Detection using Obfuscation Analysis and String Reconstruction Techniques

Alfin Gusti Alamsyah; Latius Hermawan

Authors

Alfin Gusti Alamsyah Department of Informatics, Faculty of Science and Technology, Catholic University of Musi Charitas
Latius Hermawan Department of Informatics, Faculty of Science and Technology, Catholic University of Musi Charitas

Keywords:

machine learning, malicious code, obfuscated JavaScript, JavaScript disamarkan, kode berbahaya, pembelajaran mesin, random forest, rekonstruksi string

Abstract

Detecting malicious JavaScript remains a persistent challenge in cybersecurity, particularly as obfuscation techniques become more sophisticated. This study presents a dual-model detection framework that separates the analysis of obfuscation from malicious behavior to enhance precision. The first model detects obfuscated scripts using 20 features, including entropy, string ratios, and syntax. The second model classifies malicious code based on 92 features, incorporating outputs from the first model and semantically meaningful strings reconstructed using a novel technique called atomic search. Both models utilize the random forest algorithm and are trained on balanced datasets of labeled JavaScript samples. Experimental results demonstrate high performance, with the obfuscation model achieving 99.1% accuracy and the malicious detection model reaching 99.52%. The proposed approach provides a scalable and effective solution for detecting hidden threats in modern web environments by clearly addressing obfuscation and incorporating semantic reconstruction.

References

R. Verma, “Cybersecurity Challenges in the Era of Digital Transformation,” Transdisciplinary Threads Crafting the Future Through Multidisciplinary Research, vol. 1, p. 187, 2024.

M. Shema, Hacking Web Apps. San Francisco, CA, USA: Syngress, 2012, doi: 10.1016/C2011-0-07576-2.

Fasna and S. R. Swamy, “Sandbox: A Secured Testing Framework for Applications,” Journal of Technology & Engineering Sciences, vol. 4, no. 1, Jun. 2020.

S. Ndichu, S. Kim, S. Ozawa, T. Misu, and K. Makishima, “A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors,” Applied Soft Computing, vol. 84, p. 105721, Aug. 2019, doi: 10.1016/j.asoc.2019.105721.

D. R. Patil and J. B. Patil, “Detection of malicious JavaScript code in web pages,” Indian Journal of Science and Technology, vol. 10, no. 19, pp. 1–12, Jun. 2017, doi: 10.17485/ijst/2017/v10i19/114828.

Y. Fang, C. Huang, L. Liu, and M. Xue, “Research on malicious JavaScript detection technology based on LSTM,” IEEE Access, vol. 6, pp. 59118–59125, Jan. 2018, doi: 10.1109/access.2018.2874098.

X. Song, C. Chen, B. Cui, and J. Fu, “Malicious JavaScript detection based on bidirectional LSTM model,” Applied Sciences, vol. 10, no. 10, p. 3440, May 2020, doi: 10.3390/app10103440.

A. Sheneamer, “Vulnerable JavaScript functions detection using stacking of convolutional neural networks,” PeerJ Computer Science, vol. 10, 2024, doi: 10.7717/peerj-cs.1838.

M. F. Rozi, S. Ozawa, T. Ban, S. Kim, T. Takahashi, and D. Inoue, “Understanding the influence of AST-JS for improving malicious webpage detection,” Applied Sciences, vol. 12, no. 24, p. 12916, Dec. 2022, doi: 10.3390/app122412916.

J. Mao et al., “Detecting malicious behaviors in JavaScript applications,” IEEE Access, vol. 6, pp. 12284–12294, Jan. 2018, doi: 10.1109/access.2018.2795383.

N. H. Son and H. T. Dung, “Malicious Javascript Detection based on Clustering Techniques,” International Journal of Network Security & Its Applications, vol. 13, no. 6, pp. 11–21, Nov. 2021, doi: 10.5121/ijnsa.2021.13602.

A. Alazab, A. Khraisat, M. Alazab, and S. Singh, “Detection of obfuscated malicious JavaScript code,” Future Internet, vol. 14, no. 8, p. 217, Jul. 2022, doi: 10.3390/fi14080217.

B. G. Zorn, B. Livshits, and C. Seifert, “NOFUS: Automatically Detecting’ String.fromCharCode(32) "ObFuSCateD ".toLowerCase() ‘JavaScript Code,” Microsoft Research Technical Report, MSR-TR-2011-57, Jan. 2011. [Online]. Available: https://www.researchgate.net/publication/215448536.

A. Fass, R. P. Krawczyk, M. Backes, and B. Stock, “JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript,” Lecture Notes in Computer Science, vol. 10885, pp. 303-325, 2018.

K. Kryszczuk, S. Aebersold, S. Paganoni, B. Tellenbach, and T. Trowbridge, “Detecting Obfuscated JavaScripts using Machine Learning,” The Eleventh International Conference on Internet Monitoring and Protection (ICIMP 2016), Valencia, Spain, May 2016. [Online]. Available: https://www.researchgate.net/publication/321805699.

M. Moog, M. Demmel, M. Backes, dan A. Fass, "Statically Detecting JavaScript Obfuscation and Minification Techniques in the Wild," in 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2021, hlm. 569-580, doi: 10.1109/DSN48987.2021.00065.

A. G. Alamsyah, Atomic Search. [Online]. Available: https://pypi.org/project/atomic-search.

L. Breiman, “Random Forest,” Machine Learning, vol. 45, no. 1, pp. 5–32, Jan. 2001, doi: 10.1023/a:1010933404324.

J. Acharya, H. Das, O. Milenkovic, A. Orlitsky, and S. Pan, “String Reconstruction from Substring Compositions,” SIAM Journal on Discrete Mathematics, vol. 29, no. 3, pp. 1340–1371, Jan. 2015, doi: 10.1137/140962486.

Malicious JavaScript Detection using Obfuscation Analysis and String Reconstruction Techniques

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

editorial-policies

Editorial Policies

instruction-for-authors

Instruction for Authors

article-template-and-instructions

Article Templates and Instructions

accreditation-certificate

Accreditation Certificate

Cited-by

Cited by

Visitors

Visitors

Index-in

Indexed In

Address

Contact Information