Malicious JavaScript Detection using Obfuscation Analysis and String Reconstruction Techniques
Keywords:
machine learning, malicious code, obfuscated JavaScript, JavaScript disamarkan, kode berbahaya, pembelajaran mesin, random forest, rekonstruksi stringAbstract
Detecting malicious JavaScript remains a persistent challenge in cybersecurity, particularly as obfuscation techniques become more sophisticated. This study presents a dual-model detection framework that separates the analysis of obfuscation from malicious behavior to enhance precision. The first model detects obfuscated scripts using 20 features, including entropy, string ratios, and syntax. The second model classifies malicious code based on 92 features, incorporating outputs from the first model and semantically meaningful strings reconstructed using a novel technique called atomic search. Both models utilize the random forest algorithm and are trained on balanced datasets of labeled JavaScript samples. Experimental results demonstrate high performance, with the obfuscation model achieving 99.1% accuracy and the malicious detection model reaching 99.52%. The proposed approach provides a scalable and effective solution for detecting hidden threats in modern web environments by clearly addressing obfuscation and incorporating semantic reconstruction.
References
R. Verma, “Cybersecurity Challenges in the Era of Digital Transformation,” Transdisciplinary Threads Crafting the Future Through Multidisciplinary Research, vol. 1, p. 187, 2024.
M. Shema, Hacking Web Apps. San Francisco, CA, USA: Syngress, 2012, doi: 10.1016/C2011-0-07576-2.
Fasna and S. R. Swamy, “Sandbox: A Secured Testing Framework for Applications,” Journal of Technology & Engineering Sciences, vol. 4, no. 1, Jun. 2020.
S. Ndichu, S. Kim, S. Ozawa, T. Misu, and K. Makishima, “A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors,” Applied Soft Computing, vol. 84, p. 105721, Aug. 2019, doi: 10.1016/j.asoc.2019.105721.
D. R. Patil and J. B. Patil, “Detection of malicious JavaScript code in web pages,” Indian Journal of Science and Technology, vol. 10, no. 19, pp. 1–12, Jun. 2017, doi: 10.17485/ijst/2017/v10i19/114828.
Y. Fang, C. Huang, L. Liu, and M. Xue, “Research on malicious JavaScript detection technology based on LSTM,” IEEE Access, vol. 6, pp. 59118–59125, Jan. 2018, doi: 10.1109/access.2018.2874098.
X. Song, C. Chen, B. Cui, and J. Fu, “Malicious JavaScript detection based on bidirectional LSTM model,” Applied Sciences, vol. 10, no. 10, p. 3440, May 2020, doi: 10.3390/app10103440.
A. Sheneamer, “Vulnerable JavaScript functions detection using stacking of convolutional neural networks,” PeerJ Computer Science, vol. 10, 2024, doi: 10.7717/peerj-cs.1838.
M. F. Rozi, S. Ozawa, T. Ban, S. Kim, T. Takahashi, and D. Inoue, “Understanding the influence of AST-JS for improving malicious webpage detection,” Applied Sciences, vol. 12, no. 24, p. 12916, Dec. 2022, doi: 10.3390/app122412916.
J. Mao et al., “Detecting malicious behaviors in JavaScript applications,” IEEE Access, vol. 6, pp. 12284–12294, Jan. 2018, doi: 10.1109/access.2018.2795383.
N. H. Son and H. T. Dung, “Malicious Javascript Detection based on Clustering Techniques,” International Journal of Network Security & Its Applications, vol. 13, no. 6, pp. 11–21, Nov. 2021, doi: 10.5121/ijnsa.2021.13602.
A. Alazab, A. Khraisat, M. Alazab, and S. Singh, “Detection of obfuscated malicious JavaScript code,” Future Internet, vol. 14, no. 8, p. 217, Jul. 2022, doi: 10.3390/fi14080217.
B. G. Zorn, B. Livshits, and C. Seifert, “NOFUS: Automatically Detecting’ String.fromCharCode(32) "ObFuSCateD ".toLowerCase() ‘JavaScript Code,” Microsoft Research Technical Report, MSR-TR-2011-57, Jan. 2011. [Online]. Available: https://www.researchgate.net/publication/215448536.
A. Fass, R. P. Krawczyk, M. Backes, and B. Stock, “JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript,” Lecture Notes in Computer Science, vol. 10885, pp. 303-325, 2018.
K. Kryszczuk, S. Aebersold, S. Paganoni, B. Tellenbach, and T. Trowbridge, “Detecting Obfuscated JavaScripts using Machine Learning,” The Eleventh International Conference on Internet Monitoring and Protection (ICIMP 2016), Valencia, Spain, May 2016. [Online]. Available: https://www.researchgate.net/publication/321805699.
M. Moog, M. Demmel, M. Backes, dan A. Fass, "Statically Detecting JavaScript Obfuscation and Minification Techniques in the Wild," in 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2021, hlm. 569-580, doi: 10.1109/DSN48987.2021.00065.
A. G. Alamsyah, Atomic Search. [Online]. Available: https://pypi.org/project/atomic-search.
L. Breiman, “Random Forest,” Machine Learning, vol. 45, no. 1, pp. 5–32, Jan. 2001, doi: 10.1023/a:1010933404324.
J. Acharya, H. Das, O. Milenkovic, A. Orlitsky, and S. Pan, “String Reconstruction from Substring Compositions,” SIAM Journal on Discrete Mathematics, vol. 29, no. 3, pp. 1340–1371, Jan. 2015, doi: 10.1137/140962486.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright of this journal is assigned to Jurnal Buana Informatika as the journal publisher by the knowledge of author, whilst the moral right of the publication belongs to author. Every printed and electronic publications are open access for educational purposes, research, and library. The editorial board is not responsible for copyright violation to the other than them aims mentioned before. The reproduction of any part of this journal (printed or online) will be allowed only with a written permission from Jurnal Buana Informatika.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.