Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia
DOI:
https://doi.org/10.24002/jbi.v10i2.2451Abstract
Abstract.
The sheer amount of hate speech in social media is making people nauseous. The amount of hate speech these days keeps increasing and yet, there was no preventive act to counter back the hate speech. Pre-existing hate speech detection is also not yet available in Bahasa Indonesia. A machine learning model that is able to recognize hate speech in Bahasa Indonesia will be explained in this article. The model will compare pre-existing methods in machine learning. Naive Bayes, SVM, and Logistics Regression are the methods that will be used for the model. Some of the parameters in the test will be altered to achieve the maximum value for detecting hate speech. The expectation is a machine learning model that is able to recognize hate speech in Bahasa Indonesia accurately. Expected accuracy is above 85%. After the experiment, the highest accuracy achieved was at 98%, while the lowest accuracy was only 80%.
Keywords: Hate speech detection, machine learning model, social media, Bahasa Indonesia, tweets
Abstrak.
Banyaknya ujaran kebencian yang ada di media sosial sudah membuat jengah. Ujaran kebencian tersebut makin marak dijumpai namun masih belum ada upaya preventif dari media sosial untuk menangkalnya. Deteksi ujaran kebencian yang sudah dibuat juga belum tersedia dalam Bahasa Indonesia. Sebuah model pembelajaran mesin yang dapat mengenali ujaran kebencian dengan Bahasa Indonesia akan dibahas pada naskah ini. Dalam model tersebut dibandingkan beberapa metode pembelajaran mesin yang ada. Metode yang digunakan dalam pengujian adalah Naïve Bayes, SVM, dan Logistic Regression. Dalam pengujian, beberapa parameter akan diubah-ubah sehingga didapatkan nilai paling maksimal dalam deteksi ujaran kebencian. Hasil yang diharapkan adalah sebuah model pembelajaran mesin. Model tersebut diharapkan dapat mengenali ujaran kebencian berbahasa Indonesia secara akurat. Akurasi yang diharapkan adalah diatas 85%. Setelah percobaan, didapatkan nilai akurasi paling tinggi yaitu 98%, sedangkan nilai akurasi paling rendah yaitu 80%.
Kata Kunci: Deteksi ujaran kebencian, model pembelajaran mesin, media sosial, Bahasa Indonesia, cuitan
References
P. Fortuna, J. Ferreira, L. Pires, G. Routar, and S. Nunes, “Merging Datasets for Aggressive Text Identification,” Proc. First Work. Trolling, Aggress. Cyberbullying, no. Section 2, pp. 39–50, 2018.
A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language Processing,” Proc. Fifth Int. Work. Nat. Lang. Process. Soc. Media, no. 2017, pp. 1–10, 2017.
V. Golem, M. Karan, and J. Šnajder, “Combining Shallow and Deep Learning for Aggressive Text Detection,” Proc. First Work. Trolling, Aggress. Cyberbullying, pp. 130–140, 2018.
P. Parekh and Patel Hetal, “Toxic Comment Tools: A Case Study,” Int. J. Adv. Res. Comput. Sci., vol. 8, no. 5, pp. 964–967, 2017.
J. Pavlopoulos, P. Malakasiotis, and I. Androutsopoulos, “Deep Learning for User Comment Moderation,” pp. 25–35, 2017.
S. Sharma, S. Agrawal, and M. Shrivastava, “Degree based Classification of Harmful Speech using Twitter Data,” pp. 106–112, 2018.
I. Arroyo-Fernández, D. Forest, J.-M. Torres-Moreno, M. Carrasco-Ruiz, T. Legeleux, and K. Joannette, “Cyberbullying Detection Task: the EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1,” Proc. First Work. Trolling, Aggress. Cyberbullying, pp. 51–60, 2018.
N. S. Samghabadi, D. Mave, S. Kar, and T. Solorio, “RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification,” pp. 12–18, 2018.
F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,” CEUR Workshop Proc., vol. 1816, pp. 86–95, 2017.
G. K. Pitsilis, H. Ramampiaro, and H. Langseth, “Detecting Offensive Language in Tweets Using Deep Learning,” pp. 1–17, 2018.
J. Bell, Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, 2014.
Komisi Nasional HAM, BUKU SAKU PENANGANAN UJARAN KEBENCIAN (HATE SPEECH). Jakarta, 2015.
D. Sarkar, Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data, 1st ed. California: Apress, 2016.
Downloads
Published
Issue
Section
License
Copyright of this journal is assigned to Jurnal Buana Informatika as the journal publisher by the knowledge of author, whilst the moral right of the publication belongs to author. Every printed and electronic publications are open access for educational purposes, research, and library. The editorial board is not responsible for copyright violation to the other than them aims mentioned before. The reproduction of any part of this journal (printed or online) will be allowed only with a written permission from Jurnal Buana Informatika.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.