Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia

Authors

  • Kevin Antariksa Universitas Atma Jaya Yogyakarta
  • Y. Sigit Purnomo WP Universitas Atma Jaya Yogyakarta
  • Ernawati Ernawati Universitas Atma Jaya Yogyakarta

DOI:

https://doi.org/10.24002/jbi.v10i2.2451

Abstract

Abstract.

The sheer amount of hate speech in social media is making people nauseous. The amount of hate speech these days keeps increasing and yet, there was no preventive act to counter back the hate speech. Pre-existing hate speech detection is also not yet available in Bahasa Indonesia. A machine learning model that is able to recognize hate speech in Bahasa Indonesia will be explained in this article. The model will compare pre-existing methods in machine learning. Naive Bayes, SVM, and Logistics Regression are the methods that will be used for the model. Some of the parameters in the test will be altered to achieve the maximum value for detecting hate speech. The expectation is a machine learning model that is able to recognize hate speech in Bahasa Indonesia accurately. Expected accuracy is above 85%. After the experiment, the highest accuracy achieved was at 98%, while the lowest accuracy was only 80%.
Keywords: Hate speech detection, machine learning model, social media, Bahasa Indonesia, tweets


Abstrak.

Banyaknya ujaran kebencian yang ada di media sosial sudah membuat jengah. Ujaran kebencian tersebut makin marak dijumpai namun masih belum ada upaya preventif dari media sosial untuk menangkalnya. Deteksi ujaran kebencian yang sudah dibuat juga belum tersedia dalam Bahasa Indonesia. Sebuah model pembelajaran mesin yang dapat mengenali ujaran kebencian dengan Bahasa Indonesia akan dibahas pada naskah ini. Dalam model tersebut dibandingkan beberapa metode pembelajaran mesin yang ada. Metode yang digunakan dalam pengujian adalah Naïve Bayes, SVM, dan Logistic Regression. Dalam pengujian, beberapa parameter akan diubah-ubah sehingga didapatkan nilai paling maksimal dalam deteksi ujaran kebencian. Hasil yang diharapkan adalah sebuah model pembelajaran mesin. Model tersebut diharapkan dapat mengenali ujaran kebencian berbahasa Indonesia secara akurat. Akurasi yang diharapkan adalah diatas 85%. Setelah percobaan, didapatkan nilai akurasi paling tinggi yaitu 98%, sedangkan nilai akurasi paling rendah yaitu 80%.
Kata Kunci: Deteksi ujaran kebencian, model pembelajaran mesin, media sosial, Bahasa Indonesia, cuitan

Author Biography

Kevin Antariksa, Universitas Atma Jaya Yogyakarta

Informatics Engineerin

References

P. Fortuna, J. Ferreira, L. Pires, G. Routar, and S. Nunes, “Merging Datasets for Aggressive Text Identification,” Proc. First Work. Trolling, Aggress. Cyberbullying, no. Section 2, pp. 39–50, 2018.

A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language Processing,” Proc. Fifth Int. Work. Nat. Lang. Process. Soc. Media, no. 2017, pp. 1–10, 2017.

V. Golem, M. Karan, and J. Šnajder, “Combining Shallow and Deep Learning for Aggressive Text Detection,” Proc. First Work. Trolling, Aggress. Cyberbullying, pp. 130–140, 2018.

P. Parekh and Patel Hetal, “Toxic Comment Tools: A Case Study,” Int. J. Adv. Res. Comput. Sci., vol. 8, no. 5, pp. 964–967, 2017.

J. Pavlopoulos, P. Malakasiotis, and I. Androutsopoulos, “Deep Learning for User Comment Moderation,” pp. 25–35, 2017.

S. Sharma, S. Agrawal, and M. Shrivastava, “Degree based Classification of Harmful Speech using Twitter Data,” pp. 106–112, 2018.

I. Arroyo-Fernández, D. Forest, J.-M. Torres-Moreno, M. Carrasco-Ruiz, T. Legeleux, and K. Joannette, “Cyberbullying Detection Task: the EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1,” Proc. First Work. Trolling, Aggress. Cyberbullying, pp. 51–60, 2018.

N. S. Samghabadi, D. Mave, S. Kar, and T. Solorio, “RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification,” pp. 12–18, 2018.

F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,” CEUR Workshop Proc., vol. 1816, pp. 86–95, 2017.

G. K. Pitsilis, H. Ramampiaro, and H. Langseth, “Detecting Offensive Language in Tweets Using Deep Learning,” pp. 1–17, 2018.

J. Bell, Machine Learning: Hands-On for Developers and Technical Professionals. Wiley, 2014.

Komisi Nasional HAM, BUKU SAKU PENANGANAN UJARAN KEBENCIAN (HATE SPEECH). Jakarta, 2015.

D. Sarkar, Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from Your Data, 1st ed. California: Apress, 2016.

Downloads

Published

2019-10-30