Deteksi Bot Spammer pada Twitter Berbasis Sentiment Analysis dan Time Interval Entropy

Authors

  • Christian Sri Kusuma Aditya
  • Mamluatul Hani’ah
  • Alif Akbar Fitrawan
  • Agus Zainal Arifin
  • Diana Purwitasari

DOI:

https://doi.org/10.24002/jbi.v7i3.656

Abstract

Abstract. Spam is an abuse of messaging undesired by recipients. Those who send spam are called spammers.  Popularity of Twitter has attracted spammers to use it as a means to disseminate spam messages. The spams are characterized by a neutral emotional sentiment or no particular users’ preference perspective. In addition, the regularity of tweeting behavior periodically shows automation performed by bot. This study proposes a new method to differentiate between bot spammer and legitimate user accounts by integrating the sentiment analysis (SA) based on emotions and time interval entropy (TIE). The combination of knowledge-based and machine learning-based were used to classify tweets with positive, negative and neutral sentiments. Furthermore, the collection of timestamp is used to calculate the time interval entropy of each account. The results show that the precision and recall of the proposed method reach up to 83% and 91%. This proves that the merging SA and TIE can optimize overall system performance in detecting Bot Spammer.

Keywords: bot spammer, twitter, sentiment analysis, polarity, entropy

 

Abstrak. Spam merupakan penyalahgunaan pengiriman pesan tanpa dikehendaki oleh penerimanya, orang yang mengirimkan spam disebut spammer. Ketenaran Twitter mengundang spammer untuk menggunakannya sebagai sarana menyebarluaskan pesan spam. Karakteristik dari tweet yang dikategorikan spam memiliki sentimen emosi netral atau tidak ada preferensi tertentu terhadap suatu perspektif dari user yang memposting tweet. Selain itu keteraturan waktu perilaku saat memposting tweet secara periodik menunjukkan otomatisasi yang dilakukan bot. Pada penelitian ini diusulkan metode baru untuk mendeteksi antara bot spammer dan legitimate user dengan mengintegrasikan sentimen analysis berdasarkan emosi dan time interval entropy. Pendekatan gabungan knowledge-based dan machine learning-based digunakan untuk mengklasifikasi tweet yang memiliki sentimen positif, negatif dan tweet netral. Selanjutnya kumpulan timestamp digunakan untuk menghitung time interval entropy dari tiap akun. Hasil percobaan menunjukan bahwa precision dan recall dari metode yang diusulkan mencapai 83% dan 91%. Hal ini membuktikan penggabungan Sentiment Analysis (SA) dan Time Interval Entropy (TIE) dapat mengoptimalkan performa sistem secara keseluruhan dalam mendeteksi Bot Spammer.

Kata Kunci:  bot spammer, twitter, sentiment analysis,  polarity, entropy

References

. Arifin, A. Z., & Setiono, A. N. 2002. Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering. In Prosiding Seminar on Intelligent Technology and its Applications (SITIA), Teknik Elektro, Institut Teknologi Sepuluh Nopember Surabaya.

. Chu, Z., Gianvecchio, S., Wang, H., & Jajodia, S. 2012. Detecting automation of twitter accounts: Are you a human, bot, or cyborg?. Dependable and Secure Computing, IEEE Transactions on, 9(6), 811-824.

. Drucker, H., Wu, D., & Vapnik, V. N. 1999. Support vector machines for spam categorization. Neural Networks, IEEE Transactions on, 10(5), 1048-1054.

. Esuli, A., & Sebastiani, F. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC (Vol. 6, pp. 417-422).

. Heron, S. 2009. Technologies for spam detection. Network Security, 2009(1), 11-15.

. Hu, M., & Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.

. Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. 2013. Ontology-based sentiment analysis of twitter posts. Expert systems with applications, 40(10), 4065-4074.

. Li, W., & Xu, H. 2014. Text-based emotion classification using emotion cause extraction. Expert Systems with Applications, 41(4), 1742-1749.

. Lima, A. C. E., de Castro, L. N., & Corchado, J. M. 2015. A polarity analysis framework for Twitter messages. Applied Mathematics and Computation, 270, 756-767.

. Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.

. Miller, Z., Dickinson, B., Deitrick, W., Hu, W., & Wang, A. H. 2014. Twitter spammer detection using data stream clustering. Information Sciences, 260, 64-73.

. Mohammad, S. M., Zhu, X., Kiritchenko, S., & Martin, J. 2014. Sentiment, emotion, purpose, and style in electoral tweets. Information Processing & Management. Elsevier.

. Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M. T., & Ureña-López, L. A. 2014. Ranked wordnet graph for sentiment polarity classification in twitter. Computer Speech & Language, 28(1), 93-107.

. Moraes, R., Valiati, J. F., & Neto, W. P. G. 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications, 40(2), 621-633.

. Perdana, R. S., Muliawati, T. H., & Alexandro, R. 2015. Bot Spammer Detection in Twitter Using Tweet Similarity And Time Interval Entropy. Jurnal Ilmu Komputer dan Informasi, 8(1), 20-26.

. Poria, S., Cambria, E., Winterstein, G., & Huang, G. B. 2014. Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowledge-Based Systems, 69, 45-63.

. Takhteyev, Y., Gruzd, A., & Wellman, B. 2012. Geography of Twitter networks. Social networks, 34(1), 73-81.

. Tala, F. Z. 2003. A study of stemming effects on information retrieval in Bahasa Indonesia. Institute for Logic, Language and Computation Universeit Van Amsterdam.

. Tausczik, Y. R., & Pennebaker, J. W.(2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1), 24-54.

. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. 2010. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.

. Verma, M., & Sofat, S. 2014. Techniques to Detect Spammers in Twitter-A Survey. International Journal of Computer Applications, 85(10), 27-32.

. Yang, C., Harkreader, R. C., & Gu, G. 2011. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. InRecent Advances in Intrusion Detection (pp. 318-337). Springer Berlin Heidelberg.

Downloads

Published

2016-07-17