Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)

Authors

  • Rizka Maulida Yanti Politeknik Statistika STIS
  • Ibnu Santoso Politeknik Statistika STIS
  • Lya Hulliyyatus Suadaa Politeknik Statistika STIS

DOI:

https://doi.org/10.24002/ijis.v4i1.4677

Keywords:

information extraction, NER, spacy, twitter, power failure.

Abstract

SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.

Author Biography

Rizka Maulida Yanti, Politeknik Statistika STIS

Program Studi D4 Komputasi Statistik

References

M. Chaudhari and S. Govilkar, "A Survey of Machine Learning Techniques for Sentiment Classification", International Journal on Computational Science & Applications, vol. 5, no. 3, pp. 13-23, 2015. Available: 10.5121/ijcsa.2015.5302

U. Nahm, "Text mining with information extraction", https://repositories.lib.utexas.edu/, 2004. [Online]. Available: http://hdl.handle.net/2152/1280. [Accessed: 07- Jul- 2021].

F. Peng and A. McCallum, "Information extraction from research papers using conditional random fields", Information Processing & Management, vol. 42, no. 4, pp. 963-979, 2006. Available: 10.1016/j.ipm.2005.09.002

"Indonesia Pengguna Twitter Terbesar Ketiga di Dunia", Databoks.katadata.co.id, 2016. [Online]. Available: https://databoks.katadata.co.id/datapublish/2016/11/22/indonesia-pengguna-Twitter-terbesar-ketiga-di-dunia. [Accessed: 05- Nov- 2020].

"Laporan Statistik - PT PLN (Persero)", PT PLN (Persero), 2021. [Online]. Available: https://web.pln.co.id/stakeholder/laporan-statistik. [Accessed: 05- Nov- 2020].

S. Hani, G. Santoso, and R. D. Wibowo, “Penempatan Recloser Sebagai Parameter Keandalan Sistem Proteksi Pada Sistem Distribusi”, Simp. Nas. RAPI XVIIII – 2019 FT UMS, pp. 21–27, 2019

M. Mursyidah and H. T. Hidayat, “Klasifikasi Teks Emosi Bahasa Aceh Menggunakan Metode Term Frekuensi / Invers Dokument Frekuensi,” Jurnal Infomedia, vol. 2, no. 1, pp. 14–19, 2017, doi: 10.30811/.v2i1.462.

I. Adiwijaya, “Text Mining dan Knowledge Discovery”, Kolokium bersama komunitas datamining Indonesia & soft-computing Indonesia, pp. 1–9, 2006.

A. Hotho, A. Nürnberger and G. Paaß, “A brief survey of text mining”, In LDV Forum-GLDV Journal for Computational Linguistics and Language Technology, vol. 20, no. 1, pp. 19-62, 2005.

E. Cambria and B. White, "Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]", IEEE Computational Intelligence Magazine, vol. 9, no. 2, pp. 48-57, 2014. Available: 10.1109/mci.2014.2307227.

K. Adnan and R. Akbar, "An analytical study of information extraction from unstructured and multidimensional big data", Journal of Big Data, vol. 6, no. 1, 2019. Available: 10.1186/s40537-019-0254-8.

A. Mansouri, L. S. Affendey, and A. Mamat, “Named Entity Recognition Approaches”, International Journal of Computer Science and Network Security, vol. 8, no. 2, pp. 339–344, 2008.

Y. Wu, T. Fan, Y. Lee and S. Yen, "Extracting Named Entities Using Support Vector Machines", Knowledge Discovery in Life Science Literature, pp. 91-103, 2006. Available: 10.1007/11683568_8.

I. Budi and S. Bressan, "Association rules mining for name entity recognition," Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003., 2003, pp. 325-328, doi: 10.1109/WISE.2003.1254504.

E. Partalidou, E. Spyromitros-Xioufis, S. Doropoulos, S. Vologiannidis, and K. I. Diamantaras, “Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy,” Proc. - 2019 IEEE/WIC/ACM Int. Conf. Web Intell. WI 2019, pp. 337–341, 2019, doi: 10.1145/3350546.3352543.

Downloads

Published

2021-08-19

How to Cite

Yanti, R. M., Santoso, I., & Suadaa, L. H. (2021). Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta). Indonesian Journal of Information Systems, 4(1), 76–86. https://doi.org/10.24002/ijis.v4i1.4677

Issue

Section

Articles