Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)
DOI:
https://doi.org/10.24002/ijis.v4i1.4677Keywords:
information extraction, NER, spacy, twitter, power failure.Abstract
SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.References
M. Chaudhari and S. Govilkar, "A Survey of Machine Learning Techniques for Sentiment Classification", International Journal on Computational Science & Applications, vol. 5, no. 3, pp. 13-23, 2015. Available: 10.5121/ijcsa.2015.5302
U. Nahm, "Text mining with information extraction", https://repositories.lib.utexas.edu/, 2004. [Online]. Available: http://hdl.handle.net/2152/1280. [Accessed: 07- Jul- 2021].
F. Peng and A. McCallum, "Information extraction from research papers using conditional random fields", Information Processing & Management, vol. 42, no. 4, pp. 963-979, 2006. Available: 10.1016/j.ipm.2005.09.002
"Indonesia Pengguna Twitter Terbesar Ketiga di Dunia", Databoks.katadata.co.id, 2016. [Online]. Available: https://databoks.katadata.co.id/datapublish/2016/11/22/indonesia-pengguna-Twitter-terbesar-ketiga-di-dunia. [Accessed: 05- Nov- 2020].
"Laporan Statistik - PT PLN (Persero)", PT PLN (Persero), 2021. [Online]. Available: https://web.pln.co.id/stakeholder/laporan-statistik. [Accessed: 05- Nov- 2020].
S. Hani, G. Santoso, and R. D. Wibowo, “Penempatan Recloser Sebagai Parameter Keandalan Sistem Proteksi Pada Sistem Distribusi”, Simp. Nas. RAPI XVIIII – 2019 FT UMS, pp. 21–27, 2019
M. Mursyidah and H. T. Hidayat, “Klasifikasi Teks Emosi Bahasa Aceh Menggunakan Metode Term Frekuensi / Invers Dokument Frekuensi,” Jurnal Infomedia, vol. 2, no. 1, pp. 14–19, 2017, doi: 10.30811/.v2i1.462.
I. Adiwijaya, “Text Mining dan Knowledge Discovery”, Kolokium bersama komunitas datamining Indonesia & soft-computing Indonesia, pp. 1–9, 2006.
A. Hotho, A. Nürnberger and G. Paaß, “A brief survey of text mining”, In LDV Forum-GLDV Journal for Computational Linguistics and Language Technology, vol. 20, no. 1, pp. 19-62, 2005.
E. Cambria and B. White, "Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]", IEEE Computational Intelligence Magazine, vol. 9, no. 2, pp. 48-57, 2014. Available: 10.1109/mci.2014.2307227.
K. Adnan and R. Akbar, "An analytical study of information extraction from unstructured and multidimensional big data", Journal of Big Data, vol. 6, no. 1, 2019. Available: 10.1186/s40537-019-0254-8.
A. Mansouri, L. S. Affendey, and A. Mamat, “Named Entity Recognition Approaches”, International Journal of Computer Science and Network Security, vol. 8, no. 2, pp. 339–344, 2008.
Y. Wu, T. Fan, Y. Lee and S. Yen, "Extracting Named Entities Using Support Vector Machines", Knowledge Discovery in Life Science Literature, pp. 91-103, 2006. Available: 10.1007/11683568_8.
I. Budi and S. Bressan, "Association rules mining for name entity recognition," Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003., 2003, pp. 325-328, doi: 10.1109/WISE.2003.1254504.
E. Partalidou, E. Spyromitros-Xioufis, S. Doropoulos, S. Vologiannidis, and K. I. Diamantaras, “Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy,” Proc. - 2019 IEEE/WIC/ACM Int. Conf. Web Intell. WI 2019, pp. 337–341, 2019, doi: 10.1145/3350546.3352543.
Downloads
Published
How to Cite
Issue
Section
License
Indonesian Journal of Information Systems as journal publisher holds copyright of papers published in this journal. Authors transfer the copyright of their journal by filling Copyright Transfer Form and send it to Indonesian Journal of Information Systems.
![Creative Commons License](https://i.creativecommons.org/l/by-sa/4.0/88x31.png)
Indonesian Journal of Information Systems is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.