Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar

Authors

  • Alam Rahmatulloh Universitas Siliwangi
  • Rohmat Gunawan

DOI:

https://doi.org/10.24002/ijis.v2i2.3029

Keywords:

google scholar, data collection, data recap, web scraping

Abstract

Google Scholar is a web-based service for searching a broad academic literature. Various types of references can be accessed such as: peer-reviewed papers, theses, books, abstracts and articles from academic publishers, professional communities, pre-printed data centers, universities and other academic organizations. Google Scholar provides the profile creation feature of every researcher, expert and lecturer. Quantity of publication from an academic institution along with detailed data on the publication of scientific articles can be accessed through Google Scholar. A recap of the publication of scientific articles of each researcher in an institution or organization is needed to determine the research performance collectively. But the problems that occur, the unavailability of recap services for publishing scientific articles for each researcher in an institution or organization. So that the scientific article publication data can be utilized by academic institutions or organizations, this research will take data from Google Scholar to make a recap of scientific article publication data by applying web scraping technology. Implementation of web scraping can help to take the available resources on the web and the results can be utilized by other applications. By doing web scraping on Google Scholar, collective scientific article publication data can be obtained. So that the process of making scientific publications data recap can be done quickly. Experiments in this study have succeeded in taking 236 researchers data from Google Scholar, with 9 attributes, and 2,420 articles.

References

Nurhadi, “Pentingnya Publikasi Karya Ilmiah,” 2019. [Online]. Available: https://www.uny.ac.id/?q=berita/pentingnya-publikasi-karya-ilmiah.html

jurkimiaunnes, “Tentang Google Cendikia (Google Scholar).” [Online]. Available: http://kimia.unnes.ac.id/v1/2016/01/01/google-scholar/

W. Scraping, “General techniques used for web scraping Wiki Guide - IGN,” pp. 1–6, 2019.

V. Mitra, H. Sujaini, and A. B. P. Negara, “Rancang Bangun Aplikasi Web Scraping Untuk Korpus Paralel Indonesia - Inggris Dengan Metode HTML Dom,” Jurnal Sistem dan Teknologi Informasi (JUSTIN), 2017.

N. R. Haddaway, “The use of web-scraping software in searching for grey literature,” Grey Journal, vol. 11, no. February, pp. 186–190, 2016.

N. Ibrahim, A. Hassan, and M. Nihad, “Big data analysis of web data extraction,” International Journal of Engineering and Technology(UAE), vol. 7, no. 4, pp. 168–172, 2018. DOI: 10.14419/ijet.v7i4.37.24095

E. Ferrara, P. De Meo, G. Fiumara, and R. Baumgartner, “Knowledge-Based Systems Web data extraction , applications and techniques : A survey,” Knowledge-Based Systems, vol. 70, pp. 301–323, 2014 [Online]. DOI: 10.1016/j.knosys.2014.07.007

A. Parameswaran, N. Dalvi, H. GarciaMolina, and R. Rastogi, “Optimal schemes for robust web extraction,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 980–991, 2011.

A. Gök, A. Waterworth, and P. Shapira, “Use of web mining in studying innovation,” Scientometrics, vol. 102, no. 1, pp. 653–671, 2015. DOI: 10.1007/s11192-014-1434-0

L. K. Joshila Grace, V. Maheswari, and D. Nagamalai, “Analysis of Web Logs And Web User In Web Mining,” International Journal of Network Security & Its Applications, vol. 3, no. 1, pp. 99–110, 2011. DOI: 10.5121/ijnsa.2011.3107

E. Şt CHIFU, T. Şt LEŢIA, B. Budişan, and V. R. Chifu, “Web Harvesting and Sentiment Analysis of Consumer Feedback,” ACTA TECHNICA NAPOCENSIS Electronics and Telecommunications, vol. 56, no. 3, pp. 7–14, 2015.

P. A. Johnson, R. E. Sieber, N. Magnien, and J. Ariwi, “Automated web harvesting to collect and analyse user-generated content for tourism,” Current Issues in Tourism, vol. 15, no. 3, pp. 293–299, 2012. DOI: 10.1080/13683500.2011.555528

A. Josi, L. A. Abdillah, and Suryayusra, “Penerapan teknik web scraping pada mesin pencari artikel ilmiah,” 2014 [Online]. Available: http://arxiv.org/abs/1410.5777

N. I. Kurniati, A. Rahmatulloh, and R. N. Qomar, “Web Scraping and Winnowing Algorithms for Plagiarism Detection of Final Project Titles,” Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 10, no. 2, pp. 73–83, 2019.

A. Rahmatulloh, N. I. Kurniati, I. Darmawan, A. Z. Asyikin, and D. W. Jacob, “Comparison between the Stemmer Porter Effect and Nazief-Adriani on the Performance of Winnowing Algorithms for Measuring Plagiarism,” International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 4, pp. 1124–1128, 2019 [Online]. Available: http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8844

R. Gunawan, A. Rahmatulloh, I. Darmawan, and F. Firdaus, “Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath,” 2019. DOI: 10.2991/icoiese-18.2019.50

Downloads

Published

2020-02-26

How to Cite

Rahmatulloh, A., & Gunawan, R. (2020). Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar. Indonesian Journal of Information Systems, 2(2), 95–104. https://doi.org/10.24002/ijis.v2i2.3029