Peringkasan Dokumen Berbahasa Inggris Menggunakan Sebaran Local Sentence

Authors

  • Aminul Wahib Jurusan Teknik Informatika, Institut Teknologi Sepuluh Nopember
  • Agus Zainal Arifin Jurusan Teknik Informatika, Institut Teknologi Sepuluh Nopember
  • Diana Purwitasari Jurusan Teknik Informatika, Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.24002/jbi.v7i1.482

Abstract

Abstract. The number of digital documents grows very rapidly causing time waste in searching and reading the information. To overcome these problems, many document summary methods are developed to find important or key sentences from the source document. This study proposes a new strategy in summarizing English document by using local sentence distribution method to find and dig up hidden important sentence from the source document in an effort to improve quality of the summaries. Experiments are conducted on dataset DUC 2004 task 2. Measurement ROUGE-1 and ROUGE-2 are employed as a performance evaluation of the proposed method with sentence information density and sentence cluster keyword (SIDeKiCK). The experiment shows that the proposed method has better performance with an average achievement ROUGE-1 0.398, an increase of 1.5% compared to SIDeKiCK method and ROUGE-2 0.12, an increase 13% compared to SIDeKiCK method.

Keywords: Summarize Document, Important Sentences, Distribution of Local Sentence, ROUGE.

 

Abstrak. Jumlah dokumen digital yang berkembang sangat pesat menyebabkan banyaknya waktu terbuang dalam mencari dan membaca informasi. Untuk mengatasi permasalahan tersebut banyak dikembangkan metode peringkasan dokumen yang diharapkan mampu menemukan kalimat-kalimat penting dari dokumen sumber. Penelitian ini mengajukan strategi baru peringkasan dokumen berbahasa inggris menggunakan metode sebaran local sentence untuk mencari dan menggali kalimat penting yang tersembunyi dalam dokumen sumber sebagai upaya untuk meningkatkan kualitas hasil ringkasan. Uji coba dilakukan terhadap dataset task 2 DUC 2004. Pengukuran ROUGE-1 dan ROUGE-2 digunakan sebagai evaluasi performa metode yang diusulkan dengan metode lain yaitu metode sentence information density dan kata kunci cluster kalimat (SIDeKiCK). Hasil ujicoba didapatkan bahwa metode yang diusulkan memiliki performa lebih baik dengan capaian rata-rata ROUGE-1 0,398, meningkat 1,5% dibanding metode SIDeKiCK dan ROUGE-2 0,12 meningkat 13% dibanding metode SIDeKiCK.

Kata Kunci: Peringkasan Dokumen, Kalimat Penting, Sebaran Local Sentence, ROUGE.

References

. Carbonell, J., & Goldstein, J. 1998. The use of MMR, Diversity-Based Reranking for Reordering Documents Andproducing Summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Eds: Moffat, A. dan Zobel, J., ACM, Melbourne, Australia, hal. 335–336.

. He, T., Li F., Shao, W., Chen, J., & Ma, L. 2008. A New Feature-Fusion Sentence Selecting Strategy for Query-Focused Multi-document Summarization. Proceeding of International Conference Advance Language Processing and Web Information Technology. Eds: Ock C., dkk., University of Normal, Wuhan, China, hal. 81-86.

. Kogilavani, A. & Balasubramani, P. 2010. Clustering and Feature Sprecific Sentence Extraction Based Summarization of Multiple Documents. International Journal of Computer Science & Information Technology (IJCSIT). Vol. 2, No. 4, hal. 99-111.

. Kruengkrai, C. & Jaruskulchai, C.2003. Generic Text Summarization Using Local and Global Properties of Sentences. Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI’03), IEEE Computer Society Washington DC, Halifax, Canada, hal. 201-206.

. Lin, C. Y. 2004. ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of Workshop on Text Summarization Brances Out. Eds: Moens, M. F. dan Szpakowicz, S., Association for Computational Linguistics, Barcelona, hal. 74-81.

. Ouyang, Y., Li W., Zhang R., Li S., & Lu Q. 2013. A Progressive Sentence Selection Strategy for Document Summarization. Journal of information Precessing and Management. Vol. 49, Issue 1, hal. 213-221.

. Randev, D. R., Jing, H., Stys, M., & Tam, D. 2004. Centroid-Based Summarization of Multiple Documents. Journal Information Processing and Management: an International Journal, Vol. 40 Issue 6, hal. 919-938.

. Sarkar, K. 2009. Sentence Clustering-based Summarization of Multiple Text Documents. International Journal of Computing Science and Communication Technologies. Vol. 2, No. 1, hal. 325-335.

. Suputra H. G. I., Arifin Z. A., & Yuniarti A. 2013. Strategi Pemilihan Kalimat pada Peringkasan Multi-Dokumen Berdasarkan Metode Clustering Kalimat, Master Thesis of Informatics Engineering ITS.

. Tian, X. & Chai Y. 2011. An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm. Journal of Software, Vol. 6, No.3, hal 413-420.

Downloads

Published

2016-01-31