Preserving Meher and Woirata Corpus Languages using Neural Machine Translation

Authors

  • Yulius Denny Prabowo Universitas Bina Nusantara
  • Marthen Texas A&M University
  • Nazarudin Universitas Indonesia
  • Ratumanan Universitas Pattimura
  • Martinus Universitas Atma Jaya Yogyakarta

DOI:

https://doi.org/10.24002/ijis.v6i2.8542

Abstract

Research on languages, particularly regional languages, is extremely challenging to conduct because there is very little or no language corpus available, particularly for Indonesia's regional languages. This project seeks to construct a translation machine for Indonesian in Meher and Woirata languages, and vice versa. However, to be able to achieve this, a corpus of Meher and Woirata languages must first be developed. The production of this corpus was carried out through field studies, the researcher requested various speakers of this language to translate manually and then compared the results from several translators through focus group talks to identify the appropriate use of words. The outcomes of this translation process are then written in the form of a database of Indonesian-Meher and Indonesian-Woirata language pairings which will subsequently be utilized as a learning database for the translation machine that will be created. This research succeeded in collecting 714.000 words in the Meher language and 805.000 words in the Woirata language. These results were then employed as a machine translation learning corpus, the output of the translation carried out by this machine was then validated through direct assessment by speakers of the two languages. The results of this testing indicated an accuracy above 80% for both translation into the Meher language and translation into the Woirata language. From the research carried out, it can be concluded that the construction of the Meher language corpus and the Woirata language corpus which was carried out through field research was successful in gathering and establishing a language corpus for these two languages. Apart from that, the experimental results suggest that the employment of translation algorithms to convert Indonesian into regional languages and vice versa may be carried out and provide translations with acceptable accuracy. The contribution of this research is in the establishment of the Meher and Woirata language corpus so that it can be generally accessed by anyone who requires it.

Author Biographies

Yulius Denny Prabowo, Universitas Bina Nusantara

Computer Science Department, Binus Online Learning, Universitas Bina Nusantara, Jakarta, Indonesia

Marthen, Texas A&M University

Nuclear Engineering Department, Texas A&M University, Texas, United States of America

Nazarudin, Universitas Indonesia

Departemen Linguistik, Fakultas Ilmu Pengetahuan Budaya, Universitas Indonesia, Depok, Jawa Barat

Ratumanan, Universitas Pattimura

Fakultas Keguruan dan Ilmu Kependidikan, Universitas Pattimura, Ambon, Maluku, Indonesia

Martinus, Universitas Atma Jaya Yogyakarta

Program Studi Informatika, Fakultas Teknologi Industri, Universitas Atma Jaya Yogyakarta, Daerah Istimewa Yogyakarta, Indonesia

Downloads

Published

2024-02-29

How to Cite

Prabowo, Y., Gabriel, M., Nazarudin, Ratumanan, T., & Maslim, M. (2024). Preserving Meher and Woirata Corpus Languages using Neural Machine Translation. Indonesian Journal of Information Systems, 6(2), 156–161. https://doi.org/10.24002/ijis.v6i2.8542