Preserving Meher and Woirata Corpus Languages using Neural Machine Translation
DOI:
https://doi.org/10.24002/ijis.v6i2.8542Abstract
Research on languages, particularly regional languages, is extremely challenging to conduct because there is very little or no language corpus available, particularly for Indonesia's regional languages. This project seeks to construct a translation machine for Indonesian in Meher and Woirata languages, and vice versa. However, to be able to achieve this, a corpus of Meher and Woirata languages must first be developed. The production of this corpus was carried out through field studies, the researcher requested various speakers of this language to translate manually and then compared the results from several translators through focus group talks to identify the appropriate use of words. The outcomes of this translation process are then written in the form of a database of Indonesian-Meher and Indonesian-Woirata language pairings which will subsequently be utilized as a learning database for the translation machine that will be created. This research succeeded in collecting 714.000 words in the Meher language and 805.000 words in the Woirata language. These results were then employed as a machine translation learning corpus, the output of the translation carried out by this machine was then validated through direct assessment by speakers of the two languages. The results of this testing indicated an accuracy above 80% for both translation into the Meher language and translation into the Woirata language. From the research carried out, it can be concluded that the construction of the Meher language corpus and the Woirata language corpus which was carried out through field research was successful in gathering and establishing a language corpus for these two languages. Apart from that, the experimental results suggest that the employment of translation algorithms to convert Indonesian into regional languages and vice versa may be carried out and provide translations with acceptable accuracy. The contribution of this research is in the establishment of the Meher and Woirata language corpus so that it can be generally accessed by anyone who requires it.
Downloads
Published
How to Cite
Issue
Section
License
![Creative Commons License](http://i.creativecommons.org/l/by-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Information Systems as journal publisher holds copyright of papers published in this journal. Authors transfer the copyright of their journal by filling Copyright Transfer Form and send it to Indonesian Journal of Information Systems.
![Creative Commons License](https://i.creativecommons.org/l/by-sa/4.0/88x31.png)
Indonesian Journal of Information Systems is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.