Issue |
MATEC Web Conf.
Volume 252, 2019
III International Conference of Computational Methods in Engineering Science (CMES’18)
|
|
---|---|---|
Article Number | 03006 | |
Number of page(s) | 5 | |
Section | Computational Artificial Intelligence | |
DOI | https://doi.org/10.1051/matecconf/201925203006 | |
Published online | 14 January 2019 |
Neural machine translation system for the Kazakh language based on synthetic corpora
Al-Farabi Kazakh National University, Almaty, Kazakhstan
* Corresponding author: ualsher.tukeyev@gmail.com
The lack of big parallel data is present for the Kazakh language. This problem seriously impairs the quality of machine translation from and into Kazakh. This article considers the neural machine translation of the Kazakh language on the basis of synthetic corpora. The Kazakh language belongs to the Turkic languages, which are characterised by rich morphology. Neural machine translation of natural languages requires large training data. The article will show the model for the creation of synthetic corpora, namely the generation of sentences based on complete suffixes for the Kazakh language. The novelty of this approach of the synthetic corpora generation for the Kazakh language is the generation of sentences on the basis of the complete system of suffixes of the Kazakh language. By using generated synthetic corpora we are improving the translation quality in neural machine translation of Kazakh-English and Kazakh-Russian pairs.
© The Authors, published by EDP Sciences, 2019
This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.