Diacritic Restoration for Yoruba Text with under dot and Diacritic Marks Based on LSTM

dc.contributor.authorOgheneruemu, L. Kingsley
dc.contributor.authorAjao, F. Jumoke
dc.contributor.authorIsiaka, M. Rafiu
dc.contributor.authorAsahiah, O. Franklin
dc.contributor.authorOrimogunje, K. Olumide
dc.date.accessioned2026-05-16T19:49:55Z
dc.date.available2026-05-16T19:49:55Z
dc.date.issued2023-09-18
dc.description.abstractYoruba is a tonal language spoken primarily in Nigeria, some West African countries, and other parts of the world by over 40 million people. Many Yoruba texts written online lack tone marks, which can be confusing, ambiguous, and difficult for Natural Language Processing. This paper presents a method, which combines syllable-based approach and long short-term memory (LSTM) for diacritics restoration of standard Yoruba text. By enhancing the built-in varnishing gradient of RNN, the aim is intended to recover lost diacritics in Yoruba text for both characters that carry diacritic signs and underdot and return it with the proper diacritics. Data were acquired from Yoglobavoice, BBC Yoruba new and Yoruba words collected from literate indigenous writers. 27050 Yoglobalvoice datasets, 2000 Yoruba words extracted from BBC Yoruba news, and 1470 Yoruba words collected from a Yoruba language teacher. In addition, syllabic module was developed to group the tokenized word into different syllables. The output of the syllabication algorithm was fed into the Long Short-Term Memory (LSTM) module for training, the LSTM model was trained using 70% of the dataset and validated using 30% of the dataset. The result obtained showed 96% accuracy. From the result, it was observed that the use of LSTM for restoring diacritic gave an improved restoration of both character with under dot and character that contains tone-marks.
dc.identifier.issn2579-0617
dc.identifier.urihttps://kwasuspace.kwasu.edu.ng/handle/123456789/7253
dc.language.isoen
dc.publisherFUOYE Journal of Engineering and Technology
dc.titleDiacritic Restoration for Yoruba Text with under dot and Diacritic Marks Based on LSTM
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Diacritic Restoration for Yoruba Text with under dot and Diacritic Marks Based on LSTM.pdf
Size:
918.06 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: