Pemberian Harakat Bahasa Arab Menggunakan Metode N-Gram dan C5.0

Ajib Hanani, Harry Soekotjo Dachlan, Purnomo Budi Santoso

Abstract


This research is about an Arabic diacritizer using N-Gram (Quad Gram) and C5.0. The input of application is an undiacritized Arabic sentence. Quad Gram and matching word to known patterns are used to diacritize the undiacritized Arabic sentence. C5.0 is used to simplify the rules for diacritizing the final diacritic mark of word. The output of application is a diacritized Arabic sentence based on morphological and syntactic rules. Then, the diacritized Arabic sentence is converted into speech and spoken by talking avatar using text to speech API. The result of N-Gram (Quad Gram) and matching word to known patterns gives an accuracy of 96% based on morphological rules. The result of C5.0 gives an accuracy of 94% based on syntactic rules. Finally, the result of integrating N-Gram (Gram Quad) and C5.0 gives an accuracy of 90% based on morphological and syntactic rules.

Index Terms — Arabic Diacritics, C5.0, N-Gram, Text to Speech.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.