TAILIEUCHUNG - Báo cáo khoa học: "Arabic Language Modeling with Finite State Transducers"

In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms (Kirchhoff et al., 2006). This exacerbates the out-of-vocabulary (OOV) problem. Test set words are more likely to be unknown, limiting the effectiveness of the model. The goal of this study is to use the regularities of Arabic inflectional morphology to reduce the OOV problem in that language. | Arabic Language Modeling with Finite State Transducers Ilana Heintz Department of Linguistics The Ohio State University Columbus OH Abstract In morphologically rich languages such as Arabic the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms Kirchhoff et al. 2006 . This exacerbates the out-of-vocabulary OOV problem. Test set words are more likely to be unknown limiting the effectiveness of the model. The goal of this study is to use the regularities of Arabic inflectional morphology to reduce the OOV problem in that language. We hope that success in this task will result in a decrease in word error rate in Arabic automatic speech recognition. 1 Introduction The task of language modeling is to predict the next word in a sequence of words Jelinek et al. 1991 . Predicting words that have not yet been seen is the main obstacle Gale and Sampson 1995 and is called the Out of Vocabulary OOV problem. In morphologically rich languages the OOV problem is worsened by the increased number of morpheme combinations. Berton et al. 1996 and Geutner 1995 approached this problem in German finding that language models built on decomposed words reduce the OOV rate of a test set. In Carki et al. 2000 Turkish words are split into syllables for language modeling also reducing the OOV rate but not improving This work was supported by a student-faculty fellowship from the AFRL Dayton Area Graduate Studies Insititute and worked on in partnership with Ray Slyh and Tim Anderson of the Air Force Research Labs. WER . Morphological decomposition is also used to boost language modeling scores in Korean Kwon 2000 and Finnish Hirsimaki et al. 2006 . We approach the processing of Arabic morphology both inflectional and derivational with finite state machines FSMs . We use a technique that produces many morphological analyses for each word retaining information about possible stems affixes

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.