TAILIEUCHUNG - Báo cáo khoa học: "Transforming Standard Arabic to Colloquial Arabic"

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from to on unseen CEA text, and reduces the percentage of out-ofvocabulary words from to . | Transforming Standard Arabic to Colloquial Arabic Emad Mohamed Behrang Mohit and Kemal Oflazer Carnegie Mellon University - Qatar Doha Qatar emohamed@ behrang@ ko@ Abstract We present a method for generating Colloquial Egyptian Arabic CEA from morphologically disambiguated Modern Standard Arabic MSA . When used in POS tagging this process improves the accuracy from to on unseen CEA text and reduces the percentage of out-ofvocabulary words from to . The process holds promise for any NLP task targeting the dialectal varieties of Arabic . this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects. 1. Introduction Most of the research on Arabic is focused on Modern Standard Arabic. Dialectal varieties have not received much attention due to the lack of dialectal tools and annotated texts Duh and Kirchoff 2005 . In this paper we present a rule-based method to generate Colloquial Egyptian Arabic CEA from Modern Standard Arabic MSA relying on segment-based part-of-speech tags. The transformation process relies on the observation that dialectal varieties of Arabic differ mainly in the use of affixes and function words while the word stem mostly remains unchanged. For example given the Buckwalter-encoded MSA sentence AlAxwAn Almslmwn Im yfwzwA fy AlAntxbAf the rules produce AlAxwAn Almslmyn mfAzw f AlAntxAbAt oLtAu l u-jjl jjA. Jl The Muslim Brotherhood did not win the elections . The availability of segment-based part-of-speech tags is essential since many of the affixes in MSA are ambiguous. For example Im could be either a negative particle or a question work and the word AlAxwAn could be either made of two segments Al xwAn the brothers or three segments Al xw An the two brothers . We first introduce the transformation rules and show that .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.