TAILIEUCHUNG - Báo cáo khoa học: "Parsing Arabic Dialects"

The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem of parsing transcribed spoken Levantine Arabic (LA). We do not assume the existence of any annotated LA corpus (except for development and testing), nor of a parallel corpus LAMSA. . | Parsing Arabic Dialects David Chiang Mona Diab Nizar Habash Owen Rambow Safiullah Shareef- ISI University of Southern California t CCLS Columbia University ị The Johns Hopkins University chiang@ mdiab habash rambow @ safi@ Abstract The Arabic language is a collection of spoken dialects with important phonological morphological lexical and syntactic differences along with a standard written language Modern Standard Arabic MSA . Since the spoken dialects are not officially written it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper we address the problem of parsing transcribed spoken Levantine Arabic LA . We do not assume the existence of any annotated LA corpus except for development and testing nor of a parallel corpus LA-MSA. Instead we use explicit knowledge about the relation between LA and MSA. 1 Introduction Arabic Dialects The Arabic language is a collection of spoken dialects and a standard written The dialects show phonological morphological lexical and syntactic differences comparable to those among the Romance languages. The standard written language is the same throughout the Arab world Modern Standard Arabic MSA . MSA is also used in some scripted spoken communication news casts parliamentary debates . MSA is based on Classical Arabic and is not a native language of any Arabic speaking people . children do not learn it from their parents but in school. 1 This paper is based on work done at the 2005 Johns Hopkins Summer Workshop which was partially supported by the National Science Foundation under Grant No. 0121285. Diab Habash and Rambow were supported for additional work by DARPA contract HR0011-06-C-0023 under the GALE program. We wish to thank audiences at JHU for their useful feedback. The authors are listed in alphabetical order. Most native speakers of Arabic are unable to produce sustained spontaneous MSA. Dialects vary not only along a .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.