TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA"

We exploit the resources in the Arabic Treebank (ATB) and Arabic Gigaword (AG) to determine the best features for the novel task of automatically creating lexical semantic verb classes for Modern Standard Arabic (MSA). The verbs are classified into groups that share semantic elements of meaning as they exhibit similar syntactic behavior. The results of the clustering experiments are compared with a gold standard set of classes, which is approximated by using the noisy English translations provided in the ATB to create Levin-like classes for MSA. The quality of the clusters is found to be sensitive to the inclusion. | Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA Neal Snider Linguistics Department Stanford University Stanford CA 94305 snider@ Mona Diab Center for Computational Learning Systems Columbia University New York NY 10115 mdiab@ Abstract We exploit the resources in the Arabic Treebank ATB and Arabic Gigaword AG to determine the best features for the novel task of automatically creating lexical semantic verb classes for Modern Standard Arabic MSA . The verbs are classified into groups that share semantic elements of meaning as they exhibit similar syntactic behavior. The results of the clustering experiments are compared with a gold standard set of classes which is approximated by using the noisy English translations provided in the ATB to create Levin-like classes for MSA. The quality of the clusters is found to be sensitive to the inclusion of syntactic frames LSA vectors morphological pattern and subject animacy. The best set of parameters yields an Fạ 1 score of compared to a random baseline of an Fs i score of . 1 Introduction The creation of the Arabic Treebank ATB and Arabic Gigaword AG facilitates corpus based studies of many interesting linguistic phenomena in Modern Standard Arabic MSA .1 The ATB comprises manually annotated morphological and syntactic analyses of newswire text from different Arabic sources while the AG is simply a huge collection of raw Arabic newswire text. In our ongoing project we exploit the ATB and AG to determine the best features for the novel task of automatically creating lexical semantic verb classes 1http for MSA. We are interested in the problem of classifying verbs in MSA into groups that share semantic elements of meaning as they exhibit similar syntactic behavior. This manner of classifying verbs in a language is mainly advocated by Levin 1993 . The Levin Hypothesis LH contends that verbs that exhibit similar syntactic behavior

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.