TAILIEUCHUNG - Báo cáo khoa học: "A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing"

Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity. Using a treebank grammar, a data-driven lexicon, and a linguistically motivated unknown-tokens handling technique our model outperforms previous pipelined, integrated or factorized systems for Hebrew morphological and syntactic processing, yielding an. | A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing Yoav Goldberg Reut Tsarfaty Ben Gurion University of the Negev Institute for Logic Language and Computation Department of Computer Science University of Amsterdam POB 653 Be er Sheva 84105 Israel Plantage Muidergracht 24 Amsterdam NL yoavg@ rtsarfat@ Abstract Morphological processes in Semitic languages deliver space-delimited words which introduce multiple distinct syntactic units into the structure of the input sentence. These words are in turn highly ambiguous breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity. Using a treebank grammar a data-driven lexicon and a linguistically motivated unknown-tokens handling technique our model outperforms previous pipelined integrated or factorized systems for Hebrew morphological and syntactic processing yielding an error reduction of 12 over the best published results so far. 1 Introduction Current state-of-the-art broad-coverage parsers assume a direct correspondence between the lexical items ingrained in the proposed syntactic analyses the yields of syntactic parse-trees and the space-delimited tokens henceforth tokens that constitute the unanalyzed surface forms utterances . In Semitic languages the situation is very different. In Modern Hebrew Hebrew a Semitic language with very rich morphology particles marking conjunctions prepositions complementizers and rela-tivizers are bound elements prefixed to the word Glinert 1989 . The Hebrew token bcl 1 for example stands for the complete prepositional phrase 1We adopt here the transliteration of Sima an et al. 2001 . in the shadow . This token may further embed into a larger utterance . bcl hneim literally in-the-shadow the-pleasant meaning .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.