Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as nonrecursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. | Noun Phrase Chunking in Hebrew Influence of Lexical and Morphological Features Yoav Goldberg and Meni Adler and Michael Elhadad Computer Science Department Ben Gurion University of the Negev P.O.B 653 Be er Sheva 84105 Israel yoavg adlerm elhadad @cs.bgu.ac.il Abstract We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as nonrecursive noun phrases does not apply in Hebrew and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. As a confirmation we apply methods known to work well for English to Hebrew data. These methods give low results F from 76 to 86 in Hebrew. We then discuss our method which applies SVM induction over lexical and morphological features. Morphological features improve the average precision by 0.5 recall by 1 and F-measure by 0.75 resulting in a system with average performance of 93 precision 93.4 recall and 93.2 F-measure. 1 Introduction Modern Hebrew is an agglutinative Semitic language with rich morphology. Like most other non-European languages it lacks NLP resources and tools and specifically there are currently no available syntactic parsers for Hebrew. We address the task of NP chunking in Hebrew as a This work was funded by the Israel Ministry of Science and Technology under the auspices of the Knowledge Center for Processing Hebrew. Additional funding was provided by the Lynn and William Frankel Center for Computer Sciences. first step to fulfill the need for such tools. We also illustrate how this task can successfully be approached with little resource requirements and indicate how the method is applicable to other resource-scarce languages. NP chunking is the task of labelling noun phrases in natural language text. The input to this task is free text with part-of-speech tags. The output is the same text with brackets