Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse Representation Structures), including events, thematic roles, named entities, anaphora, scope, and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators (i.e. linguists) can log in and adjust linguistic analyses in real time, at various levels of analysis, such as boundaries (tokens, sentences) and tags (part of speech, lexical categories) | A platform for collaborative semantic annotation Valerio Basile and Johan Bos and Kilian Evang and Noortje Venhuizen v.basile johan.bos k.evang n.j.venhuizen @rug.nl Center for Language and Cognition Groningen CLCG University of Groningen The Netherlands Abstract Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures derivations in Combinatory Categorial Grammar and semantic representations Discourse Representation Structures including events thematic roles named entities anaphora scope and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators i.e. linguists can log in and adjust linguistic analyses in real time at various levels of analysis such as boundaries tokens sentences and tags part of speech lexical categories . The demo will illustrate the different features of the platform including navigation visualization and editing. 1 Introduction Data-driven approaches in computational semantics are still rare because there are not many large annotated resources that provide empirical information about anaphora presupposition scope events tense thematic roles named entities word senses ellipsis discourse segmentation and rhetorical relations in a single formalism. This is not surprising as it is challenging and time-consuming to create such a resource from scratch. Nevertheless our objective is to develop a large annotated corpus of Discourse Representation Structures Kamp and Reyle 1993 comprising most of the aforementioned phenomena the Groningen Meaning Bank GMB . We aim to reach this goal by 1. Providing a wiki-like platform supporting collaborative annotation efforts 2. Employing state-of-the-art NLP software for bootstrapping semantic analysis 3. Giving real-time feedback of annotation adjustments in their