Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
A lexicon is an essential component in a generation system but few efforts have been made to build a rich, large-scale lexicon and make it reusable for different generation applications. In this paper, we describe our work to build such a lexicon by combining multiple, heterogeneous linguistic resources which have been developed for other purposes. Novel transformation and integration of resources is required to reuse them for generation. We also applied the lexicon to the lexical choice and realization component of a practical generation application by using a multi-level feedback architecture. . | Combining Multiple Large-Scale Resources in a Reusable Lexicon for Natural Language Generation Hongyan Jing and Kathleen McKeown Department of Computer Science Columbia University New York NY 10027 USA hjing kathy @cs.columbia.edu Abstract A lexicon is an essential component in a generation system but few efforts have been made to build a rich large-scale lexicon and make it reusable for different generation applications. In this paper we describe our work to build such a lexicon by combining multiple heterogeneous linguistic resources which have been developed for other purposes. Novel transformation and integration of resources is required to reuse them for generation. We also applied the lexicon to the lexical choice and realization component of a practical generation application by using a multi-level feedback architecture. The integration of the lexicon and the architecture is able to effectively improve the system paraphrasing power minimize the chance of grammatical errors and simplify the development process substantially. 1 Introduction Every generation system needs a lexicon and in almost every case it is acquired anew. Few efforts in building a rich large-scale and reusable generation lexicon have been presented in literature. Most generation systems are still supported by a small system lexicon with limited entries and hand-coded knowledge. Although such lexicons are reported to be sufficient for the specific domain in which a generation system works there are some obvious deficiencies 1 Hand-coding is time and labor intensive and introduction of errors is likely. 2 Even though some knowledge such as syntactic structures for a verb is domain-independent often it is re-encoded each time a new application is under development. 3 Hand-coding seriously restricts the scale and expressive power of generation systems. As natural language generation is used in more ambitious applications this sit uation calls for an improvement. Generally existing linguistic .