Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program. | Choosing the Word Most Typical in Context Using a Lexical Co-occurrence Network Philip Edmonds Department of Computer Science University of Toronto Toronto Canada M5S 3G4 pedmondsScs.toronto.edu Abstract This paper presents a partial solution to a component of the problem of lexical choice choosing the synonym most typical or expected in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program. 1 Introduction Recent work views lexical choice as the process of mapping from a set of concepts in some representation of knowledge to a word or phrase Elhadad 1992 Stede 1996 . When the same concept admits more than one lexicalization it is often difficult to choose which of these synonyms is the most appropriate for achieving the desired pragmatic goals but this is necessary for high-quality machine ưanslation and natural language generation. Knowledge-based approaches to representing the potentially subtle differences between synonyms have suffered from a serious lexical acquisition bottleneck DiMarco Hirst and Stede 1993 Hirst 1995 . Statistical approaches which have sought to explicitly represent differences between pairs of synonyms with respect to their occurrence with other specific words Church et al. 1994 are inefficient in time and space. This paper presents a new statistical approach to modeling context that provides a preliminary solution to an important sub-problem that of determining the nearsynonym that is most typical or expected if any in a given context. Although weaker than full lexical choice because it doesn t choose the best word we believe that it is a necessary first step because it would allow one to determine the effects of choosing a non-typical word in place of the typical word.