TAILIEUCHUNG - Báo cáo khoa học: "TOWARDS A CORE VOCABULARY FOR SYSTEM A NATURAL LANGUAGE"
The desire to construct robust and portable natural language systems has led to research o n how a core vocabulary for such systems can be defined. Stalistical methods and semantic criteria for doing this arc discussed and compared. Currcnlly it docs not seem possible to precisely define the notion of core vocabulary, but it is argued that workable criteria can nevertheless be ['. l:inally it is emplmsized that the implementation of a core vt~cabulary must be seen as a long-range research prt~gram rather than as a short-term goal. . | TOWARDS A CORE VOCABULARY FOR A NATURAL LANGUAGE SYSTEM Hubert Ix hmann IBM Deutschland GmbH Scientific Center Institute for Knowledge Based Systems Wilckensstr. la D-6900 Heidelberg Germany Email 1401 at DHDIBM I ABSTRACT The desire to construct robust and portable natural language systems has led to research on how a core vocabulary for such systems can be defined. Statistical methods and semantic criteria for doing this are discussed and compared. Currently it docs not seem possible to precisely define the notion of core vocabulary but it is argued that workable criteria can nevertheless be found finally it is emphasized that the implementation of a core vocabulary must be seen as a long-range research program rather than as a short term goal. Motivation Reascarch on natural language processing systems today strives for the construction of robust and portable A system is robust if it can handle a large variety of user inputs without giving up or producing unexpected results. A system is portable in the sense intended here if it is not geared to a single subject domain but can be ported with a reasonable effort tn a variety of subject domains. It is common understanding that there exists a central fragment of a language which I. is required for dealing with virtually any subject domain and 2. is invariant with respect to meaning and use accross subject domains. It is of course a non-trivial empirical question whether such a central fragment really exists and if so to say what it is but a number of researchers seem to share the assumption that it docs cf. . Alshawi et al. 1988 . Any robust and portable system would then have to handle this core fragment. In this paper I am concerned with a second related - assumption namely that there exists a core vocabulary which is needed for handling any subject domain. This assumption is also shared by many researchers and it underlies the production of basic vocabularies for language learning such as .
đang nạp các trang xem trước