Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation, it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google 5-gram corpus to find such characteristic contexts, we show that for the given task, our filter improves the precision of a distributional similarity. | Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap Aurelie Herbelot University of Cambridge Computer Laboratory J.J. Thompson Avenue Cambridge ah433@cam.ac.uk Abstract This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google 5-gram corpus to find such characteristic contexts we show that for the given task our filter improves the precision of a distributional similarity system from 41 to 56 on a test set comprising common transitive verbs. 1 Introduction This paper looks at the task of finding word substitutions for simple statements in the context of KB querying. Let us assume that we have a knowledge base made of statements of the type subject - verb - object 1. Bank of America - acquire - Merrill Lynch 2. Lloyd s-buy-HBOS 3. Iceland - nationalise - Kaupthing Let us also assume a simple querying facility where the user can enter a word and be presented with all statements containing that word in a typical search engine fashion. If we want to return all acquisition events present in the knowledge base above as opposed to nationalisation events we might search for acquire . This will return the first statement about the acquisition of Merrill Lynch but not the second statement about HBOS. Ideally we would like a system able to generate words similar to our query so that a statement containing the verb buy gets returned when we search for acquire . This problem is closely related to the clustering of semantically similar terms which has received much attention in the literature. Systems that perform such clustering usually