TAILIEUCHUNG - Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap"

This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation, it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google 5-gram corpus to find such characteristic contexts, we show that for the given task, our filter improves the precision of a distributional similarity. | Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap Aurelie Herbelot University of Cambridge Computer Laboratory . Thompson Avenue Cambridge ah433@ Abstract This paper deals with the task of finding generally applicable substitutions for a given input term. We show that the output of a distributional similarity system baseline can be filtered to obtain terms that are not simply similar but frequently substitutable. Our filter relies on the fact that when two terms are in a common entailment relation it should be possible to substitute one for the other in their most frequent surface contexts. Using the Google 5-gram corpus to find such characteristic contexts we show that for the given task our filter improves the precision of a distributional similarity system from 41 to 56 on a test set comprising common transitive verbs. 1 Introduction This paper looks at the task of finding word substitutions for simple statements in the context of KB querying. Let us assume that we have a knowledge base made of statements of the type subject - verb - object 1. Bank of America - acquire - Merrill Lynch 2. Lloyd s-buy-HBOS 3. Iceland - nationalise - Kaupthing Let us also assume a simple querying facility where the user can enter a word and be presented with all statements containing that word in a typical search engine fashion. If we want to return all acquisition events present in the knowledge base above as opposed to nationalisation events we might search for acquire . This will return the first statement about the acquisition of Merrill Lynch but not the second statement about HBOS. Ideally we would like a system able to generate words similar to our query so that a statement containing the verb buy gets returned when we search for acquire . This problem is closely related to the clustering of semantically similar terms which has received much attention in the literature. Systems that perform such clustering usually

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.