TAILIEUCHUNG - Báo cáo khoa học: "Modeling Sentences in the Latent Space"

Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. | Modeling Sentences in the Latent Space Weiwei Guo Mona Diab Department of Computer Science Columbia University weiwei@ Center for Computational Learning Systems Columbia University mdiab@ Abstract Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper we show that by carefully handling words that are not in the sentences missing words we can train a reliable latent variable model on sentences. In the process we propose a new evaluation framework for sentence similarity Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity. 1 Introduction Identifying the degree of semantic similarity SS between two sentences is at the core of many NLP applications that focus on sentence level semantics such as Machine Translation Kauchak and Barzi-lay 2006 Summarization Zhou et al. 2006 Text Coherence Detection Lapata and Barzilay 2005 date almost all Sentence Similarity SS approaches work in the high-dimensional word space and rely mainly on word similarity. There are two main not unrelated disadvantages to word similarity based approaches 1. lexical ambiguity as the pairwise word similarity ignores the semantic interaction between the word and its sentential context 864 2. word co-occurrence information is not sufficiently exploited. Latent variable models such as Latent Semantic Analysis LSA Landauer et al. 1998 Probabilistic Latent Semantic Analysis PLSA Hofmann 1999 Latent .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.