TAILIEUCHUNG - Báo cáo khoa học: "Paraphrasing with Bilingual Parallel Corpora"

Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. . | Paraphrasing with Bilingual Parallel Corpora Colin Bannard Chris Callison-Burch School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW callison-burch @ Abstract Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities and show how it can be refined to take contextual information into account. We evaluate our paraphrase extraction and ranking methods using a set of manual word alignments and contrast the quality with paraphrases extracted from automatic alignments. 1 Introduction Paraphrases are alternative ways of conveying the same information. Paraphrases are useful in a number of NLP applications. In natural language generation the production of paraphrases allows for the creation of more varied and fluent text Iordanskaja et al. 1991 . In multidocument summarization the identification of paraphrases allows information repeated across documents to be condensed McKeown et al. 2002 . In the automatic evaluation of machine translation paraphrases may help to alleviate problems presented by the fact that there are often alternative and equally valid ways of translating a text Pang et al. 2003 . In question answering discovering paraphrased answers may provide additional evidence that an answer is correct Ibrahim et al. 2003 . In this paper we introduce a novel method for extracting paraphrases that uses bilingual parallel corpora. Past work Barzilay and McKeown 2001 Barzilay and Lee 2003 Pang et al. 2003 Ibrahim et al. 2003 has examined the use of monolingual

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.