TAILIEUCHUNG - Báo cáo khoa học: "Discriminative Sentence Compression with Soft Syntactic Evidence"

We present a model for sentence compression that uses a discriminative largemargin learning framework coupled with a novel feature set defined on compressed bigrams as well as deep syntactic representations provided by auxiliary dependency and phrase-structure parsers. The parsers are trained out-of-domain and contain a significant amount of noise. We argue that the discriminative nature of the learning algorithm allows the model to learn weights relative to any noise in the feature set to optimize compression accuracy directly. This differs from current state-of-the-art models (Knight and Marcu, 2000) that treat noisy parse trees, for both compressed and uncompressed sentences,. | Discriminative Sentence Compression with Soft Syntactic Evidence Ryan McDonald Department of Computer and Information Science University of Pennsylvania Philadelphia PA 19104 ryantm@ Abstract We present a model for sentence compression that uses a discriminative large-margin learning framework coupled with a novel feature set defined on compressed bigrams as well as deep syntactic representations provided by auxiliary dependency and phrase-structure parsers. The parsers are trained out-of-domain and contain a significant amount of noise. We argue that the discriminative nature of the learning algorithm allows the model to learn weights relative to any noise in the feature set to optimize compression accuracy directly. This differs from current state-of-the-art models Knight and Marcu 2000 that treat noisy parse trees for both compressed and uncompressed sentences as gold standard when calculating model parameters. 1 Introduction The ability to compress sentences grammatically with minimal information loss is an important problem in text summarization. Most summarization systems are evaluated on the amount of relevant information retained as well as their compression rate. Thus returning highly compressed yet informative sentences allows summarization systems to return larger sets of sentences and increase the overall amount of information extracted. We focus on the particular instantiation of sentence compression when the goal is to produce the compressed version solely by removing words or phrases from the original which is the most common setting in the literature Knight and Marcu 2000 Riezler et al. 2003 Turner and Charniak 2005 . In this framework the goal is to find the shortest substring of the original sentence that conveys the most important aspects of the meaning. We will work in a supervised learning setting and assume as input a training set T xt yt of original sentences xt and their compressions yt. We use the Ziff-Davis corpus which is a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.