TAILIEUCHUNG - Báo cáo khoa học: "M AX S IM: A Maximum Similarity Metric for Machine Translation Evaluation"

We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences. Unlike most metrics, we compute a similarity score between items across the two sentences. We then find a maximum weight matching between the items such that each item in one sentence is mapped to at most one item in the other sentence. | Max Sim a Maximum Similarity Metric for Machine Translation Evaluation Yee Seng Chan and Hwee Tou Ng Department of Computer Science National University of Singapore Law Link Singapore 117590 chanys nght @ Abstract We propose an automatic machine translation MT evaluation metric that calculates a similarity score based on precision and recall of a pair of sentences. Unlike most metrics we compute a similarity score between items across the two sentences. We then find a maximum weight matching between the items such that each item in one sentence is mapped to at most one item in the other sentence. This general framework allows us to use arbitrary similarity functions between items and to incorporate different information in our comparison such as n-grams dependency relations etc. When evaluated on data from the ACL-07 MT workshop our proposed metric achieves higher correlation with human judgements than all 11 automatic MT evaluation metrics that were evaluated during the workshop. 1 Introduction In recent years machine translation MT research has made much progress which includes the introduction of automatic metrics for MT evaluation. Since human evaluation of MT output is time consuming and expensive having a robust and accurate automatic MT evaluation metric that correlates well with human judgement is invaluable. Among all the automatic MT evaluation metrics BLEU Papineni et al. 2002 is the most widely used. Although BLEU has played a crucial role in the progress of MT research it is becoming evident that BLEU does not correlate with human judgement well enough and suffers from several other deficiencies such as the lack of an intuitive interpretation of its scores. During the recent ACL-07 workshop on statistical MT Callison-Burch et al. 2007 a total of 11 automatic MT evaluation metrics were evaluated for correlation with human judgement. The results show that as compared to BLEU several recently proposed metrics such as Semantic-role overlap .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.