TAILIEUCHUNG - Báo cáo khoa học: "Improving Domain-Specific Word Alignment for Computer Assisted Translation"

In general, it is not quite hard to obtain a large-scale general bilingual corpus while the available domain-specific bilingual corpus is usually quite small. Thus, we use the bilingual corpus in the general domain to improve word alignments for general words and the corpus in the specific domain for domain-specific words. In other words, we will adapt the word alignment information in the general domain to the specific domain. | Improving Domain-Specific Word Alignment for Computer Assisted Translation WU Hua WANG Haifeng Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing China 100738 wuhua wanghaifeng @ .cn Abstract This paper proposes an approach to improve word alignment in a specific domain in which only a small-scale domain-specific corpus is available by adapting the word alignment information in the general domain to the specific domain. This approach first trains two statistical word alignment models with the large-scale corpus in the general domain and the small-scale corpus in the specific domain respectively and then improves the domain-specific word alignment with these two models. Experimental results show a significant improvement in terms of both alignment precision and recall. And the alignment results are applied in a computer assisted translation system to improve human translation efficiency. 1 Introduction Bilingual word alignment is first introduced as an intermediate result in statistical machine translation SMT Brown et al. 1993 . In previous alignment methods some researchers modeled the alignments with different statistical models Wu 1997 Och and Ney 2000 Cherry and Lin 2003 . Some researchers use similarity and association measures to build alignment links Ahrenberg et al. 1998 Tufis and Barbu 2002 . However All of these methods require a large-scale bilingual corpus for training. When the large-scale bilingual corpus is not available some researchers use existing dictionaries to improve word alignment Ker and Chang 1997 . However few works address the problem of domain-specific word alignment when neither the large-scale domain-specific bilingual corpus nor the domain-specific translation dictionary is available. This paper addresses the problem of word alignment in a specific domain where only a small domain-specific corpus is available. In the domain-specific corpus there

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.