TAILIEUCHUNG - Báo cáo khoa học: "A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA"

Researchers in both machine Iranslation (., Brown et al., 1990) and bilingual lexicography (., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports. A much larger sample of 90 million words of Canadian Hansards has been aligned and donated to the ACL/DCI. . | A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA William A. Gale Kenneth w. Church AT T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 ABSTRACT Researchers in both machine tfanslation . Brown et al. 1990 and bilingual lexicography . Klavans and Tzoukermann 1990 have recently become interested in studying parallel texts texts such as the Canadian Hansards parliamentary proceedings which are available in multiple languages French and English . This paper describes a method for aligning sentences in these parallel texts based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports. A much larger sample of 90 million words of Canadian Hansards has been aligned and donated to the ACL DCI. 1. Introduction Researchers in both machine translation . Brown et al 1990 and bilingual lexicography . Klavans and Tzoukermann 1990 have recently become interested in studying bilingual corpora bodies of text such as the Canadian Hansards parliamentary debates which are available in multiple languages such as French and English . The sentence alignment task is to identify correspondences between sentences in one language and sentences in the other language. This task is a first step toward the more ambitious task finding correspondances among The input is a paữ of texts such as Table 1. 1. In statistics string matching problems are divided into two classes alignment problems and correspondance problems. Crossing dependencies are possible in the latter but not in the former. Table 1 Input to Alignment Program English_____________________________________ According to our survey 1988 sales of mineral water and soft drinks were much higher than in 1987 reflecting the growing popularity of these products. Cola drink manufacturers in particular achieved above-average growth rates. The higher turnover was largely due to an increase in the sales volume. Employment and .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.