TAILIEUCHUNG - Báo cáo khoa học: "ALIGNING SENTENCES IN PARALLEL CORPORA"

In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our , the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence, the alignment computation is fast and therefore practical for application to very large collections of text. | ALIGNING SENTENCES IN PARALLEL CORPORA Peter F. Brown Jennifer c. Lai and Robert L. Mercer IBM Thomas J. Watson Research Center . Box 704 Yorktown Heights NY 10598 ABSTRACT In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our data the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence the alignment computation is fast and therefore practical for application to very large collections of text. We have used this technique to align several million sentences in the English-French Hansard corpora and have achieved an accuracy in excess of 99 in a random selected set of 1000 sentence pairs that we checked by hand. We show that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that we should expect to achieve an accuracy of between 96 and 97 . Thus the technique may be applicable to a wider variety of texts than we have yet tried. INTRODUCTION Recent work by Brown et al. Brown et al. 1988 Brown et al. 1990 has quickened anew the long dormant idea OÍ using statistical techniques to carry out machine translation from one natural language to another. The lynchpin of their approach is a large collection of pairs of sentences that are mutual translations. Beyond providing grist to the statistical mill such pairs of sentences are valuable to researchers in bilingual lexicography Kla-vans and Tzoukermann 1990 Warwick and Russell 1990 and may be useful in other approaches to machine translation Sadler. 1989 . In this paper we consider the problem of extracting from parallel French and English corpora pairs sentences that are translations of one another. The task is not trivial because at times a. single sentence in one language is translated as two or more

Ðình Toàn 81 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Reliable Measures for Aligning Japanese-English News Articles and Sentences"

8 64 0

Báo cáo khoa học: "ALIGNING SENTENCES IN PARALLEL CORPORA"

8 68 0

Báo cáo khoa học: "A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA"

8 44 0

Báo cáo khoa học: "ALIGNING SENTENCES IN BILINGUAL CORPORA USING LEXICAL INFORMATION"

8 72 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461867 55

Giới thiệu :Lập trình mã nguồn mở

14 22643 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10066 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9519 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8238 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6687 253

Vật lý hạt cơ bản (1)

29 5770 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Mass Transfer in Multiphase Systems and its Applications Part 19

40 256 1 27-04-2024

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 267 0 27-04-2024

extremetech Hacking Firefox phần 7

46 187 0 27-04-2024

Magnetic Bearings Theory and Applications phần 2

14 172 0 27-04-2024

Bơm máy nén quạt trong công nghiệp part 8

20 198 2 27-04-2024

Công nghiệp gang thép Việt Nam : Một giai đoạn phát triển và chuyển đổi chính sách mới part 5

6 194 0 27-04-2024

Hướng dẫn sử dụng Quickoffice cho Ipad và Iphone

13 151 0 27-04-2024

Giáo trình tổng quan khoa học thông tin và thư viện part 7

22 143 2 27-04-2024

New Trends and Developments in Automotive Industry Part 7

35 95 0 27-04-2024

báo cáo hóa học:" Rare ligamentum flavum cyst causing incapacitating lumbar spinal stenosis: Experience with 3 Chinese patients"

4 96 0 27-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5737 1368

Ebook Chào con ba mẹ đã sẵn sàng

112 3767 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5319 1136

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3499 643

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3684 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4046 515

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4128 480