Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information"

Quảng Ðạt 66 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. | High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information Masahiko Haruno Takefumi Yamazaki NTT Communication Science Labs. 1-2356 Take Yokosuka-Shi Kanagawa 238-03 Japan haruno@nttkb.ntt.jp yamazaki@nttkb.ntt.jp Abstract This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English there is a limitation on the amount of word correspondences that can be statistically acquired. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs anchors that correspond to each other by relaxing parameters. The method by combining two kinds of word correspondences achieves adequate word correspondences for complete alignment. As a result texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese-English texts. 1 Introduction Corpus-based approaches based on bilingual texts are promising for various applications i.e. lexical knowledge extraction Kupiec 1993 Matsumoto et al. 1993 Smadja et al. 1996 Dagan and Church 1994 Kumano and Hirakawa 1994 Haruno et al. 1996 machine translation Brown and others 1993 Sato and Nagao 1990 Kaji et al. 1992 and information retrieval Sato 1992 . Most of these works assume voluminous aligned corpora. Many methods have been proposed to align bilingual corpora. One of the major approaches is based on the statistics of simple features such as sentence length in words Brown and others 1991 or in characters Gale and Church 1993 . These techniques are widely used because they can be imple mented in an efficient and simple way through .

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.