TAILIEUCHUNG - Báo cáo khoa học: "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora"

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a precision. We also show how the results can be used in the compilation of domain-specific noun phrases. . | A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Pascale Fung Computer Science Department Columbia University New York NY 10027 Abstract We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned noisy parallel texts of Asian Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a precision. We also show how the results can be used in the compilation of domain-specific noun phrases. 1 Bilingual lexicon compilation without sentence alignment Automatically compiling a bilingual lexicon of nouns and proper nouns can contribute significantly to breaking the bottleneck in machine translation and machine-aided translation systems. Domain-specific terms are hard to translate because they often do not appear in dictionaries. Since most of these terms are nouns proper nouns or noun phrases compiling a bilingual lexicon of these word groups is an important first step. We have been studying robust lexicon compilation methods which do not rely on sentence alignment. Existing lexicon compilation methods Kupiec 1993 Smadja McKeown 1994 Kumano Hirakawa 1994 Dagan et al. 1993 Wu Xia 1994 all attempt to extract pairs of words or compounds that are translations of each other from previously sentence-aligned parallel texts. However sentence alignment Brown ei al. 1991 Kay Roscheisen 1993 Gale Church 1993 Church 1993 Chen 1993 Wu 1994 is not always practical when corpora have unclear sentence boundaries or with noisy text segments present in only one language. Our proposed algorithm for bilingual lexicon acquisition bootstraps off of corpus alignment procedures we developed earlier Fung Church 1994 Fung McKeown

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.