TAILIEUCHUNG - Báo cáo khoa học: "Identifying Word Translations in Non-Parallel Texts"

C o m m o n algorithms for sentence and word-alignment allow the automatic identification of word translations from paxalhl texts. This study suggests that the identification of word translations should also be possible with non-paxMlel and even unrelated texts. The m e t h o d proposed is based on the assumption t h a t there is a correlation between the patterns of word cooccurrences in texts of different languages. | Identifying Word Translations in Non-Parallel Texts Reinhard Rapp ISSCO Université de Geneve 54 route des Acadas Geneve Switzerland rapp@ Abstract Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word cooccurrences in texts of different languages. 1 Introduction In a number of recent studies it has been shown that word translations can be automatically derived from the statistical distribution of words in bilingual parallel texts e. g. Catizone Russell Warwick 1989 Brown et al. 1990 Dagan Church Gale 1993 Kay Rõscheisen 1993 . Most of the proposed algorithms first conduct an alignment of sentences i. e. those pairs of sentences are located that are translations of each other. In a second step a word alignment is performed by analyzing the correspondences of words in each pair of sentences. The results achieved with these algorithms have been found useful for the compilation of dictionaries for checking the consistency of terminological usage in translations and for assisting the terminological work of translators and interpreters. However despite serious efforts in the compilation of corpora Church Mercer 1993 Armstrong Thompson 1995 the availability of a large enough parallel corpus in a specific field and for a given pair of languages will always be the exception not the rule. Since the acquisition of non-parallel texts is usually much easier it would be desirable to have a program that can determine the translations of words from comparable or even unrelated texts. 2 Approach It is assumed that there is a correlation between the co-occurrences of words which are translations of each other. If - for example - in a text of one language two words

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.