TAILIEUCHUNG - Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora"

This paper describes a system of terminological extraction capable of handling multi-word expressions, using a powerful syntactic parser. The system includes a concordancing tool enabling the user to display the context of the collocation, . the sentence or the whole document where the collocation occurs. Since the corpora are multilingual, the system also offers an alignment mechanism for the corresponding translated documents. | Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima Violeta Seretan Eric Wehrli Language Technology Laboratory LATL Dept of Linguistics University of Geneva CH-1211 Geneva 4 Switzerland Abstract This paper describes a system of terminological extraction capable of handling multi-word expressions using a powerful syntactic parser. The system includes a concordancing tool enabling the user to display the context of the collocation . the sentence or the whole document where the collocation occurs. Since the corpora are multilingual the system also offers an alignment mechanism for the corresponding translated documents. 1 Introduction Cross-linguistic communication frequently raises the problem of the proper understanding of idiomatic expressions . multi-word expressions whose meaning differs from the composition of the individual meaning of their parts. The importance of multi-word expressions is widely recognized in the domains of translation and terminology. These expressions can usually not be translated literally and one must find adequate correspondences in the target language. This paper describes a system of terminological extraction capable of handling multi-word expressions based on a detailed linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from raw texts but rather from syntactically parsed texts. The linguistic analysis selects potential pairs of words as only the words occurring in a specific syntactic configuration will be taken into account for further statistical processing. Such a chain of processes significantly increases the quality and the relevance of the extracted collocations. This system will be applied to textual corpora from the World Trade Organisation WTO which consist in parallel documents in three languages English French and Spanish. All the examples given in this paper are taken from

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.