TAILIEUCHUNG - Báo cáo khoa học: "Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies"

Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches, proposed in (Fung and McKeown, 1994) and (Somers, 1998), respectively, for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory | Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach Some Studies Niladri Chatterjee Department of Mathematics Indian Institute of Technology Delhi Hauz Khas New Delhi INDIA - 110016 niladri_iitd@ Saumya Agrawal Department of Mathematics Indian Institute of Technology Kharagpur West Bengal INDIA-721302 saumya_agrawal2000@ Abstract Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches proposed in Fung and McKeown 1994 and Somers 1998 respectively for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory. However subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus. The present paper discusses the new version of the algorithm and its performance in detail. 1 Introduction Several approaches including statistical techniques Gale and Church 1991 Brown et al. 1993 lexical techniques Huang and Choi 2000 Tiedemann 2003 and hybrid techniques Ahren-berg et al. 2000 have been pursued to design schemes for word alignment which aims at establishing links between words of a source language and a target language in a parallel corpus. All these schemes rely heavily on rich linguistic resources either in the form of huge data of parallel texts or various language grammar related tools such as parser tagger morphological analyser etc. Recency vector based approach has been proposed as an alternative strategy for word alignment. Approaches based on recency vectors typically consider the positions of the word in the corresponding texts .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.