TAILIEUCHUNG - Báo cáo khoa học: "Hierarchical Search for Word Alignment"

We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by points in F-measure, yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system | Hierarchical Search for Word Alignment Jason Riesa and Daniel Marcu Information Sciences Institute Viterbi School of Engineering University of Southern California riesa marcu @ Abstract We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked Ubest list. We score a given alignment within the forest with a flexible linear discriminative model incorporating hundreds of features and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA Model-4 baseline by points in F-measure yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system. 1 Introduction Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. It is a vital prerequisite for generating translation tables phrase tables or syntactic transformation rules. Generative alignment models like IBM Model-4 Brown et al. 1993 have been in wide use for over 15 years and while not perfect see Figure 1 they are completely unsupervised requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. Today there exist human-annotated alignments and an abundance of other information for many language pairs potentially useful for inducing accurate alignments. How can we take advantage of all of this data at our fingertips Using feature functions that encode extra information is one good way. Unfortunately as Moore 2005 points out it is usually difficult to extend a given generative model with feature functions without changing the entire generative story. This difficulty V . V Y w U w V M V U r M 3 .ợ -V o o soM J durJi Figure 1 Model-4 alignment vs. a gold standard. Circles represent links in a human-annotated alignment and .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.