TAILIEUCHUNG - Báo cáo khoa học: "Hierarchical Search for Word Alignment"

We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can eﬃciently extract a ranked k-best list. We score a given alignment within the forest with a ﬂexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by points in F-measure, yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system | Hierarchical Search for Word Alignment Jason Riesa and Daniel Marcu Information Sciences Institute Viterbi School of Engineering University of Southern California riesa marcu @ Abstract We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked Ubest list. We score a given alignment within the forest with a flexible linear discriminative model incorporating hundreds of features and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA Model-4 baseline by points in F-measure yielding a BLEU score increase over a state-of-the-art syntax-based machine translation system. 1 Introduction Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. It is a vital prerequisite for generating translation tables phrase tables or syntactic transformation rules. Generative alignment models like IBM Model-4 Brown et al. 1993 have been in wide use for over 15 years and while not perfect see Figure 1 they are completely unsupervised requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. Today there exist human-annotated alignments and an abundance of other information for many language pairs potentially useful for inducing accurate alignments. How can we take advantage of all of this data at our fingertips Using feature functions that encode extra information is one good way. Unfortunately as Moore 2005 points out it is usually difficult to extend a given generative model with feature functions without changing the entire generative story. This difficulty V . V Y w U w V M V U r M 3 .ợ -V o o soM J durJi Figure 1 Model-4 alignment vs. a gold standard. Circles represent links in a human-annotated alignment and .

Thụy Trâm 100 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Hierarchical Search for Word Alignment"

10 81 0

Variable neighborhood formulation search approach for the multi item - capacitated lot sizing problem with time windows and setup times

22 55 0

Information retrieval techniques: Lecture 32

15 17 1

Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection

12 45 0

Product sub-vector quantization for feature indexing

15 90 1

Product sub vector quantization for feature indexing

15 86 0

A study on parameter tuning for optimal indexing on large scale datasets

7 22 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462301 61

Giới thiệu :Lập trình mã nguồn mở

14 24970 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11293 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9796 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7479 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7194 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 377 3 29-11-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 214 3 29-11-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 151 3 29-11-2024

Bảng màu theo chữ cái – V

11 155 2 29-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 160 2 29-11-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 147 1 29-11-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 158 1 29-11-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1067 2 29-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 140 1 29-11-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 146 3 29-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7479 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3796 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11293 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490