TAILIEUCHUNG - Báo cáo khoa học: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation"

In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. | Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation Bing Xiang and Abraham Ittycheriah IBM T. J. Watson Research Center Yorktown Heights NY 10598 bxiang abei @ Abstract In this paper we present a novel discriminative mixture model for statistical machine translation SMT . We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways such as based on feature types word alignments or domains for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task. 1 Introduction Significant progress has been made in statistical machine translation SMT in recent years. Among all the proposed approaches the phrasebased method Koehn et al. 2003 has become the widely adopted one in SMT due to its capability of capturing local context information from adjacent words. There exists significant amount of work focused on the improvement of translation performance with better features. The feature set could be either small at the order of 10 or large up to millions . For example the system described in Koehn 424 et al. 2003 is a widely known one using small number of features in a maximum-entropy log-linear model Och and Ney 2002 . The features include phrase translation probabilities lexical probabilities number of phrases and language model scores etc. The feature weights are usually optimized with minimum error rate training MERT as in Och 2003 . Besides the MERT-based feature .

Anh Khoa 56 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462051 59

Giới thiệu :Lập trình mã nguồn mở

14 23747 74

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11118 535

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10355 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9635 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8630 1148

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8356 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6976 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6695 1606

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 214 2 26-06-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 179 3 26-06-2024

MẪU CHỨNG CHỈ QUẢN LÝ VŨ KHÍ, VẬT LIỆU NỔ, CCHT

1 142 0 26-06-2024

Báo cáo khoa học: " Principaux critères économiques de gestion des forêts : analyse critique et comparative"

29 107 0 26-06-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 199 5 26-06-2024

The Constituents of Medicinal Plants

185 136 0 26-06-2024

ĐỀ ÔN TẬP THI ĐH & CĐ NĂM 2011 MÔN VẬT LÍ

6 112 0 26-06-2024

Báo cáo khoa học: X-ray crystallographic and enzymatic analyses of shikimate dehydrogenase from Staphylococcus epidermidis

15 112 0 26-06-2024

Word Games with English 1

65 105 0 26-06-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 109 0 26-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6695 1606

Ebook Chào con ba mẹ đã sẵn sàng

112 4003 1299

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5685 1193

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8630 1148

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3633 665

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3845 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4378 543

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11118 535

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4291 483