TAILIEUCHUNG - Báo cáo khoa học: "Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT"

With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. | Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT Patrick Simianer and Stefan Riezler Department of Computational Linguistics Heidelberg University 69120 Heidelberg Germany simianer riezler @ Chris Dyer Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 USA cdyer@ Abstract With a few exceptions discriminative training in statistical machine translation SMT has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime and present a learning algorithm that applies 11112 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on million training sentences and show significant improvements over tuning discriminative models on small development sets. 1 Introduction The standard SMT training pipeline combines scores from large count-based translation models and language models with a few other features and tunes these using the well-understood line-search technique for error minimization of Och 2003 . If only a handful of dense features need to be tuned minimum error rate training can be done on small tuning sets and is hard to beat in terms of accuracy and efficiency. In contrast the promise of large-scale discriminative training for SMT is to scale to arbitrary types and numbers of features and to provide sufficient statistical support by parameter estimation on large sample sizes. Features may be lex-icalized and sparse non-local and overlapping or 11 be designed to generalize beyond surface statistics by incorporating part-of-speech or syntactic labels. The modeler s .

Diễm Lệ 89 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT"

11 67 0

Correlations between the feature of sagittal spinopelvic alignment and facet joint degeneration: A retrospective study

5 29 3

Báo cáo hóa học: "Research Article A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classiﬁcation"

11 48 0

JCD-DEA: A joint covariate detection tool for differential expression analysis on tumor expression profiles

13 37 1

JCDSA: A joint covariate detection tool for survival analysis on tumor expression profiles

8 38 1

Joint probabilistic-logical refinement of multiple protein feature predictors

14 35 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26272 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11354 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9846 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8893 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8510 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8103 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7798 1800

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7281 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đóng mới oto 8 chỗ ngồi part 9

10 181 3 31-12-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 190 5 31-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 148 2 31-12-2024

Quy Trình Canh Tác Cây Bông Vải

8 166 3 31-12-2024

Bảng màu theo chữ cái – V

11 171 2 31-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 156 4 31-12-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 162 1 31-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 180 2 31-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 166 1 31-12-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 215 1 31-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8103 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7798 1800

Ebook Chào con ba mẹ đã sẵn sàng

112 4412 1374

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6332 1274

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8893 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3848 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3922 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4727 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11354 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4512 490