TAILIEUCHUNG - Báo cáo khoa học: "Cutting the Long Tail: Hybrid Language Models for Translation Style Adaptation"

In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM, where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extensive experiments comparing different settings of the hybrid LM are reported on publicly available benchmarks based on TED talks, from Arabic to English and from English to French. The proposed models show to better exploit in-domain data. | Cutting the Long Tail Hybrid Language Models for Translation Style Adaptation Arianna Bisazza and Marcello Federico Fondazione Bruno Kessler Trento Italy bisazza federico @ Abstract In this paper we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extensive experiments comparing different settings of the hybrid LM are reported on publicly available benchmarks based on TED talks from Arabic to English and from English to French. The proposed models show to better exploit in-domain data than conventional word-based LMs for the target language modeling component of a phrase-based statistical machine translation system. 1 Introduction The translation of TED conference talks1 is an emerging task in the statistical machine translation SMT community Federico et al. 2011 . The variety of topics covered by the speeches as well as their specific language style make this a very challenging problem. Fixed expressions colloquial terms figures of speech and other phenomena recurrent in the talks should be properly modeled to produce translations that are not only fluent but that also employ the right register. In this paper we propose a language modeling technique that leverages indomain training data for style adaptation. 1http talks Hybrid class-based LMs are trained on text where only infrequent words are mapped to Part-of-Speech POS classes. In this way topicspecific words are discarded and the model focuses on generic words that we assume more useful to characterize the language style. The factorization of similar expressions made possible by this mixed text representation yields a better ngram coverage but with a much higher .

Minh Kỳ 95 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

The influence of cutting conditions and cutting tool geometry on the atomistic modeling of precision cutting

12 51 0

Effect of blade type, cutting velocity and stalk cross sectional area of black gram stalks on cutting energy and cutting force

10 72 1

ANN modeling of kerf taper angle in CO2 laser cutting and optimization of cutting parameters using Monte Carlo method

10 77 0

Research on redesign and manufacturing of an automatic roll cutting machine

8 41 1

Metric pattern cutting for women’s wear

22 105 5

A research on the effect of cutting parameters on cutting force in flat grinding using segmented grinding wheel

10 84 0

Development and performance evaluation of sucker cutting tool for banana

8 49 1

Study on parameters on cutting AAC by wires

8 26 1

Cutting Tool Materials

31 55 0

Part one: Fabric estimation for pattern cutting

10 108 2

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25946 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11336 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10544 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9836 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8500 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7243 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 276 4 24-12-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 228 3 24-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 144 2 24-12-2024

Bảng màu theo chữ cái – V

11 164 2 24-12-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1072 2 24-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 177 2 24-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 171 1 24-12-2024

The Ombudsman Enterprise and Administrative Justice

309 139 0 24-12-2024

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 134 0 24-12-2024

Sinh thái học nông nghiệp : Sinh thái học và sự phát triển Nông nghiệp part 8

8 135 0 24-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6275 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3836 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3918 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4703 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11336 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4502 490