TAILIEUCHUNG - Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation"

This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT, which utilize left-to-right parsing and a score for a substructure. . | Splitting Long or Ill-formed Input for Robust Spoken-language Translation Osamu FURUSE 1 Setsuo YAMADA Kazuhide YAMAMOTO ATR Interpreting Telecommunications Research Laboratories 2-2 Hikaridai Seika-cho Soraku-gun Kyoto 619-0288 Japan syamada yamamoto @ Abstract This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing and does not degrade translation efficiency. The complete translation result is formed by concatenating the partial translation results of each split unit. The proposed method can be incorporated into frameworks like TDMT which utilize left-to-right parsing and a score for a substructure. Experimental results show that the proposed method gives TDMT the following advantages 1 elimination of null outputs 2 splitting of utterances into sentences and 3 robust translation of erroneous speech recognition results. 1 Introduction A spoken-language translation system requires the ability to treat long or ill-formed input. An utterance as input of a spoken-language translation system is not always one well-formed sentence. Also when treating an utterance in speech translation the speech recognition result which is the input of the translation component might be corrupted even though the input utterance is well-formed. Such a misrecognized result can cause a parsing failure and consequently no translation output would be produced. Furthermore we cannot expect that a speech recognition result includes punctuation marks such as a comma or a period between words which are useful information for parsing. 1 As a solution for treating long input long-sentence splitting techniques such as that of Current affiliation is NTT Communication Science Laboratories. 1 Punctuation marks are not used

Thuận Toàn 74 7 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation"

7 64 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462302 61

Giới thiệu :Lập trình mã nguồn mở

14 24973 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9797 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7480 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7195 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 212 4 29-11-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 214 3 29-11-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 151 3 29-11-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 170 3 29-11-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 147 1 29-11-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 149 1 29-11-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 147 4 29-11-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1067 2 29-11-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 197 7 29-11-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 157 1 29-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7480 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3797 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490