TAILIEUCHUNG - Báo cáo khoa học: "An Unsupervised System for Identifying English Inclusions in German Text"

We present an unsupervised system that exploits linguistic knowledge resources, namely English and German lexical databases and the World Wide Web, to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classiﬁcation results of our system and compare them to the performance of a trained machine learner in a series of in- and crossdomain experiments. | An Unsupervised System for Identifying English Inclusions in German Text Beatrice Alex School of Informatics University of Edinburgh Edinburgh EH8 9LW Uk v1balex@ Abstract We present an unsupervised system that exploits linguistic knowledge resources namely English and German lexical databases and the World Wide Web to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classification results of our system and compare them to the performance of a trained machine learner in a series of in- and crossdomain experiments. 1 Introduction The recognition of foreign words and foreign named entities NEs in otherwise mono-lingual text is beyond the capability of many existing approaches and is only starting to be addressed. This language mixing phenomenon is prevalent in German where the number of anglicisms has increased considerably. We have developed an unsupervised and highly efficient system that identifies English inclusions in German text by means of a computationally inexpensive lookup procedure. By unsupervised we mean that the system does not require any annotated training data and only relies on lexicons and the Web. Our system allows linguists and lexicographers to observe language changes over time and to investigate the use and frequency of foreign words in a given language and domain. The output also represents valuable information for a number of ap- plications including polyglot text-to-speech TTS synthesis and machine translation MT . We will first explain the issue of foreign inclusions in German text in greater detail with examples in Section 2. Sections 3 and 4 describe the data we used and the architecture of our system. In Section 5 we provide an evaluation of the system output and compare the results with those of a series of in- and cross-domain machine learning experiments outlined in Section 6. We conclude and outline future work in Section

Xuân Huy 67 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Accuracy assessment of supervised and unsupervised classification using Landsat-8 Imagery of D-7 shahapur branch canal of UKP Command Area Karnataka, India

12 81 0

An evaluation method for unsupervised anomaly detection algorithms

14 96 0

An automated unsupervised discretization method: A novel approach

22 28 3

Báo cáo khoa học: "Unsupervised Relation Discovery with Sense Disambiguation"

9 43 0

Báo cáo khoa học: "Unsupervised Semantic Role Induction with Global Role Ordering"

5 77 0

Báo cáo khoa học: "Towards the Unsupervised Acquisition of Discourse Relations"

5 58 0

Báo cáo khoa học: "Unsupervised Morphology Rivals Supervised Morphology for Arabic MT"

6 51 0

Báo cáo khoa học: "Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the 0"

9 73 0

Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining"

9 43 0

Báo cáo khoa học: "Fully Unsupervised Core-Adjunct Argument Classiﬁcation"

11 47 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462090 59

Giới thiệu :Lập trình mã nguồn mở

14 23884 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11133 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10380 459

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9659 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8678 1151

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8365 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7949 2251

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7001 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6817 1619

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 357 1 05-07-2024

Truyện kiếm hiệp - Duy ngã độc tôn phần 5/7

1 120 0 05-07-2024

MẪU GIẤY PHÉP VẬN TẢI LOẠI C

2 141 0 05-07-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 114 0 05-07-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 164 2 05-07-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 205 5 05-07-2024

ĐỀ THI THỬ ĐH NĂM 2011 MÔN VẬT LÍ _ ĐỀ SỐ 101

7 122 0 05-07-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 205 2 05-07-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 122 0 05-07-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 129 2 05-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7949 2251

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6817 1619

Ebook Chào con ba mẹ đã sẵn sàng

112 4055 1306

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5762 1206

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8678 1151

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3662 668

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3854 602

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4424 548

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11133 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4309 486