TAILIEUCHUNG - Báo cáo khoa học: "Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web"

In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are reﬁned through sentence separation and text reﬁnement procedures and NE instances are ﬁnally tagged with the appropriate NE categories. | Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web Joohui An Dept. of CSE POSTECH Pohang Korea 790-784 minnie@ Seungwoo Lee Dept. of CSE POSTECH Pohang Korea 790-784 pinesnow@ Gary Geunbae Lee Dept. of CSE POSTECH Pohang Korea 790-784 gblee@ Abstract In this paper we present a method that automatically constructs a Named Entity NE tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are refined through sentence separation and text refinement procedures and NE instances are finally tagged with the appropriate NE categories. Our experiments demonstrates that the suggested method can acquire enough NE tagged corpus equally useful to the manually tagged one without any human intervention. 1 Introduction Current trend in Named Entity Recognition NER is to apply machine learning approach which is more attractive because it is trainable and adaptable and subsequently the porting of a machine learning system to another domain is much easier than that of a rule-based one. Various supervised learning methods for Named Entity NE tasks were successfully applied and have shown reasonably satisfiable per-formance. Zhou and Su 2002 Borthwick et al. 1998 Sassano and Utsuro 2000 However most of these systems heavily rely on a tagged corpus for training. For a machine learning approach a large corpus is required to circumvent the data sparseness problem but the dilemma is that the costs required to annotate a large training corpus are non-trivial. In this paper we suggest a method that automatically constructs an NE tagged corpus from the web to be used for learning of NER systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are refined through the sentence separation and text refinement procedures and NE .

Từ Ðông 66 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Automatic Selectional Preference Acquisition for Latin verbs"

6 57 0

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web"

8 50 0

Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora"

8 64 0

Báo cáo khoa học: "Automatic Acquisition of English Topic Signatures Based on a Second Language"

6 69 0

Báo cáo khoa học: "An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition"

8 49 0

Báo cáo khoa học: "Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web"

4 52 1

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words"

5 49 0

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT"

6 59 0

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA"

8 66 0

**Báo cáo khoa học: "AUTOMATIC ACQUISITION OF THE LEXICAL SEMANTICS OF VERBS FROM SENTENCE FRAMES*"**

8 54 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462283 61

Giới thiệu :Lập trình mã nguồn mở

14 24833 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10508 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9785 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8462 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7464 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7185 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 210 4 22-11-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 213 3 22-11-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 132 2 22-11-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 167 3 22-11-2024

Bảng màu theo chữ cái – V

11 153 2 22-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 159 2 22-11-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 146 1 22-11-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 146 1 22-11-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 152 1 22-11-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 157 1 22-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7464 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6149 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3786 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4614 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4446 490