TAILIEUCHUNG - Báo cáo khoa học: "Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction"

For a given category, choose a small set of exemplars (or 'seed words') 2. Count co-occurrence of words and seed words within a corpus 3. Use a figure of merit based upon these counts to select new seed words 4. Return to step 2 and iterate n times 5. Use a figure of merit to rank words for category membership and o u t p u t a ranked list Our algorithm uses roughly this same generic structure, but achieves notably superior results, by changing the specifics of: what counts as co-occurrence; which figures of merit to use for. | Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction Brian Roark Cognitive and Linguistic Sciences Box 1978 Brown University Providence RI 02912 USA Brian_Roark@Brown. edu Abstract Generating semantic lexicons semi-automatically could be a great time saver relative to creating them by hand. In this paper we present an algorithm for extracting potential entries for a category from an on-line corpus based upon a small set of exemplars. Our algorithm finds more correct terms and fewer incorrect ones than previous work in this area. Additionally the entries that are generated potentially provide broader coverage of the category than would occur to an individual coding them by hand. Our algorithm finds many terms not included within Wordnet many more than previous algorithms and could be viewed as an enhancer of existing broad-coverage resources. 1 Introduction Semantic lexicons play an important role in many natural language processing tasks. Effective lexicons must often include many domainspecific terms so that available broad coverage resources such as Wordnet Miller 1990 are inadequate. For example both Escort and Chinook are among other things types of vehicles a car and a helicopter respectively but neither are cited as so in Wordnet. Manually building domain-specific lexicons can be a costly time-consuming affair. Utilizing existing resources such as on-line corpora to aid in this task could improve performance both by decreasing the time to construct the lexicon and by improving its quality. Extracting semantic information from word co-occurrence statistics has been effective particularly for sense disambiguation Schiitze 1992 Gale et al. 1992 Yarowsky 1995 . In Riloff and Shepherd 1997 noun co-occurrence statistics were used to indicate nominal cate- Eugene Charniak Computer Science Box 1910 Brown University Providence RI 02912 USA gory membership for the purpose of aiding in the construction of semantic .

Hoài Giang 61 7 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Quy Trình Canh Tác Cây Bông Vải

8 164 3 23-12-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1072 2 23-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 177 2 23-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 170 1 23-12-2024

Phạm trù Chủ nghĩa cá nhân của tư tưởng phương Tây trong sự lý giải của Phan Khôi _1

9 128 0 23-12-2024

Giáo trình môn cầu đường

26 134 2 23-12-2024

CÔNG NGHỆ MÔI TRƯỜNG - CHƯƠNG 5 CƠ SỞ QUÁ TRÌNH XỬ LÝ SINH HỌC

1 141 0 23-12-2024

NGUYÊN NHÂN HÌNH THÀNH VÀ VẮN HÓA XÃ HỘI NGUYÊN THỦY_1

8 146 1 23-12-2024

Món ngon ngày lễ tết part 2

16 132 1 23-12-2024

BÁN HÀNG,NGHỀ VÀ NGHIỆP

3 121 0 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490