TAILIEUCHUNG - Báo cáo khoa học: "Improving the Accuracy of Subcategorizations Acquired from Corpora"

This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a conﬁdence value of each SCF using corpus-based statistics, and then perform clustering of SCF conﬁdencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. | Improving the Accuracy of Subcategorizations Acquired from Corpora Naoki Yoshinaga Department of Computer Science University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-0033 yoshinag@ Abstract This paper presents a method of improving the accuracy of subcategorization frames SCFs acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics and then perform clustering of SCF confidencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corpora using lexicons of two large-scale lexicalized grammars. The resulting SCFs achieve higher precision and recall compared to SCFs obtained by naive frequency cut-off. 1 Introduction Recently a variety of methods have been proposed for acquisition of subcategorization frames SCFs from corpora surveyed in Korhonen 2002 . One interesting possibility is to use these techniques to improve the coverage of existing large-scale lexicon resources such as lexicons of lexi-calized grammars. However there has been little work on evaluating the impact of acquired SCFs with the exception of Carroll and Fang 2004 . The problem when we integrate acquired SCFs into existing lexicalized grammars is lower quality of the acquired SCFs since they are acquired in an unsupervised manner rather than being manually coded. If we attempt to compensate for the poor precision by being less strict in filtering out less likely SCFs then we will end up with a larger number of noisy lexical entries which is problematic for parsing with lexicalized grammars Sarkar et al. 2000 . We thus need some method of selecting the most reliable set of SCFs from the system output as demonstrated in Korhonen 2002 . In this paper I present a method of improving the accuracy of SCFs acquired from corpora in order to augment existing lexicon resources. I first estimate a confidence value that a word can have each SCF using

Hoài Trang 62 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Improving QA Accuracy by Question Inversion"

8 49 0

Báo cáo khoa học: "Improving the Accuracy of Subcategorizations Acquired from Corpora"

6 55 0

Improving the accuracy of the calibration method for structured light system

6 51 0

Báo cáo y học: "The first Irish genome and ways of improving sequence accuracy"

1 34 0

báo cáo hóa học:" Improving accuracy of total knee component cementation: description of a simple technique"

4 31 0

On improving the time synchronization precision in the electric power system

6 51 0

An approach for improving accuracy of change detection in multi-temopral sar images

8 73 0

LSCplus: A fast solution for improving long read accuracy by short read alignment

9 39 1

TarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine

13 38 1

From squiggle to basepair: Computational approaches for improving nanopore sequencing read accuracy

11 10 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461949 55

Giới thiệu :Lập trình mã nguồn mở

14 23135 64

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10989 531

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10186 451

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9572 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8396 1136

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8278 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7896 2234

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6839 256

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6143 1488

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 268 0 22-05-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 175 0 22-05-2024

Management and Services Part 1

10 171 0 22-05-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 190 0 22-05-2024

MySQL Database Usage & Administration PHẦN 7

37 168 0 22-05-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 147 0 22-05-2024

Hệ thống làm lạnh và điều hòa không khí

21 134 0 22-05-2024

Thương hiệu sản phẩm làng nghề: Đã ít, lại thiếu tính cạnh tranh

5 125 0 22-05-2024

Quy Trình Canh Tác Cây Bông Vải

8 117 0 22-05-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 132 0 22-05-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7896 2234

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6143 1488

Ebook Chào con ba mẹ đã sẵn sàng

112 3789 1255

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5426 1140

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8396 1136

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3552 656

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3759 544

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10989 531

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4172 523

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4192 483