TAILIEUCHUNG - Báo cáo khoa học: "The Manually Annotated Sub-Corpus: A Community Resource For and By the People"

The Manually Annotated Sub-Corpus (MASC) project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single, usable format that can then be analyzed as it is or ported to any of a variety of other formats. | The Manually Annotated Sub-Corpus A Community Resource For and By the People Nancy Ide Department of Computer Science Vassar College Poughkeepsie NY USA ide@ Christiane Fellbaum Princeton University Princeton New Jersey USA fellbaum@ Abstract The Manually Annotated Sub-Corpus MASC project provides data and annotations to serve as the base for a communitywide annotation effort of a subset of the American National Corpus. The MASC infrastructure enables the incorporation of contributed annotations into a single usable format that can then be analyzed as it is or ported to any of a variety of other formats. MASC includes data from a much wider variety of genres than existing multiply-annotated corpora of English and the project is committed to a fully open model of distribution without restriction for all data and annotations produced or contributed. As such MASC is the first large-scale open communitybased effort to create much needed language resources for NLP. This paper describes the MASC project its corpus and annotations and serves as a call for contributions of data and annotations from the language processing community. 1 Introduction The need for corpora annotated for multiple phenomena across a variety of linguistic layers is keenly recognized in the computational linguistics community. Several multiply-annotated corpora exist especially for Western European languages and for spoken data but interestingly broadbased English language corpora with robust annotation for diverse linguistic phenomena are relatively rare. The most widely-used corpus of English the British National Corpus contains only part-of-speech annotation and although it contains a wider range of annotation types the fif- Collin Baker International Computer Science Institute Berkeley California USA collinb@ Rebecca Passonneau Columbia University New York New York UsA becky@ teen million word Open American National Corpus annotations .

Minh Giang 65 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Performance evaluation of a push-type manually operated garlic planter

9 30 1

Development and performance evaluation of a manually operated seed-cum fertilizer soybean planter

13 30 1

Abstract Manually Classes Progamming

4 52 0

Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus"

10 53 0

Báo cáo khoa học: "The Manually Annotated Sub-Corpus: A Community Resource For and By the People"

6 51 0

Báo cáo khoa học: "What lies beneath: Semantic and syntactic analysis of manually reconstructed spontaneous speech"

9 51 0

Báo cáo khoa học: "Manually Constructed Context-Free Grammar For Myanmar Syllable Structure"

6 53 1

Báo cáo khoa học: "Manually Annotated Hungarian Corpus"

4 35 0

Development of manually operated single row oil seed drill

6 73 0

Development of manually operated piston press type briquetting machine

8 69 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26135 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11350 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8507 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7766 1793

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7274 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 279 4 29-12-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 188 5 29-12-2024

Quy Trình Canh Tác Cây Bông Vải

8 164 3 29-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 29-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 154 4 29-12-2024

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 187 2 29-12-2024

Word Games with English 1

65 142 1 29-12-2024

IT Audit: EMC’s Journey to the Private Cloud

13 158 1 29-12-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 171 1 29-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 175 1 29-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7766 1793

Ebook Chào con ba mẹ đã sẵn sàng

112 4410 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6305 1268

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3843 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4720 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11350 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490