TAILIEUCHUNG - Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus"

The AMI Meeting Corpus is now publicly available, including manual annotation ﬁles generated in the NXT XML format, but lacking explicit metadata for the 171 meetings of the corpus. To increase the usability of this important resource, a representation format based on relational databases is proposed, which maximizes informativeness, simplicity and reusability of the metadata and annotations. | Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus Andrei Popescu-Belis and Paula Estrella ISSCO TIM ETI University of Geneva 40 bd. du Pont-d Arve 1211 Geneva 4 - Switzerland @ Abstract The AMI Meeting Corpus is now publicly available including manual annotation files generated in the NXT XML format but lacking explicit metadata for the 171 meetings of the corpus. To increase the usability of this important resource a representation format based on relational databases is proposed which maximizes informativeness simplicity and reusability of the metadata and annotations. The annotation files are converted to a tabular format using an easily adaptable XSLT-based mechanism and their consistency is verified in the process. Metadata files are generated directly in the IMDI XML format from implicit information and converted to tabular format using a similar procedure. The results and tools will be freely available with the AMI Corpus. Sharing the metadata using the Open Archives network will contribute to increase the visibility of the AMI Corpus. 1 Introduction The AMI Meeting Corpus Carletta and al. 2006 is one of the largest and most extensively annotated data sets of multimodal recordings of human interaction. The corpus contains 171 meetings in English for a total duration of ca. 100 hours. The meetings either follow the remote control design scenario or are naturally occurring meetings. In both cases they have between 3 and 5 participants. Perhaps the most valuable resources in this corpus are the high quality annotations which can be 93 used to train and test NLP tools. The existing annotation dimensions include beside transcripts forced temporal alignment named entities topic segmentation dialogue acts abstractive and extractive summaries as well as hand and head movement and posture. However these dimensions as well as the implicit metadata for the corpus are difficult to exploit .

Tấn Lợi 56 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus"

4 48 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461857 55

Giới thiệu :Lập trình mã nguồn mở

14 22601 58

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10882 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10049 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9513 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8267 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8224 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7862 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6669 253

Vật lý hạt cơ bản (1)

29 5765 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 312 1 24-04-2024

extremetech Hacking BlackBerry phần 9

31 240 0 24-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 207 0 24-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 24-04-2024

Posted prices versus bargaining in markets_7

23 155 0 24-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 136 0 24-04-2024

The profit magic of stock Timing The Markets_5

22 119 0 24-04-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 124 0 24-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 107 0 24-04-2024

HƯỚNG DẪN SỬ DỤNG PHẦN MỀM CAITA part 9

18 128 0 24-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7862 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5675 1348

Ebook Chào con ba mẹ đã sẵn sàng

112 3757 1230

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5308 1135

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8267 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3484 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10882 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3678 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4040 514

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4120 480