TAILIEUCHUNG - Báo cáo khoa học: "An Integrated Term-Based Corpus Query System"

In this paper we describe the X-TRACT workbench, which enables efficient termbased querying against a domain-specific literature corpus. Its main aim is to aid domain specialists in locating and extracting new knowledge from scientific literature corpora. Before querying, a corpus is automatically terminologically analysed by the ATRACT system, which performs terminology recognition based on the C/NCvalue method enhanced by incorporation of term variation handling. The results of terminology processing are annotated in XML, and the produced XML documents are stored in an XML-native database. All corpus retrieval operations are performed against this database using an XML query language. We. | An Integrated Term-Based Corpus Query System Irena Spasic Goran Nenadic Computer Science Dept of Computation University of Salford UMIST I. Spasic@ Kostas Manios Computer Science University of Salford @ Sophia Ananiadou Computer Science University of Salford Abstract In this paper we describe the X-TRACT workbench which enables efficient termbased querying against a domain-specific literature corpus. Its main aim is to aid domain specialists in locating and extracting new knowledge from scientific literature corpora. Before querying a corpus is automatically terminologically analysed by the ATRACT system which performs terminology recognition based on the C NC-value method enhanced by incorporation of term variation handling. The results of terminology processing are annotated in XML and the produced XML documents are stored in an XML-native database. All corpus retrieval operations are performed against this database using an XML query language. We illustrate the way in which the X-TRACT workbench can be utilised for knowledge discovery literature mining and conceptual information extraction. 1 Introduction New scientific discoveries usually result in an abundance of publications verbalising these findings in an attempt to share new knowledge with other scientists. Electronically available texts are continually being created and updated and thus the knowledge represented in such texts is more up-to-date than in any other media. The sheer amount of published papers1 makes it difficult for a human to efficiently 1 For example the Medline database PubMed currently contains over 12 million abstracts in the domains of molecular biology biomedicine and medicine growing by more than abstracts each month. localise the information of interest not only in a collection of documents but also within a single document. The growing number of electronically available .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.