TAILIEUCHUNG - Báo cáo khoa học: "Incorporating Context Information for the Extraction of Terms"

The information used for the extraction of terms can be considered as rather 'internal', . coming from the candidate string itself. This paper presents the incorporation of 'external' information derived from the context of the candidate string. It is embedded to the C-value approach for automatic term recognition (ATR), in the form of weights constructed from statistical characteristics of the context words of the candidate string. | Incorporating Context Information for the Extraction of Terms Katerina T. Frantzi Dept of Computing Manchester Metropolitan University Manchester Ml 5GD . Abstract The information used for the extraction of terms can be considered as rather internal . coming from the candidate string itself. This paper presents the incorporation of external information derived from the context of the candidate string. It is embedded to the C-value approach for automatic term recognition ATR in the form of weights constructed from statistical characteristics of the context words of the candidate string. 1 Introduction Ỉ. Related Work The applications of term recognition specialised dictionary construction and maintenance human and machine translation text categorization etc. and the fact that new terms appear with high speed in some domains . in computer science enforce the need for automating the extraction of terms. ATR also gives the potential to work with large amounts of real data that it would not be able to handle manually. We should note that by ATR we neither mean dictionary string matching nor term interpretation which deals with the relations between terms and concepts . Terms may consist of either one or more words. When the aim is the extraction of single-word terms domain-dep endent linguistic information . morphology is used Ananiadou 1994 . Multi-word ATR usually uses linguistic information in the form of a grammar that mainly allows noun phrases or compounds to be extracted as candidate terms Bourigault 1992 extracts maximal-length noun phrases and their subgroups depending on their grammatical structure and position as candidate terms. Dagan and Church 1994 accept sequen-cies of nouns which give them high precision but not such a good recall as that of Justeson and Katz 1995 which allow some prepositions . of to be part of the extracted candidate terms. Frantzi and Ananiadou 1996 stand between these two approaches allowing the

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.