TAILIEUCHUNG - Báo cáo khoa học: "Encoding a Parallel Corpus for Automatic Terminology"

We present a status report about an ongoing research project in the field of (semi-)automatic terminology acquisition at the European Academy Bolzano. The main focus will be on encoding a text corpus, which serves as a basis for applying term extraction programq. The CATEx (C_omputer Terminology E~raction) project emerged from the need to support and improve, both qualitatively and quantitatively, the manual acquisition of terminological data. Thus, the main objective of CATEx is the development of a computational framework for (semi-)antomatic terminology acquisition, which consists of four modules: a parallel text corpus, term-extraction programs, a term bank linked. | Proceedings of EACL 99 Encoding a Parallel Corpus for Automatic Terminology Extraction Johann Gamper European Academy Bolzano Bozen Weggensteinstr. 12 A 39100 Bolzano Bozen Italy jgamper Abstract We present a status report about an ongoing research project in the field of semi- automatic terminology acquisition at the European Academy Bolzano. The main focus will be on encoding a text corpus which serves as a basis for applying term extraction programs. 1 Introduction Text corpora are valuable resources in all areas dealing with natural language processing in one form or another. Terminology is one of these fields where researchers explore domain-specific language material to investigate terminological issues. The manual acquisition of terminological data from text material is a very work-intensive and error-prone task. Recent advances in automatic corpus analysis favored a modern form of terminology acquisition 1 a corpus is a collection of language material in machine-readable form and 2 computer programs scan the corpus for terminologically relevant information and generate lists of term candidates which have to be post-edited by humans. The following project CATEx adopts this approach. 2 The CATEx Project Due to the equal status of the Italian and the German language in South Tyrol legal and administrative documents have to be written in both languages. A prerequisite for high quality translations is a consistent and comprehensive bilingual terminology which also forms the basis for an independent German legal language which reflects the Italian legislation. The first systematic effort in this direction was initiated a few years ago at the European Academy Bolzano Bozen with the goal to compile an Italian German legal and administrative terminology for South Tyrol. The CATEx Computer Assisted Terminology Extraction project emerged from the need to support and improve both qualitatively and quantitatively the manual acquisition of terminological data.

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.