TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Learning of Field Segmentation Models for Information Extraction"

The applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, such as classified advertisements and bibliographic citations, small amounts of prior knowledge can be used to learn effective models in a primarily unsupervised fashion. Although hidden Markov models (HMMs) provide a suitable generative model for field structured text, general unsupervised HMM learning fails to learn useful structure in either of our domains. However, one can dramatically improve the quality of the learned structure by exploiting simple prior knowledge of the desired solutions | Unsupervised Learning of Field Segmentation Models for Information Extraction Trond Grenager Computer Science Department Stanford University Stanford CA 94305 grenager@cs. Dan Klein Computer Science Division . Berkeley Berkeley CA 94709 klein@ Christopher D. Manning Computer Science Department Stanford University Stanford CA 94305 manning@ Abstract The applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks such as classified advertisements and bibliographic citations small amounts of prior knowledge can be used to learn effective models in a primarily unsupervised fashion. Although hidden Markov models HMMs provide a suitable generative model for field structured text general unsupervised HMM learning fails to learn useful structure in either of our domains. However one can dramatically improve the quality of the learned structure by exploiting simple prior knowledge of the desired solutions. In both domains we found that unsupervised methods can attain accuracies with 400 unlabeled examples comparable to those attained by supervised methods on 50 labeled examples and that semi-supervised methods can make good use of small amounts of labeled data. 1 Introduction Information extraction is potentially one of the most useful applications enabled by current natural language processing technology. However unlike general tools like parsers or taggers which generalize reasonably beyond their training domains extraction systems must be entirely retrained for each application. As an example consider the task of turning a set of diverse classified advertisements into a queryable database each type of ad would require tailored training data for a supervised system. Approaches which required little or no training data would therefore provide substantial resource savings and extend the practicality

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.