TAILIEUCHUNG - Báo cáo khoa học: "Joint Inference of Named Entity Recognition and Normalization for Tweets"

Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. | Joint Inference of Named Entity Recognition and Normalization for Tweets Xiaohua Liu t Ming Zhou t Furu Wei t Zhongyang Fu Xiangyang Zhou School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai 200240 China B School of Computer Science and Technology Shandong University Jinan 250100 China tMicrosoft Research Asia Beijing 100190 China t xiaoliu fuwei mingzhou @ v-xzho@ Abstract Tweets represent a critical source of fresh information in which named entities occur frequently with rich variations. We study the problem of named entity normalization NEN for tweets. Two main challenges are the errors propagated from named entity recognition NER and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly our model introduces a binary random variable for each pair of words with the same lemma across similar tweets whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set and show that our method outperforms the baseline that handles these two tasks separately boosting the F1 from to for NER and the Accuracy from to for NEN respectively. 1 Introduction Tweets short messages of less than 140 characters shared through the Twitter service 1 have become an important source of fresh information. As a result the task of named entity recognition NER for tweets which aims to identify mentions of rigid designators from tweets belonging to named-entity types such as persons organizations and locations 2007 has attracted increasing research interest. For example Ritter et al. 2011 develop a system that exploits a CRF model to segment named 1 http 526 entities and then uses a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.