Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper describes a computational approach to resolving the true referent of a named mention of a person in the body of an email. A generative model of mention generation is used to guide mention resolution. Results on three relatively small collections indicate that the accuracy of this approach compares favorably to the best known techniques, and results on the full CMU Enron collection indicate that it scales well to larger collections. | Resolving Personal Names in Email Using Context Expansion Tamer Elsayed Douglas W. Oard and Galileo Namata Human Language Technology Center of Excellence and UMIACS Laboratory for Computational Linguistics and Information Processing CLIP University of Maryland College Park MD 20742 telsayed oard gnamata @umd.edu Abstract This paper describes a computational approach to resolving the true referent of a named mention of a person in the body of an email. A generative model of mention generation is used to guide mention resolution. Results on three relatively small collections indicate that the accuracy of this approach compares favorably to the best known techniques and results on the full CMU Enron collection indicate that it scales well to larger collections. 1 Introduction The increasing prevalence of informal text from which a dialog structure can be reconstructed e.g. email or instant messaging raises new challenges if we are to help users make sense of this cacophony. Large collections offer greater scope for assembling evidence to help with that task but they pose additional challenges as well. With well over 100 000 unique email addresses in the CMU version of the Enron collection Klimt and Yang 2004 common names e.g. John might easily refer to any one of several hundred people. In this paper we associate named mentions in unstructured text i.e. the body of an email and or the subject line to modeled identities. We see at least two direct applications for this work 1 helping searchers who are unfamiliar with the contents of an email collection e.g. historians or lawyers better understand the context of emails that they find and 2 augmenting more typical social networks based on senders and recipients with additional links based on references found in unstructured text. Most approaches to resolving identity can be decomposed into four sub-problems 1 finding a reference that requires resolution 2 identifying candidates 3 assembling evidence and 4 choosing .