Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophonemic features and first/last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30% accuracy improvement using the techniques presented in the paper. . | Typed Graph Models for Semi-Supervised Learning of Name Ethnicity Delip Rao Dept. of Computer Science Johns Hopkins University delip@cs.jhu.edu David Yarowsky Dept. of Computer Science Johns Hopkins University yarowsky@cs.jhu.edu Abstract This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophone-mic features and first last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30 accuracy improvement using the techniques presented in the paper. 1 Introduction In the highly relational world of NLP graphs are a natural way to represent relations and constraints among entities of interest. Even problems that are not obviously graph based can be effectively and productively encoded as a graph. Such an encoding will often be comprised of nodes edges that represent the relation and weights on the edges that could be a metric or a probability-based value and type information for the nodes and edges. Typed graphs are a frequently-used formalism in natural language problems including dependency parsing McDonald et al. 2005 entity disambiguation Minkov and Cohen 2007 and social networks to just mention a few. In this paper we consider the problem of identifying a personal attribute such as ethnicity from 514 only an observed first-name last-name pair. This has important consequences in targeted advertising and personalization in social networks and in gathering intelligence for business and government research. We propose a parametrized typed graph framework for this problem and perform the hidden attribute