TAILIEUCHUNG - Báo cáo khoa học: "Typed Graph Models for Semi-Supervised Learning of Name Ethnicity"

This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophonemic features and first/last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30% accuracy improvement using the techniques presented in the paper. . | Typed Graph Models for Semi-Supervised Learning of Name Ethnicity Delip Rao Dept. of Computer Science Johns Hopkins University delip@ David Yarowsky Dept. of Computer Science Johns Hopkins University yarowsky@ Abstract This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophone-mic features and first last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30 accuracy improvement using the techniques presented in the paper. 1 Introduction In the highly relational world of NLP graphs are a natural way to represent relations and constraints among entities of interest. Even problems that are not obviously graph based can be effectively and productively encoded as a graph. Such an encoding will often be comprised of nodes edges that represent the relation and weights on the edges that could be a metric or a probability-based value and type information for the nodes and edges. Typed graphs are a frequently-used formalism in natural language problems including dependency parsing McDonald et al. 2005 entity disambiguation Minkov and Cohen 2007 and social networks to just mention a few. In this paper we consider the problem of identifying a personal attribute such as ethnicity from 514 only an observed first-name last-name pair. This has important consequences in targeted advertising and personalization in social networks and in gathering intelligence for business and government research. We propose a parametrized typed graph framework for this problem and perform the hidden attribute

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.