TAILIEUCHUNG - Báo cáo khoa học: "Discovering Sociolinguistic Associations with Structured Sparsity"

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. | Discovering Sociolinguistic Associations with Structured Sparsity Jacob Eisenstein Noah A. Smith Eric P. Xing School of Computer Science Carnegie Mellon University PittsbUrgh PA 15213 UsA jacobeis nasmith epxing @ Abstract We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors geographic communities we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite 1 regularizes we obtain structured sparsity driving entire rows of coefficients to zero. We perform two regression studies. First we use term frequencies to predict demographic attributes our method identifies a compact set of words that are strongly associated with author demographics. Next we conjoin demographic attributes into features which we use to predict term frequencies. The composite regularizer identifies a small number of features which correspond to communities of authors united by shared demographic and linguistic properties. 1 Introduction How is language influenced by the speaker s sociocultural identity Quantitative sociolinguistics usually addresses this question through carefully crafted studies that correlate individual demographic attributes and linguistic variables for example the interaction between income and the dropped r feature of the New York accent Labov 1966 . But such studies require the knowledge to select the dropped r and the speaker s income from thousands of other possibilities. In this paper we present a method to acquire such patterns from raw data. Using multi-output regression with structured sparsity 1365 our method identifies a small subset of lexical items that are most influenced by demographics and discovers conjunctions of demographic attributes that are especially salient for lexical variation. Sociolinguistic associations are difficult to model because the space of potentially relevant

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.