TAILIEUCHUNG - Báo cáo khoa học: "Discovering Sociolinguistic Associations with Structured Sparsity"

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. | Discovering Sociolinguistic Associations with Structured Sparsity Jacob Eisenstein Noah A. Smith Eric P. Xing School of Computer Science Carnegie Mellon University PittsbUrgh PA 15213 UsA jacobeis nasmith epxing @ Abstract We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors geographic communities we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite 1 regularizes we obtain structured sparsity driving entire rows of coefficients to zero. We perform two regression studies. First we use term frequencies to predict demographic attributes our method identifies a compact set of words that are strongly associated with author demographics. Next we conjoin demographic attributes into features which we use to predict term frequencies. The composite regularizer identifies a small number of features which correspond to communities of authors united by shared demographic and linguistic properties. 1 Introduction How is language influenced by the speaker s sociocultural identity Quantitative sociolinguistics usually addresses this question through carefully crafted studies that correlate individual demographic attributes and linguistic variables for example the interaction between income and the dropped r feature of the New York accent Labov 1966 . But such studies require the knowledge to select the dropped r and the speaker s income from thousands of other possibilities. In this paper we present a method to acquire such patterns from raw data. Using multi-output regression with structured sparsity 1365 our method identifies a small subset of lexical items that are most influenced by demographics and discovers conjunctions of demographic attributes that are especially salient for lexical variation. Sociolinguistic associations are difficult to model because the space of potentially relevant

Tuyết Lâm 46 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Discovering Sociolinguistic Associations with Structured Sparsity"

10 36 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462079 59

Giới thiệu :Lập trình mã nguồn mở

14 23851 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10374 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9655 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8363 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6983 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 288 0 01-07-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 292 2 01-07-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 191 2 01-07-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 163 2 01-07-2024

ĐỀ THI THỬ ĐẠI HỌC 2009 – THPT ĐÔNG SƠN 1 – LẦN 2 – MÔN TOÁN

8 124 1 01-07-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 143 1 01-07-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 143 0 01-07-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 203 2 01-07-2024

Color Atlas of Ophthamology

165 106 0 01-07-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 128 2 01-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

Ebook Chào con ba mẹ đã sẵn sàng

112 4025 1302

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5712 1196

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3649 667

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3849 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4412 546

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4299 483