Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We investigate whether wording, stylistic choices, and online behavior can be used to predict the age category of blog authors. Our hypothesis is that significant changes in writing style distinguish pre-social media bloggers from post-social media bloggers. Through experimentation with a range of years, we found that the birth dates of students in college at the time when social media such as AIM, SMS text messaging, MySpace and Facebook first became popular, enable accurate age prediction. . | Age Prediction in Blogs A Study of Style Content and Online Behavior in Pre- and Post-Social Media Generations Sara Rosenthal Department of Computer Science Columbia University New York NY 10027 UsA sara@cs.columbia.edu Kathleen McKeown Department of Computer Science Columbia University New York NY 10027 UsA kathy@cs.columbia.edu Abstract We investigate whether wording stylistic choices and online behavior can be used to predict the age category of blog authors. Our hypothesis is that significant changes in writing style distinguish pre-social media bloggers from post-social media bloggers. Through experimentation with a range of years we found that the birth dates of students in college at the time when social media such as AIM SMS text messaging MySpace and Facebook first became popular enable accurate age prediction. We also show that internet writing characteristics are important features for age prediction but that lexical content is also needed to produce significantly more accurate results. Our best results allow for 81.57 accuracy. 1 Introduction The evolution of the internet has changed the way that people communicate. The introduction of instant messaging forums social networking and blogs has made it possible for people of every age to become authors. The users of these social media platforms have created their own form of unstructured writing that is best characterized as informal. Even how people communicate has dramatically changed with multitasking increasing and responses generated immediately. We should be able to exploit those differences to automatically determine from blog posts whether an author is part of a pre- or post-763 social media generation. This problem is called age prediction and raises two main questions Is there a point in time that proves to be a significantly better dividing line between pre and post-social media generations What features of communication most directly reveal the generation in which a blogger was born We .