Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires “reading between the lines”. Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. | Learning to Read Between the Lines using Bayesian Logic Programs Sindhu Raghavan Raymond J. Mooney Hyeonseo Ku Department of Computer Science The University of Texas at Austin 1616 Guadalupe Suite 2.408 Austin TX 78701 USA sindhu mooney yorq @cs.utexas.edu Abstract Most information extraction IE systems identify facts that are explicitly stated in text. However in natural language some facts are implicit and identifying them requires reading between the lines . Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. We propose an approach that uses Bayesian Logic Programs BLPs a statistical relational model combining first-order logic and Bayesian networks to infer additional implicit information from extracted facts. It involves learning uncertain commonsense knowledge in the form of probabilistic first-order rules from natural language text by mining a large corpus of automatically extracted facts. These rules are then used to derive additional facts from extracted information using BLP inference. Experimental evaluation on a benchmark data set for machine reading demonstrates the efficacy of our approach. 1 Introduction The task of information extraction IE involves automatic extraction of typed entities and relations from unstructured text. IE systems Cowie and Lehnert 1996 Sarawagi 2008 are trained to extract facts that are stated explicitly in text. However some facts are implicit and human readers naturally read between the lines and infer them from the stated facts using commonsense knowledge. Answering many queries can require inferring such implicitly stated facts. Consider the text Barack Obama is the 349 president of the United States of America. Given the query Barack Obama is a citizen of what country standard IE systems cannot identify the answer since citizenship is not explicitly stated in the text. However a human reader possesses the commonsense knowledge that the president of a .