Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. | Fine-Grained Class Label Markup of Search Queries Joseph Reisinger Department of Computer Sciences The University of Texas at Austin Austin Texas 78712 joeraii@cs.utexas.edu Marius Pa ca Google Inc. 1600 Amphitheatre Parkway Mountain View California 94043 mars@google.com Abstract We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation CLC model admits a robust parallel approximation allowing it to scale to large amounts of query data. We demonstrate its performance in terms of 1 its predicted label accuracy on polysemous queries and 2 its ability to accurately chunk queries into base constituents. 1 Introduction Search queries are generally short and rarely contain much explicit syntax making query understanding a purely semantic endeavor. Furthermore as in nounphrase understanding shallow lexical semantics is often irrelevant or misleading e.g. the query tropical breeze cleaners has little to do with island vacations nor are desert birds relevant to 1970 road runner which refers to a car model. This paper introduces class-label correlation CLC a novel unsupervised approach to extract- Contributions made during an internship at Google. 1200 ing shallow semantic content that combines classbased semantic markup e.g. road runner is a car model with a latent variable model for capturing weakly compositional interactions between query constituents. Constituents are tagged with IsA class labels from a large automatically extracted .