Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems. . | Should we Translate the Documents or the Queries in Cross-language Information Retrieval J. Scott McCarley IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights NY 10598 jsmc@watson.ibm.com Abstract Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French incorporating both translations directions into both document translation and query translation-based information retrieval as well as into hybrid systems. We find that hybrids of document and query translation-based systems outperform query translation systems even human-quality query translation systems. 1 Introduction Should we translate the documents or the queries in cross-language information retrieval The question is more subtle than the implied two alternatives. The need for translation has itself been questioned although non-translation based methods of cross-language information retrieval CLIR such as cognate-matching Buckley et al. 1998 and cross-language Latent Semantic Indexing Dumais et al. 1997 have been developed the most common approaches have involved coupling information retrieval IR with machine translation MT . For convenience we refer to dictionary-lookup techniques and interlingua Diekema et al. 1999 as translation even if these techniques make no attempt to produce coherent or sensibly-ordered language this distinction is important in other areas but a stream of words is adequate for IR. Translating the documents into the query s language s and translating the queries into the document s language s represent two extreme approaches to coupling MT and IR. These two approaches are neither equivalent nor mutually exclusive. They are not equivalent because machine translation is not an .