TAILIEUCHUNG - Báo cáo khoa học: "Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora"

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. | Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora Xiaoyin Wang1 2 David Lo1 Jing Jiang1 Lu Zhang2 Hong Mei2 1School of Information Systems Singapore Management University Singapore 178902 xywang davidlo jingjiang @ 2Key Laboratory of High Confidence Software Technologies Peking University Ministry of Education Beijing 100871 China zhanglu meih @ Abstract In this paper we study the problem of extracting technical paraphrases from a parallel software corpus namely a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by up to 58 . 1 Introduction Using natural language processing NLP techniques to mine software corpora such as code comments and bug reports to assist software engineering SE is an emerging and promising research direction Wang et al. 2008 Tan et al. 2007 . Paraphrase extraction is one of the fundamental problems that have not been addressed in this area. It has many applications including software ontology construction and query expansion for retrieving relevant technical documents. In this paper we study automatic paraphrase extraction from a large collection of software bug reports. Most large software projects have bug tracking systems . Bugzilla1 to help global users to describe and report the bugs they encounter when using the software. However since the same bug may be seen by many users many duplicate bug reports are sent to bug tracking systems. The duplicate bug reports are manually tagged and associated to the original bug report by either the system manager or software developers. These families of duplicate bug reports form a .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.