TAILIEUCHUNG - Issues in data mining and information retrieval
The choice of a particular combination of techniques to apply in a particular situation depends on the nature of the data mining task, the nature of the available data, and the skills and preferences of the data miner. | ISSN:2249-5789 Smitha Nayak et al, International Journal of Computer Science & Communication Networks,Vol 2(1), 93-98 Issues in Data Mining and Information Retrieval Ammar Yassir and Smitha Nayak, alfayumi@ smithank@ Department of Computing, Muscat College, Sultanate of Oman Abstract— Data mining, as we use the term, is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. For the purposes of this book, we assume that the goal of data mining is to allow a corporation to improve its marketing, sales, and customer support operations through a better understanding of its customers. Keep in mind, however, that the data mining techniques and tools described here are equally applicable in fields ranging from law enforcement to radio astronomy, medicine, and industrial process control. In fact, hardly any of the data mining algorithms were first invented with commercial applications in mind. The commercial data miner employs a grab bag of techniques borrowed from statistics, computer science, and machine learning research. The choice of a particular combination of techniques to apply in a particular situation depends on the nature of the data mining task, the nature of the available data, and the skills and preferences of the data miner. Data mining is largely concerned with building models. A model is simply an algorithm or set of rules that connects a collection of inputs (often in the form of fields in a corporate database) to a particular target or outcome. • Association of events that can be correlated. A computer purchase, for example, is likely to involve the simultaneous purchase of a printer. • Sequences as one event leads to another. Computer and printer purchase may be followed by the purchase of a scanner. • Classification through the recognition of patterns. These can be based on any relevant data—income, sales, location, or even average summer rainfall! It all depends on how you see .
đang nạp các trang xem trước