Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Keyword Search in Databases- P27:Conceptually, a database can be viewed as a data graph GD(V ,E), where V represents a set of objects, and E represents a set of connections between objects. In this book, we concentrate on two kinds of databases, a relational database (RDB) and an XML database. In an RDB, an object is a tuple that consists of many attribute values where some attribute values are strings or full-text; there is a connection between two objects if there exists at least one reference from one to the other | 5.3. VARIATIONS OF KEYWORD SEARCH ON DATABASES 129 maybe empty . For example for the keyword query Q author number paper XML one of the possible CIs is C author.TID paper.TID paper.title contains XML a paper.TID F count w author.TID . A CI is trivial if one of the following is satisfied 1 C contains a attribute c a such that c functionally determines a or 2 C contains two attributes Ci and Cj that refer to the same attribute or Ci is a foreign key of Cj .The set of non-trivial CI s can be enumerated by using the full text index enabled in rdbms. After enumerating all non-trivial CIs for each CI C a F w it enumerates a set of Simple Query Networks SQN where each SQN is a connected subgraph of the schema graph that satisfies the following conditions Total - All tables in C are contained in the SQN. Minimal - It is not total if any node is removed from SQN. Node Clarity - Each node in SQN has at most one incoming edge. Suppose the cost of a SQN is the summation of all edge costs and node costs. For each CI it needs to get the SQN with the smallest cost which is a NP-Complete problem. A heuristic greedy algorithm is proposed in SQAK. For a CI C a F w it starts at the table o that contains the attribute a. For each of the other tables nodes v e C it finds the shortest path from v to o in a backtrack manner. If after adding the path from v to o the node clarity condition is violated it backtracks to find the next shortest path from v to o until all nodes in C are successfully added. It then outputs the current result to be a good SQN for the CI. After finding the SQN for each CI it can get the top-k SQNs with the smallest cost. And each of the top-k SQNs is translated into an SQL to be output. 5.3.5 SMALL DATABASE AS RESULT Précis Koutrika et al. 2006 Simitsis et al. 2008 returns a small database that contains only the tuples relevant to a given keyword query Q. The schema of a relational database D is modeled as a weighted graph Gs V E where each relation is modeled as