Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Keyword Search in Databases- P13:Conceptually, a database can be viewed as a data graph GD(V ,E), where V represents a set of objects, and E represents a set of connections between objects. In this book, we concentrate on two kinds of databases, a relational database (RDB) and an XML database. In an RDB, an object is a tuple that consists of many attribute values where some attribute values are strings or full-text; there is a connection between two objects if there exists at least one reference from one to the other | 3.3. STEINER TREE-BASED KEYWORD SEARCH 59 T kl 62 T2 k2 k 4 Figure 3.5 Serialize The theorem also specifies the delay in terms of the running time of Q-subtree . Recall that n and m are the number of nodes and edges of Gd respectively l is the number of keywords and ni is the number of nodes in the i-th output tree. Note that there are at most 2m trees i.e. i 2m. Theorem3.5 Kimelfeld and Sagiv 2006b Consider a data graph Gd andaquery Q ki ki . EnumTreePD enumerates all the Q-SUBTREEs ofGd in the rank order ifQ-SUBTREE returns an optimal tree. If Q-SUBTREE returns a 0 -approximation of optimal tree then EnumTreePD enumerates in a 0 -approximate ranked order. If Q-subtree terminates in time t n m l then EnumTreePD outputs the i 1 -th answer with delay O ni t n m l log n i nf . The task of enumerating Q-subtrees is transformed into finding an optimal Q-subtree under a set of constraints which are specified as inclusion edges I and exclusion edges E. The constraints specified by exclusion edges can be handled easily by removing those edges in E from the data graph Gd . So in the following we only consider the inclusion edges I recall that it is the set of edges that each answer in the subspace should contain. A partial tree PT is any directed subtree of Gd . A set of PTs P is called a set of PT constraints if the PTs in P are pairwise node-disjoint. The set of leaves in the PTs of P is denoted as leaves P . Proposition 3.6 Kimelfeld and Sagiv 2006b The algorithm Q-SUBTREE can be executed efficiently so that for every generated set of inclusion edges I the subgraph of Gd induced by I forms a set of PT constrains P such that leaves P C ki kl and P has at most two PTs. Serialize function at line 7 of EnumTreePD is used to order the set of edges such that the newly generated inclusion edges satisfy the above proposition i.e. P 2. The general idea 60 3. GRAPH-BASED KEYWORD SEARCH Algorithm 20 SuperTree G T Input a data graph G and a set of PT constraints T. Output a minimum