Title | Efficient Combination of Ranked Result Sets in Multi-Feature Applications |
Publication Type | Thesis |
Year of Publication | 2001 |
Authors | Balke, W. - T. |
Academic Department | Institut für Informatik |
Degree | Ph.D. |
University | Universität Augsburg |
City | Augsburg, Germany |
Abstract | Applications like multimedia databases or enterprise-wide information management systems have to meet the challenge of efficiently retrieving best matching objects from vast collections of data. For instance in image retrieval queries can be based on the similarity of objects, using several feature attributes like shape, texture, color or text. Such multi-feature queries return a ranked result set instead of exact matches. Besides, the user wants to see only the k top-ranked objects. In the recent years combining algorithms have been proposed to cope with this essentially different retrieval model. Generally speaking, we distinguish three environments for the combination of ranked results. In homogeneous environments the various features are used on a set of objects that can be identified by a common key. The quasi-homogeneous environment uses features on different collections of data that share some common, standardized attributes. The last and rather rare case are heterogeneous environments, where objects from different collections have to be compared using a complex function. We present a new combining algorithm called Quick-Combine for combining multi-feature result lists in (quasi-) homogeneous environments, guaranteeing the correct retrieval of the k top-ranked results. For score aggregation virtually any combining function can be used, including weighted queries. Compared to common algorithms we have developed an improved termination condition in tuned combination with a heuristic control flow adopting itself narrowly to the particular score distribution. Top-ranked results can be computed and output incrementally. We show that we can dramatically improve performance, in particular for non-uniform score distributions. Benchmarks on practical data indicate efficiency gains by a factor of 30. For very skewed data observed speed-up factors are even larger. These performance results scale through different database sizes and numbers of result sets to combine. Also for heterogeneous environments we present an innovative algorithm called Stream-Combine for processing multi-feature queries on heterogeneous data sources. This algorithm can guarantee the correct retrieval of the k top-ranked results without |
Attachment | Size |
---|---|
balke01.pdf | 1.04 MB |