TopCrowd – Efficient Crowd-enabled Top-k Retrieval on Incomplete Data

TitleTopCrowd – Efficient Crowd-enabled Top-k Retrieval on Incomplete Data
Publication TypeConference Paper
Year of Publication2014
AuthorsNieke, C., U. Güntzer, and W. - T. Balke
Refereed DesignationDoes Not Apply
Conference Name33rd Int. Conf. on Conceptual Modeling (ER)
Date Published10/2014
Conference LocationAtlanta, GA, USA

Abstract. Building databases and information systems over data extracted from heterogeneous sources like the Web poses a severe challenge: most data is in- complete and thus difficult to process in structured queries. This is especially true for sophisticated query techniques like Top-k querying where rankings are aggregated over several sources. The intelligent combination of efficient data processing algorithms with crowdsourced database operators promises to alle- viate the situation. Yet the scalability of such combined processing is doubtful. We present TopCrowd, a novel crowd-enabled Top-k query processing algo- rithm that works effectively on incomplete data, while tightly controlling query processing costs in terms of response time and money spent for crowdsourcing. TopCrowd features probabilistic pruning rules for drastically reduced numbers of crowd accesses (up to 95%), while effectively balancing querying costs and result correctness. Extensive experiments show the benefit of our technique.

2014_ER_TopCrowd.pdf563.87 KB