The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems

TitleThe Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems
Publication TypeConference Paper
Year of Publication2007
AuthorsDiederich, J., and W. - T. Balke
Conference Name11th European Conference on Research and Advanced Technology for Digital Libraries (ECDL)
Conference LocationBudapest, Hungary

Using keyword search to find relevant objects in digital libraries often results in way too large result sets. Based on the metadata associated with such objects, the faceted search paradigm allows users to structure and filter the result set, for example, using a publication type facet to show only books or videos. These facets usually focus on clear-cut characteristics of digital items, however it is very difficult to also organize the actual semantic content information into such a facet. The Semantic GrowBag approach, presented in this paper, uses the keywords provided by many authors of digital objects to automatically create light-weight topic categorization systems as a basis for a meaningful and dynamically adaptable topic facet. Using such emergent semantics enables an alternative way to filter large result sets according to the objects’ content without the need to manually classify all objects with respect to a pre-specified vocabulary. We present the details of our algorithm using the DBLP collection of computer science documents and show some experimental evidence about the quality of the achieved results.

ecdl07.pdf240.17 KB