Alternative access paths to literature beyond mere keyword or biblio-
graphic search are a major success factor in today’s digital libraries. Especially
in the sciences, users are in dire need of complex knowledge spaces and facetta-
tions where entities like e.g., chemical substances, genes, or mathematical for-
mulae may play a central role. However, even for clear-cut entities the require-
ments in terms of contextualized similarities or rankings may strongly differ. In
this paper, we show how deep learning techniques used on scientific corpora lead
to a strongly contextualized description of entities. As application case we take
pharmaceutical entities in the form of small molecules and demonstrate how their
learned contexts and profiles reflect their actual use as well as possible new uses,
e.g., for drug design or repurposing. As our evaluation shows, the results gained
are quite comparable to expensive manually maintained classifications in the
field. Since our techniques only rely on deep embeddings of textual documents,
our methodology promises to be generalizable to other use cases, too.
|