Lecture "Multimedia Databases"

4 or 5 (depending on examination rules)
Oral (30 minutes, English or German)
Regular Dates: 
Every Tuesday, 9:45 - 12:15, Starting October 22nd
IZ 161


In this course, we examine the aspects regarding building multimedia database systems and give an insight into the used techniques. The course deals with content-specific retrieval of multimedia data. Basic issue is the efficient storage and subsequent retrieval of multimedia documents.

The general structure of the course is:

  • Basic characteristics of multimedia databases
  • Evaluation of retrieval effectiveness, Precision-Recall Analysis
  • Semantic content of image-content search
  • Image representation, low-level and high-level features
  • Texture features, random-field models
  • Audio formats, sampling, metadata
  • Thematic search within music tracks
  • Query formulation in music databases
  • Media representation for video
  • Frame/Shot Detection, Event Detection
  • Video segmentation and video summarization
  • Video Indexing, MPEG-7
  • Extraction of low-and high-level features
  • Integration of features and efficient similarity comparison
  • Indexing over inverted file index, indexing Gemini, R *- trees



  Date Topic Slides Exercises Videos Comments/Literature
1 22.10.2019 Introduction Slides 1
  Video 1 BR99 (P. 1–18), Sch05 (P. 1–15), Chr85
2 29.10.2019 Color-based Retrieval Slides 2 Exercise 1 Video 2

Due to technical problems this week's video is from SS2016.

Features Introduction: CB02 (P. 261–284), Sch05 (P. 67–91), Sch05 (P. 91–96)
Color features and color histograms: CB02 (P. 285–311)
Matching of color histograms: CB02 (P. 285–311), Sch05 (S. 170–174), 
Sch05 (P. 229–231), Sch05 (P. 175–179), Smi97 SB91 HCP95 SD96

3 05.11.2019 Texture-based Image Retrieval Slides 3 Exercise 2 Video 3 Texture Features: CB02 (P. 261–284), Sch05 (P. 67–91), Sch05 (P. 91–96)
Low-Level Texture Features: CB02 (P. 313–344), Jul62JGSF73Jul75Jul81
Tamura Measure: CB02 (S. 313–344), TMY78RT71RTL72EN94
Random Field Models: CB02 (P. 313–344), Sch05 (P. 111–146)
Transform Domain Features: CB02 (P. 313–344), Woo72Bes74MJ92
4 12.11.2019 Multi-resolution Analysis Slides 4 Exercise 3 Video 4 Multiresolution Analysis: CB02 (P. 285–311), Sch05 (S. 170–174),
Sch05 (P. 229–231), Sch05 (P. 175–179), Smi97 SB91 HCP95 SD96
Form based Features: CB02 (P. 345–372)
Thresholding: RC78ZRL77MM97
Edge Detection: BL79KWT88
Morphological Operators: HSZ87
5 19.11.2019 Shape-based Features Slides 5 Exercise 4

Video 5

Chain Codes: Fre61aFre61bBG78CMVZ94
Area based Retrieval: Bar81Blu73SK05
Moment Invariants: Woo96Hu62
Query by Visual example: HK92Ege97
6 26.11.2019 Basics of Audio Retrieval Slides 6   Video 6 Introduction in Audio Retrieval: Nyq28
7 03.12.2019 Audio Retrieval 1 Slides 7 Exercise 5
Video 7 Audio Low level Features: LH98WBKW96
Difference Limen: JWG77
Pitch Recognition: Fle34Gre90Gol73Sch68Nol69KS00GR69
8 10.12.2019 Audio Retrieval 2 Slides 8   Video 8 Query by Humming: GLCS95
Melody Representation: Par75MS90KNSYK00BC94ZS03
Hidden Markov Model: Rab89
9 17.12.2019 Video Retrieval Slides 9   Video 9 Vit67BPSW70
10 07.01.2020 Shot Detection Slides 10 Exercise 6 Video 10 ZKS93Ton91IP96TD98MJC95VL00
11 14.01.2020 Video Retrieval Slides 11   Video 11 CZ03
12 21.01.2020 Video Abstraction Slides 12   Video 12 SC02PLE01RBK98
13 28.01.2020 Indexes for Multimedia Data Slides 13   Video 13

Due to technical problems this week's video is from WS 14/15.

CB02 (P. 373–434), Sch05 (P. 261–302), Gut84SRF87BKSS90BKK96CPZ97

14 04.02.2020 Indexes for Multimedia Data 2 Slides 14   Video 14 Sch05 (p. 302–308), WSB98, BGRS98




 [CB02] Vittorio Castelli and Lawrence D. Bergman, editors. Image Databases. Search and Retrieval of Digital Imagery. Wiley, 2002. [ .html ]

[BR99] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [ http ]

[Par75] Denys Parsons. The Directory of Tunes and Musical Themes. Spencer Brown, 1975.

[Sch05] Ingo Schmitt. Ähnlichkeitssuche in Multimedia-Datenbanken. Retrieval, Suchalgorithmen und Anfragebehandlung. Oldenbourg, 2005. [ http ]

[vR79] Cornelis Joost van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979. [ .html ]



Note: Many of the following documents are available for download free of charge, through the univeristy network. The ones that are not free of charge can be obtained as printed versions from the univeristy library.

[Bar81] Alan H. Barr. Superquadratics and angle-preserving transformations. IEEE Computer Graphics and Applications, 1(1):11-23, 1981. [ http ]

[Bes74] Julian Besag. Spatial interaction and the statistical analysis of lattice systems.Journal of the Royal Statistical Society, Series B (Methodological), 36(2):192-236, 1974. [ http ]

[BC94] Donald J. Berndt and James Clifford. Using dynamic time warping to find patterns in time series. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop, pages 359-370. AAAI Press, 1994.

[BDF+97] Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Jospeh M. Hellerstein, Yannis Ioannidis, Hosagrahar V. Jagadish, Theodore Johnson, Raymond Ng, Viswanath Poosala, Kenneth A. Ross, and Kenneth C. Sevcik. The new jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3-42, 1997. [ .pdf ]

[BG78] Ernesto Bribiesca and Adolfo Guzmán. Shape detection and shape similarity measurement for two-dimensional regions. In Proceedings of the 4th International Joint Conference on Pattern Recognition, pages 608-612, 1978.

[BGRS98] Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? In Catiel Beeri and Peter Buneman, editors, Proceedings of the 7th International Conference on Database Theory (ICDT 1999), volume 1540 of Lecture Notes in Computer Science, pages 217-235. Springer, 1999. [ http ]

[BKSS90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Hector Garcia-Molina and Hosagrahar V. Jagadish, editors, Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), pages 322-331. ACM Press, 1990. [ DOI ]

[BKK96] Stefan Berchtold, Daniel A. Keim, and Hans-Peter Kriegel. The X-tree: An index structure for high-dimensional data. In T. M. Vijayaraman, Alejandro P. Buchmann, Chandrasekaran Mohan, and Nandlal L. Sarda, editors, Proceedings of 22th International Conference on Very Large Data Bases (VLDB 1996), pages 28-39. Morgan Kaufmann, 1996. [ .html ]

[BL79] Serge Beucher and Christian Lantuejoul. Use of watersheds in contour detection. In Proceedings of the International Workshop on Image Processing, Real-Time Edge and Motion Detection/Estimation, 1979. [ .pdf ]

[Blu73] Harry Blum. Biological shape and visual science (part I). Journal of Theoretical Biology, 38(2):205-287, 1973. [ DOI ]

[BPSW70] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1):164-171, 1970. [ http ]

[Chr85] Stavros Christodoulakis. Multimedia data base management: Applications and problems. A position paper. In Shamkant B. Navathe, editor, Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data (SIGMOD 1985), pages 304-305, 1985. [ DOI 

[CMVZ94] Guido Cortelazzo, Gian A. Mian, G. Vezzi, and Piero Zamperoni. Trademark shapes description by string-matching techniques. Pattern Recognition, 27(8):1005-1018, 1994. [ DOI ]

[CPZ97] Paolo Ciaccia, Marco Patella, and Pavel Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, and Manfred A. Jeusfeld, editors, Proceedings of 23th International Conference on Very Large Data Bases (VLDB 1997), pages 426-435. Morgan Kaufmann, 1997. [ .html ]

[CZ03] Sen-ching Samson Cheung and Avideh Zakhor. Efficient video similarity measurement with video signature. IEEE Transactions on Circuits and Systems for Video Technology, 13(1):59-74, 2003. [ DOI ]

[DDFLH90] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990. [ DOI ]

[Ege97] Max J. Egenhofer. Query processing in spatial-query-by-sketch. Journal of Visual Languages and Computing, 8(4):403-424, 1997. [ DOI ]

[EN94] William Equitz and Wayne Niblack. Retrieving images from a database using texture. Algorithms from the QBIC system. Technical Report RJ-9805, IBM Almaden Research Center, 1994.

[Fal95] Christos Faloutsos. Fast searching by content in multimedia databases. Bulletin of the Technical Committee on Data Engineering, 18(4):31-40, 1995. [ .pdf ]

[FL95] Christos Faloutsos and King-Ip Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Michael J. Carey and Donovan A. Schneider, editors, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), pages 163-174. ACM Press, 1995. [ DOI ]

[Fle34] Harvey Fletcher. Loudness, pitch and the timbre of musical tones and their relation to the intensity, the frequency and the overtone structure. Journal of the Acoustical Society of America, 6(2):59-69, 1934. [ DOI ]

[Fre61a] Herbert Freeman. On the encoding of arbitrary geometric configurations. IRE Transactions on Electronic Computers, 10(2):260-268, 1961.

[Fre61b] Herbert Freeman. A technique for the classification and recognition of geometric patterns. In Actes du 3e Congrès International de Cybernétique, 1961.

[GLCS95] Asif Ghias, Jonathan Logan, David Chamberlin, and Brian C. Smith. Query by humming: Musical information retrieval in an audio database. InProceedings of the 3rd ACM International Conference on Multimedia (ACM MM 1995), pages 231-236. ACM Press, 1995. [ DOI ]

[Gol73] Julius L. Goldstein. An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America, 54(6):1496-1516, 1973. [ DOI ]

[GR69] Bernard Gold and Lawrence R. Rabiner. Parallel processing techniques for estimating pitch periods of speech in the time domain. Journal of the Acoustical Society of America, 46(2). [ DOI ]

[Gre90] Donald D. Greenwood. A cochlear frequency-position function for several species—29 years later. Journal of the Acoustical Society of America, 87(6):2592-2605, 1990. [ DOI ]

[Gut84] Antonin Guttman. R-trees: A dynamic index structure for spatial searching. In Beatrice Yormark, editor, Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD 1984), pages 47-57. ACM Press, 1984. [ DOI ]

[HCP95] Wynne Hsu, Tat-Seng Chua, and Hung Keng Pung. An integrated color-spatial approach to content-based image retrieval. In Proceedings of the 3rd ACM International Conference on Multimedia (ACM Multimedia 1995), pages 305-313. ACM Press, 1995. [ DOI ]

[HK92] Kyoji Hirata and Toshikazu Kato. Query by visual example—content based image retrieval. In Alain Pirotte, Claude Delobel, and Georg Gottlob, editors,Advances in Database Technology. Proceedings of the 3rd International Conference on Extending Database Technology (EDBT 1992), volume 580 of Lecture Notes in Computer Science, pages 56-71. Springer, 1992. [ DOI ]

[HSZ87] Robert M. Haralick, Stanley R. Sternberg, and Xinhua Zhuang. Image analysis using mathematical morphology. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):532-550, 1987.

[Hu62] Ming-Kuei Hu. Visual pattern recognition by moment invariants. IEEE Transactions on Information Theory, 8(2):179-187, 1962. [ http ]

[IP96] Fayez M. Idris and Sethuraman Panchanathan. Indexing of compressed video sequences. In Ishwar K. Sethi and Ramesh C. Jain, editors, Storage and Retrieval for Still Image and Video Databases IV, volume 2670 of Proceedings of SPIE, pages 247-253. SPIE, 1996. [ DOI ]

[JGSF73] Bela Julesz, Edgar N. Gilbert, Larry A. Shepp, and Harry L. Frisch. Inability of humans to discriminate between visual textures that agree in second-order statistics—revisited. Perception, 2(4):391-405, 1973. [ DOI ]

[Jul62] Bela Julesz. Visual pattern discrimination. IRE Transactions on Information Theory, 8(2):84-92, 1962. [ http ]

[Jul75] Bela Julesz. Experiments in the visual perception of texture. Scientific American, 232(4):34-43, 1975.

[Jul81] Bela Julesz. Textons, the elements of texture perception, and their interactions. Nature, 290(12):91-97, 1981. [ DOI ]

[JWG77] Walt Jesteadt, Craig C. Wier, and David M. Green. Intensity discrimination as a function of frequency and sensation level. Journal of the Acoustical Society of America, 61(1):169-177, 1977. [ DOI ]

[KS00] Hajime Kobayashi and Tetsuya Shimamura. A weighted autocorrelation method for pitch extraction of noisy speech. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), volume 3, pages 1307-1310. IEEE, 2000. [ DOI ]

[KNSYK00] Naoko Kosugi, Yuichi Nishihara, Tetsuo Sakata, Masashi Yamamuro, and Kazuhiko Kushima. A practical query-by-humming system for a large music database. In Proceedings of the 8th ACM International Conference on Multimedia (ACM MM 2000), pages 333-342. ACM Press, 2000. [ DOI ]

[KWT88] Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, 1(4):321-331, 1988. [ DOI ]

[LH98] Guojun Lu and Templar Hankinson. A technique towards automatic audio classification and retrieval. In Proceedings of the 4th International Conference on Signal Processing (ICSP 1998), volume 2, pages 1142-1145. IEEE, 1998. [ DOI ]

[Mal89] Stéphane G. Mallat. Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(12):2091-2110, 1989. [ DOI ]

[MJ92] Jianchang Mao and Anil K. Jain. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition, 25(2):173-188, 1992. [ DOI ]

[MJC95] Jianhao Meng, Yujen Juan, and Shih-Fu Chang. Scene change detection in an MPEG compressed video sequence. In Arturo A. Rodriguez, Robert J. Safranek, and Edward J. Delp, editors, Digital Video Compression: Algorithms and Technologies 1995, volume 2419 of Proceedings of SPIE, pages 14-25. SPIE, 1995. [ DOI ]

[MM97] Wei-Ying Ma and B. S. Manjunath. Edge Flow: A framework of boundary detection and image segmentation. In Proceedings of the 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 744-749. IEEE Computer Society, 1997. [ DOI ]

[MS90] Marcel Mongeau and David Sankoff. Comparison of musical sequences. Computers and the Humanities, 24(3):161-175, 1990. [ DOI ]

[Nol69] A. Michael Noll. Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum, and a maximum likelihood estimate. In Jerome Fox, editor, Proceedings of the Symposium on Computer Processing in Communications, volume 19 of Microwave Research Institute Symposia Series, pages 779-797. Polytechnic Press of the Polytechnic Institute of Brooklyn, 1969.

[Nyq28] Harry Nyquist. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47:617-644, 1928. Reprint. [ DOI ]

[PLE01] Silvia Pfeiffer, Rainer Lienhart, and Wolfgang Efflsberg. Scene determination based on video and audio features. Multimedia Tools and Applications, 15(1):59-81, 2001. [ DOI ]

[Rab89] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989. [ DOI ]

[RBK98] Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23-38, 1998. [ DOI ]

[RC78] T. W. Ridler und S. Calvard. Picture thresholding using an iterative selection method. IEEE Transactions on Systems, Man, and Cybernetics, 8(6):630-632, 1978.

[RL93] A. Ravishankar Rao and Gerald L. Lohse. Identifying high level features of texture perception. CVGIP: Graphical Models and Image Processing, 55(3):218-233, 1992. [ DOI ]

[RT71] Azriel Rosenfeld and Mark Thurston. Edge and curve detection for visual scene analysis. IEEE Transactions on Computers, 20(5):562-569, 1971. [ http ]

[RTL72] Azriel Rosenfeld, Mark Thurston, and Yung-Han Lee. Edge and curve detection: Further experiments. IEEE Transactions on Computers, 21(7):677-715, 1972. [ http ]

[SB91] Michael J. Swain and Dana H. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11-32, 1991. [ DOI ]

[SC02] Hari Sundaram and Shih-Fu Chang. Computable scenes and structures in films. IEEE Transactions on Multimedia, 4(4):482-491, 2002. [ DOI ]

[Sch68] Manfred R. Schroeder. Period histogram and product spectrum: New methods for fundamental-frequency measurement. Journal of the Acoustical Society of America, 43(4):829-834, 1968. [ DOI ]

[SD96] Markus Stricker and Alexander Dimai. Color indexing with weak spatial constraints. In Proceedings of Storage and Retrieval for Image and Video Databases IV, 1996. [ .html ]

[SK05] Thomas B. Sebastian and Benjamin B. Kimia. Curves vs. skeletons in object recognition. Signal Processing, 85(2):247-263, 2005. [ DOI ]

[Smi97] John R. Smith. Integrated Spatial and Feature Image Systems: Retrieval , Analysis and Compression. PhD thesis, Columbia University, 1997. [ .html ]

[SRF87] Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Peter M. Stocker, William Kent, and Peter Hammersley, editors, Proceedings of 13th International Conference on Very Large Data Bases (VLDB 1987), pages 507-518. Morgan Kaufmann, 1987. [ .html ]

[TD98] Cüneyt M. Taskiran and Edward J. Delp. Video scene change detection using the generalized sequence trace. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 1998), volume 5, pages 2961-2964. IEEE, 1998. [ DOI ]

[TMY78] Hideyuki Tamura, Shunji Mori, and Takashi Yamawaki. Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics, 8(6):460-473, 1978. [ DOI ]

[Ton91] Yoshinobu Tonomura. Video handling based on structured information for hypermedia systems. In International Conference on Multimedia Information Systems 1991, pages 333-344. McGraw-Hill, 1991.

[Vit67] Andrew Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260-269, 1967. [ http ]

[VL00] Nuno Vasconcelos and Andrew Lippman. Statistical models of video structure for content analysis and characterization. IEEE Transactions on Image Processing, 9(1):3-19, 2000. [ DOI ]

[WBKW96] Erling Wold, Thom Blum, Douglas Keislar, and James Wheaton. Content-based classification, search, and retrieval of audio. IEEE Multimedia, 3(3):27-36, 1996. [ DOI ]

[Woo72] John W. Woods. Two-dimensional discrete markovian fields. IEEE Transactions on Information Theory, 18(2):232-240, 1972. [ http ]

[Woo96] Jeffrey Wood. Invariant pattern recognition: A review. Pattern Recognition, 29(1):1-17, 1996. [ DOI ]

[WSB98] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Ashish Gupta, Oded Shmueli, and Jennifer Widom, editors, Proceedings of 24th International Conference on Very Large Data Bases (VLDB 1998), pages 194-205. Morgan Kaufmann, 1998. [ .html ]

[ZKS93] HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. Multimedia Systems, 1(1):10-28, 1993. [ DOI ]

[ZRL77] W. E. Rogers und S. A. Latt G. W. Zack. Automatic measurement of sister chromatid exchange frequency. The Journal of Histochemistry and Cytochemistry, 25(7):741-753, 1977. [ http ]

[ZS03] Yunyue Zhu and Dennis Shasha. Warping indexes with envelope transforms for query by humming. In Alon Y. Halevy, Zachary G. Ives, and AnHai Doan, editors, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pages 181-192. ACM Press, 2003. [ DOI ]

01-Introduction-WS1920.pdf2.97 MB
02-WS1920.pdf2.45 MB
03-WS1920.pdf2.59 MB
04-WS1920.pdf3.13 MB
05-WS1920.pdf2.28 MB
06-WS1920.pdf1.88 MB
07-WS1920.pdf2.66 MB
07-WS1920-2.pdf2.86 MB
08-WS1920-2.pdf2.38 MB
09-WS1920.pdf2.39 MB
10-WS1920.pdf9.04 MB
11-WS1920.pdf4.3 MB
12-WS1920.pdf2.31 MB
13-WS1920.pdf2.17 MB
14-WS1920.pdf1.95 MB