Project Details
Differentiable Alignment Techniques for Music Information Retrieval
Applicant
Professor Dr. Meinard Müller
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 521420645
The research field known as Music Information Retrieval (MIR) aims to develop computational tools that allow users to find, organize, analyze, and interact with music in all its different forms and facets. From a multimedia perspective, music is a challenging domain due to the many time-dependent musical concepts such as melody, harmony, pitch and instrumentation activity, loudness, rhythm, and lyrics. Given data-driven deep learning approaches to capture these concepts, one typically requires fine-grained target annotations that reflect the local properties of the underlying music recordings. However, such strongly aligned (frame-level) annotations are generally difficult to obtain or generate. Recent years have seen major advances in general time series analysis by developing differentiable alignment techniques that can be used in loss functions for deep learning pipelines. Since the alignment process can then be part of the differentiable model, such techniques make it possible to train a neural network based on weakly aligned target annotations where only global correspondences need to be known. In this project, our primary goal is to adapt, explore, and develop differentiable alignment techniques in the context of challenging music analysis and retrieval applications. Building upon recently proposed differentiable versions of dynamic time warping, we will systematically study efficiency and approximation properties from a theoretical and practical perspective. Furthermore, we will investigate the role of temporal constraints to better handle confounding factors and improve the explainability of models and learned representations. From an MIR perspective, we want to achieve substantial advances in analyzing music signals by exploiting weakly annotated training data. To this end, we will consider concrete MIR tasks with many yet unsolved problems, including multi-pitch estimation, cross-version music retrieval, and score-audio matching of musical patterns such as themes and leitmotifs. In summary, while making substantial progress for various MIR tasks, we want to gain a better understanding and advance research of modern alignment techniques using music as a challenging multimedia domain.
DFG Programme
Research Grants