Project Details
Projekt Print View

dCortools: Distance Correlation Methods for Detecting Nonlinear Associations in High-Dimensional Molecular Data

Subject Area Medical Informatics and Medical Bioinformatics
Term from 2019 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 417754611
 
Virtually all methods that are currently used for testing associations in high-dimensional molecular data can only detect linear or monotone associations. This concerns both tests for the association between different molecular variables (e.g. gene-gene-interactions) and tests for the association between molecular and clinical variables (e.g. gene-environment-interactions).However, it is known that many biological relations are more complex, including nonmonotone or even nonfunctional dependencies. Distance correlation is a novel dependence measure that can detect every kind of dependence between random vectors of arbitrary dimensions. Moreover, the distance correlation coefficient is very easy to compute, which predestines it for the application in statistical practice. In spite of these convincing properties, there are hitherto only few applications of the distance correlation coefficient on high-dimensional molecular data. This is due to missing methodology for biostatistical problems on the one hand and to a lack of application-oriented software on the other hand. The goal of this project is to close this gap. In the first part of the project, we plan to develop distance correlation methodology for biomedical applications. First, we aim to derive iterative variable selection procedures that are much more efficient than univariate procedures under the assumption of strong correlation structures, which are typically present in molecular data. Moreover, we propose to extend the distance correlation coefficient to survival data, which are particularly important in cancer research.For the second part of the project we plan to create a user friendly R package that combines distance correlation methods that are useful for biostatistics and hence allows the application of this methodology for the practitioner. The techniques developed in the first part of the project will be important components of this R package. Finally, we propose to apply the R package on a data set from the DACHS study, consisting of epigenome-wide methylation data, epidemiological and clinical data for more than 2000 patients with colorectal cancer.We are confident that the planned project will lead to a considerable increase of the use of distance correlation methodology in biostatistical practice. For molecular data, this will allow to detect complex associations that would be missed if linear procedures were used. This in turn may lead to a better understanding of biological processes.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung