Project Details
One comprehensive pipeline for single cell RNA-seq data: Analysis, Experimental Design and Variance Quantification
Applicant
Privatdozentin Dr. Ines Hellmann
Subject Area
Bioinformatics and Theoretical Biology
Term
since 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 407541155
Single cell RNA-seq (scRNA-seq) has become an ubiquitous and central method in biomedical research. Large cell atlas projects serve as reference data sets and commercial providers enable non-specialists to apply this powerful technology to their specific research question. This development generates the need to also make the necessary computational analysis methods more accessible. These include better tools for making experimental design choices, an integrated pipeline that considers recent benchmarking studies and a better quantification of gene expression variance. We believe that the combination of state of the art analysis pipelines with realistic simulations and more detailed quantitative analysis of gene expression variance is the way forward to a more confident interpretation of cluster and trajectory analysis results from scRNA-seq data.Here, we propose to integrate our previously developed software tools zUMIs and powsimR with other state of the art methods for differential gene expression analysis, data integration, clustering, trajectory analysis and marker gene detection. Most of the methods that we intend to assemble into one comprehensive pipeline were already benchmarked by us during the previous funding period or were evaluated by other groups. Moreover, the integration of an analysis pipeline with a simulation tool (powsimR) will ensure that users have continued control over the performance given the data and the task. In addition, the provided detailed performance metrics will be instrumental for experimental design. This is of particular importance for complex scRNA-seq experiments consisting of many cell types with variable frequencies. Moreover, powsimR can also incorporate batch effects, as they will occur if data are analysed in the context of pilot or reference data.For the tasks described above, appropriate computational methods exist, one exception being the quantitative analysis of changes in gene expression variance at the gene level. Existing methods are too simplistic given the complexity of most scRNA-seq data sets. Here, we plan to make use of double generalized linear models to accommodate complex structures and at the same time disentangle mean and variance shifts. Such an analysis can be interpreted as a measure of stabilizing selection on gene expression, which in extension will help to improve our understanding of cell types, states and their transitions.
DFG Programme
Research Grants