Project Details
Structured explainability for interactions in deep learning models applied to pathogen phenotype prediction
Subject Area
Medical Informatics and Medical Bioinformatics
Epidemiology and Medical Biometry/Statistics
Statistics and Econometrics
Epidemiology and Medical Biometry/Statistics
Statistics and Econometrics
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 459422098
Explaining and understanding the underlying interactions of genomic regions are crucial for proper pathogen phenotype characterization such as predicting the virulence of an organism or the resistance to drugs. Existing methods for classifying the underlying large-scale data of genome sequences face challenges with regard to explainability due to the high dimensionality of data, making it difficult to visualize, access and justify classification decisions. This is particularly the case in the presence of interactions, such as of genomic regions. To address these challenges, we will develop methods for variable selection and structured explainability that capture the interactions of important input variables: More specifically, we address these challenges (i) within a deep mixed models framework for binary outcomes fusing generalized linear mixed models and a deep variant of structured predictors. We thereby combine statistical logistic regression models with deep learning for disentangling complex interactions in genomic data. We particularly enable estimation when no explicitly formulated inputs are available for the models, as for instance relevant with genomics data. Further, (ii), we will extend methods for explainability of classification decisions such as layerwise relevance propagation to explain these interactions. Investigating these two complementary approaches on both the model and explainability levels, it is our main objective to formulate and postulate structured explanations that not only give first-order, single variable explanations of classification decisions, but also regard their interactions. While our methods are motivated by our genomic data, they can be useful and extended to other application areas in which interactions are of interest.
DFG Programme
Research Units