Project Details
Projekt Print View

Statistical Learning from Dependent Data: Learning Theory, Robust Algorithms, and Applications

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2015 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 266702577
 
Final Report Year 2024

Final Report Abstract

In the rapidly advancing field of machine learning, our research addresses a critical challenge that arises when analyzing real-world data: not all data we encounter are independent. Much of the data in fields such as bioinformatics or computer security show patterns of dependency. Observations are linked to each other through time, space, or external conditions. This interconnectedness can skew the results of traditional machine learning models, which typically assume that data points are independent of one another. We developed a comprehensive approach to statistical learning that accounts for these dependencies. Our work enables more accurate predictions and insights across various scientific and technological domains. Moreover, we’ve integrated methods to automatically interpret the outcomes of these models, making it easier for experts in different fields to utilize our findings in their own research or applications. A significant application of our research has been in genetic association studies, where improperly handled data dependencies can easily invalidate entire studies. Another focal point of our investigation has been to lay a solid theoretical foundation for understanding how learning from dependent data operates. This understanding ensures that the algorithms we develop are not just innovative but also reliable under various conditions. We shared our findings at top-tier research conferences and scientific journals. We also released open-source implementations of our algorithms so that researchers and practitioners can utilize and build on top of our work. Our work represents a significant step forward in making machine learning more adaptable and effective in facing complex, real-world data challenges. By addressing the intricacies of dependent data, we’re opening up new avenues for discoveries and advancements across various disciplines.

Publications

  • “Machine learning with interdependent and nonidentically distributed data (dagstuhl seminar 15152)”. In: Dagstuhl Reports. Vol. 5. 4. Schloss Dagstuhl- Leibniz-Zentrum fuer Informatik. 2015
    T. Darrell; M. Kloft; M. Pontil; G. Rätsch & E. Rodner
  • “Multi-class svms: From tighter data-dependent generalization bounds to novel algorithms”. In: Advances in neural information processing systems 28 (2015)
    Y. Lei; Ü. Dogan; A. Binder & M. Kloft
  • Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Scientific Reports, 6(1).
    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi & Müller, Klaus-Robert
  • Sparse probit linear mixed model. Machine Learning, 106(9-10), 1621-1642.
    Mandt, Stephan; Wenzel, Florian; Nakajima, Shinichi; Cunningham, John; Lippert, Christoph & Kloft, Marius
  • “Local rademacher complexity based learning guarantees for multi-task learning”. In: The Journal of Machine Learning Research 19.1 (2018), pp. 1385–1431
    N. Yousefi; Y. Lei; M. Kloft; M. Mollaghasemi & G. C. Anagnostopoulos
  • “Scalable generalized dynamic topic models”. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2018, pp. 1427–1435
    P. Jahnichen; F. Wenzel; M. Kloft & S. Mandt
  • Data-Dependent Generalization Bounds for Multi-Class Classification. IEEE Transactions on Information Theory, 65(5), 2995-3021.
    Lei, Yunwen; Dogan, Urun; Zhou, Ding-Xuan & Kloft, Marius
  • “Extreme classification (dagstuhl seminar 18291)”. In: Dagstuhl Reports. Vol. 8. 7. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 2019
    S. Bengio; K. Dembczynski; T. Joachims; M. Kloft & M. Varma
  • “Two-sample testing using deep learning”. In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020, pp. 1387–1398
    M. Kirchler; S. Khorasani; M. Kloft & C. Lippert
  • transferGWAS: GWAS of images using deep transfer learning. Bioinformatics, 38(14), 3621-3628.
    Kirchler, Matthias; Konigorski, Stefan; Norden, Matthias; Meltendorf, Christian; Kloft, Marius; Schurmann, Claudia & Lippert, Christoph
  • “Training normalizing flows from dependent data”. In: International Conference on Machine Learning. PMLR. 2023, pp. 17105–17121
    M. Kirchler; C. Lippert & M. Kloft
  • “Zero-Shot Anomaly Detection via Batch Normalization”. In: Thirty-seventh Conference on Neural Information Processing Systems. 2023
    A. Li; C. Qiu; M. Kloft; P. Smyth; M. Rudolph & S. Mandt
 
 

Additional Information

Textvergrößerung und Kontrastanpassung