Project Details
Bridging biology and paleontology – a novel combined machine-learning approach to species delimitation
Applicant
Dr. Thomas A. Neubauer
Subject Area
Palaeontology
Bioinformatics and Theoretical Biology
Bioinformatics and Theoretical Biology
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 538733775
After more than 250 years after the definition of the first species by Linnaeus, there is still disagreement about how to define and delimit species. This discrepancy is especially evident when comparing fossil and living species, for which typically different types and amounts of information are available, which has led to a multitude of different species concepts over the centuries. To compare past, present and predicted rates of biodiversity turnover, reconstruct biogeographic patterns and infer evolutionary processes realistically, a standardized species classification system is needed that deals with fossils and living taxa equally. Recent developments in machine learning (ML) and image recognition provide a unique opportunity to jointly delimit fossil and recent species in a modern analytical framework. We will develop a novel ML approach that uses image data for recent species, for which species boundaries were established based on molecular data, to allow for a standardized and unified species delimitation of related fossil species. We will use Siamese Convolutional Neural Networks, which require relatively few images to learn similarities, can be applied to unlabeled data without re-training and can even deal with unknown classes. As model group we will use freshwater gastropods of the family Viviparidae, where the recent and fossil target groups encompass a comparable, high morphological plasticity that has caused taxonomic confusion in the past. In addition, we will i) assess the value of a ML-based species delimitation system versus traditional taxonomy by carrying out independent delimitations made by taxonomists, ii) test the limits of the species delimitation system with regard to different types and degrees of fossil preservation, iii) assess the level of detail that is required to reliably delimit fossil species, by feeding the system images of variable quality, including simple drawings from the literature, and finally iv) apply the newly inferred species boundaries to reconstruct accurate biodiversity patterns and estimate diversification processes for the fossil species group. Our new machine-learning-derived approach will be widely usable across different taxonomic groups and form an important starting point for making species comparable through space and time. A standardized species delimitation system that is applicable to extant and extinct species is imperative to compare pathways of turnover events and biodiversity crises throughout geological time and finally provide more realistic outlooks on the Anthropocene Biodiversity Crisis.
DFG Programme
Research Grants