Project Details
Active and batch Segmentation, Clustering, and seriation: toward unified foundations in AI
Subject Area
Mathematics
Theoretical Computer Science
Theoretical Computer Science
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 490860858
Unsupervised Learning is one of the most fundamental problem of machine learning, and more generally, of artificial intelligence. In a broad sense, it amounts to learning some unobserved latent structure over data. This structure may be of interest per se, or may serve as an important stepping stone integrated in a complex data analysis pipe-line. Arguably, one the cornerstones of unsupervised learning is clustering, where the aim is to recover a partition of the data into homogeneous groups. Beside vanilla clustering, unsupervised learning encompasses a large variety of related problems such as hierarchical clustering, where the group structure is more complex and reveals both the backbone and fine-grain organization of the data, segmentation where the shape of the clusters is constrained by side information, or ranking or seriation problems where where no actual cluster structure exists, but where there is some implicit ordering between the data. All these problems have already found countless applications and interest in these methods is strengthening due to the amount of available unlabelled data. We can e.g. cite crowdsourcing - where individuals answer to a subset of questions, and where one might want to e.g. cluster them depending on their field of expertise, rank them depending on their performances, or seriate them depending on their affinities. Such problems are extremely relevant for recommender systems and for network social analyses.The analysis of unsupervised learning procedures has a long history that takes its roots both in the computer science and in mathematical communities. In response to recent bridges between these two communities, groundbreaking advances have been made in the theoretical foundations of vanilla clustering. We believe that these recent advances hold the key for deep impacts on the broader field of unsupervised learning because of the pervasive nature of clustering. In this proposal, we first aim at propagating these recent ground-breaking advances in vanilla clustering to problems where the latent structure is either more complex or more constrained.Moreover, many modern unsupervised learning applications are essentially of an online nature - and sometimes decisions have to be made sequentially on top of that. For instance, consider a recommender systems that sequentially recommends items to users. In this context where sequential, active recommendations are made, it is important to leverage the latent structure underlying the individuals. While both the fields of unsupervised learning, and sequential, active learning, are thriving, research at the crossroad has been conducted mostly separately by each community - leading to procedures that can be improved. A second aim of this proposal will then be to bring together the fields of unsupervised learning and active learning, in order to propose new algorithms that are more efficient at leveraging sequentially the unknown latent structure.
DFG Programme
Research Grants
International Connection
France
Cooperation Partner
Professor Nicolas Verzelen, Ph.D.