Learning Table Similarity Measures

Applicant Professor Dr. Ulf Leser

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics

Term from 2017 to 2021

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 388146305

Project Description

Tables are an efficient and popular mean to embed structured pieces of information in unstructured texts, such as reports, publications, or web pages. However, the particular properties of tables (two-dimensional structure, headers, semantic homogeneity in columns or rows, ...) are disregarded in typical retrieval methods. On the other hand, directly finding tables matching a given search criterion would offer fast access to a wealth of structured information. One way to achieve such functionality is table similarity search: Given a query table, find the most similar tables in a given table corpus. In this project, we will research methods to learn high-quality tablesimilarity measures as fundamental pieces of table similarity search methods, but also for other applications such as table information extraction, table clustering, or table fusion. In particular, we will study deep learning methods for designing supervised table similarity measures with the objectives of 1) automatic identification of table orientation, 2) learning appropriate table representations at multiple levels of abstraction, and 3) merging these representations into a single table similarity score. All methods will be evaluated on a gold standard annotated table corpus and compared to different state-of-the-art methods. All corpora and software will be published under a permissive open access license.

DFG Programme Research Grants

Servicenavigation

Hauptnavigation

Learning Table Similarity Measures

Additional Information

Servicenavigation

Hauptnavigation

Learning Table Similarity Measures

Additional Information

Textvergrößerung und Kontrastanpassung