Project Details
Projekt Print View

CHORUS - Top-k Composition of Browsing Scripts for efficient social-enabled Usage of Web-based Services

Applicant Dr. Sudhir Agarwal
Subject Area Software Engineering and Programming Languages
Term from 2013 to 2015
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 241316025
 
Final Report Year 2019

Final Report Abstract

Oftentimes, users need to quickly integrate and explore the data in an ad hoc manner from multiple such sources to perform planning tasks, make data-driven decisions, verify or falsify hypotheses, or gain entirely new insights. The data can be public or private as well as structured or semi-structured. In particular, deep web pages can also contain useful personal data. We have presented a novel approach for enabling end users to extract data from any web page, structure it, and store it locally. This is already useful since it allows structured search over the visited information at a later stage without needing to visit the same pages again. We have also presented how end users can integrate the extracted data with the help of Datalog rules and how queries over the unified view of the data can be answered. The case of web pages that have semantic data available is a special case in which the extraction can be done automatically, cleaning becomes almost obsolete, and integration step is much easier. The proposed approach has been implemented and evaluated as a browser plugin. Our proposed formalization of cleaning steps and integration rules enables their reuse, and thus accelerates the cleaning and integration of extracted data. We presented Dexter, a tool to empower users to expressively explore in ad hoc manner structured data from various sources such as local files, Web-APIs and databases. Dexter stores user’s data locally inside his or her browser thus ensuring that users can combine their private and confidential data with public data without compromising on their privacy. Dexter is also equipped with a client-side parallel algorithm for efficient computation of answers to queries that require data from multiple, possibly remote, sources. Finally, we presented Jabberwocky to address the gap of a tool to enable individuals to make use of the vast amount of data released under open data provision by more and more companies and organizations. Jabberwocky integrates structured data from authoritative data sources as a triples-based conceptual model and can be operated completely through any popular web browser. Jabberwocky enables its users to browse the integrated open data as a graph, supports highly expressive queries as well as provide modern styles and visualizations for different types of objects in a flexible manner. In order to ensure adequate data exploration performance and high interactivity, Jabberwocky employs novel hybrid (server and client side) data caching techniques. An element of surprise was that almost everyone realized what all they can’t do with popular search engines once they saw what they could do with Dexter and Jabberwocky.

Publications

  • Extraction and integration of web data by end-users. CIKM 2013: 2405-2410
    Sudhir Agarwal, Michael R. Genesereth
    (See online at https://doi.org/10.1145/2505515.2505635)
  • Dexter: Plugging-n-Playing with Data Sources in Your Browser AAAI Workshop on Semantic Cities, 2014
    Abhijeet Mohapatra, Sudhir Agarwal, and Michael Genesereth
  • Rule-Based Exploration of Structured Data in the Browser. RuleML 2015: 161-175
    Sudhir Agarwal, Abhijeet Mohapatra, Michael R. Genesereth, Harold Boley
    (See online at https://doi.org/10.1007/978-3-319-21542-6_11)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung