Project Details
Projekt Print View

Uncertainty and Data Cleansing in the Stratosphere Cloud Data Management System

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Term from 2010 to 2015
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 132320961
 
The problem of poor data quality, in the form of uncertain, incomplete, and inconsistent data, permeates data management. Complex data management systems, such as Stratosphere, that support various applications and use cases including text extraction and data integration must be able to handle the intrinsic uncertainty and often poor quality of source data and computed results. Within Stratosphere we will develop methods to detect and represent uncertain data, erroneous data, inconsistent data, and incomplete data. The representation will not be limited to base data, but, more interestingly, to data that is the result of transformations, queries, and other computations. To this end we extend the basic algebraic operations within Stratosphere. Merely representing uncertainty and poor quality ignores the potential of the Cloud environment. Stratosphere’s Cloud-based architecture allows and demands a rethinking of the basic notions of uncertainty management and data cleansing: The scalable, adaptive, and parallel execution engine provides compute capabilities that for the first time allow ad hoc data cleansing in the context of queries and not of long, manually created ETL processes and that allow a wider exploration of the possible worlds of uncertainty-laced databases. Uncertainty will not only be represented but will also be queryable; Stratosphere’s optimization component will allow uncertainty-constrained queries. Poor data quality will not only be represented, but its quality will be improved; Stratosphere will include a set of basic operators to cleanse data, optimized for the parallel Cloud environment, and integrated in Stratosphere’s data model, query language, and execution engine.
DFG Programme Research Units
 
 

Additional Information

Textvergrößerung und Kontrastanpassung