Detailseite
Projekt Druckansicht

Uncertainty and Data Cleansing in the Stratosphere Cloud Data Management System

Fachliche Zuordnung Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Förderung Förderung von 2010 bis 2015
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 132320961
 
The problem of poor data quality, in the form of uncertain, incomplete, and inconsistent data, permeates data management. Complex data management systems, such as Stratosphere, that support various applications and use cases including text extraction and data integration must be able to handle the intrinsic uncertainty and often poor quality of source data and computed results. Within Stratosphere we will develop methods to detect and represent uncertain data, erroneous data, inconsistent data, and incomplete data. The representation will not be limited to base data, but, more interestingly, to data that is the result of transformations, queries, and other computations. To this end we extend the basic algebraic operations within Stratosphere. Merely representing uncertainty and poor quality ignores the potential of the Cloud environment. Stratosphere’s Cloud-based architecture allows and demands a rethinking of the basic notions of uncertainty management and data cleansing: The scalable, adaptive, and parallel execution engine provides compute capabilities that for the first time allow ad hoc data cleansing in the context of queries and not of long, manually created ETL processes and that allow a wider exploration of the possible worlds of uncertainty-laced databases. Uncertainty will not only be represented but will also be queryable; Stratosphere’s optimization component will allow uncertainty-constrained queries. Poor data quality will not only be represented, but its quality will be improved; Stratosphere will include a set of basic operators to cleanse data, optimized for the parallel Cloud environment, and integrated in Stratosphere’s data model, query language, and execution engine.
DFG-Verfahren Forschungsgruppen
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung