Project Details
Projekt Print View

Scalable Information Extraction in Stratosphere

Subject Area Security and Dependability, Operating-, Communication- and Distributed Systems
Term from 2010 to 2015
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 132320961
 
The main objective of this project is to enable query-based analysis of large quantities of unstructured text. We envision users to formulate IE tasks with the Stratosphere query language. Such a query is parsed, optimized, parallelized, executed, and re-optimized on a Cloud infrastructure by methods developed in projects A, B, and C by Markl, Freytag, and Kao. The IE-specific operators, which crunch text into structured representations, are developed in this project. Furthermore, we develop, in cooperation with the Project E, operators for a systematic aggregation of extracted information that fully take the uncertainty of extracted information into account. All IE operators will be configurable to embrace different IE strategies, either geared towards high throughput, high precision / low uncertainty, or high recall. The high-level operator interfaces must be domain independent, while their concrete instantiations need to be easily adaptable to the text-domain at hand. These requirements call for a carefully balanced mixture of simple IE techniques, advanced NLP, and Machine Learning. All methods developed within this project will be evaluated on large and realistic IE tasks in the biomedical domain.
DFG Programme Research Units
 
 

Additional Information

Textvergrößerung und Kontrastanpassung