Detailseite
Projekt Druckansicht

Scalable Information Extraction in Stratosphere

Antragsteller Professor Dr. Ulf Leser
Fachliche Zuordnung Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Förderung Förderung von 2010 bis 2015
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 132320961
 
The main objective of this project is to enable query-based analysis of large quantities of unstructured text. We envision users to formulate IE tasks with the Stratosphere query language. Such a query is parsed, optimized, parallelized, executed, and re-optimized on a Cloud infrastructure by methods developed in projects A, B, and C by Markl, Freytag, and Kao. The IE-specific operators, which crunch text into structured representations, are developed in this project. Furthermore, we develop, in cooperation with the Project E, operators for a systematic aggregation of extracted information that fully take the uncertainty of extracted information into account. All IE operators will be configurable to embrace different IE strategies, either geared towards high throughput, high precision / low uncertainty, or high recall. The high-level operator interfaces must be domain independent, while their concrete instantiations need to be easily adaptable to the text-domain at hand. These requirements call for a carefully balanced mixture of simple IE techniques, advanced NLP, and Machine Learning. All methods developed within this project will be evaluated on large and realistic IE tasks in the biomedical domain.
DFG-Verfahren Forschungsgruppen
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung