Detailseite
Scalable Information Extraction in Stratosphere
Antragsteller
Professor Dr. Ulf Leser
Fachliche Zuordnung
Sicherheit und Verlässlichkeit, Betriebs-, Kommunikations- und verteilte Systeme
Förderung
Förderung von 2010 bis 2015
Projektkennung
Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 132320961
The main objective of this project is to enable query-based analysis of large quantities of unstructured text. We envision users to formulate IE tasks with the Stratosphere query language. Such a query is parsed, optimized, parallelized, executed, and re-optimized on a Cloud infrastructure by methods developed in projects A, B, and C by Markl, Freytag, and Kao. The IE-specific operators, which crunch text into structured representations, are developed in this project. Furthermore, we develop, in cooperation with the Project E, operators for a systematic aggregation of extracted information that fully take the uncertainty of extracted information into account. All IE operators will be configurable to embrace different IE strategies, either geared towards high throughput, high precision / low uncertainty, or high recall. The high-level operator interfaces must be domain independent, while their concrete instantiations need to be easily adaptable to the text-domain at hand. These requirements call for a carefully balanced mixture of simple IE techniques, advanced NLP, and Machine Learning. All methods developed within this project will be evaluated on large and realistic IE tasks in the biomedical domain.
DFG-Verfahren
Forschungsgruppen
Teilprojekt zu
FOR 1306:
Stratosphere - Information Management on the Cloud