Project Details
Knowledge-enhanced information extraction across languages for pharmacovigilance
Applicant
Professor Dr.-Ing. Sebastian Möller
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 442445488
Nowadays scientific knowledge can be published digitally within many different forms and sources, such as encyclopedias, scientific papers, but also structured knowledge sources like ontologies or knowledge bases. Beside that also news articles, blog posts or social media contain relevant information. All this is published everyday in a large number of different languages.In MEDLINE for instance every year close to one million new articles are included.The present project aims to design Artificial Intelligence (AI) methods that automatically digest these different types of text sources and jointly extract such knowledge and observations in order to populate existing knowledge bases.Our project showcases these methods in the domain of pharmacovigilance, which endeavors to maintain up-to-date knowledge on adverse drug reactions (ADR) for the benefit of public health. In this domain, authoritative sources include scientific journals and drug labels while elementary observations are reported in patient records and social media.Current mainstream information extraction methods use self-supervised extraction of word representations from large text corpora and tend to neglect existing knowledge on the target domain. In contrast, the present project aims to integrate existing knowledge into the word representation acquisition and information extraction processes to improve the extraction of new information and knowledge. Additionally, it will take advantage of the existence of similar information published in multiple languages to pool knowledge across countries.Language barriers hamper the free flow of knowledge and thought across languages. Relevant findings need to be articulated across these barriers, which requires time and effort to collect and translate into the respective languages. In the not too distant future, tools will assist researchers and other citizens in finding and linking information distributed across sources and languages. In this project, we will help to improve such technologies and will demonstrate them for pharmacovigilance. This cross-language dimension obtains a clear benefit from the proposed trilateral collaboration. To strengthen our collaboration and mutual knowledge, we have planned internships for early career researchers at each of the other two partner teams under the joint supervision of the partners, as well as plenary, jointly taught training actions, to provide them with a shared international exposure.The consortium is composed of three internationally recognized teams specialized in natural language processing. NAIST (JP) has created the de-facto NLP tools for Japanese. DFKI (DE) has a strong background in corpus generation, general information extraction and biomedical text processing. LIMSI (FR) has a long experience in corpus annotation, hybrid information extraction and question-answering, and a strong background in biomedical language processing, including pharmacovigilance from patient forums.
DFG Programme
Research Grants
International Connection
France, Japan
Partner Organisation
Agence Nationale de la Recherche / The French National Research Agency; Japan Science and Technology Agency
JST
JST
Cooperation Partners
Professor Dr. Yuji Matsumoto; Dr. Pierre Zweigenbaum