Project Details
Projekt Print View

ScienceLinker: A Framework for Finding, Linking, and Enriching Social Science Linked Data

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Empirical Social Research
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 404417453
 
Scientists in applied empirical research are typically searching for datasets and, in particular, measures within the datasets (e.g. variables in the case of the social science research) which allow them to investigate their specific research interest. These datasets are used for multiple purposes like for answering a particular research question, replicating a specific finding based on a different dataset or merging it with another dataset in order to increase possibilities for analysis or to reduce missing values. However, finding suitable data and measures for the support of one’s own hypothesis is a challenging task. In a lot of cases, a researcher will be able to find the desired data at a research data centre. Regarding the mass of data available on the web (resulting from the Open Data movement) additional interesting datasets are likely be available but are not provided by organized infrastructures like research data centres. Additionally, manual effort still has to be done to use the found datasets for interlinking, e.g. in order to enrich own datasets with additional content from the found data and metadata, also for a later publication in a journal, a self-archiving platform or on the web. The project ScienceLinker motivates two approaches for these challenges: (1) to develop methods to identify datasets published as Linked Open Data on the web that are compatible by their content and also provide an appropriate quality; (2) to apply Semantic Web technologies to use of the data e.g. for linking, enrichment and publishing. These techniques will be made usable for non-domain users by applying extensive automation when possible. The developed framework aims to guide the user (e.g. an employee of a data provider who is responsible for the publication of data or a scientist who is seeking datasets in order to complete his dataset with additional metadata) through the following five steps: the automatic identification of a set of related datasets published as Linked Open Data; the assessment of a dataset in terms of compatibility and quality; the linking of entities referenced in the dataset to the identified datasets; the enrichment of the dataset by applying a set of entity-type-specific rules to infer additional information about the entities also via non-identity links; and the preprocessing of the enriched dataset for a publication in self-archiving platforms, as Linked Data or via further publication ways.The investigations and developments in this project will be kept generic in order to allow an application of the framework in other domains. For the social sciences, potential related Linked Data sources may neither be scientific nor from the social science domain at all like e.g. DBpedia or Geonames. In order that the ScienceLinker framework can also be executed in a neutral environment, we will integrate it into the established data integration platform Karma which has been developed at ISI.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung