Project Details
Analyzing Mathematics to Detect Disguised Academic Plagiarism
Applicants
Professor Dr.-Ing. Bela Gipp; Dr. Moritz Schubotz
Subject Area
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
since 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 437179652
The goal of this project is to improve the detection methods for disguised forms of academic plagiarism, such as paraphrases, translations and idea plagiarism, particularly for documents in the fields of Science, Technology, Engineering and Mathematics (STEM). To achieve this goal, we will conduct fundamental research on analyzing mathematical expressions as text-independent and language-independent features to determine potentially suspicious similarity of documents. We will combine the novel mathematics-based approach to plagiarism detection with text-based and citation-based detection approaches that we developed in previous research. Combined approaches that consider different feature types for similarity assessment are most promising to detect the wide range of academic plagiarism forms. To enable the mathematics-based approach to plagiarism detection, as well as other information retrieval tasks in practice, we will research methods for extracting and semantically augmenting mathematical expressions from scientific documents in various input formats. For this research task, the project team will collaborate closely with the Bela Gipp’s DFG-funded project GI 1259/1-1 “Methods and Tools to Advance the Retrieval of Mathematical Knowledge from Digital Libraries for Search-, Recommendation- and Assistance-Systems”. The project develops mathematical information retrieval methods to serve information needs within the Wikipedia. The proposed project will extend these methods and make them applicable for scientific documents.Detecting academic plagiarism is a pressing problem, e.g., for educational and research institutions, funding agencies, and academic publishers. However, currently, plagiarism detection providers focus on identifying the non-obfuscated forms of plagiarism prevalent among students. For economic reasons, they avoid the research and development effort necessary to enable the detection of more strongly obfuscated forms of plagiarism, which are characteristic of research papers. To enable the research community and plagiarism detection providers to better identify carefully disguised instances of academic plagiarism, we will implement our contributions in the open source and free to use plagiarism detection system HyPlag (www.hyplag.org) and openly provide the code and data of our research. To ensure our research meets the needs of domain-experts, we will continuously evaluate our research contributions with the help of professional mathematics editors from zbMATH (www.zbMATH.org). By collaborating closely with mathematics experts, we furthermore intend this project to lay the groundwork for future contributions in Mathematical Information Retrieval beyond the plagiarism detection use case.
DFG Programme
Research Grants
Cooperation Partner
Professor Dr. Klaus Hulek