Ouriginal fails more often than not to detect plagiarism

Ctrl-C Ctrl-V

Ouriginal, the plagiarism detection service formerly known as Urkund, has been in use at our university for a very long time.
It is unclear to me which software we are really using since Ouriginal is a merger of the Swedish Urkund and the German PlagScan, and in 2021 Ouriginal was acquired by the American Turnitin. I don't need any of such software, but our university mandates that theses must be scanned by the software. Since I read and evaluate an increasing number of theses every year, I also increasingly use Ouriginal. Looking at the history of submissions, I realized that until about last week, all submitted PDFs had been dutifully analyzed. But last week, something changed since about half of the submitted documents were not processed. The usual tricks (submitting in another format such as .docx, .odt, .ps, .rtf) did work for some, but not all of the theses. One thesis is especially stubborn, and neither our "experts" nor the Original helpdesk staff has any clue why their software cannot process it (which makes me again wonder what software their backend is actually using). Ouriginal is investigating... Our university receives funding from the Ministry of Education based on graduating students, and we lose thousands of Euros for each student who fails to graduate on time. Fee-liable students also have a strong incentive to graduate on time in order to avoid additional costs. Therefore, every delay in the graduation schedule is unacceptable.
Since there are dozens of ways how to turn a Word document into a PDF, I tried to isolate the problem by submitting test documents to the Ouriginal service. I used manuscripts that we had published over the last few years. I was shocked when I realized that Ouriginal failed to detect plagiarism in more than half of all cases. Paywalls are not to blame since almost all our publications are freely available. Neither the novelty of the publication nor the quality of the journal made a perceivable difference. However, it seems evident that Ouriginal has a problem with non-English documents. It is very difficult to explain why so many freely available scientific texts from reputable journals (Scientific Reports, Angiogenesis, Annals of Anatomy) are not indexed by Ouriginal. Original does not even index our own university's freely available PhD theses (https://ethesis.helsinki.fi). None of the PhD thesis written in my own lab was recognized. Does it make any sense to pay for such an imperfect service? I don't think so. The omission of freely available texts from important journals seems inexcusable since a simple Google search does a better job. I did Google searches for sentences from the unrecognized texts, and in most cases, Google identified the original source without problems. A Python script to split texts into sentences for individual Google searches and to aggregate the results can be written by any novice programmer in an afternoon. Why would you maintain your own database if Google does a better job than you can?

Plagiarism to the following published manuscripts/theses/proceedings* was NOT detected:

  • Scientific Reports 10/2022 17%
  • PhD thesis from University of Helsinki 6/2023 31%
  • Angiogenesis 4/2023 15%
  • PhD thesis from University of Helsinki 5/2020 30%
  • Duodecim (Finnish) 8/2020 19%
  • Duodecim (English version, parallel deposition at Zenodo) 8/2020 20%
  • Preprint server 12/2023 26%
  • LymphForsch (German) 12/2019 17%
  • LymphForsch (English version, parallel deposition at Zenodo) 12/2019 1%
  • Vasomed (German) 7/2018 35%
  • Annals of Anatomy 9/2018 33%

Plagiarism to the following published manuscripts/theses/proceedings was detected:

  • International Journal of Molecular Sciences 12/2021 99%
  • Biology 2/2021 99%
  • Science Signaling 8/2021 92%
  • Nature Cardiovascular Research 5/2022 86%
  • Molecular Genetics & Genomic Medicine 6/2020 100%
  • 3. Schweizer Lymphsymposium 9/2021 94%
  • Blood 10/2020 100%
  • Frontiers in Bioengineering and Biotechnology 2/2018 88%
  • eLife 5/2019 93%
  • Vasomed (English) 7/2018 93%

* Since all entries in the reference list are verbatim, almost every scientific document gets wrongly flagged with plagiarism, depending mostly on how many other publications it cites. It is strange that in 2024, Ouriginal is still unable to recognize the literature list as such and exclude it from analysis). I checked the above percentages manually, and they result mostly from the reference list, and the original 100% identical source was not identified by Original.