Ouriginal fails more often than not to detect plagiarism

Ctrl-C Ctrl-V

Ouriginal, the plagiarism detection service formerly known as Urkund, has been in use at the University of Helsinki for a very long time. Because of Ouriginal's inability to process some of our students' Master's thesis, I spent a little bit of time investigating Ouriginal and its problems. Here's what I found out.

From Swedisch to European to US-American
It is unclear to me which software we are really using since Ouriginal is a merger of the Swedish Urkund and the German PlagScan, and in 2021 Ouriginal was acquired by the American Turnitin. I don't need any of such software, but our university mandates that theses must be scanned by the software. Since I read and evaluate an increasing number of theses every year, I also increasingly use Ouriginal.

Ouriginal supports many document formats, but support is buggy
Looking at the history of submissions, I realized that until about mid-May 2024, all submitted PDFs had been dutifully analyzed. But last week, something changed since about half of the submitted documents were not processed. The usual tricks (submitting in another format such as .docx, .odt, .ps, .rtf) did work for some, but not all of the theses. One thesis was especially stubborn, and neither our "experts" nor the Original helpdesk staff had any clue why their software could not process it (which makes me again wonder what software their backend is actually using). Ouriginal is investigating already for a month without getting back to me with answers... Our university receives funding from the Ministry of Education based on graduating students, and we lose thousands of Euros for each student who fails to graduate on time. Fee-liable students also have a strong incentive to graduate on time in order to avoid additional costs. Therefore, every delay in the graduation schedule is unacceptable.

Ouriginal's performance disappoints with a shocking detection rate
Since there are dozens of ways how to turn a Word document into a PDF, I tried to isolate the problem by submitting test documents to the Ouriginal service. I used manuscripts that we had published over the last few years. I was shocked when I realized that Ouriginal failed to detect plagiarism in more than half of all cases. Paywalls are not to blame since almost all our publications are freely available. Neither the novelty of the publication nor the quality of the journal made a perceivable difference. However, it seems evident that Ouriginal has more problems with non-English documents. It is very difficult to explain why so many freely available scientific texts from reputable journals (Scientific Reports, Angiogenesis, Annals of Anatomy) are not indexed by Ouriginal.

Failure to detect plagiarism from Helsinki University's own thesis repository
Original does not even index our own university's freely available PhD theses (https://ethesis.helsinki.fi). None of the PhD thesis written in my own lab was recognized. I also checked my own PhD thesis, and it was the only one that was recognized (Ouriginal found it on docslib.org, not our own university's official thesis repository). Does it make any sense to pay for such an imperfect service? I don't think so. The omission of freely available texts from important journals seems inexcusable since a simple Google search does a better job. I did Google searches for sentences from the unrecognized texts, and in most cases, Google identified the original source without problems. A Python script to split texts into sentences for individual Google searches and to aggregate the results can be written by any programmer in an afternoon. Why would you maintain your own database if Google does a better job than you can?

Plagiarism to the following published manuscripts/theses/proceedings* was NOT detected:

Plagiarism to the following published manuscripts/theses/proceedings was detected:

Ouriginal's failure extends to MSc theses and they don't care
Our own public MSc theses are not included in Ouriginal's plagiarism scan. And what is more serious: Ouriginal doesn't care. About one month after I informed Ouriginal about their blind spots, the same publicly available texts were still not included in their scan. The Ouriginal guys are clearly unable to change their legacy code. However, they should have the possibility to point the software to important text repositories for inclusion in future comparisons. They did not even do that. Either they don't have this possibility, or they don't care. Either way, nobody should use their service. There are better alternatives, and perhaps the best alternative is no software at all but teaching students scientific writing.

Free plagiarism detection software alternatives?
Most "free" plagiarism checkers are not really free (e.g. https://www.check-plagiarism.com/ has a 1000-word limit), but I tested them with the two first pages of our own university's public Master's theses, and they mostly recognized the plagiarism, thus outperforming Ouriginal. I also tried the free plagiarism checker by Grammarly, which features a 10000-word limit (which is usually enough for the recommended scientific manuscript-type MSc thesis). While the free Grammarly version failed to detect any plagiarism, the Pro version found the exact document. Bottom line: there are many free alternatives and they are at least not worse than Ouriginal. Obviously, I am not the first to test plagiarism detection software (here is a comparison from 2022, which does not include Ouriginal: https://ami.info.umfcluj.ro/index.php/AMI/article/view/904). Based on my own experiences, I have to amend the statement of Adithan and Surendiran (2018), that most free plagiarism detection software performs poorly: Some commercial offerings perform poorly as well. In this comparison, the best free software was Scribbr (which uses Turnitin detection software). In fact, it was the only software among those that I tested that managed to detect Kukka Aho's Master's thesis from 2006 (available from the ethesis repository of the University of Helsinki).

Do we need plagiarism detection software at all?
The question has been debated elsewhere extensively (e.g. https://theconversation.com/universities-must-stop-relying-on-software-t..., https://doi.org/10.1080/07294360600610438), and I agree with most of what has been said. Using such software results in false feelings of having done something to prevent plagiarism, while the tool simply does what it does: It makes students submit texts that are not recognized as plagiarism. It does not teach students to write scientific texts nor makes them understand what scientific writing is about. That would take too much time and cannot be easily outsourced to a software tool. I hope that by the end of this year, our university will discontinue the mandatory use of plagiarism software tools. After all, we also don't use software to detect the use of AI in students' writings, but we try to teach and encourage the transparent and legitimate use of such tools instead of moving the responsibility to a poorly performing software tool. After all, some types of plagiarism are never detected by any of the software tools that I have used, namely figures and images!

* Since all entries in the reference list are verbatim, almost every scientific document gets wrongly flagged with plagiarism, depending mostly on how many other publications it cites. It is strange that in 2024, Ouriginal is still unable to recognize the literature list as such and exclude it from analysis). I checked the above percentages manually, and they result mostly from the reference list, and the original 100% identical source was not identified by Ouriginal.