Jinfo: Detecting online plagiarism

Detecting online plagiarism
Jinfo Blog

8th August 2012

Abstract

Every organisation that publishes content on the web needs to be aware of the potential for plagiarism and how to stay alert to it. Plagiarism can risk your organisation's reputation, as well as put revenue at risk.

Item

Plagiarism involves using somebody else’s work and claiming authorship without crediting the original author. It differs from copyright infringement where permission to publish was not obtained. Copyright infringement may or may not involve the misattribution of authorship.

Online plagiarism (or website plagiarism) is the copying of website or blog content and passing it off as original material on another website. Generally this also involves a breach of copyright. The problem for content owners is detecting online plagiarism and stopping it.

Detecting plagiarised content can be hard work. Nevertheless there are approaches that can catch obvious plagiarists.

One approach is to search for unique phrases using Google or other search engine. As an example, our website includes the phrase “No business is an island”. Google lists over 100,000 hits for this phrase. However a few sentences later we use the phrase “businesses are at war”. Adding this to the search gave 20 or so results almost all of which came from sites that took the material from our website. This can be automated using Google Alerts, which ensures that notification when new plagiarised content appears.

One approach to discover plagiarisers is to use a dedicated tool such as Copyscape. With Copyscape, you enter in the URL to be checked and the site returns a list of up to 10 sites that have copied content in the free version and an unlimited number in the paid version (which allows a batch search for a complete website). Copyscape also offers an alerting service. TinEye is a similar type service that can be used to find copied images.

Identifying plagiarism is the initial step. Getting the plagiariser to remove the copied content is harder. Firstly, you need proof that the content has been copied and that your content pre-dated the copied content. (Sites like the Wayback Machine at Archive.org are useful for this).

Contacting plagiarisers can sometimes have results and the copied content will be removed. More usually follow-up actions are required. The host provider can be informed that the site is infringing copyright as can search engines who can be asked not to include the infringing pages in their indexes. The final recourse is legal and this can be expensive. Ultimately the approach taken should be based on a decision relating to the threat posed by the copied material and the effort required to get it removed.

This is a short version of a longer article on the same topic available as part of the FreePint Subscription. The longer article discusses in more detail how to identify and protect against plagiarism, as well as the risks an organisation faces by ignoring it. Subscribers can log in to view it now.

About this article