Creative Duplicate Content Solutions for Enterprise

Lindsay Wassell's picture

Duplicate content issues plague websites small and large, from print versions and session IDs to content sharing such as distribution partnerships and locally targeted portals. This post focuses on the latter, outlining solutions for enterprise scale duplicate content issues. Essentially, duplicate content is a significant block of content on multiple URLs that either completely or closely matches other content. The search engines strive to index and present to users only one version of each piece of content.

Google says, "Our users typically want to see a diverse cross-section of unique content when they do searches."

Makes sense. Today, I'm focusing on the challenges of duplicate content when the driving factor is a business need, not a technical problem. Imagine the following scenarios;

  • A national media company owns a variety of local newspapers and the related websites. Certain features like Global Warming, Summer Olympics, and Politics are written by a single group and shared amongst the daily papers. Reasonably, each paper would also like to host the content on their website.
  • Several niche job boards are owned and operated by a single company. One website is focused on Sales and Marketing jobs and another on California jobs. An employer posts a Marketing Manager position on the California jobs website. Shouldn't the job board cross post to the Sales & Marketing job board?
  • A general classified ad site powers white-label classified ad functionality and content for niche websites, a site based on NYC real estate for example. There is clear benefit to sharing targeted content and reaching the additional audience of the NYC real estate site.

As you can see, there are valid business reasons for posting the content in more than one location; customer retention, revenue, branding, usability, and more. As enterprise aware SEOs, we are tasked to find effective solutions for content sharing that position our sites with strength in the search engines without compromising an effective business model. The way I see it, pages can be classified into five groups; unique, master, satellite, pass-through, and disallow. Below I will describe the attributes and technical treatments for each group.

Unique

A page classified with the unique treatment has independent SEO value. The majority of the text is not replicated elsewhere (on any page, anywhere on the web). A unique page is the single canonical source of valuable information. Technical Treatment None

Master

A page classified with the master treatment has independent SEO value. It is the strongest single selection among two or more duplicate pages. Master pages are the recipient of transitive value from pages with the satellite treatment classification. Technical Treatment None

Satellite

A page classified with the satellite treatment has transitive SEO value. It has been deemed weaker that the master version among two or more duplicate pages. Technical Treatment

  • Include a Meta Directive: <meta name="robots" content="noindex,follow">
  • Link to the related master page with the destination H1 or Title Tag as anchor text.
  • Add a rel=nofollow directive to all links (except for the one to the master page) to maximize link flow.

Disallow

A page classified with the disallow treatment has no SEO value. These pages often exist behind log-in or do not serve well as entry points to the website (such as terms, privacy, and print pages). Technical Treatment

  • Disallow the page in the Robots.txt file if possible, otherwise add the meta directive, <meta name="robots" content="noindex,follow">.
  • Add a rel=nofollow directive to site internal inbound links to the page.

Pass-Through

A page classified with the pass-through treatment has transitive SEO value. Though not necessarily a duplicate page (as with the satellite classification), pass-through pages are usually the output of a search query with infinite resulting possibilities (and URLs). Technical Treatment

  • Include a Meta Directive: <meta name="robots" content="noindex,follow">

The next step is to classify pages into one, and only one, of the above classifications. I'll cover the finer points of making the selection in a subsequent post.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.