SES SJ: Duplicate Content & Multiple Site Issues

Posted on Aug 13th, 2009

Share this article

Blog
B2B Marketing
SES SJ: Duplicate Content & Multiple Site Issues

duplicate-content

It was standing room only for the SES San Jose session regarding Duplicate Content & Multiple Site Issues.

Shari Thurow of Omni Marketing started the session off by addressing why duplicate content is a concern.

When there are duplicate content issues it lowers the number of pages available to rank. Less pages to rank can equal less rankings which can lead to lower search traffic.

In addition, Shari mentioned something called ‘Crawler Cap’ which is the maximum number of pages a search engine will crawl of a particular website.

So, if your site delivers duplicate content:

it lowers index count
the best converting pages might not appear in search results
web pages from your shared-content partner sites (affiliates, syndicates) may have better search visibility
is a waste of search engine resources
delivers a poor searcher experience

Here are some details on how the search engines crawl and index content and how you can help them crawl the right information, no longer leaving the pages that rank to chance.

Crawler: when the search engine goes and fetches web pages
Pagelevel

maximum # of links per page
less likely to follow links on known duplicate pages
url fingerprint

Site level

maximum # of links per site
crawler cap

Query time

limits of URLs/host per SRP, domain restrictions as well

Shari educated the half techie/half marketer audience on shingle comparison.
Content is broken down into sets of word patterns and groups of adjacent words, called shingles and are compared for similarity.
More shingles = more similarity and perhaps duplicate content issues.

Some solutions include:
1. Site level robots.txt file allows you to tell the search engines which parts of the site to crawl
2. Page level robots exclusion meta tag, canonical lets you do the same but on the page level

Next up is Marty Weintraub from aimClear.

Let me sum up the energetic presentation with this:

A Clean URL is a happy URL 🙂
Canonicalization is an ethic made of many techniques letting there be 1 page
The CMS works for the marketer, not the other way around
Build a bridge between the marketer and the SEO and the C-suite

Sasi Parthasarathy of Bing is next and he asks pretty please that webmasters make changes like those being discussed today so that Bing can do their job better for you and your potential customers. Also, if someone can come up with a better and easier name for canonicalization that would be great!

To help with geo-targeting use the country code top-level domain. For example, .ca.

To help with content syndication, make sure you are using it with caution. From Bing’s perspective, they want to show the version that is best suited and appropriate for the user. That may not be the version you prefer.

General tips:

Don’t use dynamic URLs if you have static content
301 Redirects are your BFF
No 302 Hijacks
When you do a site update, don’t have links to expired pages

Canonical tab observations:
Still not being actively used
~50% of canonical meta tags are not used correctly
~38% use the tag to point to the same page
~9% point to pages from another domain (a big NO NO)

Greg Grothaus, a Google engineer is next in line and he promises to tackle the Duplicate Content Penalty Myth.

When a searcher queries, Google wants to ensure diversity in the results. Therefore, Google will try to pick 1 page of those that have similar content. For example, if you search for ‘fluffy bunnies’, and there are two pages from a site (web page and printer version), Google will likely result the web page. However, the site is not penalized for having the printer friendly version also talking about fluffy bunnies.

Google removes sites for spam, but not duplicate content as a stand alone item.

The below is a list of Issues, Not Penalties:

Dilution of link popularity
Backlinks pointing to several URL versions of same content
User-unfriendly URLs in search results
URLs with useless parameters may offset branding efforts and decrease usability
Inefficient crawling
Take note – the more time they spend crawling duplicate content, the less time they have to find new content

Ivan Davtchev of Yahoo! polls the audience asking ‘how much of the web is duplicate content’? Answer – more than 1/3rd.

3 Types of Duplication
1. Accidental

Session IDs in URLs
Soft 404s

2. Dodgy

Replicating content across multiple domains

3. Abusive

Scraper spammers
Weaving/Stitching
Bulk content duplication

Q&A:
How do you figure out if duplicate content issues are a problem on your site?
Greg – start with advanced search queries and see if there are many URLs resulting that you didn’t expect. Also, check Google Webmaster Tools.

Q:
Let’s say you launch a new product and hundreds of partner sites leverage that new content across their sites as well?
Greg – if left to Google, we will pick the page that we think is best. But this may not be your preferred page.
Something you can do is make sure all the syndicates are linking back to you as the original author.

Sasi – have the syndicate sites add to the original content to make it unique.

Q: What do you do with content across multiple sites, manufacturers, distributors?
Shari – It’s all about context. What is the context of the search. Are they looking for a place to purchase the headphones or distribute them. Duplicate content may be inevitable, but don’t abuse it.

Moderator:
PJ Fusco, Natural Search Director, Netconcepts

Shari Thurow, Founder & SEO Director, Omni Marketing Interactive
Greg Grothaus, Search Quality Team, Google
Marty Weintraub, President, aimClear
Sasi Parthasarathy, Program Manager, Bing
Ivan Davtchev, Lead Product Manager, Search Relevance, Yahoo! Search

Learn more search marketing and social media strategies from 2009 SES SJ coverage by TopRank Online Marketing

B2B Marketing Search Marketing

Keep reading

From our award-winning B2B marketing blog to conversations with top influencers and eBooks, check out TopRank Marketing's latest resources to drive connection, engagement & impact:

The Intersection of Generative AI and Marketing Data B2B Marketing

5 Reasons Why a Content Audit Is Useful Search Marketing

Conducting an SEO Content Audit Search Marketing