It was standing room only for the SES San Jose session regarding Duplicate Content & Multiple Site Issues.
Shari Thurow of Omni Marketing started the session off by addressing why duplicate content is a concern.
When there are duplicate content issues it lowers the number of pages available to rank. Less pages to rank can equal less rankings which can lead to lower search traffic.
In addition, Shari mentioned something called ‘Crawler Cap’ which is the maximum number of pages a search engine will crawl of a particular website.
So, if your site delivers duplicate content:
- it lowers index count
- the best converting pages might not appear in search results
- web pages from your shared-content partner sites (affiliates, syndicates) may have better search visibility
- is a waste of search engine resources
- delivers a poor searcher experience
Here are some details on how the search engines crawl and index content and how you can help them crawl the right information, no longer leaving the pages that rank to chance.
Crawler: when the search engine goes and fetches web pages
- maximum # of links per page
- less likely to follow links on known duplicate pages
- url fingerprint
- maximum # of links per site
- crawler cap
- limits of URLs/host per SRP, domain restrictions as well
Shari educated the half techie/half marketer audience on shingle comparison.
Content is broken down into sets of word patterns and groups of adjacent words, called shingles and are compared for similarity.
More shingles = more similarity and perhaps duplicate content issues.
Some solutions include:
1. Site level robots.txt file allows you to tell the search engines which parts of the site to crawl
2. Page level robots exclusion meta tag, canonical lets you do the same but on the page level
Next up is Marty Weintraub from aimClear.
Let me sum up the energetic presentation with this:
- A Clean URL is a happy URL 🙂
- Canonicalization is an ethic made of many techniques letting there be 1 page
- The CMS works for the marketer, not the other way around
- Build a bridge between the marketer and the SEO and the C-suite
Sasi Parthasarathy of Bing is next and he asks pretty please that webmasters make changes like those being discussed today so that Bing can do their job better for you and your potential customers. Also, if someone can come up with a better and easier name for canonicalization that would be great!
To help with geo-targeting use the country code top-level domain. For example, .ca.
To help with content syndication, make sure you are using it with caution. From Bing’s perspective, they want to show the version that is best suited and appropriate for the user. That may not be the version you prefer.
- Don’t use dynamic URLs if you have static content
- 301 Redirects are your BFF
- No 302 Hijacks
- When you do a site update, don’t have links to expired pages
Canonical tab observations:
Still not being actively used
~50% of canonical meta tags are not used correctly
~38% use the tag to point to the same page
~9% point to pages from another domain (a big NO NO)
Greg Grothaus, a Google engineer is next in line and he promises to tackle the Duplicate Content Penalty Myth.
When a searcher queries, Google wants to ensure diversity in the results. Therefore, Google will try to pick 1 page of those that have similar content. For example, if you search for ‘fluffy bunnies’, and there are two pages from a site (web page and printer version), Google will likely result the web page. However, the site is not penalized for having the printer friendly version also talking about fluffy bunnies.
Google removes sites for spam, but not duplicate content as a stand alone item.
The below is a list of Issues, Not Penalties:
- Dilution of link popularity
- Backlinks pointing to several URL versions of same content
- User-unfriendly URLs in search results
- URLs with useless parameters may offset branding efforts and decrease usability
- Inefficient crawling
- Take note – the more time they spend crawling duplicate content, the less time they have to find new content
Ivan Davtchev of Yahoo! polls the audience asking ‘how much of the web is duplicate content’? Answer – more than 1/3rd.
3 Types of Duplication
- Session IDs in URLs
- Soft 404s
- Replicating content across multiple domains
- Scraper spammers
- Bulk content duplication
How do you figure out if duplicate content issues are a problem on your site?
Greg – start with advanced search queries and see if there are many URLs resulting that you didn’t expect. Also, check Google Webmaster Tools.
Let’s say you launch a new product and hundreds of partner sites leverage that new content across their sites as well?
Greg – if left to Google, we will pick the page that we think is best. But this may not be your preferred page.
Something you can do is make sure all the syndicates are linking back to you as the original author.
Sasi – have the syndicate sites add to the original content to make it unique.
Q: What do you do with content across multiple sites, manufacturers, distributors?
Shari – It’s all about context. What is the context of the search. Are they looking for a place to purchase the headphones or distribute them. Duplicate content may be inevitable, but don’t abuse it.
PJ Fusco, Natural Search Director, Netconcepts
Shari Thurow, Founder & SEO Director, Omni Marketing Interactive
Greg Grothaus, Search Quality Team, Google
Marty Weintraub, President, aimClear
Sasi Parthasarathy, Program Manager, Bing
Ivan Davtchev, Lead Product Manager, Search Relevance, Yahoo! Search
Learn more search marketing and social media strategies from 2009 SES SJ coverage by TopRank Online Marketing