The second session on my liveblogging hitllist for SES Chicago is the “Duplicate Content and Multiple Sites” moderated by Adam Audette. Unfortunatley, Michael Gray was not able to be there, so things started up with Susan Moskwa, Webmaster Trends Analyst from Google, and Shari Thurow was added on the fly.
What is duplicate content? Identical or substantially similar content. Also multiple URLs with the same content. Google realizes that duplication can be deliberate or accidental.
Basically, duplicate content in the context of a search engine is publishing different URLs that present the same content. (My Example:)
Why does Google care about duplicate content? Users don’t like to see 10 nearly identical results. Also, there’s no benefit in crawling multiple urls with the same content. It’s a waste of resources for Googlebot to do that.
How does Google handle duplicate content? Google will filter results at serve time, collapsing the duplicates and then show the most relevant page in the SERPs for the query. This will likely be the page where the content originated. The original page is determined by considering factors like age and authority. This is not an exact science and influences on what is considered to determine the “original” could change with the query. Think about what people are searching for and what they expect to see.
The myth of the duplicate content penalty. A lot of people think that if they have duplicate content that they’ll be penalized. In most cases, Google does not penalize sites for accidental duplication. Many, many, many sites have duplicate content.
Google may penalize sites for deliberate or manipulative duplication. For example: auto generated content, link networks or similar tactics designed to be manipulative. Don’t be afraid of duplicate content, be informed.
Tactics for controlling what version of your content does get shown:
- Control which version of your content is shown. Especially if you don’t like the version show in search.
- If you aggregate all the duplicates and redirect to a canonical page, then it consolidates the ranking signals. It also optimizes the crawling of your site focusing on canonical URLs versus the duplicates.
- You should establish contracts (when syndicating) that require linking back to the original version.
When should you not care about duplicate content?
- Common, minimal duplication.
- When you think the benefit outweighs potential ranking concerns. Consider your cost of fixing the duplicate content situation vs. the benefit you would receive.
- Remember: duplication is common and search engines can handle it.
Ways to indicate your preferred version of content to Google:
- Set preferred domain in Google webmaster Tools
- Include your preferred URLs in your sitemap
- Always use your preferred URLs when linking to your site
- Use 301 redirects
- Use rel=”canonical”
- Google’s “Parameter handling” tool (now has default values)
Google doesn’t recommend blocking duplicate URLs with robots.txt because if they can’t crawl a URL they have to assume it’s unique. It’s better to let everything get crawled and to clearly indicate which URLs are duplicates. A number of SEOs including Shari, who also spoke on this panel, do not agree with Google on this.
Robots.txt controls crawling, not indexing. Google may index something (because of a link to it from an external site) but not crawl it. That can create a duplicate content issue.
International site duplicate content issues:
- The same ideas in different languages are not considered duplicate content.
- If you have the same content in one language for multiple countries: Consider having one page with the content they share, separate pages for what they don’t such as price, address, etc. Highlight what’s local or unique on each page. Geotarget different folders or sub-domains to indicate that they target different users.
International sites: New tag. rel=”alternate”.
Indicate when a page is a duplicate of another except for the navigation. Check out the Google blog post on this.
Takeaways: Avoid knee jerk reactions to duplicate content. Understand the effects and take informed action. Don’t robots.txt out duplicate URLs, indicate which ones are duplicates. Content matters more than the domain name when it comes to duplicate content signals.
Shari Thurow stepped in last minute to fill in after Susan and talked about duplicate content from a user experience perspective as well as SEO. Essentially, when you fix duplicate content issues, it results in your best pages being delivered to users in the search results.
Having duplicate content lowers your index count which means less pages are available to rank well in the search results. A consequence of that is your best converting pages might not appear for people searching.
Shari says that if you have too many links on a page, then the page doesn’t seem focused on content. If there are too few links to a page, it seems unimportant, so be sure to have a balance and focus on what would be most useful to a reader.
Here’s a (partial) checklist for duplicate content on a website:
- Boilerplate similarity
- Are linkage properties the same or similar (inbound or outbound)?
- Does the same company control the content of both pages?
- Are the same shingles (word sets) used on both pages?
If the answer to these questions is yes, then you have duplicate content.
Duplicate content solutions:
- Information architecture, site navigation, and page interlinking.
- Are URLS linked to consistently, throughout the site or the site netowork?
- Are the links labeled consistently?
- Are you preventing the web page from being spidered with robots.txt?
- If articles are shared across your network of sites are you implementing the rel=canonical tag?
- Redirects 301 from duplicate versions to the original
- Use the rel=nofollow attribute where appropriate. For ecommerce sites, put it on your “add to cart” buttons
- Use webmaster tools and provide the engine a sitemap (Note: don’t robots.txt out a page and then include it in a XML sitemap)
Audience Question: Noticed someone was planning on duplicating an entire site using different language navigation. What to do?
Answer: File a Digital Millennium Copyright Act (DMCA) complaint with Google