Many sites, some large content farms, some smaller sites recently got hit by the Google Algorithm update that is alternatly being called “Farmer”, for obvious reasons and “Panda” internally by Google. The update was intended to weed out many of the thin content mills which crank out hundreds to thousands of pieces of content per day in an attempt to rank for a myriad of long tail keyword phrases, drive organic traffic to the site, and then monetize that traffic, usually through CPM advertising or contextual PPC ads. (see the fantastic graphic below from SEOBook for a history of some of the factors behind the update).
In many cases sites with “good” content where punished while those with thin content, like eHow, seem to have survived. There seem to be a few factors at work which triggered the update where so called “good” content got hit in the algorithm update. I’ve seen evidence of sites being hit where their content has been re-published or even scraped by other sites. Duplicate or near duplicate content within and across sites seems to have had an large impact. Indeed one site which I’ve always considered to have pretty good quality content, AskTheBuilder, got hit hard. This is likely due to the fact that the author publishes his content in exact duplicate form no only on his own site but across several other blogs and a newspaper site. That last one is the killer.
Google does not do a good job figuring out what the “original” version of a piece of content is. Even were an article published on the AskTheBuilder blog first, publishing it on the newspaper site with it’s much more authoritative domain will cause Google to attribute the piece to the news site as the canonical source. Date seems to matter much less at this point as a signal of what location is authoritative, compared to the trust and authority of the site. At the very least, if you must publish duplicate versions of your content, either tell the crawlers to index only one of the versions, or use a cross domain canonical tag.
The other saving grace for some pages and sites seems to actually be social media. At Pubcon Austin 2011 we heard about an undisclosed very large site which had thousands of pages hit by the Farmer / Panda Update and the pages that were saved were actually those where the URL had been tweeted from authoritative Twitter accounts. It seems to be generally only Twitter that can help the rankings of pages at the moment as Google has announced that as of today at any rate, Facebook Likes do not affect search rankings. I’ve seen no proof as yet, but their level of activity in social media may be what helped save eHow. Unlike some others, I do not believe the conspiracy theories that the fix was in and Google gave eHow a pass.
One factor that I have noted that few seem to be talking about is the link to content ratio on the affected pages, particularly where there are few followed authoritative outgoing links. One thing eHow does that many of the sites which were affected do not, is that they limit the number of internal and ad links on their pages to a relatively small number. When I look at some site that didn’t fare so well in the update I see very different patterns. Mahalo for instance has a reasonable number of ad links but huge numbers (sometimes 30-40) internal links per article. Suite101 has the opposite; relatively few internal links but up to 20-30 AdSense links per page. Even AskTheBuilder has a relatively large number of AdSense and a large number of unmasked affiliate links in the content. None of these sites are apt to link out to other sites which are authorities. I cannot help but think that the ratios of internal/ad/affiliate/external links to each other and to the quantity of copy had some affect on which sites lost rankings and which did not. It’s pretty reasonable to assume (IMHO) that Google would assume sites with large numbers of advertising or internal links look “spammier” than those that don’t.
The moral of the story here seems to be: