Google Algorithm Problems – Have you noticed anything different with Google lately?

The Webmaster community certainly has, and if recent discussions on several SEO forums are any indication, Webmasters are very frustrated.

For approximately two years, Google has introduced a series of algorithm and filter changes that have led to unpredictable search engine results. Many clear (non-SPA) websites have been dropped from the rankings. Google updates used to be monthly, but they are now quarterly.

Now, with so many servers, several different search engine results can be rolling through the servers at any time during a quarter. Part of this is the recent Big Daddy update, which is more of a Google infrastructure update than an algorithm update.

We believe Big Daddy uses a 64-bit architecture. Pages seem to drop from a first-page ranking to a spot on the 100th page, or worse, to the Supplemental index. Google algorithm changes began in November 2003 with the Florida update, which is now regarded as a legendary event in the Webmaster community. Then came updates named Austin, Brandy, Bourbon, and Jagger. Now we’re dealing with Big Daddy!

The algorithm problems seem to fall into four categories. There are canonical issues, duplicate content issues, the Sandbox, and supplemental page issues.

1. Canonical Issues: These occur when a search engine treats www.yourdomain.com, yourdomain.com, and yourdomain.com/index.html all as different websites. When Google does this, it then flags the other copies as duplicate content and penalizes them.

Also, if the site is not penalized and has an http://yourdomain.com address, but all the websites linking to your site use www.yourdomain.com, then the version without the www will have no ranking. These are fundamental issues that other major search engines, such as Yahoo and Bing, have no problem dealing with. Google is possibly the most excellent search engine in the world, ranking itself a 10 on a scale of 1 to 10. They provide tremendous results for a wide range of topics, yet they cannot resolve some fundamental indexing issues.

2. The Sandbox: This has become a legend in the search engine world. It appears that websites, or links to them, are “sandboxed” for a period before they are given full rank in the index, much like a maturing process.

Some even think it is only applied to a set of competitive keywords, because they were the ones being manipulated the most. The ‘sexistence of The Sandbox is debated, and Google has never officially confirmed it. The hypothesis behind the Sandbox is that Google knows it takes someone time to create a 100,000-page website, so they have implemented a type of time penalty for new links and sites before entirely adding them to the index.

3. Duplicate Content Issues: These have become a significant issue on the Internet. Because web pages drive search engine rankings, black hat SEOs (search engine optimizers) began duplicating entire sites’ content under their domain name, thereby instantly creating a large number of web pages (an example of this is downloading an Encyclopedia onto your website).

As a result of this abuse, Google aggressively attacked duplicate content abusers with its algorithm updates. But in the process, they knocked out many legitimate sites as collateral damage. One example occurs when someone scrapes your website. Google sees both sites and may determine that the legitimate one is a duplicate.

About the only thing a Webmaster can do is track down these sites as they are scraped, and submit a spam report to Google. Another big issue with duplicate content is that it has many legitimate uses. News feeds are a prime example. Many websites cover a news story because it is content that the viewers want. Any filter will inevitably catch some legitimate uses.

4. Supplemental Page Issues: Webmasters fondly refer to this as Supplemental Hell. This issue has been reported on sites like WebmasterWorld for over a year, but a significant shakeup around February 23rd has led to a massive outcry from the Webmaster community.

This recent shakeup was part of the ongoing Big Daddy rollout, which is expected to finish this month. This issue remains unclear, but here’s what we know. Google has two indexes: the Main index, which you get when you search, and the Supplemental index, which contains old pages, those that are no longer active, or those with errors.

The Supplemental index is a type of graveyard where web pages go when they are no longer deemed active.

No one disputes the need for a Supplemental index. The problem, though, is that active, recent, and clean pages have been showing up in the Supplemental index. Like a dungeon, once they go in, they rarely come out.

This issue has been reported with a low noise level for over a year, but the recent February incident has sparked a lot of discussion around it. There isn’t much we know about this issue, and no one seems to be able to identify a common cause.

Google updates were once fairly predictable, with monthly updates that Webmasters anticipated with both joy and angst.

Google followed a well-publicized algorithm that assigned a PageRank to each website, a number based on the number and rank of other webpages linking to it. When someone searches for a term, all the web pages deemed relevant are then ordered by their PageRank.

Google uses several factors, including keyword density, page titles, meta tags, and header tags, to determine which pages are relevant.

This original algorithm favored incoming links and their anchor text. The more links you have with an anchor text, the better you rank for that keyword. As Google gained the bulk of internet searches in the early part of the decade, ranking well in their engine became highly coveted.

Add to this the release of Google’s AdSense program, and it became very lucrative. If a website could rank high for a popular keyword, it could run Google ads under AdSense and split the revenue with Google!

This combination led to an avalanche of SEOing like the Webmaster world had never seen.

The whole nature of links between websites changed. Websites used to link to one another because they provided good information for their visitors. But now, a link to another website could reduce your search engine rankings, and if it’s a link to a competitor, it might boost theirs.

In Google’s algorithm, links coming into your website increase the site’s PageRank (PR), while links from your web pages to other sites reduce your PR. People started creating link farms, engaging in reciprocal link partnerships, and buying and selling links.

Web admins started linking to each other for mutual ranking benefits or financial gain, rather than providing quality content for their visitors. This also led to the wholesale scraping of websites. Black hat SEOs will take the entire content of a website, place Google’s ad on it, acquire a few high-powered incoming links, and the next thing you know, they are ranking high in Google and generating revenue from Google AdSense without providing any unique website content.

Worse yet, as Google tries to go after this duplicate content, they sometimes get the real company instead of the scraper. This is all part of the cat-and-mouse game that has become the Google algorithm. Once Google realized what was happening, it updated its algorithms to prevent it from happening aggressively.

After all, their goal is to find the most relevant results for their searchers. At the same time, they also faced massive growth with the internet explosion. This has led to a period of unstable updates, causing many top-ranking websites to disappear, while spam and scraped websites remain.

Despite Google’s efforts, every change seems to catch more high-quality websites. Many spam sites and websites that violate Google’s guidelines are seen, but an endless tide of new spam websites takes their place.

Some people might believe that this is not a problem.

Google is there to provide the best, most relevant listings for what people are searching for, and, for the most part, users have not noticed any issues with Google’s listings. If they only drop thousands of listings out of millions, then the results are still excellent. These problems may not be affecting Google’s bottom line now, but having a search engine that cannot be evolved without producing unintended results will hurt them over time in several ways.

First, as the competition from MSN and Yahoo grows, having the best results will no longer be a given, and these drops in quality listings will hurt them.

Next, to stay competitive, Google will need to continue to change its algorithms. This will be harder if they cannot make changes without producing unintended results. Finally, losing the Webmaster community’s faith will make them vulnerable to competition. Web admins provide Google with two things.

They are the word-of-mouth experts. Also, they run websites that use Google’s AdSense program. Unlike other monopolies, it is easy to switch search engines. People might also criticize Webmasters for relying on a business model that requires free search engine traffic. Fluctuations in ranking are part of the internet business, and most Webmasters realize this. Webmasters are simply asking Google to fix bugs that cause unintended issues with their sites.

Most Webmasters may blame ranking losses on Google and their bugs.

However, the truth is that many Webmasters do violate some of the guidelines that Google outlines. Most consider it harmless to bend the rules a little, and assume this is not the reason their websites have issues. In some cases, though, Google is right and has just tweaked its algorithm in the right direction.

Here’s an example: Google appears to be monitoring the incoming links to your site to ensure they don’t use the exact anchor text (the text used in the link on the website that links to you). If too many links use the exact anchor text, Google discounts these links.

Some people originally did this to inflate their rankings. Other people did it because one anchor text usually makes sense. This is not really a black hat SEO trick, and it is not mentioned in Google’s guidelines, but it has caused some websites to lose their rank.

Web admins realize that Google needs to fight spam and black hat SEO manipulation.

And to their credit, there is a Google Engineer named Matt Cutts who has a Blog site and participates in SEO forums to assist Webmasters. But given the revenue impact that Google rankings have on companies, Webmasters would like to see even more communication around the known issues, and help with identifying future algorithm issues.

No one expects Google to reveal its algorithm or the changes it is making. Rumors on the forum boards speculate that Google is currently considering factors such as the age of the domain name, websites on the same IP address, and the frequency of fresh content.

It would be nice from a Webmaster standpoint to be able to report potential bugs to Google and get a response. It is in Google’s best interest to have a bug-free algorithm. This will, in turn, provide the best search engine results for everyone.

Post Excerpts from Rodney Ringler

Newer Older