Matt Cutts Duplicate Content
Search Engine Optimization Dec 22, 2006
by: Matt Cutts: Gadgets, Google, and SEO
I woke up early on Thursday and was at the convention center by 8am to check on my backpack and laptop. It was still tucked away under a table, untouched. Whew! I hunkered down in the speaker room and started working on my slides. (I hate seeing the same presentations over and over again at conferences, so I always try to make a new presentation for each show.)
By 9am, I was still really behind, so I decided to skip Danny’s keynote and kept chugging. I missed a few other sessions on Thursday, but I figured it was worth it to be prepared.
Around 1:30, Brett had signed me up for a site review panel with Greg Boser, Tim Mayer, Danny Sullivan, and Todd Friesen, with Jake Baillie moderating. The idea behind a site review panel is that people volunteer their sites and get advice on how to do things better. We discussed a promotional gifts company, the Dollar Stretcher site, a real estate company in Las Vegas, a chiropractic doctor, a real estate licensing company, a computer peripheral site, a hifi sounds store, and a day spa in Arizona. In his PubCon recap, Barry said I ripped on sites. For most of the sites I tried to give positive advice, but I wasn’t afraid to tell people of potential problems.
Once again, I sat on the end and had my wireless and VPN working so that I could use all of my Google tools. The promotional gifts company had a couple issues. For one thing, I was immediately able to find 20+ other sites also belonged to the promotional gifts person. The other sites offered overlapping content and overlapping pages on different urls. The larger issue was searching for a few words from a description quickly found dozens of other sites with the exact same descriptions. We discussed the difficulty of adding value to feeds when you’re running lots of sites. One thing to do is to find ways to incorporate user feedback (forums, reviews, etc.). The wrong thing to do is to try to add a few extra sentences or to scramble a few words or bullet points trying to avoid duplicate content detection. If I can spot duplicate content in a minute with a search, Google has time to do more in-depth duplicate detection in its index.
Next up was the Dollar Stretcher. I’d actually heard of this site before (stretcher.com), and everything I checked off-page and on-page looked fine. The site owner said that Google was doing fine on this site, but Yahoo! didn’t seem to like it. Greg Boser managed to find a sitemap-type page that listed hundreds of articles, but in my opinion it only looked bad because the site has been live since 1996 and they had tons of original articles that they had written over the years. So how should a webmaster make a sitemap on their site when they have hundreds of articles? My advice would be to break the sitemap up on your pages. Instead of hundreds of links all on one page, you could organize your articles chronologically (each year could be a page), alphabetically, or by topic. Danny and Todd noticed a mismatch between uppercase url titles on the live pages and lowercase url titles according Yahoo!’s Site Explorer, and a few folks started to wonder if cloaking was going on. The site was definitely hitting a “this is a legit site” chord for me, and I didn’t think they were cloaking, so I checked with a quick wget and also told Googlebot to fetch the page. It all checked out–no cloaking going on to Google. I gently tried to suggest that it might be a Site Explorer issue, which a few people took as a diss. No diss was intended to Site Explorer; I think it’s a fine way to explore urls; I just don’t think stretcher.com was trying to pull any tricks with lowercasing their titles to search engines (not that cloaking to lowercase a title is something that it would help a site anyway).
Holy crawp this is taking forever to write. Let me kick it into high gear. The real estate site was functional (~100-ish pages about different projects + 10 about us/contact sort of pages) about Las Vegas real estate, but it was also pretty close to brochureware. There was nothing compelling or exciting about the site. I recommended looking for good ways to attract links: surveys, articles about the crazy construction levels in Vegas, contests–basically just looking at ways to create a little buzz, as opposed to standard corporate brochureware sites. Linkbait doesn’t have to be sneaky or cunning; great content can be linkbait as well, if you let people know about it.
The chiropractor site looked good. Danny Sullivan made some good points that they wanted to show up for a keyword phrase, but that phrase didn’t occur on the site’s home page. Think about what users will type (and what you want to rank for), and make sure to use those words on your page. The site owner was also using Comic Sans, which is a font that a few people hate. I recommended something more conservative for a medical story. Greg Boser mentioned to be aware of local medical associations and similar community organizations. I recommended local newspapers, and gave the example that when my Mom shows up with a prepared article for her small hometown newspaper about a charity event, they’re usually happy to run it or something close to it. Don’t neglect local resources when you’re trying to promote your site.
My favorite for the real estate licensing site is that in less that a minute, I was able to find 50+ other domains that this person had–everything from learning Spanish to military training. So I got to say “Let us be frank, you and I: how many sites do you have?” He paused for a while, then said “a handful.” After I ran through several of his sites, he agree that he had quite a few. My quick take is that if you’re running 50 or a 100 domains yourself, you’re fundamentally different than the chiropractor with his one site: with that many domains, each domain doesn’t always get as much loving attention, and that can really show. Ask yourself how many domains you have, and if it’s so many that lots of domains end up a bit cookie-cutter-like.
Several times during the session, it was readily apparent that someone had tried to do reciprocal links as a “quick hit” to increase their link popularity. When I saw that in the backlinks, I tried to communicate that 1) it was immediately obvious to me, and therefore our algorithms can do a pretty good job of spotting excessive reciprocal links, and 2) in the instances that I looked at, the reciprocal links weren’t doing any good. I urged folks to spend more time looking for ways to make a compelling site that attract viral buzz or word of mouth. Compelling sites that are well-marketed attract editorially chosen links, which tend to help a site more.
The computer peripheral site had a few issues, but it was a solid site. They had genuine links from e.g. Lexar listing them as a place to buy their memory cards. When you’re a well-known site like that, it’s worth trying to find even more manufacturers whose products you sell. Links from computer part makers would be pretty good links, for example. The peripheral site had urls that were like /i-Linksys-WRT54G-Wireless-G-54Mbps-Broadband-Router-4-Port-10100Mbps-Switch-54Mbps-80211G-Access-Point-519
, which looks kinda cruddy. Instead of using the first 14-15 words of the description, the panel recommended truncating the keywords in the url to ~4-5 words. The site also had session ID stuff like “sid=te8is439m75w6mp” that I recommended to drop if they could. The site also had product categories, but the urls were like “/s-subcat-NETWORK~.html”. Personally, I think having “/network/” and then having the networking products in that subdirectory is a little cleaner.
The HiFi store was fine, but this was another example where someone had 40+ other sites. Having lots of sites isn’t bad, but I’ve mentioned the risk that not all the sites get as much attention as they should. In this case, 1-2 of the sites were stuff like cheap-cheap-(something-related-to-telephone-calling).com. Rather than any real content, most of the pages were pay-per-click (PPC) parked pages, and when I checked the whois on them, they all had “whois privacy protection service” on them. That’s relatively unusual. Having lots of sites isn’t automatically bad, and having PPC sites isn’t automatically bad, and having whois privacy turned on isn’t automatically bad, but once you get several of these factors all together, you’re often talking about a very different type of webmaster than the fellow who just has a single site or so.
Closing out on a fun note, the day spa was done by someone who was definitely a novice. The site owner seemed to have trouble accessing all the pages on the site, so she had loaded a ton of content onto the main page. But it was a real site, and it still ranked well at Yahoo! and Google. The site owner said that she’d been trying to find a good SEO for ages, so Todd guilted the audience and said “Will someone volunteer to help out this nice person for free?” A whole mess of people raised their hand–good evidence that SEOs have big hearts and that this site owner picked the right conference to ask for help.
Okay, that’s a rundown of SEO feedback on some real-world sites, and this post is like a gajillion words long, so I’ll stop now and save more write-ups for later.