2/24/2007 12:15 PM (Bankok)
Posted by Dan Crow, Product ManagerThis is the second in a short series of posts about the Robots Exclusion Protocol, the standard for controlling how web pages on your site are indexed. This post provides more details and examples of mechanisms to control access and indexing of your website by Google.In the first post in this series, I introduced robots.txt and robots META tags, giving an overview of when to use them. In this post, I'll look at some examples of the power of the protocol. These examples illustrate the detailed and fine-grain control online publishers have over how their websites are indexed.
Preventing Google bot from following a link
Usually when the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot "crawls" the web. This is useful as it allows Google to include all the pages on your site, as long as they are linked together. Let's say you run the TheHighsteadPost.com website. Here's a map of part of the site:

When Googlebot crawls the index.html file, it finds the links to breakingnews.html and articles.html. From breakingnews.html, it can find valentinesday.html and promnight.html and so on.What if you didn't want valentinesday.html and promnight.html appearing in Google's index? The articles in the Breaking News section may only appear for a few hours before being updated and moved to the Articles section. In this case you want the full articles indexed, not the breaking news version. You could put the NOINDEX tag on both those pages. But if the set of pages in the Breaking News section changed frequently, it would be a lot of work to continually update the pages with the NOINDEX tag and then remove it again when they moved into the articles section. Instead, you can add the NOFOLLOW tag to the breakingnews.html page. This tells the Googlebot not to follow any links it finds on that page, thus hiding valentinesday.html and promnight.html and any other pages linked from there. Simply add this line to the section of breakingnews.html: However, there is an important caveat to NOFOLLOW that you should know about. It only stops Google from following links from one page to another. If one of the linked pages is also linked from somewhere else, Google can still find and index that page via that other link. For example if promnight.html is also linked from HighsteadCourier.com, Google can still find and index promnight.html when it indexes HighsteadCourier.com and follows the link from there to promnight.html.Using NOFOLLOW is generally not the best method to ensure content does not appear in our search results. Using the NOINDEX tag on individual pages or controlling access using robots.txt is the best way to achieve this.
Controlling Caching and SnippetsThe Robots Exclusion Protocol allows you to specify, to some extent, how you would like your web pages should appear in Google's search results. Usually search results show a cached page link and a snippet, two features that our users tell us are very useful. Here, for example, is the first result I got when I searched for "Mallard duck":
The snippet is the extract of text from the web page, in this case it starts "The mallard duck is found mostly in North America...". We know from user studies that users are more likely to visit your site if the search results show the snippet. Why? Because snippets make it much easier for users to see why the result is relevant to their query. If a user isn't able to make this determination quickly, he or she usually moves on to the next search result.Underneath the snippet is the URL of the page followed by the "cached" link. Clicking on this link takes you to a copy of the page stored on Google's servers. This is useful in a number of cases: for sites that are temporarily unavailable; for news sites that get overloaded in the aftermath of a major event, for example, 9/11; for sites that are accidentally deleted. Another advantage is that Google's cached copy highlights the words a person searched for, allowing them to quickly see how the page is relevant to their query.Usually you want Google to display both the snippet and the cached link. However, there are some cases where you might want to disable one or both of these. For example, say you were a newspaper publisher, and you have a page whose content changes several times a day. It may take longer than a day for us to reindex a page, so users may have access to a cached copy of the page that is not the same as the one currently on your site. In this case, you probably don't want the cached link appearing in our results.Again, the Robots Exclusion Protocol comes to your aid. Add the NOARCHIVE tag to a web page and Google won't cache copy of a web page in search results: Similarly, you can tell Google not to display a snippet for a page. The NOSNIPPET tag achieves this: Adding NOSNIPPET also has the effect of preventing a cache link from being shown, so if you specify NOSNIPPET you automatically get NOARCHIVE too.
Learn more As usual the Google Webmaster Help pages have a lot of useful information:
More on Googlebot and robots.txt
Our robots.txt analysis tool
Next time... The final post in this series will take some common exclusion problems that webmasters have told us about and show how to solve them using the Robots Exclusion Protocol.
Labels: Googlebot, search index, webmaster tools
Permalink Links to this post
You never know what you'll need to know
2/22/2007 10:40:00 AM
Posted by Anita Yuen, Senior Product Marketing ManagerWhen people share their stories about how Google search has made a difference in their lives, we know we're doing our job. It's also given us an opportunity to learn about the breadth of information that you can find on Google. Such as how to find a lost tortoise -- as Jim Lyness did. Here's his story:
"After Christmas, my son Sam wanted a turtle. We bought a Russian Tortoise instead and named him Rocky. Well, one day, we let Rocky out for a stroll around the house. We could not find him that night and into the afternoon the following day. After the boys went to school, my wife, Susan, and I were stumped. Did Rocky get out the front door? My wife told me I was crazy. Susan googled [how to find a Russian Tortoise] and bang -- we had a game plan. Russian Tortoises like warm, dark spaces. We started in the boys' bedroom, again. We pulled the bunk bed back and there was Rocky at the head of the bed. Case solved. When we tell friends and family about googling How to Find a Russian Tortoise, they bust a gut in laughter!If you have a story about how Google search has made an impact on you, we'd love to hear it. Tell us here or post a video (be sure to tag it "google testimonial"). You never know when you'll need to search for a lost pet.p.s. While our lawyers may not be happy with Jim's use of 'googled' and 'googling', we are very pleased that Jim and Susan were able to find what they needed by searching on Google.
Labels: search
Permalink Links to this post
Google Apps grows up
2/22/2007 07:13:00 AM
Posted by Derek Parham, Lead Software Engineer, Google AppsBack in 2005, Google Apps was conceived in a few lines of code, and then it was born in February 2006. Our team has had such a close relationship with it, you might understand how we have nurtured it as we would a child.So first there was Gmail for your domain -- a limited service that helped organizations like San Jose City College offer personalized Gmail inboxes to all their users. As our little guy picked up new skills (Calendar, Talk, and Page Creator) it grew out of its old name and into Google Apps for Your Domain.A quick learner, by October Google Apps had perfected 17 more languages, so we could help bring our communication tools all around the globe. Later in the fall, we improved our organizational skills with the Start Page, which brought all the Apps together into a centralized place. Then it was time to start school. Google Apps entered Arizona State University and stood out as one of those high achievers. Today, students and administrators at large universities like ASU and Lakehead are raving about Apps -- how it saves money and IT resources, plus make students lives easier with bigger spam-free mailboxes and a set of tools for working together.Now, I'm excited to tell you that our baby has finally graduated and is entering the business world. Google Apps Premier Edition is a new version designed to take on all the challenges presented by businesses with complex IT needs. For $50 per account per year, you get the whole Google Apps package plus many new business-oriented features, including access to our APIs and partner solutions (so it’s easy to integrate with existing systems), conference room scheduling for Calendar, 10GB of inbox storage, extended business hours phone support, and mobile access to your email on BlackBerry devices (just in case you can't get enough at the office).Already, companies big and small, like Procter & Gamble, General Electric Corporation, Prudential, and SF Bay Pediatrics, are talking about how this new version of Google Apps makes it easy to offer low-cost communication and collaboration tools to all their employees so they can get on with what they do best.Google Apps also won't forget its roots anytime soon. The Standard and Education Editions will continue to be offered for free, and we'll keep working on all three flavors of Google Apps with the help of feedback from all of you. As a start, we’ve just integrated Google Docs & Spreadsheets in all three editions so that everyone can share and edit documents online. Since August, we’ve also added five more major features you've requested, including customized service URLs (mail.yourcompany.com) and domain registration for organizations that don’t yet have a custom domain. Our appearance has matured too, with updates to the administrator control panel that make it easier to setup and manage your services.
Labels: Google Apps
--------------------------------------------------------------------------------------
From Gmail with <3
2/14/2007 12:08:00 PM
Posted by David Murray, Associate Product ManagerGmail sign-ups are now open worldwide! No more waiting for someone to invite you—just create an account directly at http://www.gmail.com/. What better way to share the love with the people you care about than with Gmailchatwith. And Gmail is available in over 40 interface languages (though <3 href="http://www.youtube.com/watch?v=_YUugB4IUl4">Check out our 4-part video
No comments:
Post a Comment