Some of you may already know about Google’s FeedFetcher, but I thought it might be useful to those who’ve seen it listed as a User Agent in their logs but don’t know what it is.
Released in 2005, Google’s Feed Fetcher is a web bot that is activated by a human action.
FeedFetcher allows you to add your feed to the search results for Google Reader and visitors to add your feed to their Google Personalized Homepage.
It’s quite simple to ensure that your feed is part of this index. All you need to do is add a <link> tag to the header of your webpage to enable the feed autodiscovery.
The <link> tag for your Atom feed will look like this:
- <link rel="alternate" type="application/atom+xml" title="Your Feed Title" href="http://www.domain.com/atom.xml" />
Or if you use an RSS feed your <link> tag will look like this:
- <link rel="alternate" type="application/rss+xml" title="Your Feed Title" href="http://www.domain.com/rss.xml" />
If you are using blogger.com these links will be added for you automatically, which has implications if you are using a private blog. Jessica Cutler is facing a lawsuit of $20 million after her "Washingtonienne" blog featuring lurid details of her sex life with six law makers was discovered by a Washington journalist and made public.
Also bear in mind that Feedfetcher will retrieve feeds at the request of human users who have added them to their Google Personalized Homepage or Google Reader. Unfortunately because it acts as a direct agent for human users it will ignore your robots.txt file.
So is Google FeedFetcher trying to download files from your “secret” server? If you have a problem like this, it is likely that a request came from a user who knows about your private blog, or it could have been typed it in by mistake. Google do offer a variety of ways to prevent this, for more details you can read their detailed instructions.
An advantage to Google FeedFetcher is that it conserves bandwidth by making requests for common feeds only once for multiple users. How often does Google’s FeedFetcher retrieve your feed? Well apparently on average it shouldn't retrieve more than once every hour. More frequently updated sites may be refreshed more often.
As to be expected Google FeedFetcher’s IP address changes every so often. So if you want to filter your log files you can identify their user-agent at:
(+http://www.google.com/feedfetcher.html)
For any questions on Google FeedFetcher, you may leave a comment here.
More information can also be fournd here: Google FeedFetcher