Ticket #4205 (new defect)

Opened 5 months ago

Last modified 1 month ago

Add robots.txt

Reported by: bettse Assigned to: cedenoj
Priority: major Milestone: Elgg1.5
Component: Trac Version:
Keywords: Cc:

Description

After the migration to the OSL, we removed our block on Google crawling us. To be on the safe side, we should have a robots.txt to manage what gets crawled. Here is a sample robots.txt given to us by the OSL (we'll need to modify the list for our projects)

User-Agent: *
Crawl-delay: 15

# Disallow crawling of any gitplugin based code browser
Disallow: /artool/browser/
Disallow: /bouncer/browser/
Disallow: /donations/browser/
Disallow: /helpdesk-dev/browser/
Disallow: /maintain/browser/
Disallow: /producer-edit/browser/
Disallow: /producer-record/browser/
Disallow: /raiv/browser/
Disallow: /teachengineering/browser/
Disallow: /watch-listen/browser/
Disallow: /unify/browser/

# Disallow crawling of any gitplugin based changeset
Disallow: /artool/changeset/
Disallow: /bouncer/changeset/
Disallow: /donations/changeset/
Disallow: /helpdesk-dev/changeset/
Disallow: /maintain/changeset/
Disallow: /producer-edit/changeset/
Disallow: /producer-record/changeset/
Disallow: /raiv/changeset/
Disallow: /teachengineering/changeset/
Disallow: /watch-listen/changeset/
Disallow: /unify/changeset/

Change History

07/07/09 11:27:04 changed by bettse

Although the standard robots.txt doesn't support wildcards in Disallow statements, according to this post: http://www.seobook.com/archives/001329.shtml the GoogleBot? might understand them, which we may want to make use of since it was the crawling we specifically blocked.

10/01/09 11:01:25 changed by cedenoj

run this by the osl before adding it in.

10/14/09 23:35:21 changed by cedenoj

  • owner set to cedenoj.