How to Prevent Google Search Engine from Crawling Your Joomla URLs Using Robots txt

One of our students was having trouble removing URLs from Google and received this message:

"Your request has been denied because the webmaster of the site hasn't applied the appropriate robots.txt file or meta tags to block us from indexing or archiving this page. Please work with the webmaster of this site or select an alternate removal option from the webpage removal request tool"

So we created this tutorial for him, which shows how to edit Joomla's robot.txt file to block search engines from crawling certain URLs, as well as the whole site if desired.

Access robots.txt in Joomla Root

tutuploadstutuploadsmedia_1297118190482.png

Access your host's file manager, e.g. cPanel, plesk, etc.

In the root of your Joomla installation you will find a robots.txt file which you need to open and edit.

Default robots.txt

By default Joomla's robots.txt file should contain these rules for security measures:

User-agent: *
 Disallow: /administrator/
 Disallow: /cache/
 Disallow: /components/
 Disallow: /images/
 Disallow: /includes/
 Disallow: /installation/
 Disallow: /language/
 Disallow: /libraries/
 Disallow: /logs/
 Disallow: /media/
 Disallow: /modules/
 Disallow: /plugins/
 Disallow: /templates/
 Disallow: /tmp/

Explanation:

  • User-agent: specifies which search engine crawler 
  • asterisk (*): specifies, in this case, that we want to disallow all search engine crawlers
  • Disallow: specifies that we don't want the user-agent to crawl this specific directory
  • pound (#): If you see a pound symbol, it is a comment for people to add clarification. In the subsequent example, I am going to add a few comments for clarification.

How to block

Disallow: /pathto/page.html # blocks just this page Disallow: /pathto/page* # blocks just this page including all suffixes, e.g. .html, .php, etc. Disallow: /pathto/* # blocks all pages under this directory

For example, if you want to block www.yoursite.com/clients/testimonials/business.html use:

Disallow: /clients/testimonials/business.html #

Example:

User-agent: *
 Disallow: /administrator/
 Disallow: /cache/
 Disallow: /components/
 Disallow: /images/
 Disallow: /includes/
 Disallow: /installation/
 Disallow: /language/
 Disallow: /libraries/
 Disallow: /logs/
 Disallow: /media/
 Disallow: /modules/
 Disallow: /plugins/
 Disallow: /templates/
 Disallow: /tmp/
 Disallow: /clients/testimonials/business.html # 

Once you are done, save the robots.txt file.