Skip to main content

What is a robots.txt File?

What is a Screenshot?

A screenshot is a picture of your computer’s monitor screen. It can be the whole screen, just the active window or depending on what you use, it can be just a piece of the viewable area of the computer monitor screen.

What is a robots.txt File?

A robots.txt file is a text file placed in the root of your website or blog that contains instructions for the bots (spiders, search engine bots).
When a bot first arrives at the site it looks for the robots.txt file. If it does not find one it will look for and gather information about all the files on your site. e.g. If Google’s bot (GoogleBot) visits your site and there is not a robots.txt file or instruction in your robots.txt file limiting what it is allowed to look at it will have a look at everything it can find and eventually all it found will be in it’s search results. Not a good thing if there is some stuff you want kept out of the search results.
Even if you have a robots.txt file there are some not so polite bots that will ignore your instructions. For things you want kept away from these bots it is best to put in a password protected area.
You should be aware that anyone can access your robots.txt file. Try it yourself. Type www.domainname.com/robots.txt in the browser address bar and you will see the contents of the site’s robots.txt file. This is why, even if polite bots obey your instructions, it is best to keep private stuff behind password protected folders. Any snoopy person or bot could have a look at what you are trying to hide if they see a folder marked as keep out please.

Purpose of the robots.txt File

The purpose of the robots.txt file is to tell the nice bots which areas of the site you do not want included in their search index.

What Can You Do with a robots.txt File?

Each instruction (and part of) needs to be on a separate line.
No blank lines between the parts of the instruction. The blank line indicates the end of the instruction.
If you wish to pick and choose which files are indexed, it is easier to put those files you do not want indexed in a separate folder with the instruction in the robots.txt file for the bots to stay out of the folder.
Here are a few samples of what instructions you can place in your robots.txt file:

Disallow Indexing of Specific Folders

To disallow the indexing of the contents in a specific folder the instruction is:
User-agent: *
Disallow: /images/
* indicates all bots
/images/ is the name of the folder. Don’t forget the / at the beginning of the folder name and at the end.

Disallow Specific Bots

Maybe there is a specific bot you do not want to index your information.
User-agent: Bot name
Disallow: /
There are lists of User-agents and bad bots if you what to look up a specific User-agent/bot name.

Stop Images Indexed in Image Search

Some people want their images indexed in Google, Bing and Yahoo! image searches for the possible traffic but if you don’t you can let the image bots know this. e.g. For Google image bot
User-agent: Googlebot-Image
Disallow: /

How to Create a robots.txt File

You will need a plain text editor. Something like Notepad (which somes with Windows) or Notepad++ is a plain text editor. Word and other word processing software are not plain text editors.
You will also need a folder on your computer to store this file until you are finished editing it. You have a backup of your site – right? If not, create one! Use FTP software to backup your site. For WordPress we also have specific instructions: Backup WordPress.
  1. Open your plain text editor.
  2. Use File/Save As from it’s top menu bar to navigate to the folder which contains a local copy of your website or blog.
    Make sure you are in the root of the folder, not inside a folder within the website folder.
  3. Name the file robots.txt in the File Name box.
  4. Left click Save to save the file.
    The empty file’s screen becomes active again.
  5. Type in the instructions you wish to have in your robots.txt file (see above for samples).
  6. Save the file when you are done.
    The file can be closed also.
  7. Using FTP software (or your web hosting File Manager function) upload the robots.txt file to the root of your website/blog.
    The root of your website is the folder where your website files are. Sorry can’t be more specific as each web hosting setup is different. If you are not sure which folder, look at your web hosting’s documentation.
    Test you uploaded the robots.txt file to the right spot by opening your browser and typing http://www.yourdomainname.com/robots.txt. If you can see the contents of the file you just created you uploaded it to the right spot.

Testing the robots.txt File

Once you have created a robots.txt file it should be tested that there are no errors in it. Here are a few ways to test the file:
  • Google Webmaster Tools

    Within Google Webmaster Tools under Health/Blocked URLS there is a tool to test your robots.txt file. However you will need a Google account and a Google Webmaster Account to use this tool.
  • Robots.txt Checker

    The Robots.txt Checker testing tool is available to the public. Enter the web address of your robots.txt file in the box provided then click the Check robots.txt button below. On the resulting page it will explain each set of instructions you have entered. At the top of the page it will tell you if you have errors or not. The results also point out what line is incorrect.

Search Engine robots.txt Information

Below are links to two of the search engines’ robots.txt information:

Use the robots.txt File Carefully

Be sure to understand the instructions you are placing in the robots.txt file of your site. A simple mistake could be disasterous for your site.

Comments

Popular posts from this blog

Education of India Part 2

History [ edit ] Main article:  History of education in South Asia The remnants of the library of  Nalanda , built in the 5th century BCE by  Gupta kings . It was rebuilt twice after invasion, first after an invasion from the  Huns  in the 5th century BCE and then after an invasion from the  Gaudas  in the 7th century CE but abandoned after the third invasion by  Turkic invaders  in the 12th century. Takshasila  (in modern-day Pakistan) was the earliest recorded centre of higher learning in India from possibly 8th century BCE, and it is debatable whether it could be regarded a university or not in modern sense, since teachers living there may not have had official membership of particular colleges, and there did not seem to have existed purpose-built lecture halls and residential quarters in Taxila, in contrast to the later Nalanda university in eastern India.  Nalanda  was the oldest university-system of education in the world in the modern sense of university. There al

Save a Workbook in another File Format

  Save a Workbook in another File Format When you save an Excel 2013 Workbook, by default it saves in the  .xlsx  format. Excel 2013 supports saving in other formats, but whenever you save a workbook in another file format, some of its formatting, data, and features might not be saved. File Formats (File Types) that are supported in Excel 2013 − Excel File Formats Text File Formats Other File Formats Excel File Formats Format Extension Description Excel Workbook .xlsx The default XML-based file format for Excel 2007-2013. Cannot store Microsoft Visual Basic for Applications (VBA) macro code or Microsoft Office Excel 4.0 macro sheets (.xlm). Strict Open XML Spreadsheet .xlsx An ISO strict version of the Excel Workbook file format (.xlsx). Excel Workbook (code) .xlsm The XML-based and macro-enabled file format for Excel 2007-2013. Stores VBA macro code or Excel 4.0 macro sheets (.xlm) Excel Binary Workbook .xlsb The binary file format (BIFF12) for Excel 2007-2013. Template .xltx The defa

ORGANIZATIONAL STRUCTURE OF A DEPARTMENT IN THE GOVERNMENT OF INDIA

  ORGANI