Affiliate Marketing Tutorial 41 | Robots.txt – Telling The Search Engines What They Index
When your site is indexed by the search engines, it is “crawled” by the search engine spiders – GoogleBot, Yahoo Slurp, Bingbot – in order to find all the content on your site, so that other people can find it.
But what if you’ve got sections of your website that you don’t want indexed? The bots dumbly index whatever they can find – they don’t know that, for example, those photos on the hidden part of your site are strictly friends and family only, or that there are certain pages in your website that you’d really rather not have popping up in the search engine listings or being archived by that pesky internet archive bot — like your long-expired special offers. In this lesson we look at robots.txt – telling the search engines what they can and cannot index.
What is the robots.txt file?
Robots.txt is a small text document that lives in the root of your website and tells the “robots” visiting your website which pages they can and cannot access. When one of these “robots” visits your site, the first thing they do is go looking for the robots.txt file. They listen to your requests, and won’t visit pages that you’ve disallowed.
How do you make a robots.txt file?
Decide which areas of your website you want the spiders to index, and which ones you don’t want them crawling through. And decide if there are any bots you would rather not have crawling through your site.
Open up your plaintext editor of choice, create a new, blank text file and save it as robots.txt, then write this information into the file:
To block all spiders from your entire website:
User-agent: * Disallow: /
To let all spiders see all content on your site:
User-agent: * Disallow:
To block certain directories:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /personal/ Disallow: /photos/staffchristmasparty/
To block a certain spider:
User-agent: Googlebot Disallow: /
To allow a certain spider, while blocking others:
User-agent: Googlebot Disallow: User-agent: * Disallow: /
- You must use a new line for each instruction.
- Blank lines are used to show separate groups of instructions (as in the last example).
- The asterisk in the User-agent line has a special meaning in robots.txt and can’t be used as a wildcard; if you wanted to disallow all GIF images on your website, you couldn’t just can’t just go Disallow: *.gif – that won’t work.
- Your file must be called robots.txt, all in lower-case.
- Your file must be located in the root directory of your website: www.yoursite.com/robots.txt. That’s where the spiders look when they visit your site, and they won’t find it if you put it anywhere else.
Now simply save your file and upload it to your website.
Robots.txt and your XML sitemap
If you’ve seen our lesson on creating XML sitemaps, you’ll know that your robots.txt file is a really handy place to let the search engines know where that is.
All you have to do is leave a blank line after the last command in your robots.txt file, and then paste this little line:
If you’ve got more than one sitemap, you can enter more than one line.
Sitemap: <http://www.example.com/sitemap1.xml> Sitemap: <http://www.example.com/sitemap2.xml> Sitemap: <http://www.example.com/sitemap3.xml>
This way you don’t need to specifically tell each and every search engine where they can find your sitemap. They’ll see it as soon as they look for your robots.txt file, which every polite bot will do when they visit your site anyway.
Things you need to know
Not all spiders honor robots.txt
“Polite” spiders, such as those belonging to the major search engines, are polite and won’t index items you’ve listed in your robots.txt file. However, not all robots are polite (for example, from smaller search engines, or general data scraping bots), so they will collect any and all content anyway.
Your robots.txt is publicly accessible!
Don’t try to use your robots.txt file to hide content on your site – the robots.txt file is able to be viewed by anybody, simply by typing www.yoursite.com/robots.txt into their browser, so anybody can see the things you’ve said you don’t want indexed!
If there’s content on your website that you really, really don’t want anybody else seeing, your best bet is to password-protect that directory. There will usually be a tool to help you do this in your hosting control panel (cPanel or similar). Note that password-protecting your comment (if done right) will also prevent the “unpolite” bots from accessing the content
In this lesson we’ve looked at robots.txt – what it is, what it’s used for, and how to create one. We’ve looked at certain things you can do with robots.txt including:
- Blocking your entire site from indexing
- Blocking certain directories
- Blocking certain bots
- Identifying the location of the sitemap
Learn something new? Share it with your friends!
How To Make Money Online With Affiliate Marketing- Discover How You Can Make Over $900 This Week
How To Make Money Online With Affiliate Marketing- Real Life Case Study Made A $468.97+/ Month
How To Make Money Online- Bank An Easy $330+/Mo Passive Income From EACH Of Tiny $10 Investments
Social Media Manager Job | Facebook Jobs | Twitter Jobs| This Single Mother Makes Over $700 per Week
Affiliate Marketing For Beginners Guide- Make $121.71 Per Day With FREE Traffic, Plus FREE Software
How To Make Money Online With Facebook – Youtube | Software For Affiliate Marketing | Work From Home Jobs
How To Make Money On Facebook- OCTOSUITE | Facebook Tool Software- Best Way To Make Money From Home
How To Increase Website Traffic- Free Traffic Source Up to 16,364 Targeted Visitors Per Day
Make Money Online With Affiliate Marketing | Clickbank- Just Follow These 3 Simple Steps, I Have Make $200 In 20 Minutes Daily
How To Make Money Online With Affiliate Marketing | Clickbank | $1k/day With Free Traffic Keywords
Make Money Online Affiliate Marketing- Set Your Instagram and Bring In Traffic, Leads & Sales Today
Learn How To Make Money With JVZoo | JVZoo Academy | Make Money Online With Affiliate Marketing
Complete ConvertKit Tutorial 2017 | Build Email List Fast | Best Email Marketing Services
Affiliate Marketing For Beginners | Clickbank Tutorial | Make $2,293.26 Per Day From My Laptop
Make Money Online Affiliate Marketing For Beginners |Clickbank |$2,228 Weekly By These 4 Simple Step
Make Money Online From Home | FREE DotComSecrets Book | Underground For Growing Your Company Online
Ship Your FREE Book DotCom Secrets Now | Growing Your Business Online | Make Money Online From Home
Affiliate Marketing For Beginners | Make Easy Money Online $900 This Week With Affiliate Marketing
Affiliate Marketing For Beginners | Fastest Way To Make Money Online Affiliate Marketing | Clickbank
Stop Anxiety and Panic Attacks Fast – Barry McDonagh | Natural Treatment, Overcome Panic and Anxiety