Robots.txt
Quick Definition
Robots.txt is a text file placed in a website's root directory that instructs search engine crawlers which pages or sections they are allowed or disallowed from crawling. It helps manage crawl budget and protect private content.
Why It Matters
The robots.txt file is a plain text file at the root of every website that tells search engine crawlers which pages they can and cannot access. Getting it wrong can accidentally block your entire site from Google. Every technical SEO role requires you to know how to read and write robots.txt rules.
Real-World Example
If you visit amazon.in/robots.txt in your browser, you can see Amazon India's actual robots.txt file. It tells Google which sections to crawl and which to skip, like admin pages or checkout flows that should not appear in search results.
Signal Connection
Presence -- robots.txt controls whether search engines can even find and index your pages. A misconfigured robots.txt can make your entire site invisible to Google.
Pro Tip
Check your site's robots.txt right now by adding /robots.txt to your domain. Then use Google Search Console's robots.txt Tester to verify it is not accidentally blocking important pages.
Common Mistake
Beginners sometimes add "Disallow: /" to block everything while the site is under development, then forget to remove it before launch. This single line blocks ALL search engines from your entire site. Always double-check robots.txt after going live.
Test Your Knowledge
What does the robots.txt directive "Disallow: /admin/" do?
Show Answer
Answer: B. Asks search engine crawlers not to access the /admin/ section
The Disallow directive in robots.txt is a request to search engine crawlers to not access that URL path. It does not delete anything, add passwords, or create redirects. Note that it is a polite request -- well-behaved crawlers will follow it, but it is not a security measure.