Blog
Block Search Engines from Crawling CDN-Hosted Static Pages
🛡️ Prevent CDN Subdomains from Being Indexed: Protect Your WordPress SEO
Problem: Search engines are indexing CDN subdomains as duplicate content, risking SEO penalties due to “mirrored content” across domains.
🔍 Why This Happens
When using CDN caching with WordPress:
-
CDN domains sharing your origin IP may serve HTML pages directly.
-
Search engines crawl these as separate sites, creating duplicate content.
-
Without static caching, CDNs redirect to the main site—but cached HTML causes the issue.
✅ 4-Step Solution to Block CDN Indexing
1️⃣ Create a Dedicated robots2.txt for CDN
User-agent: * Allow: /robots.txt Allow: /*.png* Allow: /*.jpg* Allow: /*.jpeg* Allow: /*.gif* Allow: /*.bmp* Allow: /*.ico* Allow: /*.js* Allow: /*.css* Allow: /wp-content/* Disallow: /
What it does:
-
✅ Permits crawling of static assets (images, JS, CSS).
-
🚫 Blocks all other content to prevent CDN mirroring.
2️⃣ Nginx: Redirect robots.txt Requests
Add to server config:
# Redirect ALL non-primary domains to robots2.txt if ($http_host != "www.yourdomain.com") { rewrite ^/robots\.txt$ /robots2.txt last; }
Why: Ensures only your main domain uses the standard robots.txt; CDN/subdomains use the restrictive version.
3️⃣ Apache: Implement via .htaccess
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com$ [NC]
RewriteRule ^robots\.txt$ robots2.txt [L]
Note: Replace yourdomain.com with your actual domain.
4️⃣ Critical Verification
Test immediately after setup:
-
Visit
cdn.yourdomain.com/robots.txt→ Should showrobots2.txtcontent. -
Visit
www.yourdomain.com/robots.txt→ Should show your standard rules.
🚨 Failure = SEO disaster: Incorrect blocking can hide your entire site from search engines!
📌 Key Recommendations
| Action | Purpose | Risk if Ignored |
|---|---|---|
Separate robots2.txt for CDN |
Allow static assets + block HTML | Duplicate content penalties |
| Strict host-based redirects | Isolate CDN vs. main domain | Search engines index mirror sites |
| Post-setup validation | Confirm correct behavior | Accidental site-wide blocking |
💡 Pro Tips
-
Use DNS CNAMEs: Point
cdn.yourdomain.comto your CDN provider—don’t resolve to origin IP. -
Cache-Control Headers: Set
Cache-Control: publiconly for static assets,privatefor HTML. -
Monitor Search Console: Check “Coverage” reports for unexpected CDN indexing.
✨ Why this works: Search engines treat
robots.txtrules as host-specific. By isolating CDN directives, you protect your main domain’s SEO authority while allowing static resource delivery.
Implement this today to maintain SEO integrity in WordPress+CDN environments! 🚀