Blog
Block Search Engines from Crawling CDN-Hosted Static Pages

🛡️ Prevent CDN Subdomains from Being Indexed: Protect Your WordPress SEO
Problem: Search engines are indexing CDN subdomains as duplicate content, risking SEO penalties due to “mirrored content” across domains.
🔍 Why This Happens
When using CDN caching with WordPress:
-
CDN domains sharing your origin IP may serve HTML pages directly.
-
Search engines crawl these as separate sites, creating duplicate content.
-
Without static caching, CDNs redirect to the main site—but cached HTML causes the issue.
✅ 4-Step Solution to Block CDN Indexing
1️⃣ Create a Dedicated robots2.txt
for CDN
User-agent: * Allow: /robots.txt Allow: /*.png* Allow: /*.jpg* Allow: /*.jpeg* Allow: /*.gif* Allow: /*.bmp* Allow: /*.ico* Allow: /*.js* Allow: /*.css* Allow: /wp-content/* Disallow: /
What it does:
-
✅ Permits crawling of static assets (images, JS, CSS).
-
🚫 Blocks all other content to prevent CDN mirroring.
2️⃣ Nginx: Redirect robots.txt
Requests
Add to server config:
# Redirect ALL non-primary domains to robots2.txt if ($http_host != "www.yourdomain.com") { rewrite ^/robots\.txt$ /robots2.txt last; }
Why: Ensures only your main domain uses the standard robots.txt
; CDN/subdomains use the restrictive version.
3️⃣ Apache: Implement via .htaccess
RewriteEngine On RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com$ [NC] RewriteRule ^robots\.txt$ robots2.txt [L]
Note: Replace yourdomain.com
with your actual domain.
4️⃣ Critical Verification
Test immediately after setup:
-
Visit
cdn.yourdomain.com/robots.txt
→ Should showrobots2.txt
content. -
Visit
www.yourdomain.com/robots.txt
→ Should show your standard rules.
🚨 Failure = SEO disaster: Incorrect blocking can hide your entire site from search engines!
📌 Key Recommendations
Action | Purpose | Risk if Ignored |
---|---|---|
Separate robots2.txt for CDN |
Allow static assets + block HTML | Duplicate content penalties |
Strict host-based redirects | Isolate CDN vs. main domain | Search engines index mirror sites |
Post-setup validation | Confirm correct behavior | Accidental site-wide blocking |
💡 Pro Tips
-
Use DNS CNAMEs: Point
cdn.yourdomain.com
to your CDN provider—don’t resolve to origin IP. -
Cache-Control Headers: Set
Cache-Control: public
only for static assets,private
for HTML. -
Monitor Search Console: Check “Coverage” reports for unexpected CDN indexing.
✨ Why this works: Search engines treat
robots.txt
rules as host-specific. By isolating CDN directives, you protect your main domain’s SEO authority while allowing static resource delivery.
Implement this today to maintain SEO integrity in WordPress+CDN environments! 🚀