wordpress

Block Search Engines from Crawling CDN-Hosted Static Pages

🛡️ Prevent CDN Subdomains from Being Indexed: Protect Your WordPress SEO

Problem: Search engines are indexing CDN subdomains as duplicate content, risking SEO penalties due to “mirrored content” across domains.


🔍 Why This Happens

When using CDN caching with WordPress:

  • CDN domains sharing your origin IP may serve HTML pages directly.

  • Search engines crawl these as separate sites, creating duplicate content.

  • Without static caching, CDNs redirect to the main site—but cached HTML causes the issue.


✅ 4-Step Solution to Block CDN Indexing

1️⃣ Create a Dedicated robots2.txt for CDN

User-agent: *
Allow: /robots.txt
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.jpeg*
Allow: /*.gif*
Allow: /*.bmp*
Allow: /*.ico*
Allow: /*.js*
Allow: /*.css*
Allow: /wp-content/*
Disallow: /

What it does:

  • ✅ Permits crawling of static assets (images, JS, CSS).

  • 🚫 Blocks all other content to prevent CDN mirroring.


2️⃣ Nginx: Redirect robots.txt Requests

Add to server config:

# Redirect ALL non-primary domains to robots2.txt
if ($http_host != "www.yourdomain.com") {
    rewrite ^/robots\.txt$ /robots2.txt last;
}

Why: Ensures only your main domain uses the standard robots.txt; CDN/subdomains use the restrictive version.


3️⃣ Apache: Implement via .htaccess

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com$ [NC]
RewriteRule ^robots\.txt$ robots2.txt [L]

Note: Replace yourdomain.com with your actual domain.


4️⃣ Critical Verification

Test immediately after setup:

  1. Visit cdn.yourdomain.com/robots.txt → Should show robots2.txt content.

  2. Visit www.yourdomain.com/robots.txt → Should show your standard rules.
    🚨 Failure = SEO disaster: Incorrect blocking can hide your entire site from search engines!


📌 Key Recommendations

Action Purpose Risk if Ignored
Separate robots2.txt for CDN Allow static assets + block HTML Duplicate content penalties
Strict host-based redirects Isolate CDN vs. main domain Search engines index mirror sites
Post-setup validation Confirm correct behavior Accidental site-wide blocking

💡 Pro Tips

  • Use DNS CNAMEs: Point cdn.yourdomain.com to your CDN provider—don’t resolve to origin IP.

  • Cache-Control Headers: Set Cache-Control: public only for static assets, private for HTML.

  • Monitor Search Console: Check “Coverage” reports for unexpected CDN indexing.

✨ Why this works: Search engines treat robots.txt rules as host-specific. By isolating CDN directives, you protect your main domain’s SEO authority while allowing static resource delivery.

Implement this today to maintain SEO integrity in WordPress+CDN environments! 🚀