Web Publishers Should Block AI Bots. Here’s Why & How.
Do you publish a website? AI bots are likely scraping your content without providing you fair value for what they take. It’s time for the open web to fight back.
This simple guide explains:
Why Block AI Bots?
Generative AI breaks the open web’s social contract.
In the past, search engines and platforms sent real human users to websites. Today, they increasingly send AI bots instead.
The bots scrape your content and provide it directly to users on their platforms.
In order for the open web to survive, website owners must demand control and compensation from AI search companies.
A new model for the web is needed and, indeed, one is forming. Many AI companies are already paying hundreds of millions of dollars in AI licensing fees to some larger websites in exchange for unblocking AI bot scrapers. Many more deals, including with Google, are being negotiated in backrooms.
Small and medium websites have so far mostly been left out. But entities like Cloudflare, Tollbit, Fastly, and Really Simple Licensing (“RSL”) are building digital infrastructure and standards that would allow websites of all sizes to participate in this new AI content licensing marketplace.
We have the technology. We just need the AI companies to come to the table.
But, in my opinion, you as a website owner will lack negotiating leverage until you stop giving AI bots your content for free.
And more websites blocking AI bots puts all publishers in a stronger position to demand fair compensation for our content.
For a (much) deeper explanation of all this, add my YouTube video to your watchlist:
If you need more convincing, set aside an hour of your day and watch my video. I promise it’s worth your time if you publish a website for a living.
How to Block AI Bots (3 Steps to Take)
Step 1: Update Your Robots.txt File to Disallow AI Bots
What Is Robots.txt? Robots.txt is a root file on your website that you can use to give instructions to web robots via the Robots Exclusion Protocol.
You can usually see your file by visiting “yourwebsite.com/robots.txt”.
How to Edit Your Robots.txt File:
- WordPress.org:
- If you use an SEO plugin, you likely already have a built-in editor. Visit these links for instructions from Yoast, RankMath, SEOPress, All in One SEO, and SEO Framework.
- You can also find standalone plugins in the directory.
- Wix: follow these instructions.
- Squarespace: Settings > Crawlers > “Block known artificial intelligence crawlers” (more info here)
- Other Websites:
- Search for instructions for your specific platform or setup
- Or ask your host or web developer for help
- Or, if you are on Cloudflare, do it via their managed robots.txt service
- Or do it yourself by finding your robots.txt file in the root directory of your hosting account (usually /public_html/) and editing it directly. Here are instructions from Google. Be careful editing your root files if you are not technically proficient.
Which AI Bots to Disallow:
You have to decide for yourself, but I recommend disallowing as many AI bots as possible.
The following is a list of some of the most significant AI crawlers, formatted in robots.txt syntax:
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Google-CloudVertexBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Claude-User
Disallow: /
User-agent: Claude-SearchBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: meta-externalagent
Disallow: /
The above list blocks some of the most common AI bots from OpenAI, Google, Perplexity, Anthropic, Meta, Cohere, and Common Crawl. For more bots to consider disallowing, see this comprehensive list of AI bots.
Will Disallowing Google-Extended or Google-CloudVertexBot Affect My Appearance in Google?
No. It will disallow Google’s Gemini AI from training new models on your content but will not affect search.
According to Google’s crawler documentation, “Google-Extended does not impact a site’s inclusion in Google Search nor is it used as a ranking signal in Google Search.”
The same is true for Google-CloudVertexBot, which is a crawler used for Google’s enterprise clients that want to scrape your content to build AI apps (that might compete with you). According to Google, blocking it has “no effect on Google Search or other products.”
Note that blocking either of these crawlers will not block your website’s appearance in AI Overviews or AI Mode. Currently it is not possible to block those AI applications without blocking search appearance. This is something publishers should pressure Google to change.
Will Disallowing These Bots Affect My Appearance in ChatGPT, Perplexity, or Claude?
Yes, that’s the idea.
But you’re likely not getting much value from these bots anyway. AI search chatbots send 96% less traffic per search compared to traditional search engines. And data from Raptive shows that sites that block AI bots have no significant difference in traffic compared to sites that do not block.
In the short term, you may miss out on a few marginal clicks by blocking them. But, by giving your content for free now, you may be undermining your entire business model in the long term. Personally, I think blocking AI bots is a “no brainer” – but you have to make your own decision as a website owner.
Step 2: Block AI Bots at the Server Level
Why Server Blocks are Also Necessary: Unfortunately, robots.txt isn’t a perfect solution. It is a voluntary protocol and it relies on voluntary compliance by the AI companies. Some AI companies have been caught using undeclared “stealth” crawlers to bypass robots.txt directives.
An additional, non-exclusive, option for website owners is to also block AI bots at the server level – so AI crawlers cannot access your content in the first place. This is also an imperfect solution, as you still have to be able to identify which crawlers to block, but it provides another layer of security.
How to Block AI Bots at the Server Level:
- Option 1 (easiest): Via Cloudflare or Another Provider
- If you are on the Cloudflare CDN (including their free tier), you can block AI scrapers in one click. Instructions here.
- Just go to Security > Bots > AI Scrapers and Crawlers, and make sure the toggle is set to “on.”
- You can read more about why and how Cloudflare is doing this in this BBC article.
- If you are not on Cloudflare, I can recommend it for smaller publishers. Note I am a customer, but have no other affiliation. Their CDN can help make your site load faster, and their free tier should be more than enough for smaller websites. Talk to your host or developer on how to get it set up.
- If Cloudflare isn’t right for you, there are other managed services you could consider like TollBit.
- If you are on the Cloudflare CDN (including their free tier), you can block AI scrapers in one click. Instructions here.
- Option 2: Manually
- For non-technical publishers, ask your hosting provider or your web developer.
- If you are technically proficient, Jeff Starr has guides on how to do this via Apache/.htaccess and via Nginx.
Step 3: Update Your Terms of Service (ToS)
Why Update Your ToS: If you do not agree with AI bots scraping your site without permission and using your content to compete with you, you should make that clear in your website’s terms of service. This won’t stop them from scraping, but legally it may help establish “actual legal notice,” which may be an important requirement in some jurisdictions before you can enforce your rights in court.
How to Update Your ToS: Simply navigate to your website’s existing terms of service page and insert a provision with whatever language you want. If you need inspiration, you could refer to Raptive’s Terms of Use (see below).
If you do not have a ToS, you can make one (make sure it’s possible to navigate to it from all pages on your site, such as by putting a link to it in your footer).
Raptive’s Terms of Use: The Raptive network, now by default inserts the following “Terms of Content Use” in the footer of member sites:
Your Use of Our Content. The content we make available on this website [and through our other channels] (the “Service”) was created, developed, compiled, prepared, revised, selected, and/or arranged by us, using our own methods and judgment, and through the expenditure of substantial time and effort. This Service and the content we make available are proprietary, and are protected by these Terms of Service (which is a contract between us and you), copyright laws, and other intellectual property laws and treaties. This Service is also protected as a collective work or compilation under U.S. copyright and other laws and treaties. We provide it for your personal, non-commercial use only.
You may not use, and may not authorize any third party to use, this Service or any content we make available on this Service in any manner that (i) is a source of or substitute for the Service or the content; (ii) affects our ability to earn money in connection with the Service or the content; or (iii) competes with the Service we provide. These restrictions apply to any robot, spider, scraper, web crawler, or other automated means or any similar manual process, or any software used to access the Service. You further agree not to violate the restrictions in any robot exclusion headers of this Service, if any, or bypass or circumvent other measures employed to prevent or limit access to the Service by automated means.
You can read why and how they do this here.
Note that this is not legal advice. This is provided for informational purposes and without warranty. If you choose to use or modify this language for your own site, you of course assume all responsibility for your terms.
4 Bonus Ways to Fight for a Fair AI Future for the Web
Blocking AI bots is an important first step towards an open web licensing marketplace, but web publishers will still need AI companies (especially Google) to participate in the marketplace as buyers. Fortunately, there are ways we can all help bring the AI companies to the negotiating table.
Bonus 1: Talk to Your Audience About AI
If you are a web creator or publisher, you likely already have an existing audience. Talk to your audience about AI.
It can be a blog post, an email newsletter, a podcast, a social media post, a video – whatever channel(s) you normally use to reach your audience, you should talk about how AI affects your ability to continue providing content.
In my experience, many “normal” web users are already apprehensive about generative AI and are very sympathetic to our cause.
Bonus 2: Lobby Your Governments
Although we can’t wait for governments to act (which is why we need to block bots), it sure would help if they would pass laws or regulations. In many cases, politicians simply don’t understand what is happening online. Calling your local representatives and explaining how AI is affecting you as a business owner and constituent really can help make a difference.
Bonus 3: Lobby Google to Let Us Separately Block AI Mode & AI Overviews
Google does not currently allow publishers to separately block content from AI Mode & AI Overviews, without also blocking or affecting appearance in Search.
This is a major problem, because it effectively means that Google is forcing web publishers to provide our content for AI products that compete with us.
Back in 2023, Google put out a blog post claiming to recognize this and promising a “public discussion” about more options for publisher choice and control. That discussion never really happened. Instead, internal documents produced in court revealed that in 2024 Google considered giving us granular control and rejected the idea as a “hard red line.”
We as a community really need to lobby Google to change this. It may require us all coming together and collectively threatening to block Google-Bot entirely until we get granular control.
Bonus 4: Spread the Word to Other Publishers!
The more web publishers who block AI bots, the stronger negotiating position we will all have.
If you know other web publishers, or hang out in web publisher communities online, spread the word.
You could start by sharing this post.
***
If you have any questions or concerns, drop me a comment and I’ll help if I can.
More Latest PERSPECTIVES







Cloudflare readjusted Bots area – Security -> Bot traffic button -> Block AI bots (also I enabled labyrinth)
I also made a code snippet for blocking all the AI bots (if you use The SEO Framework)
[ ‘by’ => ‘AddSearch’, ‘link’ => ‘https://www.addsearch.com’ ],
‘Agentic’ => [ ‘by’ => ‘Agentic’, ‘link’ => ‘https://agentic.ai’ ],
‘AgentQL’ => [ ‘by’ => ‘AgentQL’, ‘link’ => ‘https://agentql.com’ ],
‘Alexa’ => [ ‘by’ => ‘Amazon’, ‘link’ => ‘https://developer.amazon.com/amazonbot’ ],
‘Amazonbot’ => [ ‘by’ => ‘Amazon’, ‘link’ => ‘https://developer.amazon.com/amazonbot’ ],
‘Anthropic’ => [ ‘by’ => ‘Anthropic’, ‘link’ => ‘https://support.anthropic.com/en/articles/8896518’ ],
‘anthropic-ai’ => [ ‘by’ => ‘Anthropic’, ‘link’ => ‘https://support.anthropic.com/en/articles/8896518’ ],
‘Anyword’ => [ ‘by’ => ‘Anyword’, ‘link’ => ‘https://www.anyword.com’ ],
‘Applebot’ => [ ‘by’ => ‘Apple’, ‘link’ => ‘https://support.apple.com/en-us/119829’ ],
‘Applebot-Extended’ => [ ‘by’ => ‘Apple’, ‘link’ => ‘https://support.apple.com/en-us/119829’ ],
‘AnyPicker’ => [ ‘by’ => ‘AnyPicker’, ‘link’ => ‘https://www.anypicker.com’ ],
‘Articoolo’ => [ ‘by’ => ‘Articoolo’, ‘link’ => ‘https://articoolo.com’ ],
‘Botsonic’ => [ ‘by’ => ‘Botsonic’, ‘link’ => ‘https://www.botsonic.com’ ],
‘Bytespider’ => [ ‘by’ => ‘ByteDance’, ‘link’ => ‘https://www.bytedance.com’ ],
‘CCBot’ => [ ‘by’ => ‘Common Crawl’, ‘link’ => ‘https://commoncrawl.org/ccbot’ ],
‘Chatsonic’ => [ ‘by’ => ‘Chatsonic’, ‘link’ => ‘https://www.chatsonic.com’ ],
‘Claude’ => [ ‘by’ => ‘Anthropic’, ‘link’ => ‘https://support.anthropic.com/en/articles/8896518’ ],
‘ClaudeBot’ => [ ‘by’ => ‘Anthropic’, ‘link’ => ‘https://support.anthropic.com/en/articles/8896518’ ],
‘Cohere’ => [ ‘by’ => ‘Cohere’, ‘link’ => ‘https://cohere.ai’ ],
‘cohere-ai’ => [ ‘by’ => ‘Cohere’, ‘link’ => ‘https://cohere.ai’ ],
‘ContentAtScale’ => [ ‘by’ => ‘ContentAtScale’, ‘link’ => ‘https://www.contentatscale.ai’ ],
‘Copyscape’ => [ ‘by’ => ‘Copyscape’, ‘link’ => ‘https://www.copyscape.com’ ],
‘CrewAI’ => [ ‘by’ => ‘CrewAI’, ‘link’ => ‘https://www.crewai.com’ ],
‘Crawl4AI’ => [ ‘by’ => ‘Crawl4AI’, ‘link’ => ‘https://crawl4ai.com’ ],
‘DALL-E’ => [ ‘by’ => ‘OpenAI’, ‘link’ => ‘https://openai.com/dall-e’ ],
‘DeepAI’ => [ ‘by’ => ‘DeepAI’, ‘link’ => ‘https://deepai.org’ ],
‘DeepL’ => [ ‘by’ => ‘DeepL’, ‘link’ => ‘https://www.deepl.com’ ],
‘DeepMind’ => [ ‘by’ => ‘Google’, ‘link’ => ‘https://deepmind.com’ ],
‘DeepSeek’ => [ ‘by’ => ‘DeepSeek’, ‘link’ => ‘https://www.deepseek.com’ ],
‘Diffbot’ => [ ‘by’ => ‘Diffbot’, ‘link’ => ‘https://www.diffbot.com’ ],
‘FacebookBot’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://developers.facebook.com/docs/sharing/bot’ ],
‘FacebookExternalHit’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://developers.facebook.com/docs/sharing/bot’ ],
‘Firecrawl’ => [ ‘by’ => ‘Firecrawl’, ‘link’ => ‘https://firecrawl.dev’ ],
‘Genspark’ => [ ‘by’ => ‘Genspark’, ‘link’ => ‘https://www.genspark.ai’ ],
‘Gemini’ => [ ‘by’ => ‘Google’, ‘link’ => ‘https://gemini.google.com’ ],
‘GigaChat’ => [ ‘by’ => ‘GigaChat’, ‘link’ => ‘https://gigachat.ai’ ],
‘Google-Extended’ => [ ‘by’ => ‘Google’, ‘link’ => ‘https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers’ ],
‘Google-CloudVertexBot’ => [ ‘by’ => ‘Google’, ‘link’ => ‘https://cloud.google.com/vertex-ai’ ],
‘GPTBot’ => [ ‘by’ => ‘OpenAI’, ‘link’ => ‘https://platform.openai.com/docs/bots’ ],
‘Grok’ => [ ‘by’ => ‘xAI’, ‘link’ => ‘https://grok.x.com’ ],
‘Grammarly’ => [ ‘by’ => ‘Grammarly’, ‘link’ => ‘https://www.grammarly.com’ ],
‘Jasper’ => [ ‘by’ => ‘Jasper’, ‘link’ => ‘https://www.jasper.ai’ ],
‘Jinja’ => [ ‘by’ => ‘Jinja’, ‘link’ => ‘https://jinja.ai’ ],
‘Kaggle’ => [ ‘by’ => ‘Kaggle’, ‘link’ => ‘https://www.kaggle.com’ ],
‘Kimi’ => [ ‘by’ => ‘Kimi’, ‘link’ => ‘https://kimi.moonshot.cn’ ],
‘LangChain’ => [ ‘by’ => ‘LangChain’, ‘link’ => ‘https://www.langchain.com’ ],
‘Lightpanda’ => [ ‘by’ => ‘Lightpanda’, ‘link’ => ‘https://www.lightpanda.io’ ],
‘LLaMA’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://www.llama.com’ ],
‘LLM’ => [ ‘by’ => ‘LLM’, ‘link’ => ‘https://llm.ai’ ],
‘magpie-crawler’ => [ ‘by’ => ‘Magpie’, ‘link’ => ‘https://magpie.com’ ],
‘Maroofy’ => [ ‘by’ => ‘Maroofy’, ‘link’ => ‘https://maroofy.com’ ],
‘Meta-External’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://developers.facebook.com/docs/sharing/webmasters/web-crawlers’ ],
‘Meta-ExternalAgent’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://developers.facebook.com/docs/sharing/webmasters/web-crawlers’ ],
‘Meta-Webindexer’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://developers.facebook.com/docs/sharing/webmasters/web-crawlers’ ],
‘MetaAI’ => [ ‘by’ => ‘Meta’, ‘link’ => ‘https://www.meta.ai’ ],
‘Midjourney’ => [ ‘by’ => ‘Midjourney’, ‘link’ => ‘https://www.midjourney.com’ ],
‘Mistral’ => [ ‘by’ => ‘Mistral AI’, ‘link’ => ‘https://mistral.ai’ ],
‘Mixtral’ => [ ‘by’ => ‘Mistral AI’, ‘link’ => ‘https://mistral.ai’ ],
‘Monica’ => [ ‘by’ => ‘Monica’, ‘link’ => ‘https://www.monica.im’ ],
‘Midjourney’ => [ ‘by’ => ‘Midjourney’, ‘link’ => ‘https://www.midjourney.com’ ],
‘Mistral’ => [ ‘by’ => ‘Mistral’, ‘link’ => ‘https://mistral.ai’ ],
‘OAI-SearchBot’ => [ ‘by’ => ‘OpenAI’, ‘link’ => ‘https://platform.openai.com/docs/bots’ ],
‘OpenAI’ => [ ‘by’ => ‘OpenAI’, ‘link’ => ‘https://openai.com’ ],
‘OpenRouter’ => [ ‘by’ => ‘OpenRouter’, ‘link’ => ‘https://openrouter.ai’ ],
‘Perplexity’ => [ ‘by’ => ‘Perplexity’, ‘link’ => ‘https://perplexity.ai’ ],
‘PerplexityBot’ => [ ‘by’ => ‘Perplexity’, ‘link’ => ‘https://perplexity.ai’ ],
‘Perplexity-User’ => [ ‘by’ => ‘Perplexity’, ‘link’ => ‘https://perplexity.ai’ ],
‘Phind’ => [ ‘by’ => ‘Phind’, ‘link’ => ‘https://www.phind.com’ ],
‘Puppeteer’ => [ ‘by’ => ‘Puppeteer’, ‘link’ => ‘https://pptr.dev’ ],
‘Qwen’ => [ ‘by’ => ‘Alibaba’, ‘link’ => ‘https://www.aliyun.com’ ],
‘QuillBot’ => [ ‘by’ => ‘QuillBot’, ‘link’ => ‘https://quillbot.com’ ],
‘Rytr’ => [ ‘by’ => ‘Rytr’, ‘link’ => ‘https://rytr.me’ ],
‘SemrushBot’ => [ ‘by’ => ‘SEMrush’, ‘link’ => ‘https://www.semrush.com/bot’ ],
‘SemrushBot-BA’ => [ ‘by’ => ‘SEMrush’, ‘link’ => ‘https://www.semrush.com/bot’ ],
‘SiteAuditBot’ => [ ‘by’ => ‘SEMrush’, ‘link’ => ‘https://www.semrush.com/bot’ ],
‘Serper’ => [ ‘by’ => ‘Serper’, ‘link’ => ‘https://serper.dev’ ],
‘Sora’ => [ ‘by’ => ‘OpenAI’, ‘link’ => ‘https://openai.com/sora’ ],
‘StableDiffusionBot’ => [ ‘by’ => ‘Stability AI’, ‘link’ => ‘https://stability.ai’ ],
‘Sudowrite’ => [ ‘by’ => ‘Sudowrite’, ‘link’ => ‘https://www.sudowrite.com’ ],
‘Surfer’ => [ ‘by’ => ‘Surfer’, ‘link’ => ‘https://surferseo.com’ ],
‘TextCortex’ => [ ‘by’ => ‘TextCortex’, ‘link’ => ‘https://textcortex.com’ ],
‘Together’ => [ ‘by’ => ‘Together AI’, ‘link’ => ‘https://www.together.ai’ ],
‘TurnitinBot’ => [ ‘by’ => ‘Turnitin’, ‘link’ => ‘https://www.turnitin.com’ ],
‘Writesonic’ => [ ‘by’ => ‘Writesonic’, ‘link’ => ‘https://writesonic.com’ ],
‘Writescope’ => [ ‘by’ => ‘Writescope’, ‘link’ => ‘https://www.writescope.ai’ ],
‘Wordtune’ => [ ‘by’ => ‘Wordtune’, ‘link’ => ‘https://www.wordtune.com’ ],
‘xAI’ => [ ‘by’ => ‘xAI’, ‘link’ => ‘https://x.ai’ ],
‘YandexAdditional’ => [ ‘by’ => ‘Yandex’, ‘link’ => ‘https://yandex.com’ ],
‘Zhipu’ => [ ‘by’ => ‘Zhipu’, ‘link’ => ‘https://www.zhipuai.com’ ],
];
// Merge with existing agents
$agents = array_merge( $agents, $ai_bots );
break;
case ‘seo’:
// Add SEO bots to the blocklist if needed
// allowing them to crawl your site again, while still blocking others.
unset( $agents[‘AhrefsBot’], $agents[‘AhrefsSiteAudit’] );
break;
}
return $agents;
}, 10, 2 );
}, 1 );
This works if everyone does it, but sadly not everyone will. And so if many don’t, all you are doing by blocking the bots is harming your own blog (because you don’t even get linked to from things like AI Overviews).
Have I got this right or am I missing something?
Hi Paul!
It works if enough publishers do it that Google feels a need to negotiate. We know they are doing private deals already. Just last week it came out via testimony in the ad tech antitrust case that Google is paying an AI licensing deal to WikiHow. They have told the UK government they are only willing to pay for content where they cannot access data.
If you give away your content for free why would Google or any other AI company pay you?
Personally, I don’t feel that trading 1% traffic now for all my bargaining chips is a good deal. AI content licensing deals are likely to be the primary monetization method of the future web, and the best deals are going to go to the publishers with the most leverage. Again, all just my opinion.
Thank you for reading, thinking about this, and commenting!
Also just to clarify — nothing I mention in this post will block you from appearing in AI Overviews. The only way to do that is to block Google-Bot entirely (at least until Google offers us more granular control mechanisms). We’re only talking about other AI Search bots like Perplexity, or the training bots like Google-Extended. But that doesn’t affect Search, AI Overviews, or AI Mode.