Cloudflare Radar finds AI bots consuming thousands of pages per human referral

Serge Bulaev

Serge Bulaev

Cloudflare Radar finds that AI bots are reading hundreds or even thousands of web pages for each visit they send to publishers, much more than search engines like Google. This may be causing higher costs and changing how publishers think about traffic and licensing. Some publishers are trying stricter bot controls, direct licensing, or changing their content to handle this. Regulators are watching but have not required new rules yet. The amount of traffic from AI bots appears unlikely to go down soon, so publishers might need to find ways to get paid for the content AI uses and avoid extra costs.

Cloudflare Radar finds AI bots consuming thousands of pages per human referral

New data from Cloudflare Radar finds AI bots consuming thousands of web pages for each human referral, a vast disparity that is forcing publishers to rethink traffic, costs, and content licensing. This lopsided "crawl-to-referral" ratio turns the traditional search engine value exchange into what many now call an extraction economy.

The Scale of AI's Crawl-to-Referral Imbalance

Data from late May highlights the extreme difference between AI crawlers and traditional search. While Google maintains roughly a 5:1 ratio of pages crawled to visits sent, AI platforms are far more demanding:
* Anthropic: ~70,900 pages crawled per referral
* OpenAI: ~1,091 pages crawled per referral
* Perplexity: ~195 pages crawled per referral

The Hidden Costs: Server Strain and Inflated Bills

This aggressive crawling creates significant operational and financial burdens. Newsrooms report traffic spikes resembling denial-of-service attacks, with industry reports noting that AI bots can overwhelm origin servers with high-volume requests. For example, Read the Docs cut its AI crawler bandwidth by 75% to save approximately $1,500 a month after blocking several high-volume bots, as noted in a Search Engine Journal report.

Publisher Strategies: Monetization and Control

Publishers are actively testing three main strategies to manage this new reality: stricter bot controls, direct licensing deals, and structural content changes. A Digital Content Next analysis suggests that usage-based licensing - where each AI request is metered and billed - is emerging as a primary model. This is supported by firms like EY, which describe middleware designed to track and charge LLMs on a per-query basis.

Publishers are responding to the high volume of AI crawling by exploring direct licensing agreements, implementing stricter bot controls via robots.txt, and adopting usage-based metering to bill for content consumption. Others are embedding metadata to enforce attribution and investigating content marketplace platforms for monetization.

To secure their content, publishers are advised to conduct AI vulnerability audits, define commercial intent in robots.txt files, and embed licensing metadata. Some are also preparing machine-readable content archives for emerging platforms that would allow them to set price cards for AI buyers.

The Regulatory Landscape: US vs. EU Approaches

Regulators are observing the compensation debate, but a consensus has not yet formed. In the U.S., policy discussions encourage exploring collective licensing systems to help rights holders negotiate fees without antitrust risk, but stop short of mandating licensing. In contrast, the EU AI Act, which entered into force in 2024 with phased implementation timelines, prioritizes transparency and includes requirements for generative AI providers to disclose certain information about their training data and outputs.

Navigating the New Reality

The operational pressure on publishers is not expected to decrease, as industry reports indicate AI crawlers make up a significant portion of all bot traffic. The Electronic Frontier Foundation warns that even legitimate scrapers can mimic DDoS attacks if poorly configured. Publishers now face the dual challenge of monetizing the content AI bots consume while simultaneously avoiding the infrastructure costs incurred by their high-volume traffic.


How extreme is the crawl-to-referral imbalance for AI platforms?

Cloudflare Radar measured the ratio during the last full week of May:
- Anthropic crawlers read ~70,900 pages for every human referral
- OpenAI: ~1,091 pages per human referral
- Perplexity: ~195 pages per human referral
- Google's traditional crawler: ~5 pages per human referral

These figures reveal that AI bots extract content on an industrial scale while returning almost no direct traffic, breaking the long-standing bargain where crawling translated into meaningful referrals.

Why does the imbalance matter for publishers?

Beyond lost traffic, publishers now face higher hosting and infrastructure costs:
- Industry reports indicate AI bots can generate significant request volumes that create DDoS-like strain on origin servers.
- Read the Docs cut AI crawler traffic from 800 GB to 200 GB daily after blocking bots, saving ~$1,500 per month in bandwidth.
- ProtonDB reported crawler traffic pushed usage beyond a 1 TB plan, adding $500/month in extra costs.

Operational drag also grows: more CPU cycles, database queries, distorted analytics, and the need for constant bot-management tooling.

How are publishers trying to monetize AI use?

Publishers are exploring usage-based licensing and metering as a potential model:
1. Per-query pricing: Middleware charges LLM providers each time content is accessed.
2. Direct licensing deals: OpenAI, Google, Apple, Microsoft, and Perplexity have already signed agreements worth millions with major publishers.
3. Marketplace participation: Publishers are exploring content marketplace platforms that would let them set terms and get paid when AI products use their material.
4. Attribution-by-design: Embedding metadata that forces AI outputs to include citations or backlinks, restoring referral visibility.

What is happening on the regulatory front?

Policy discussions have bifurcated:

  • United States: Policy frameworks encourage Congress to consider collective licensing or rights-management systems so rights holders can negotiate compensation without antitrust liability, while not mandating whether licensing is required.
  • Europe & Asia: Focus is on transparency and labeling rather than direct payment.
  • The EU AI Act includes phased requirements for generative-AI providers to disclose information about training data and label AI-generated content in certain contexts.
  • Various jurisdictions are considering similar watermark/metadata rules for AI outputs.

The landscape splits between compensation models (US) and ethics/transparency models (EU/APAC).

What practical steps can publishers take right now?

  1. Audit your exposure: Use tools to map which AI bots are hitting your site and how often.
  2. Implement usage tracking: Convert every AI hit into a billable event with rate cards and partner payouts.
  3. Structure content for machine readability: Clear metadata, Really Simple Licensing tags, and machine-readable sections raise the odds of proper citation and paid reuse.
  4. Strengthen first-party channels: Newsletters, apps, and subscription offerings reduce dependence on AI referrals.
  5. Deploy selective blocking: Allow only licensed crawlers via robots.txt while blocking the rest to cut bandwidth waste.