LLMs: The Digital Parasites & The Gluttons

https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/

https://blog.cloudflare.com/control-content-use-for-ai-training/

https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/

The Parasites:

I’ve been wanting to write this post ever since Cloudflare dropped its latest report on AI web crawlers, confirming a truth many of us have felt for a while: the relationship between AI companies and the open web is fundamentally broken. It's a purely extractive relationship. I'll only be talking a bit about this part; the reports I linked are far more detailed, and they are worth a read as well.

To be clear, when I talk about parasitic behaviour in this context, I’m referring to the death of the web’s old grand bargain. For decades, the deal was simple: publishers created content, and search engines crawled it in exchange for sending referral traffic back. It was a loop of value. Now, AI crawlers ingest that same content not to refer, but to replace. They scrape the web’s knowledge to provide answers directly within their own closed interface, cutting the original creators out of the loop entirely. They take the value and give almost nothing back. That’s digital parasitism.

You’d think the biggest offender would be the biggest name, right? OpenAI, with its massive scale, must surely be the worst parasite. But, to the surprise of absolutely no one who’s been paying attention, the crown for the most parasitic of all generative AI companies goes to the self-proclaimed "ethical" B Corp: Anthropic.

Anthropic: The Apex Parasite

Anthropic’s behaviour is in a league of its own. The key metric here is the "crawl-to-refer ratio," which measures how many pages a bot scrapes for every single human visitor it sends back. The numbers for Anthropic are mind-blowing.

In January 2025, their ratio was an almost unbelievable 286,930:1. While that has since "improved," recent reports still peg them with ratios between 50,000:1 and 73,000:1. To put that in perspective, OpenAI’s ratio hovers around 880:1 to 1,700:1, and Google’s is around 9:1. At their known peak, they were more than 100x the ratio of the biggest genAI company in the world. It is an absolutely insane situation.

This isn't a new problem, and like many of their other questionable behaviours, they don’t stop when called out. Just ask iFixit, who reported their servers were hammered with nearly a million requests from ClaudeBot in a single 24-hour period, in direct violation of their terms of service. Anthropic’s response? A shrug and a link to their FAQ, telling them to use a robots.txt file to opt-out. This has been a consistent pattern, with system administrators across the web describing ClaudeBot’s activity as so aggressive it resembles a DDoS attack.

There are absolutely tangible costs to these behaviours. The open-source project Read the Docs reported saving $1,500 per month in bandwidth costs after blocking AI crawlers. It’s a forced subsidy from creators to a multi-billion-dollar corporation. More importantly, it erodes the entire economic model of the web. No clicks mean no ad revenue, no subscriptions, and no brand visibility for the people who actually create the information.

Meta: The Digital Strip-Miner

On the other side of this grimy coin, we have another tech giant. A company spending billions on a generative AI strategy that seems to have no clear direction, churning out bottom-of-the-barrel products while its peers pull ahead. No, not Microsoft silly, I’m talking about Meta!

If Anthropic is the parasite, quietly draining the lifeblood of its host with an impossibly imbalanced exchange, then Meta is the digital strip-miner. It’s less about finesse and all about brute-force, overwhelming volume.

Since last year, as Meta scrambled to build a "superintelligence" team, its data hunger has accelerated into a frenzy of web scraping. The numbers are, once again, courtesy of the web’s watchdogs. According to a report from Fastly, Meta’s AI crawlers are the most dominant on the web by a massive margin, accounting for 52% of all AI crawler traffic—more than double the traffic from Google (23%) and OpenAI (20%) combined. In the span of just one year, from July 2024 to July 2025, raw requests from its Meta-ExternalAgent bot exploded by 843%.

This is a torrential deluge. This kind of high-volume scraping overwhelms servers, consumes vast amounts of bandwidth, and can mimic the effects of a DDoS attack, even if unintentional. It also pollutes analytics, with one report noting that AI scrapers contributed to an 86% year-over-year increase in general invalid traffic, making it harder for businesses to understand their real human audience.

And for what? After strip-mining half the web’s AI-related bot traffic, what does Meta have to show for it? A suite of AI products that are widely regarded as lagging behind the competition. They’ve consumed immense resources, strained the web’s infrastructure, and devalued content, all while failing to produce anything of significant worth. It’s the digital equivalent of leveling a rainforest to produce a single toothpick.

The web is being assaulted from two fronts: the insidious, imbalanced extraction of the parasite and the overwhelming, brute-force consumption of the strip-miner. Both are unsustainable, and both are destroying the ecosystem they claim to be learning from.

LLMs: The Digital Parasites & The Gluttons

The Parasites:

Anthropic: The Apex Parasite

Meta: The Digital Strip-Miner

✅ Verdict

Tags