ChatGPT User Agent Identified in Nginx Logs The user agent "ChatGPT-User" was found to fetch pages from multiple source IPs within a single burst, making rate-limiting based on IP less effective. This behavior matches OpenAI's documentation.
Claude User Agent Identified Claude's user agent, "Claude-User", was identified as pulling /robots.txt before every page fetch and following redirects. Adding a "Disallow: Claude-User" rule to robots.txt will prevent Claude from fetching the site.
Perplexity Direct Fetches Pages Perplexity-User fetched pages directly without an Accept header or referrer, indicating a live provider-side fetch. However, Perplexity can also answer from its index without hitting the origin.
Gemini Did Not Perform Live Fetch The probe detected no requests from Gemini's user agent during the prompt window, suggesting that Gemini answers entirely from its own index and does not perform live provider-side fetches.
Microsoft Copilot and Grok Indistinguishable from Human Visitors Copilot and Grok fetched pages as plain browsers with no distinct user agents, making them indistinguishable from human visitors. This makes it difficult to measure AI traffic from logs alone for these chatbots.
Meta AI Documents Two Bots, Observed One Meta appears to maintain its own index and can serve information from the index when possible. However, the probe observed only one of Meta's live-fetch bots, meta-webindexer/1.1, during a session.
Manus Announces Itself Clearly Manus fetched as Mozilla/5.0 ... Chrome/132.0 ... ; Manus-User/1.0, making it easy to identify in logs.
Measurable Signals from Logs Two measurable signals can be identified from logs:
- Provider fetch: vendor-documented or probe-observed retrieval user-agents hitting the origin.
- Real visit: normal browser user-agent with a chatbot as the referrer.