Arcjet Bot Protection — Concepts and Identifying Bots | Guide

🤖 Overview

Arcjet helps you manage automated traffic (bots) so the right requests reach your app. You can allow good bots (e.g., search crawlers) and deny unwanted ones (e.g., scrapers), with simple, composable rules. See concepts and capabilities in the official docs Concepts and Identifying bots.

🧭 Why configure bot protection?

Good vs bad bots: Some bots index your site or monitor uptime; others scrape content or abuse forms.
robots.txt is not enough: Not all bots respect it.
Granular control: Allow bots for APIs but deny them for sensitive flows like signup forms.

Reference: Concepts

🧠 Bot detection realities

No solution blocks 100% of bad bots. Sophisticated bots mimic real users.
Plans with IP analysis: Starter/Business plans add IP-based verification to catch imposters (e.g., pretending to be Google).
Combine with rate limiting: Best results come from layering bot protection with rate limits.
Trade-offs exist: Fingerprint-based blocking may temporarily affect some legitimate users behind shared IPs.

References: Concepts

🔍 Identifying and configuring bots

You can explicitly allow or deny individual bots, and/or entire categories. In TypeScript, identifiers autocomplete to make configuration easy.

Allow specific known bots

ts

Allow categories (plus optional individuals)

Available categories include: CATEGORY:SEARCH_ENGINE, CATEGORY:PROGRAMMATIC, CATEGORY:PREVIEW, CATEGORY:AI, CATEGORY:MONITOR, CATEGORY:FEEDFETCHER, CATEGORY:GOOGLE, CATEGORY:MICROSOFT, CATEGORY:META, CATEGORY:TOOL, CATEGORY:VERCEL, CATEGORY:YAHOO, CATEGORY:ARCHIVE, CATEGORY:OPTIMIZER, CATEGORY:ACADEMIC, CATEGORY:SOCIAL, CATEGORY:AMAZON, CATEGORY:UNKNOWN.

ts

Note: Only configured categories are checked for performance. Worst-case comparison is roughly count(detectedBot) * count(configuredCategories).

References: Identifying bots

🧾 Handling missing User-Agent headers

Requests without a User-Agent cannot be reliably matched to a bot and are marked as errored. Most legitimate clients include it (per RFC 7231 guidance). You can choose to block requests missing a User-Agent with a filter rule:

ts

Alternatively, check results after a decision using @arcjet/inspect:

ts

Reference: Concepts

Start in DRY_RUN to observe impact before enforcing.
Layer with rate limiting for defense-in-depth and fewer false positives.
Allow good bots you rely on (search, monitors); deny the rest.
Handle missing User-Agent with a filter or post-decision check.
Monitor and iterate; bot ecosystems evolve, and so should rules.

📚 References

Arcjet — Bot protection concepts: https://docs.arcjet.com/bot-protection/concepts
Arcjet — Identifying bots: https://docs.arcjet.com/bot-protection/identifying-bots

Arcjet Bot Protection — Concepts and Identifying Bots