Back to Learn

Blocking User Agents on Nginx and Apache | NOC.org

Why Block User Agents?

Every HTTP request includes a User-Agent header that identifies the client software making the request. Legitimate browsers send user agent strings like Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36, while automated tools, scrapers, and vulnerability scanners often identify themselves with distinctive user agent strings — or use generic defaults that are easy to detect.

Blocking known malicious user agents reduces server load, prevents content scraping, thwarts automated vulnerability scanning, and eliminates noise in your access logs. While sophisticated attackers can trivially spoof their user agent, the vast majority of automated attacks use default tool identifiers. Blocking these provides a meaningful first layer of defense before deeper inspection by a web application firewall.

Common Malicious User Agents to Block

The following user agents are associated with automated tools, vulnerability scanners, and content scrapers that should be blocked on most production servers:

Vulnerability Scanners

  • Nikto — popular open-source web server scanner
  • sqlmap — automated SQL injection tool
  • Nessus — network vulnerability scanner
  • OpenVAS — open-source vulnerability assessment
  • w3af — web application attack and audit framework
  • Acunetix — web vulnerability scanner
  • Netsparker — web application security scanner
  • ZmEu — scanner targeting phpMyAdmin vulnerabilities
  • masscan — high-speed port scanner

Content Scrapers and Bots

  • HTTrack — website copier
  • SiteSucker — downloads entire websites
  • WebCopier — website downloading tool
  • Scrapy — Python web scraping framework
  • MJ12bot — aggressive crawler (Majestic SEO)
  • AhrefsBot — SEO crawler (block if not wanted)
  • SemrushBot — SEO crawler (block if not wanted)

Default Tool User Agents

  • curl/ — default curl user agent (used by many automated scripts)
  • Wget/ — default wget user agent
  • Python-urllib — Python's default HTTP library user agent
  • python-requests — Python requests library default
  • Go-http-client — Go's default HTTP client
  • Java/ — Java's default HTTP user agent
  • libwww-perl — Perl's LWP library (used by many automated scripts)

Note: Blocking curl and wget default user agents may affect legitimate monitoring tools and health checks. Ensure your monitoring uses custom user agents before blocking these.

Blocking User Agents on Nginx

Method 1: Using map and if

The most efficient Nginx method uses a map block to evaluate the user agent once and store the result in a variable, then an if block to act on it:

# In the http {} block of nginx.conf
map $http_user_agent $block_user_agent {
    default         0;
    ~*nikto         1;
    ~*sqlmap        1;
    ~*nessus        1;
    ~*openvas       1;
    ~*w3af          1;
    ~*acunetix      1;
    ~*netsparker    1;
    ~*zmeu          1;
    ~*masscan       1;
    ~*httrack       1;
    ~*sitesucker    1;
    ~*scrapy        1;
    ~*python-urllib  1;
    ~*python-requests 1;
    ~*libwww-perl   1;
    ~*go-http-client 1;
    ~*java/         1;
    ""              1;  # Block empty user agents
}

# In the server {} block
server {
    ...
    if ($block_user_agent) {
        return 403;
    }
    ...
}

The ~* prefix makes the match case-insensitive. The map directive is evaluated lazily and only when the variable is used, making it efficient even with many entries.

Method 2: Simple if Block

For a smaller list, you can use a single regex in the server block:

server {
    ...
    if ($http_user_agent ~* (nikto|sqlmap|nessus|zmeu|masscan|httrack|scrapy|libwww-perl)) {
        return 403;
    }
    ...
}

This is simpler but less maintainable for long lists, and the regex is evaluated on every request rather than using the optimized hash table of a map block.

Returning Different Responses

# Return 403 Forbidden
return 403;

# Return 444 (Nginx-specific: close connection with no response)
return 444;

# Return a custom error page
return 403 "Access denied.";

# Redirect to a honeypot
return 301 https://example.com/honeypot;

Using return 444 is the most efficient option for Nginx — it drops the connection immediately without sending any response headers, wasting the scanner's time and bandwidth.

Blocking User Agents on Apache

Method 1: mod_rewrite in VirtualHost or .htaccess

# Requires mod_rewrite to be enabled
RewriteEngine On

# Block vulnerability scanners
RewriteCond %{HTTP_USER_AGENT} nikto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sqlmap [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nessus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} openvas [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zmeu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} masscan [NC,OR]

# Block scrapers
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitesucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scrapy [NC,OR]

# Block default tool user agents
RewriteCond %{HTTP_USER_AGENT} python-urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} python-requests [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-http-client [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^java/ [NC,OR]

# Block empty user agents
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]

RewriteRule .* - [F,L]

The [NC] flag makes the match case-insensitive. The [OR] flag chains conditions with OR logic (default is AND). The [F,L] flags return a 403 Forbidden response and stop processing further rules.

Method 2: Using a Combined Regex

A single RewriteCond with a combined regex is more compact:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (nikto|sqlmap|nessus|openvas|zmeu|masscan|httrack|sitesucker|scrapy|python-urllib|python-requests|libwww-perl|go-http-client|^java/) [NC]
RewriteRule .* - [F,L]

Method 3: Using mod_setenvif

An alternative to mod_rewrite that some administrators prefer:

# Set an environment variable for bad user agents
SetEnvIfNoCase User-Agent "nikto" bad_bot
SetEnvIfNoCase User-Agent "sqlmap" bad_bot
SetEnvIfNoCase User-Agent "nessus" bad_bot
SetEnvIfNoCase User-Agent "zmeu" bad_bot
SetEnvIfNoCase User-Agent "httrack" bad_bot
SetEnvIfNoCase User-Agent "scrapy" bad_bot
SetEnvIfNoCase User-Agent "libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^$" bad_bot

# Deny access based on the variable

    Require all granted
    Require not env bad_bot

.htaccess Method

If you do not have access to the main server configuration (shared hosting), you can place rules in .htaccess:

# .htaccess
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (nikto|sqlmap|nessus|zmeu|masscan|httrack|scrapy|libwww-perl|python-urllib|^$) [NC]
RewriteRule .* - [F,L]

Note that .htaccess rules are processed on every request and have a performance cost compared to VirtualHost-level configuration. For high-traffic sites, configure blocking in the main server config or VirtualHost block when possible.

Testing Your Rules

After implementing user agent blocking, test from the command line to verify the rules are working:

# Test with a blocked user agent (should return 403)
curl -A "nikto" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403

# Test with a normal user agent (should return 200)
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 200

# Test with an empty user agent
curl -A "" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403

# Test case sensitivity
curl -A "NIKTO" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403 (if case-insensitive matching is configured)

Monitoring Blocked Requests

Track blocked requests in your access logs to understand what you are blocking and whether legitimate traffic is affected:

# Nginx — count 403 responses by user agent (last 1000 lines)
tail -1000 /var/log/nginx/access.log | awk '$9 == 403 {print $12}' | sort | uniq -c | sort -rn

# Apache — count 403 responses
tail -1000 /var/log/apache2/access.log | awk '$9 == 403 {print $12}' | sort | uniq -c | sort -rn

Limitations of User Agent Blocking

User agent blocking is a useful first filter, but it has significant limitations:

  • Trivial to bypass: Any attacker can set a custom user agent string. Sophisticated bots use real browser user agent strings.
  • False positives: Legitimate monitoring tools, API clients, and health checkers may use default library user agents. Always whitelist your own tools.
  • Not a substitute for a WAF: User agent blocking catches unsophisticated automated attacks but does nothing against targeted attacks using real browser user agents.
  • Maintenance burden: New tools and bots appear constantly. The block list needs regular updates.

For comprehensive bot detection and attack mitigation, a web application firewall analyzes request patterns, headers, payloads, and behavioral signals — far beyond just the user agent string.

Comprehensive Web Server Protection

User agent blocking is one layer of a defense-in-depth strategy. Combine it with rate limiting, IP-based blocking for known brute force sources, and a WAF that provides intelligent bot detection, request inspection, and real-time threat intelligence. Explore NOC.org's WAF plans for automated protection that goes far beyond user agent filtering.

Improve Your Websites Speed and Security

14 days free trial. No credit card required.