Why Block User Agents?
Every HTTP request includes a User-Agent header that identifies the client software making the request. Legitimate browsers send user agent strings like Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36, while automated tools, scrapers, and vulnerability scanners often identify themselves with distinctive user agent strings — or use generic defaults that are easy to detect.
Blocking known malicious user agents reduces server load, prevents content scraping, thwarts automated vulnerability scanning, and eliminates noise in your access logs. While sophisticated attackers can trivially spoof their user agent, the vast majority of automated attacks use default tool identifiers. Blocking these provides a meaningful first layer of defense before deeper inspection by a web application firewall.
Common Malicious User Agents to Block
The following user agents are associated with automated tools, vulnerability scanners, and content scrapers that should be blocked on most production servers:
Vulnerability Scanners
Nikto— popular open-source web server scannersqlmap— automated SQL injection toolNessus— network vulnerability scannerOpenVAS— open-source vulnerability assessmentw3af— web application attack and audit frameworkAcunetix— web vulnerability scannerNetsparker— web application security scannerZmEu— scanner targeting phpMyAdmin vulnerabilitiesmasscan— high-speed port scanner
Content Scrapers and Bots
HTTrack— website copierSiteSucker— downloads entire websitesWebCopier— website downloading toolScrapy— Python web scraping frameworkMJ12bot— aggressive crawler (Majestic SEO)AhrefsBot— SEO crawler (block if not wanted)SemrushBot— SEO crawler (block if not wanted)
Default Tool User Agents
curl/— default curl user agent (used by many automated scripts)Wget/— default wget user agentPython-urllib— Python's default HTTP library user agentpython-requests— Python requests library defaultGo-http-client— Go's default HTTP clientJava/— Java's default HTTP user agentlibwww-perl— Perl's LWP library (used by many automated scripts)
Note: Blocking curl and wget default user agents may affect legitimate monitoring tools and health checks. Ensure your monitoring uses custom user agents before blocking these.
Blocking User Agents on Nginx
Method 1: Using map and if
The most efficient Nginx method uses a map block to evaluate the user agent once and store the result in a variable, then an if block to act on it:
# In the http {} block of nginx.conf
map $http_user_agent $block_user_agent {
default 0;
~*nikto 1;
~*sqlmap 1;
~*nessus 1;
~*openvas 1;
~*w3af 1;
~*acunetix 1;
~*netsparker 1;
~*zmeu 1;
~*masscan 1;
~*httrack 1;
~*sitesucker 1;
~*scrapy 1;
~*python-urllib 1;
~*python-requests 1;
~*libwww-perl 1;
~*go-http-client 1;
~*java/ 1;
"" 1; # Block empty user agents
}
# In the server {} block
server {
...
if ($block_user_agent) {
return 403;
}
...
}
The ~* prefix makes the match case-insensitive. The map directive is evaluated lazily and only when the variable is used, making it efficient even with many entries.
Method 2: Simple if Block
For a smaller list, you can use a single regex in the server block:
server {
...
if ($http_user_agent ~* (nikto|sqlmap|nessus|zmeu|masscan|httrack|scrapy|libwww-perl)) {
return 403;
}
...
}
This is simpler but less maintainable for long lists, and the regex is evaluated on every request rather than using the optimized hash table of a map block.
Returning Different Responses
# Return 403 Forbidden
return 403;
# Return 444 (Nginx-specific: close connection with no response)
return 444;
# Return a custom error page
return 403 "Access denied.";
# Redirect to a honeypot
return 301 https://example.com/honeypot;
Using return 444 is the most efficient option for Nginx — it drops the connection immediately without sending any response headers, wasting the scanner's time and bandwidth.
Blocking User Agents on Apache
Method 1: mod_rewrite in VirtualHost or .htaccess
# Requires mod_rewrite to be enabled
RewriteEngine On
# Block vulnerability scanners
RewriteCond %{HTTP_USER_AGENT} nikto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sqlmap [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nessus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} openvas [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zmeu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} masscan [NC,OR]
# Block scrapers
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitesucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scrapy [NC,OR]
# Block default tool user agents
RewriteCond %{HTTP_USER_AGENT} python-urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} python-requests [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-http-client [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^java/ [NC,OR]
# Block empty user agents
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]
RewriteRule .* - [F,L]
The [NC] flag makes the match case-insensitive. The [OR] flag chains conditions with OR logic (default is AND). The [F,L] flags return a 403 Forbidden response and stop processing further rules.
Method 2: Using a Combined Regex
A single RewriteCond with a combined regex is more compact:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (nikto|sqlmap|nessus|openvas|zmeu|masscan|httrack|sitesucker|scrapy|python-urllib|python-requests|libwww-perl|go-http-client|^java/) [NC]
RewriteRule .* - [F,L]
Method 3: Using mod_setenvif
An alternative to mod_rewrite that some administrators prefer:
# Set an environment variable for bad user agents
SetEnvIfNoCase User-Agent "nikto" bad_bot
SetEnvIfNoCase User-Agent "sqlmap" bad_bot
SetEnvIfNoCase User-Agent "nessus" bad_bot
SetEnvIfNoCase User-Agent "zmeu" bad_bot
SetEnvIfNoCase User-Agent "httrack" bad_bot
SetEnvIfNoCase User-Agent "scrapy" bad_bot
SetEnvIfNoCase User-Agent "libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^$" bad_bot
# Deny access based on the variable
Require all granted
Require not env bad_bot
.htaccess Method
If you do not have access to the main server configuration (shared hosting), you can place rules in .htaccess:
# .htaccess
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (nikto|sqlmap|nessus|zmeu|masscan|httrack|scrapy|libwww-perl|python-urllib|^$) [NC]
RewriteRule .* - [F,L]
Note that .htaccess rules are processed on every request and have a performance cost compared to VirtualHost-level configuration. For high-traffic sites, configure blocking in the main server config or VirtualHost block when possible.
Testing Your Rules
After implementing user agent blocking, test from the command line to verify the rules are working:
# Test with a blocked user agent (should return 403)
curl -A "nikto" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403
# Test with a normal user agent (should return 200)
curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 200
# Test with an empty user agent
curl -A "" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403
# Test case sensitivity
curl -A "NIKTO" -o /dev/null -s -w "%{http_code}" https://example.com
# Expected output: 403 (if case-insensitive matching is configured)
Monitoring Blocked Requests
Track blocked requests in your access logs to understand what you are blocking and whether legitimate traffic is affected:
# Nginx — count 403 responses by user agent (last 1000 lines)
tail -1000 /var/log/nginx/access.log | awk '$9 == 403 {print $12}' | sort | uniq -c | sort -rn
# Apache — count 403 responses
tail -1000 /var/log/apache2/access.log | awk '$9 == 403 {print $12}' | sort | uniq -c | sort -rn
Limitations of User Agent Blocking
User agent blocking is a useful first filter, but it has significant limitations:
- Trivial to bypass: Any attacker can set a custom user agent string. Sophisticated bots use real browser user agent strings.
- False positives: Legitimate monitoring tools, API clients, and health checkers may use default library user agents. Always whitelist your own tools.
- Not a substitute for a WAF: User agent blocking catches unsophisticated automated attacks but does nothing against targeted attacks using real browser user agents.
- Maintenance burden: New tools and bots appear constantly. The block list needs regular updates.
For comprehensive bot detection and attack mitigation, a web application firewall analyzes request patterns, headers, payloads, and behavioral signals — far beyond just the user agent string.
Comprehensive Web Server Protection
User agent blocking is one layer of a defense-in-depth strategy. Combine it with rate limiting, IP-based blocking for known brute force sources, and a WAF that provides intelligent bot detection, request inspection, and real-time threat intelligence. Explore NOC.org's WAF plans for automated protection that goes far beyond user agent filtering.