How to Block Scraper IPs with Nginx

Preventing web scraping is essential for website administrators. Beyond content theft, the server resources consumed by crawlers can be costly for many site owners.

This article presents a fundamental approach: analyzing Nginx logs to identify high-frequency IPs, then blocking them from accessing your site.

Identifying Scraper IPs

Execute the following command:

awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -n 10

Where:

access.log: Your Nginx access log file

This command analyzes request IPs in Nginx logs, identifying the top 10 most active IPs. Typically, an IP generating numerous requests in a short timeframe is likely a scraper.

Blocking IP Access

Create a blockip.conf file in your Nginx configuration directory to manage blocked IPs.

Add scraper IPs in this format:

deny IP;

Parameters:

deny: Nginx access control directive for restricting server access
IP: Target IP (supports both IPv4 and IPv6)

Include this configuration in your http, server, or location block:

include blockip.conf;

Restart Nginx to apply changes.

Blocked IPs will receive 403 Forbidden responses.

Care should be taken to identify search engine spider requests and CDN back-to-origin requests;

Advanced Blocking Patterns

# Block single IP
deny IP;

# Allow single IP
allow IP;

# Block all IPs
deny all;

# Allow all IPs
allow all;

# Block IP range
deny IP/24;

Combination example:

# Whitelist specific IP while blocking others
allow IP;
deny all;

How to Block Scraper IPs with Nginx

Identifying Scraper IPs

Blocking IP Access

Advanced Blocking Patterns

Recommended Reading：