Using HTTP 403 and HTTP 444 to Deny Bot Access

Introduction

Denying access to unwanted bots is crucial for protecting resources and maintaining the performance of your web server. Two HTTP status codes commonly used for this purpose are HTTP 403 and HTTP 444. This guide will show you how to use these status codes to block bots on both Apache and Nginx, complete with examples and illustrations.

Part 1: Using HTTP 403 to Deny Bot Access

1.1 What is HTTP 403?

HTTP 403 (Forbidden) is a status code indicating that the server understands the request but refuses to authorize it. This is a common method to deny access to bots that you don’t want crawling your website.

1.2 Configuration in Apache

To use HTTP 403 in Apache, you need to configure the .htaccess file. The example below blocks bots based on the User-Agent:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (MJ12bot|SemrushBot|DotBot|MegaIndex|AspiegelBot|PetalBot|BLEXBot|serpstatbot|bingbot|proximic|Barkrowler|SeekportBot|YandexBot|Pinterestbot|DataForSeoBot|GrapeshotCrawler|FemtosearchBot|CriteoBot|Amazonbot|GeedoBot|Bytespider|ClaudeBot|SEOkicks|GeedoProductSearch|ImagesiftBot|AwarioBot) [NC]
RewriteRule .* - [F,L]
</IfModule>

Explanation:

  • RewriteEngine On: Enables the rewrite engine.
  • RewriteCond %{HTTP_USER_AGENT}: Checks the User-Agent string.
  • RewriteRule .* - [F,L]: Denies access with a 403 status code.

Illustration:

1.3 Configuration in Nginx

In Nginx, you can configure blocking bots using the following code in the nginx.conf file:

if ($http_user_agent ~* (MJ12bot|SemrushBot|DotBot|MegaIndex|AspiegelBot|PetalBot|BLEXBot|serpstatbot|bingbot|proximic|Barkrowler|SeekportBot|YandexBot|Pinterestbot|DataForSeoBot|GrapeshotCrawler|FemtosearchBot|CriteoBot|Amazonbot|GeedoBot|Bytespider|ClaudeBot|SEOkicks|GeedoProductSearch|ImagesiftBot|AwarioBot)) {
    return 403;
}

Explanation:

  • if ($http_user_agent ~* ...): Checks the User-Agent string, case-insensitively.
  • return 403: Returns a 403 Forbidden status code.

Illustration:

Part 2: Using HTTP 444 to Deny Bot Access

2.1 What is HTTP 444?

HTTP 444 is an unofficial status code used by Nginx to close the connection without sending any response to the client. This helps conserve server resources and prevents bots from knowing they were blocked.

2.2 Configuration in Nginx

To use HTTP 444 in Nginx, you can configure as follows:

if ($http_user_agent ~* (MJ12bot|SemrushBot|DotBot|MegaIndex|AspiegelBot|PetalBot|BLEXBot|serpstatbot|bingbot|proximic|Barkrowler|SeekportBot|YandexBot|Pinterestbot|DataForSeoBot|GrapeshotCrawler|FemtosearchBot|CriteoBot|Amazonbot|GeedoBot|Bytespider|ClaudeBot|SEOkicks|GeedoProductSearch|ImagesiftBot|AwarioBot)) {
    return 444;
}

Explanation:

  • if ($http_user_agent ~* ...): Checks the User-Agent string, case-insensitively.
  • return 444: Returns a 444 status code, closing the connection without response.

Illustration:

Part 3: Conclusion

Using HTTP 403 and 444 status codes to deny bot access is an effective way to protect your website from harmful or unwanted bots. HTTP 403 provides a clear indication that access is forbidden, while HTTP 444 conserves server resources by closing the connection without a response.

Examples and illustrations help you visualize and apply these configurations to your web server. Choose the method that best suits your needs and system architecture to protect your website effectively.

References

Knowledge Base Linux
Knowledge Base Linux

Kblinux is an abbreviation for the phrase "Knowledge Base Linux." The website shares instructional articles related to the Linux system. I hope my small blog will reach many people who share the same passion for Linux.

Articles: 42

KbLinux

Typically replies within a day

Hello, Welcome to the site. Please click below button for chatting me through Telegram.