API Rate Limiting by IP with NGINX

Last time I wrote about the HTTP requests rate limit in ASP.NET Core. It works well when you are hosting a simple REST API or website on Kestrel web-server. But you can achieve even more by hosting your application behind a reverse proxy server—the most popular one nowadays in NGINX. According to W3Techs it hosts 32.7% of websites at the beginning of 2021.

Before showing you how to set up the NGINX rate limits, I would like to discuss why Kestrel is not enough and why you may use a reverse proxy server. Kestrel is an excellent high-performance web-server. Well, it is even at the top of TechEmpower Web Framework Benchmarks. Kestrel supports HTTPS, HTTP/2 and comes with .NET Core. But, as with most built-in web-servers on the market, it does not support more advanced features like load balancing. Also, Kestrel is a bit slow when hosting static files. I had performance boosts for a few projects in the past by moving static content out of the ASP.NET Core application to NGINX and IIS. Of course, moving to a CDN, e.g., Amazon CloudFront, will give you the best performance.

Let's reuse the API I built in the previous article

[ApiController]
[Route("[controller]")]
public class SampleController : ControllerBase
{
    [HttpGet]
    [Route("time")]
    public TimeResponse GetTime()
    {
        var response = new TimeResponse { Time = DateTime.Now };
        return response;
    }

    [HttpGet]
    [Route("status")]
    public IActionResult GetStatus()
    {
        return Ok("OK");
    }
}

Our deployment will include a public NGINX proxy server that passes traffic to the .NET API. The next step is to prepare a Dockerfile that builds and runs our application on .NET 5.

# build the app
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src

COPY . .
RUN dotnet restore
RUN dotnet publish -c release -o /app --no-restore

# build the final image
FROM mcr.microsoft.com/dotnet/aspnet:5.0
WORKDIR /app
COPY --from=build /  app ./
ENTRYPOINT ["dotnet", "RateLimiting.dll"]

The next step is to run the NGINX proxy server. I usually use the official NGINX image from Docker Hub. By default, it is ready to serve static content, but you can customize it by placing your default.conf or any domain-specific configuration files to /etc/nginx/conf.d folder.

server {                                                                                                                               
    listen       80;                                                                                                                   
    server_name  localhost;                                                                                                            
                                                                                                                                       
    location /time {
        proxy_pass http://rate_api/sample/time;
    }

    location /status {
        proxy_pass http://rate_api/sample/status;
    }
}

It is the Dockerfile to build the NGINX image for the project.

FROM nginx
WORKDIR /etc/nginx/conf.d/
COPY default.conf .

The final step is to create a Docker Compose file to run the services.

version: "3.9"
services:
  web:
    build: ./nginx
    ports:
      - "8080:80"
    links:
      - rate_api
  rate_api:
    build: ./RateLimiting

Let's implement the requirements from the previous article:

The Time API allows only two requests/minute per IP. This rate does not make sense in the real world, but it is OK for testing purposes to see the actual rate limiting errors.
The Status API has no restrictions.

NGINX support three possible limits:

The number of connections per IP address
The request rates limit, e.g., total requests per IP address per second.
The download speed per client connection

You should configure the limit using limit_conn_zone and limit_req directives. First, you use the limit_conn_zone to define a key (usually an IP address), a shared memory zone to store each IP address's state and how often the URL has been requested, and the expected requests rate. The request rate value is either requests/second (r/s) or requests/minute (r/m) if a rate of less than one request per second is desired.

The next step is to apply the desired rate limit to a route, e.g., /time endpoint, using limit_req directive withing a location {} context.

limit_req_zone $binary_remote_addr zone=time_api:10m rate=2r/m;

server {                                                                                                                               
    ...                                                                           
    location /time {
        limit_req zone=time_api;
    }
    ...
}

In the following example, I allocated 10 MB to keep requests counter per IP address. Please, pay attention to the fact that I used $binary_remote_addr variable as a key instead of $remote_addr, which also holds a client's IP address. The reason for doing this is $binary_remote_addr variable holds the binary representation of IP address, which requires less memory and more efficient.

Sometimes you may want to test the limits first before enabling them on a production server. You can do that by adding limit_req_dry_run on; directive to your context.

location /time {
    limit_req zone=time_api;
    limit_req_dry_run on;
    ...
}

Once an IP address reaches the limit, you should see the error messages in NGINX logs, but the request will pass through anyway.

web_1       | 172.27.0.1 - - [17/Jan/2021:13:29:58 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"
web_1       | 2021/01/17 13:29:59 [error] 28#28: *3 limiting requests, dry run, excess: 0.963 by zone "time_api", client: 172.27.0.1, server: localhost, request: "GET /time HTTP/1.1", host: "localhost:8080"
web_1       | 172.27.0.1 - - [17/Jan/2021:13:29:59 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"
web_1       | 2021/01/17 13:30:00 [error] 28#28: *5 limiting requests, dry run, excess: 0.918 by zone "time_api", client: 172.27.0.1, server: localhost, request: "GET /time HTTP/1.1", host: "localhost:8080"
web_1       | 172.27.0.1 - - [17/Jan/2021:13:30:00 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"

By default, once the number of requests exceeds the specified rate, NGINX will respond with an error.

$ curl -v http://localhost:8080/time
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:8080...
* Connected to localhost (::1) port 8080 (#0)
> GET /time HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Temporarily Unavailable
< Server: nginx/1.19.6
< Date: Sun, 17 Jan 2021 13:37:35 GMT
< Content-Type: text/html
< Content-Length: 197
< Connection: keep-alive
<
{ [197 bytes data]
100   197  100   197    0     0  19700      0 --:--:-- --:--:-- --:--:-- 21888<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.19.6</center>
</body>
</html>

* Connection #0 to host localhost left intact

Sometimes you may want to keep the requests beyond the allowed limit in a queue and execute them later. It is doable with the burst parameter of the limit_req directive.

The final NGINX configuration file looks as follows:

limit_req_zone $binary_remote_addr zone=time_api:10m rate=2r/m;

server {                                                                                                                               
    listen       80;                                                                                                                   
    server_name  localhost;                                                                                                            
                                                                                                                                       
    location /time {
        limit_req zone=time_api burst=10;
        proxy_pass http://rate_api/sample/time;
    }

    location /status {
        proxy_pass http://rate_api/sample/status;
    }
}

Now you can run the services on your own and test them.

$ docker-compose up -d
Creating network "nginx_demo_default" with the default driver
Creating nginx_demo_rate_api_1 ... done
Creating nginx_demo_web_1      ... done
$ curl -v http://localhost:8080/time
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:8080...
* Connected to localhost (::1) port 8080 (#0)
> GET /time HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.19.6
< Date: Sun, 17 Jan 2021 13:44:50 GMT
< Content-Type: application/json; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
<
{ [55 bytes data]
100    44    0    44    0     0   3142      0 --:--:-- --:--:-- --:--:--  3142{"time":"2021-01-17T13:44:50.7810712+00:00"}
* Connection #0 to host localhost left intact

In this article, I showed how you could set up an ASP.NET Core REST API behind the NGINX proxy, run it using Docker and Docker Compose, and use NGINX built-in features to setup request rate limits. Please consult with NGINX official documentation if you need more details on rate limits capabilities or message me if you a looking for a consultant to help with a project.