Using NGINX as a Queue for JSON Data

06 May 2020

We need a queue to push our incoming performance data to. What we have is NGINX. Can we use NGINX and a custom log format to create a poor man’s queue? You bet we can. Request Metrics is a real user monitoring for your Core Web Vitals and website performance](/core-web-vitals/). As such, we need to build a data ingestion pipeline that will consume performance data and process it. When building data processing pipelines it’s advisable to separate the ingestion piece from the processing piece. This allows some flexibility and an opportunity to add redundancy and reliability to each part.

One common approach to ingest is to shove incoming data into a queue. Queues are meant to handle huge volumes of data and store it for future processing. A queue sounds like exactly what we want in this case. The problem is, we like simple things, and figuring out fancy distributed queues is anything but. So we’re going to eschew best practices and use NGINX as our queue!

NGINX as a Queue?

At its core, a queue is essentially an ordered list of data items. One other property of most queues is “durability.” That is, once you shove some data in to a queue, it will be stored reliably until processed and removed. Things can crash, processes can fail, reboots occur, but that data will survive.

There are dozens of queueing systems in the wild. Many have advanced features like replication, at-least-once delivery, and support for many consumers. We don’t need all that. We really just need a relatively durable ordered list of data. That sounds an awful lot like a simple log file to us! And you know what’s really good at writing log files to disk? NGINX!

Logging to the Rescue

In order to make our log file more “queue like” and palatable for later processing, we want to format it as JSON. We want each line in the file to be one “performance payload” from a remote browser agent. Then we can rip through the log with our processor, line by line, and operate on the JSON.

Let’s imagine that we’re sending all of our performance data to an /ingest endpoint backed by NGINX. We want any data transmitted to that endpoint logged in a specific ingest.log file, separate from the normal NGINX access.log.

Specifying a specific logfile for an endpoint


location /ingest {
    # We can use the access_log directive to specify a custom log file location
    access_log /var/log/nginx/ingest.log;

    return 202; # Acknowledge the request has been received but may require further processing
}

NGINX site configuration file

This is a good start, we’ve got a specific log file we’re writing to now. But we want a JSON formatted log, not the default NGINX format! Fortunately it’s as easy as telling NGINX exactly what format we want.

Creating a custom logging format in NGINX


# Inside nginx.conf
http {
    # The log_format directive takes three parameters.
    # "json_ingest" is the name
    # escape=json tells NGINX to escape the data as if it were JSON
    # the third argument is the format string, where you can use many variable replacements
    log_format json_ingest escape=json '{ "timestamp": "$time_iso8601", "body": "$request_body" }';
}

Custom log format declaration inside nginx.conf

In the configuration above you can see we’re creating a very basic JSON structure. And we’re taking the raw request body and piping it in to the log file as well. But there is an issue. The $request_body variable is not always populated.

There are a number of workarounds but for now we’ve chosen to simply proxy_pass to a local endpoint. The other thing we need to do is tell NGINX to use our new custom log format! These things are both done in the site configuration file we modified earlier.

Updated NGINX site configuration


location /ingest {
    # Update the access_log directive to use our custom format (named json_ingest)
    access_log /var/log/nginx/ingest.log json_ingest;

    # This ensures the $request_body variable is populated in our logs
    proxy_pass 'http://localhost/dev/null';
}

# This is a "no op" endpoint that exists solely as a workaround
location /dev/null {
    # We don't want to double log this request to turn access_log off here
    access_log off;

    return 202;
}

Updated NGINX site configuration file

That’s it!

Now we have a specific endpoint (/ingest) that is logging to a bespoke file (/var/log/nginx/ingest.log), in a format that will make later processing much easier. There are still some things we could do to improve this, like buffering log writes, but this will get us moving for now!

Eric Brandes

CTO Request Metrics