Request Metrics is a performance analytics tool. As such, we need to build a data ingestion pipeline that will consume performance data and process it. When building data processing pipelines it’s advisable to separate the ingestion piece from the processing piece. This allows some flexibility and an opportunity to add redundancy and reliability to each part.
One common approach to ingest is to shove incoming data into a queue. Queues are meant to handle huge volumes of data and store it for future processing. A queue sounds like exactly what we want in this case. The problem is, we like simple things, and figuring out fancy distributed queues is anything but. So we’re going to eschew best practices and use NGINX as our queue!
NGINX as a Queue?
At its core, a queue is essentially an ordered list of data items. One other property of most queues is “durability.” That is, once you shove some data in to a queue, it will be stored reliably until processed and removed. Things can crash, processes can fail, reboots occur, but that data will survive.
There are dozens of queueing systems in the wild. Many have advanced features like replication, at-least-once delivery, and support for many consumers. We don’t need all that. We really just need a relatively durable ordered list of data. That sounds an awful lot like a simple log file to us! And you know what’s really good at writing log files to disk? NGINX!
Logging to the Rescue
In order to make our log file more “queue like” and palatable for later processing, we want to format it as JSON. We want each line in the file to be one “performance payload” from a remote browser agent. Then we can rip through the log with our processor, line by line, and operate on the JSON.
Let’s imagine that we’re sending all of our performance data to an
/ingest endpoint backed by NGINX. We want any data transmitted to that endpoint logged in a specific
ingest.log file, separate from the normal NGINX
Specifying a specific logfile for an endpoint
This is a good start, we’ve got a specific log file we’re writing to now. But we want a JSON formatted log, not the default NGINX format! Fortunately it’s as easy as telling NGINX exactly what format we want.
Creating a custom logging format in NGINX
In the configuration above you can see we’re creating a very basic JSON structure. And we’re taking the raw request body and piping it in to the log file as well. But there is an issue. The
$request_body variable is not always populated.
There are a number of workarounds but for now we’ve chosen to simply
proxy_pass to a local endpoint. The other thing we need to do is tell NGINX to use our new custom log format! These things are both done in the site configuration file we modified earlier.
Updated NGINX site configuration
Now we have a specific endpoint (
/ingest) that is logging to a bespoke file (
/var/log/nginx/ingest.log), in a format that will make later processing much easier. There are still some things we could do to improve this, like buffering log writes, but this will get us moving for now!