In this post we will cover all the details and considerations taken to make for a successful deployment of a product or service leveraging the Redis Enterprise TimeSeries Module.
We will walk through:
- Identifying a problem area that TimeSeries can fix
- Laying out the path to success by
- Setting Up the proper instance in Azure
- Understanding the TimeSeries Module features
- Key/Data Topology
- Planning for creating data sets that are downsampled for quick aggregation
Where TimeSeries Module Fits In
For this scenario we will look at replacing some long-standing processes that create fixed cost and ongoing development costs. We will also leverage Node with RedisTimeSeries-JS. We will also be propping up our Redis TimeSeries instance in Azure
Success with Redis Enterprise TimeSeries Module
Getting started in Azure
Since the first post in this series Redis Labs has fully launched Redis Enterprise in Azure!
- In the Azure portal, search for Azure Cache for Redis.
- Click + Add.
- Complete the initial screen with your needed details, select the Enterprise plan that fits your immediate needs.
- Select appropriate public/private endpoint.
- On the advanced screen we will select the modules that we need to use in this instance of Redis Enterprise.
- Review and create the instance!
- Once the instance has created, configure your RedisInsight so that you can see the new cluster.
Understanding the TimeSeries Module Features
The Redis Enterprise platform brings with it the standard expectation of a tried-and-true tool in any modern stack but with scale in mind. It is important to cover the features that can be leveraged and combined within the TimeSeries Module so early planning can take these into consideration.
There are a few key concepts that we need to expand on so that you can get most of the implementation. Let’s cover the concepts that need to be understood before ingesting data. (We will talk consumption in depth in the next post.)
Keys, this is not a new concept and are the way to sink data into the instance. There are some attributes of keys that you can fine-tune data policies.
- retention, give your data a TTL
- labels, give context and additional attributes to your data
- duplicate policy, determine what happens when you have data bumping into each other
- compression policy
Rules, make the data you are storing reactive to data ingest.
- are applied to a specific key.
- are predefined aggregation procedures
- a destination key for the reactive data must be present
- a predefined aggregation option must be provided
- the time interval for the sampled data must be provided
Labels, the metadata option to give your data more context.
- Label/Value collection that represents broader context of the underlying data
- valuable at the digest/consumption layer to provide aggregation across keys by other similar characteristics
If migrating from another system or starting from the ground up there will be some data points that need to be mapped to appropriate key structures. Data points can be ingested as singular readings but are usually accompanied by additional data that gives a full picture of what just occurred. Data ingest could come in at extremely high sampling rates (health devices, IoT sensors, weather apparatus) for the same system providing the data point, and we need to plan on how to handle this at scale so that we can surface downsampled data concisely without impacting performance.
Let’s take a look at a sample log reading for a weather sensor.
This sample log has many constant attributes, and some interesting data points that can vary with every reading. If the scientific equipment produces a sample each second, then over a period of 24 hours we will receive 86,400 readings. If we had 100 devices in a particular region providing realtime weather data, we could easily ingest close to 9M logs. To perform some level of analytics (either visually or with additional ETL/ML pipelines) we need to downsample the footprint.
Traditionally, one might sink this log reading into Cosmos or Mongo and aggregate on demand while handling the downsampling in code. This is problematic, does not scale well, and is destined to fail at some point. The better approach is to sink this data into Redis Enterprise using the TimeSeries Module. The first thing to planning for a successful TimeSeries implementation is flattening your logs by identifying both reading data and metadata.
With our sample log we can build a pattern for key definitions that we can sink readings into while attaching, define the metadata we will attach with the keys and resolving a key path.
Let’s evaluate again with some context.
With this context we can outline the keys we will need to represent each attribute in our log. If it is a reading, it needs a key. We will leverage the identifier combined with the readings to create the following keys based on this pattern.
With this topology defined we need to create keys for the attributes.
Before we create the keys, we need to identify the meta-data that we are going to associate with them as well as the TTL for series data and our duplicate policy. TTL will default to 0 by default but in our case, we want to keep the data for 30 days and take the last reading if there was contention on a series. This
Based on our sample log we need to add the following labels to our keys.
Now let’s create the keys. You can always create the keys from the CLI in RedisInsight like so.
This works but to complete this for many sensors across many attributes we should lean in on some code to streamline this generation. Using the redistimeseries-js library we can handle this programmatically.
We can validate that our key has been created by using the RedisInsight CLI command TS.INFO.
Making Data Reactive with rules
Aggregation out of the Redis Enterprise TimeSeries Module is very fast but can be optimized to provide even better performance if you know what kind of data summaries your data scientist, analyst, or systems need to make correct decisions. Using our singular device that is producing real time weather information we have determined that there are a few intervals that will be extracted regularly.
This is where we can define rules that react to data as it is ingested and downsampled using the built-in aggregation techniques to provide concise representation of many raw logs into a real time data point.
NOTE: there are a couple of things about rules that need to be made understood upfront.
- rules need an aggregation type.
- rules need a time interval to retain samples for.
- rules need downstream keys to sink the reactive data into.
- rules will not resample destination key data if rule is added after ingesting has begun, that is why we are defining these before ingesting data.
For this post we are going to focus on creating rules for the temp attribute. We can create our 3 destination keys and 3 rules from the CLI as well. We will define what aggregation type we are scoping to in the key structure with the interval it is concerned with.
This can also be completed via code!
We can use TS.INFO to see our new rules just the same and now running it against our base key we will see the rules attached.
Now that we have all of our keys and rules created, we can look into ingesting some sample data.
For the purpose of this post, we will continue to focus on ingesting a single attribute from our sample log. As data readings are coming from the system, we would have an API on the edge that would parse the log and update the TimeSeries instance. We are going to simply ingest a few days worth of sample data for demo purposes around the temp attribute.
Using the base code and redistimeseries-js module that we set up earlier we can add some functionality to simulate this process. You can tweak the setInterval so that you can ingest different attributes or intervals. I ingested Jan 1 12AM through Jan 2 3:40AM with one minute interval logs.
Once ingest is complete we can return to RedisInsight and use TS.INFO to retrieve our base key details as well as take a peak at the destination keys that have the groomed data represented in real time!
A quick look over the base key entries count (highlighted), and the destination keys gives us good context. All the down sampled data is accurately represented by the time buckets we allocated for them to produce.
Awesome! Hopefully at this point you have a grasp on the following concepts:
- Setting up an instance of Redis Enterprise in Azure with the TimeSeries Module
- Data Topology and flattening logs to linear attributes as keys
- Rules for reactive data and destination keys
- Using RedisInsight and some basic commands for checking series state (TS.INFO, TS.CREATE, TS.CREATERULE)
- How to programmatically interact with the TimeSeries instance to ingest and process data to scale
In the next post we will cover the exciting parts that really show off the power of extracting and working with the data in the Redis Enterprise TimeSeries instance. What to look forward to:
- Querying Data Out
- Exploring the destination key samples in detail
- Filtering with Labels
- API Code for composing complex objects across multiple keys
- Visualizing with Grafana