Amazon Kinesis Data Streams is a fully managed, serverless data streaming service designed for decoupling data producers from data consumers. This allows data systems to independently scale up or down and to act as a buffer between data systems operating at different speeds.
Data streams are used for a number of business and gaming applications allowing for a continuous data stream to be produced and then analyzed by AWS or other services. Kinesis then delivers a scalable distribution back out to applications retrieving that data from a stream.
Developers can depend on Kinesis for common requirements that required manual coding in the past. Kinesis delivers out-of-the-box:
- High availability
- High durability
- High read concurrency
- Data retention
Further, developers can enable other common data stream requirements by configuring Kinesis:
- High throughput with low latency
- Fan out to multiple consumers
- Ordered consumption of data (default is unordered)
- Replay data for downstream failures
After processing, Kinesis is commonly paired with other AWS big data resources and services including S3, Amazon Redshift, and Amazon DynamoDB.
Data Streams and Shards
Data streams are made up of “shards”. A shard is a unit of capacity that provides 1 MB/second of write and 2 MB/second of read throughput. Total capacity of a given stream is the sum of the capacities of its shards. Some other useful information to remember:
- All data is stored for a default of 24 hours
- This can be extended to 365 days using the IncreaseRetentionPeriod operation.
- They can be split or merged, aiding in scaling up or down
- They can replay data in inside of a 24 hour window
Amazon Kinesis Capacity Modes
Similar to DynamoDB, AWS offers two capacity modes for Amazon Kinesis Data Streams: the On-Demand mode and the Provisioned mode.
The on-demand mode is more expensive and the best choice for unknown or unpredictable workloads, including:
- Unknown application traffic including spiky/variable usage
- Business critical streams where the risk of throttling is unacceptable
- Application usage is low and creating provisioned streams is not worth the effort
The provisioned mode requires you to provision the number of shards necessary to satisfy your app’s write and request rate. Provisioned is dramatically cheaper than the On-demand mode. Provisioned Data Streams are ideal for predictable application traffic.
You can read more about selecting between Data Stream Capacity Modes in the AWS documentation here.
Comparing Costs between Data Stream Capacity Modes
In Provisioned Mode, you are charged for:
- Volume of data written to a shard ($0.014 per 1,000,000 payload units – each is 25KB)
- Per-hour charge for each shard ($0.015 per shard/hour)
So for one shard running on maximum utilization processing 30 TB of data per year would thus cost $149.40 ($0.015 * 24 * 365 + 0.014*30*1024*1024*1024/(25*1000000)).
In On-Demand Mode, you are charged for:
- The volume of data ingested ($0.08 per GB)
- The volume of data and retrieved ($0.04 per GB)
- The per-hour charge for each on-demand data stream ($0.04 per hour)
Processing 30 TB of data per year would cost $4,036.80 (0.014*30*1024*1024*1024/(25*1000000)).
To do your own cost estimates, use the AWS Pricing Calculator for Kinesis Data Streams.
Optimizing Shards
Selecting the right number of shards is critical as over-provisioning would lead to unnecessary costs, while under-provisioning would result in throttled data processing. In reality, there is rarely a constant stream of data, thus the ability to dynamically adjust the number of shards is essential.
Unfortunately, AWS does not provide a way to automatically set the optimal number of shards.
How to manually enable Autoscaling for Amazon Kinesis Data Stream Modes
To create our own autoscaling for optimized Kinesis pricing, we will use a basic formula that measures demand and responds dynamically to data stream capacity needs.
- Set up two CloudWatch Email Alarms per Kinesis stream; one Alarm to determine if the Kinesis stream needs to be scaled up, and one to determine if capacity should be scaled down.
- Each Alarm calculates the optimal number of shards for a Kinesis stream based on the current value of IncomingRecords, IncomingBytes, and GetRecords.Bytes metrics.
- The alarm then compares this number to the current number of shards, and the Alarm is triggered if the number is above/below a specified threshold:
- When the number of incoming shards exceeds 80% of the total bandwidth of shards for one minute, we recommend scaling up.
- When the number of incoming shards is below 60% of the total bandwidth of shards for 30 minutes, we recommend scaling down.
- The alarm then compares this number to the current number of shards, and the Alarm is triggered if the number is above/below a specified threshold:
- To determine the optimal number of shards for a Kinesis stream, use the following equations:
For most applications, we recommend a combination of provisioned and on-demand modes.
For example, if your data streams have predictable demand during working hours, you can use a simple Lambda function to a certain capacity in provisioned mode. Then, when there is reduced and less predictable demand at night, switch to on-demand mode.