Efficiently Manage S3 Object Lifecycles with AWS CDK
Written on
Chapter 1: Introduction to AWS S3 and CDK
Amazon's Simple Storage Service (S3) offers a highly versatile cloud storage solution, equipped with an array of features. These include automatically initiating events and workflows when objects are added or removed, as well as implementing complex lifecycle rules that allow for transitioning objects to more economical storage options, such as Glacier.
The AWS Cloud Development Kit (CDK) provides developers the ability to describe their infrastructure using code in familiar programming languages, eliminating the need for manual management or writing raw CloudFormation templates.
Background
In my current setup, I maintain an S3 bucket that serves as a repository for a data dump generated through a nightly data pipeline. This bucket consists of two primary directories:
- raw: where the newly downloaded data is initially stored.
- processed: where data resides after it has completed the necessary processing steps.
It’s important to note that although the AWS console presents these as directories, they are actually prefixes in the S3 keys. For simplicity, I will refer to them as directories.
The pipeline is structured in CDK, which includes the definition of my bucket as follows:
import * as s3 from "aws-cdk-lib/aws-s3";
const dataBucket = new s3.Bucket(this, "dataBucket", {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
});
Growing Data Challenges
As mentioned earlier, this pipeline runs on a nightly basis. Each night, new data is downloaded into the raw directory, named according to the date (e.g., 2024-03-01.raw). Over the course of several months, the size of my bucket increased significantly, retaining a substantial amount of raw data that was no longer necessary. Although S3 Standard pricing is relatively affordable at $0.023 per GB in the us-east-1 region for the first 50 TB, the accumulation of extra data was driving my costs up.
Implementing a Lifecycle Rule
To address this issue, I decided to implement a lifecycle rule for my bucket. S3 offers a variety of lifecycle management options, but my requirement was straightforward: I wanted to automatically expire and delete items after three days.
const dataBucket = new s3.Bucket(this, "dataBucket", {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
lifecycleRules: [
{
expiration: cdk.Duration.days(3),}
]
});
Narrowing the Scope of the Lifecycle Rule
However, I encountered a bug when I initially set up the lifecycle rule. While the raw directory contained files that would only be needed for a couple of days, some items in the processed directory required a longer retention period. This resulted in an error in my application that leverages the processed data, indicating that some files were missing due to the lifecycle rule's automatic deletions.
To resolve this, I simply adjusted the scope of my rule by specifying the prefix of the key, keeping in mind that "directories" are not actual entities in S3:
const dataBucket = new s3.Bucket(this, "dataBucket", {
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
encryption: s3.BucketEncryption.S3_MANAGED,
lifecycleRules: [
{
expiration: cdk.Duration.days(3),
prefix: "raw",
}
]
});
Now, the bucket will continue to automatically delete data in the raw directory while leaving my processed data intact.
For further details, I highly recommend reviewing the official documentation for the LifecycleRule interface in CDK, which offers various options to effectively manage your data and potentially reduce costs.
Other CDK Resources
Here are additional resources that might prove beneficial when working with AWS CDK:
- The official AWS Reference for documentation on all constructs.
- Construct Hub for open-source constructs.
- My articles detailing solutions I’ve implemented using CDK.
I enjoy sharing insights on software development, project management, and my experiences with AWS Cloud. If you’re interested in more content, consider following me on Medium, Twitter, or LinkedIn.
In Plain English 🚀
Thank you for being a part of the In Plain English community! Before you leave:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Check out our other platforms: Stackademic | CoFeed | Venture | Cubed
- More content available at PlainEnglish.io
Chapter 2: Video Tutorials on Managing S3 Lifecycle
This video tutorial provides a comprehensive guide on implementing AWS S3 lifecycle rules, detailing how to delete files from S3 after a specified number of days.
Explore seven distinct methods for deleting objects from AWS S3, including practical guidance on when and how to utilize each approach.