1949catering.com

Efficiently Manage S3 Object Lifecycles with AWS CDK

Written on

Chapter 1: Introduction to AWS S3 and CDK

Amazon's Simple Storage Service (S3) offers a highly versatile cloud storage solution, equipped with an array of features. These include automatically initiating events and workflows when objects are added or removed, as well as implementing complex lifecycle rules that allow for transitioning objects to more economical storage options, such as Glacier.

The AWS Cloud Development Kit (CDK) provides developers the ability to describe their infrastructure using code in familiar programming languages, eliminating the need for manual management or writing raw CloudFormation templates.

Background

In my current setup, I maintain an S3 bucket that serves as a repository for a data dump generated through a nightly data pipeline. This bucket consists of two primary directories:

  • raw: where the newly downloaded data is initially stored.
  • processed: where data resides after it has completed the necessary processing steps.

It’s important to note that although the AWS console presents these as directories, they are actually prefixes in the S3 keys. For simplicity, I will refer to them as directories.

The pipeline is structured in CDK, which includes the definition of my bucket as follows:

import * as s3 from "aws-cdk-lib/aws-s3";

const dataBucket = new s3.Bucket(this, "dataBucket", {

blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,

encryption: s3.BucketEncryption.S3_MANAGED,

});

Growing Data Challenges

As mentioned earlier, this pipeline runs on a nightly basis. Each night, new data is downloaded into the raw directory, named according to the date (e.g., 2024-03-01.raw). Over the course of several months, the size of my bucket increased significantly, retaining a substantial amount of raw data that was no longer necessary. Although S3 Standard pricing is relatively affordable at $0.023 per GB in the us-east-1 region for the first 50 TB, the accumulation of extra data was driving my costs up.

Implementing a Lifecycle Rule

To address this issue, I decided to implement a lifecycle rule for my bucket. S3 offers a variety of lifecycle management options, but my requirement was straightforward: I wanted to automatically expire and delete items after three days.

const dataBucket = new s3.Bucket(this, "dataBucket", {

blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,

encryption: s3.BucketEncryption.S3_MANAGED,

lifecycleRules: [

{

expiration: cdk.Duration.days(3),

}

]

});

Narrowing the Scope of the Lifecycle Rule

However, I encountered a bug when I initially set up the lifecycle rule. While the raw directory contained files that would only be needed for a couple of days, some items in the processed directory required a longer retention period. This resulted in an error in my application that leverages the processed data, indicating that some files were missing due to the lifecycle rule's automatic deletions.

To resolve this, I simply adjusted the scope of my rule by specifying the prefix of the key, keeping in mind that "directories" are not actual entities in S3:

const dataBucket = new s3.Bucket(this, "dataBucket", {

blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,

encryption: s3.BucketEncryption.S3_MANAGED,

lifecycleRules: [

{

expiration: cdk.Duration.days(3),

prefix: "raw",

}

]

});

Now, the bucket will continue to automatically delete data in the raw directory while leaving my processed data intact.

For further details, I highly recommend reviewing the official documentation for the LifecycleRule interface in CDK, which offers various options to effectively manage your data and potentially reduce costs.

Other CDK Resources

Here are additional resources that might prove beneficial when working with AWS CDK:

  • The official AWS Reference for documentation on all constructs.
  • Construct Hub for open-source constructs.
  • My articles detailing solutions I’ve implemented using CDK.

I enjoy sharing insights on software development, project management, and my experiences with AWS Cloud. If you’re interested in more content, consider following me on Medium, Twitter, or LinkedIn.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you leave:

  • Be sure to clap and follow the writer ️👏️️
  • Follow us: X | LinkedIn | YouTube | Discord | Newsletter
  • Check out our other platforms: Stackademic | CoFeed | Venture | Cubed
  • More content available at PlainEnglish.io

Chapter 2: Video Tutorials on Managing S3 Lifecycle

This video tutorial provides a comprehensive guide on implementing AWS S3 lifecycle rules, detailing how to delete files from S3 after a specified number of days.

Explore seven distinct methods for deleting objects from AWS S3, including practical guidance on when and how to utilize each approach.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Letting Go: The Weight of Our Hearts and Minds

Explore the importance of releasing burdens from the heart and mind for a more fulfilling life.

Substack Reigns Supreme: The Newsletter Platform of Choice

Discover why Substack stands out as the leading newsletter platform over ConvertKit and its benefits for users.

Understanding Rigid Body Dynamics: From Inertia Tensor to Lagrangian

Explore rigid body dynamics, focusing on inertia tensors and the Lagrangian approach.

Understanding Relationship Value: The Cost of Love in Modern Times

Exploring the investment of time, effort, and resources in relationships and the importance of recognizing true value beyond material costs.

Exploring AI: The Intersection of Science Fiction and Reality

Delve into the intriguing evolution of AI from science fiction to real-world applications, examining its impact and future potential.

Unveiling the Origins of Earth's Water: A Cosmic Journey

This article explores the intriguing theory that asteroids played a vital role in delivering water to Earth, shaping its history and environment.

A Robot Tax: A New Approach to Employment Challenges

Exploring the necessity of a robot tax in the face of rising automation and job displacement.

A Scientific Approach to Meditation: Balancing Mind and Health

Explore the science behind meditation and its benefits for mental health without religious affiliations.