top of page

Manage Your Amazon S3 Objects with Amazon S3 Metadata!

Manage Your Amazon S3 Objects with Amazon S3 Metadata!

manage your amazon S3 objects with amazon s3 metadata

Written by Minhyeok Cha


How do you manage your Amazon S3 objects? Do you search for them directly in the console? Use CLI or SDK? Or maybe you rely on Glue crawling? Recently, my company noticed that S3 costs were gradually piling up, so I started looking for ways to reduce them.


Initially, I thought, "We can just move unused data to Glacier, and that’s it." However, managing the massive amount of data accumulated over about six years in a single bucket turned out to be a bit tricky. That’s when I noticed the "table bucket" feature and thought, “Why not give the relatively new S3 Metadata a try?” Fortunately, it worked out well, and I’d like to share my experience.


 

Table od Contents

 

What is Amazon S3 Metadata?

amazon s3 metadata image
(Source: AWS)

You can find an introduction to Amazon S3 Metadata in an article I previously wrote, titled A Summary of Key Announcements from AWS re:Invent in 10 Minutes.

In that article, I mentioned that S3 Metadata can be integrated with AWS Glue Data Catalog. However, in this post, I’ll explore using AWS Lake Formation instead. Initially, I planned to use AWS Glue’s crawling feature, but decided to experiment with the officially released table bucket and Amazon S3 Metadata, which came out earlier this year.


 

What is AWS Lake Formation?

So, what exactly is AWS Lake Formation? AWS Lake Formation simplifies and automates the complex and time-consuming tasks involved in building a data lake. These tasks include collecting, cleaning, moving, cataloging data, and ensuring secure access for analytics and machine learning.


It also provides its own permission management model based on AWS Identity and Access Management (IAM).

This centralized permission management model allows for fine-grained access control to the data lake through a simple grant/revoke mechanism. Permissions in AWS Lake Formation can be applied at the table and column levels for all datasets in the data lake. Services integrated with this permission management include AWS Glue, Amazon Athena, Amazon Redshift Spectrum, and Amazon QuickSight. However, our primary goal is to access S3 objects for querying without crawling, so we’ll be using Lake Formation mainly as a connection pathway.


 

Demo

Since my company account has restricted permissions, this demo will be conducted using a test account.


💡 Table Buckets and Amazon S3 Metadata are only available in the Ohio and Northern Virginia regions.

Step 1: Create an S3 Table Bucket

using metadata demo 1

Step 2: Generate Metadata for the S3 Bucket to Test

using metadata demo 2

That completes the connection between S3 and the table bucket.


Step 3: Check with Amazon Athena

using metadata demo 3

However, if you try accessing Athena without cataloging, nothing will show up. In fact, you need to create a catalog through AWS Glue. Fortunately, a new feature in Lake Formation now allows for automatic alignment of S3 tables, making the setup process smoother.


Step 4: Enable S3 Table Integration in AWS Lake Formation

using metadata demo 4

When integrating, make sure to specify a role with S3 access permissions.


using metadata demo 5

Once the integration is successful, the catalog will be displayed as shown below. Go into the catalog and proceed with policy settings.


using metadata demo 6

In the Permissions section, click Grant to continue.


using metadata demo 7-1
using metadata demo 7-2
using metadata demo 7-3

If you followed the steps correctly, go to Athena to check if the S3 data appears as expected.


Step 5: Successful Amazon Athena Query!

using metadata demo 8

The data appeared without using AWS Glue, and the query executed successfully.


 

S3 Cost Optimization Strategy

The optimization process was straightforward. I created queries as shown below, downloaded the result as a CSV, and used the CLI to move the objects identified by the query to the Glacier storage class.

  S3 Cost Optimization Strategy image 1

S3 Lifecycle Management

  S3 Cost Optimization Strategy image 2_lifecycle management

Following that, I configured S3 Lifecycle policies to automatically move data to Glacier over time.


 

Conclusion

I decided to try out AWS’s new features and finally got around to it in March 2025. I had heard countless times about S3 cost optimization, but trying it out myself instead of relying on consulting felt quite refreshing.


For those who haven’t managed their S3 buckets before, I think this new method is definitely worth considering. It’s simpler to use than setting up Glue, which I found particularly appealing. However, I did find AWS Lake Formation’s setup a bit tricky initially. Still, if you need to manage data in your buckets, it might be worth giving it a try.


Note: Deleting S3 table buckets can only be done via CLI or SDK, so keep that in mind.

 
 
 

Comments


bottom of page