AWS Made Easy

Tip #44: AWS S3: Top 5 performance and cost questions

Learn about the major AWS S3 use cases and top five questions asked around performance and cost

Amazon S3 is an object storage managed service designed for storing an unlimited amount of data. 

Unlike file storage, AWS S3: 

  • does not allow appending additional data to an object – each object has to be updated in full
  • does not provide a hierarchical (tree) object storage – one can emulate directories with prefixes –  data in S3 is organized in S3 buckets with a single level of hierarchy.

In general, AWS S3 is used for these use cases:

  1. Data lakes
  2. Backup and restore
  3. Data archives
  4. Running cloud-native applications

To make the most of the AWS S3 service, be aware of the following questions.

Question 1: How long do you need to keep old data?

Use caseHow long do you need to keep old data?
Data lakesWhile machine learning / artificial intelligence models are better calibrated with larger data sets, for some models older data sets are no longer useful since the underlying patterns have changed.

Identify those datasets which are no longer useful after a certain number of years and either S3 tag them or set up an S3 lifecycle rule to expire (delete) such datasets identified by a prefix or tag after a certain number of years.
Backup and restore

Data archives
Very rarely you need to access old backups or archives, though Compliance may require keeping these for a certain number of years.

Set up an S3 lifecycle rule to expire (delete) all backups / archives older than the required number of years.

If you use incremental backups, ensure you expire all files in the one incremental backup series.
Running cloud-native applicationsThe logs for cloud-native applications should expire (deleted) after a certain number of months.

Question 2: What data should you store in AWS S3?

Use caseWhat data should you store in AWS S3?
Data lakesUse compressed binary files, rather than csv files whenever possible – we recommend using compressed Parquet files.

Ensure you do not store duplicate files.
Backup and restoreWhenever possible use incremental backup. Use compression and do not backup duplicate files. 
Data archivesWhenever possible use compression and do not archive duplicate files.
Running cloud-native applicationsSet up AWS CloudFront to serve the static files from S3.

Question 3: What S3 storage class should you use?

Use caseWhat S3 storage classes should you use?
Data lakesData lakes have usually unpredictable access patterns. 

Use S3 Intelligent Tiering for optimizing the storage costs.
Backup and restore

Data archives
The choice depends on how quickly you require the files to be restored. 

In general, the price is lower the longer the recovery time is.

Use:
  • Glacier Instant Retrieval
  • Glacier Flexible Retrieval
  • Glacier Deep Archive
Running cloud-native applicationsFor production files, use Intelligent Tiering.

For logs, use S3 Standard for up to 7 days, and then let a lifecycle rule transition the logs to S3 One Zone – Infrequent Access or Glacier Flexible Retrieval.

Question 4: What AWS region should you use for the S3 bucket?

Use caseWhat AWS region should you use for the S3 bucket?
Data lakesUse the AWS region where the majority of the applications reside. For global use, consider S3 Multi-Region Access Points.
Backup and restore

Data archives
Use the AWS region of the source application to minimize the data transfer costs. In addition, consider the AWS region with the lowest storage costs.
Running cloud-native applicationsUse the AWS region where the majority of the applications reside. For global use, consider S3 Multi-Region Access Points.

Question 5 – Do you need to enable object versioning?

Use caseDo you need to enable object versioning?
Data lakesYes, however, limit with a life cycle rule the number of versions you keep of every file and transition older versions first to Standard Infrequent Access and then to a Glacier storage class.
Backup and restoreNo.
Data archivesYes, transition older versions first to Standard Infrequent Access and then to a Glacier storage class.
Running cloud-native applicationsNo.
Email
Twitter
Facebook
LinkedIn

Leave a Reply

Your email address will not be published.

Related Tips & Tricks