Cost Engineering
AWS
The Frugal Approach to AWS S3 Storage Costs
Craig Conboy

With usage-billed storage services like S3, cost optimization starts with your code. It's not just that applications store the data; applications spend. Every putObject call, every file upload, every byte stored contributes to your bill.

Taking an application- and code-centric approach to cost reduction means understanding what your code stores before and after each operation. Here's a practical walk-through of what to look for and how to optimize S3 costs at the code level.

Cost Trap Efficiency Pattern
Storing uncompressed compressible data Compress before upload
Small objects in IA/Glacier storage classes Consolidate small objects into larger files
Short-lived objects in minimum-duration classes Avoid IA/Glacier for temporary data
Inefficient bucket listing operations Use prefix-based listing and caching
Inefficient request patterns Optimize request patterns (reduce unnecessary operations)
Frequent cross-region transfers Process data in the same region where it's stored
Scanning entire objects for queries Use S3 Select to query in place

Attribute the Costs

Start with your bill. Break down costs by bucket, application, and object prefix to understand where spend concentrates. S3 charges across three dimensions: storage volume (~$0.023/GB-month for Standard), requests (~$0.005 per 1,000 PUTs, ~$0.0004 per 1,000 GETs), and data transfer/retrieval (~$0.09/GB egress, ~$0.01-$0.03/GB for IA/Glacier retrieval). Your bill reveals whether you're burning budget on storage volume, excessive request operations, or data transfer costs.

Attribute costs down to specific usage patterns in your codebase. Which buckets store uncompressed data? Which applications generate millions of small objects? Which workloads drive cross-region transfers? This granular attribution reveals which of the 7 efficiency patterns below deliver the highest impact. Once you can attach a price tag to specific storage decisions, optimization priorities become clear.

Note: AWS platform features like lifecycle policies, intelligent tiering, and storage class analysis complement code-level optimizations. The patterns below focus on what your application code can control.


Tackling Storage Volume Costs

The most direct levers for storage volume are in your application code:

Compress data before upload.

Consolidate small objects into larger files.

Avoid minimum duration penalties.

Compress before upload

Cost impact: Storage

Text-based data (logs, JSON, CSV, XML) typically compresses 5-10x. Storing uncompressed data is leaving money on the table.

import gzip
import json
import boto3

s3 = boto3.client('s3')

# Bad: upload uncompressed JSON (costs 10x more)
data = json.dumps(large_dataset)
s3.put_object(
    Bucket='my-bucket',
    Key='data.json',
    Body=data
)

# Good: compress before upload
data = json.dumps(large_dataset).encode('utf-8')
compressed = gzip.compress(data)
s3.put_object(
    Bucket='my-bucket',
    Key='data.json.gz',
    Body=compressed,
    ContentEncoding='gzip',
    ContentType='application/json'
)

For log files, always compress:

const zlib = require('zlib');
const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({});

async function uploadLogs(logData) {
    // Compress log data
    const compressed = zlib.gzipSync(Buffer.from(logData));

    await s3.send(new PutObjectCommand({
        Bucket: 'my-logs',
        Key: `logs/${Date.now()}.log.gz`,
        Body: compressed,
        ContentEncoding: 'gzip'
    }));
}

Benefit: 80-90% reduction in storage costs and transfer costs for compressible data.

Consolidate small objects

Cost impact: Storage and Request

Storing millions of small objects is expensive for two reasons: S3 charges per-request, and storage classes like IA/Glacier have minimum billable sizes (128 KB for IA). A 10 KB object in Standard-IA is billed for 128 KB (~$0.0016/month vs ~$0.0001/month in Standard)—that's 16x more expensive. Consolidating small objects into larger files avoids both excessive PUT costs and minimum size penalties.

import json
import boto3
from datetime import datetime

s3 = boto3.client('s3')

# Bad: one object per record (millions of PUTs, minimum size penalties)
def save_record(record):
    s3.put_object(
        Bucket='my-bucket',
        Key=f"records/{record['id']}.json",
        Body=json.dumps(record)
    )

# Good: batch records into larger files
class RecordBatcher:
    def __init__(self, batch_size=1000):
        self.batch = []
        self.batch_size = batch_size

    def add_record(self, record):
        self.batch.append(record)

        if len(self.batch) >= self.batch_size:
            self.flush()

    def flush(self):
        if not self.batch:
            return

        # Combine 1,000 records into one file
        timestamp = datetime.utcnow().isoformat()
        key = f"records/batch-{timestamp}.json.gz"

        data = '\n'.join(json.dumps(r) for r in self.batch)
        compressed = gzip.compress(data.encode('utf-8'))

        s3.put_object(
            Bucket='my-bucket',
            Key=key,
            Body=compressed,
            ContentEncoding='gzip'
        )

        self.batch = []

Benefit: Reduces PUT operations by 100-1000x; avoids minimum size penalties; reduces costs by 90%+ for small object workloads.

Avoid minimum duration penalties

Cost impact: Storage

Storage classes like IA, Glacier have minimum storage durations (30, 90, 180 days). Deleting objects early triggers charges for the full duration.

import boto3

s3 = boto3.client('s3')

# Bad: upload short-lived data to Standard-IA (30-day minimum)
# Object lives 5 days, but you pay for 30 days
def upload_weekly_report(data):
    s3.put_object(
        Bucket='my-bucket',
        Key=f"reports/{datetime.now().isoformat()}.pdf",
        Body=data,
        StorageClass='STANDARD_IA'  # 30-day minimum duration
    )
    # Report is deleted after 7 days → pay for 30 days

# Good: use Standard storage for short-lived data
def upload_weekly_report(data):
    s3.put_object(
        Bucket='my-bucket',
        Key=f"reports/weekly/{datetime.now().isoformat()}.pdf",
        Body=data,
        StorageClass='STANDARD'  # No minimum duration
    )
    # Report deleted after 7 days → pay for 7 days

Lifecycle rules should transition objects only when they'll stay in the target class:

# If objects are deleted after 60 days:
# - Standard for first 30 days
# - Transition to IA after 30 days (objects stay 30+ days in IA)
# - Delete after 60 days total
# This avoids early deletion fees

Benefit: Avoids early deletion charges; ensures cheaper storage classes actually save money.


Tackling Request Costs

S3 charges per operation. High-frequency small operations drive costs quickly.

Consolidate objects (as shown above)

Optimize listing operations with prefixes and caching

Optimize request patterns to reduce unnecessary operations

Optimize bucket listing

Cost impact: Request

Listing large buckets without prefixes generates expensive LIST operations:

import boto3

s3 = boto3.client('s3')

# Bad: list entire bucket (10,000 LIST calls for 10M objects)
def get_all_objects():
    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket='my-bucket'):
        for obj in page.get('Contents', []):
            process(obj)

# Good: use prefix-based listing
def get_objects_by_date(year, month, day):
    prefix = f"data/year={year}/month={month:02d}/day={day:02d}/"

    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket='my-bucket', Prefix=prefix):
        for obj in page.get('Contents', []):
            process(obj)

Structure your keys hierarchically to enable prefix-based queries:

# Good key structure
key = f"logs/year={year}/month={month:02d}/day={day:02d}/hour={hour:02d}/{uuid}.log.gz"

# This enables efficient prefix queries:
# - All logs for a day: prefix="logs/year=2024/month=01/day=15/"
# - All logs for an hour: prefix="logs/year=2024/month=01/day=15/hour=14/"

Cache listing results when appropriate:

from functools import lru_cache
from datetime import datetime, timedelta

class S3Lister:
    def __init__(self):
        self.cache = {}
        self.cache_ttl = timedelta(minutes=5)

    def list_with_cache(self, bucket, prefix):
        cache_key = f"{bucket}:{prefix}"

        # Check cache
        if cache_key in self.cache:
            cached_time, cached_result = self.cache[cache_key]
            if datetime.now() - cached_time < self.cache_ttl:
                return cached_result

        # Fetch from S3
        result = self._list_objects(bucket, prefix)
        self.cache[cache_key] = (datetime.now(), result)
        return result

    def _list_objects(self, bucket, prefix):
        s3 = boto3.client('s3')
        paginator = s3.get_paginator('list_objects_v2')
        objects = []

        for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
            objects.extend(page.get('Contents', []))

        return objects

Benefit: Reduces LIST operations by 90-99%; improves application performance.

Optimize request patterns

Cost impact: Request

Reduce unnecessary operations in application code:

import boto3

s3 = boto3.client('s3')

# Bad: check if object exists before every upload (2 requests)
def upload_if_not_exists(key, data):
    try:
        s3.head_object(Bucket='my-bucket', Key=key)
        return  # Already exists
    except s3.exceptions.NoSuchKey:
        s3.put_object(Bucket='my-bucket', Key=key, Body=data)

# Good: just upload (1 request, S3 handles overwrites)
def upload(key, data):
    s3.put_object(Bucket='my-bucket', Key=key, Body=data)

Use conditional requests to avoid unnecessary transfers:

# Use If-None-Match to avoid re-downloading unchanged objects
def download_if_modified(key, etag_cache):
    cached_etag = etag_cache.get(key)

    try:
        response = s3.get_object(
            Bucket='my-bucket',
            Key=key,
            IfNoneMatch=cached_etag
        )
        # Object was modified, process it
        etag_cache[key] = response['ETag']
        return response['Body'].read()
    except s3.exceptions.ClientError as e:
        if e.response['Error']['Code'] == '304':  # Not Modified
            return None  # Use cached version
        raise

Benefit: Reduces request costs by 30-60%; improves application efficiency.


Tackling Data Transfer & Retrieval Costs

Data transfer (egress) costs can exceed storage costs for frequently accessed data. A 10 GB file downloaded 1,000 times costs $900 in egress.

Process data in the same region where it's stored

Use S3 Select to query data in place

Process data in the same region

Cost impact: Transfer

Don't transfer data cross-region if you can process it locally:

# Bad: download data from us-east-1 to us-west-2 for processing
s3_east = boto3.client('s3', region_name='us-east-1')
data = s3_east.get_object(Bucket='my-bucket-east', Key='data.json')

# Process locally in us-west-2 (cross-region transfer charges)
process(data['Body'].read())

# Good: process in the same region as the data
# Deploy your Lambda/EC2 to us-east-1 or replicate data to us-west-2
s3_local = boto3.client('s3', region_name='us-west-2')
data = s3_local.get_object(Bucket='my-bucket-west', Key='data.json')
process(data['Body'].read())

Use S3 Select to query in place

Cost impact: Transfer

Instead of downloading entire objects, query them server-side:

import boto3

s3 = boto3.client('s3')

# Bad: download entire 1 GB CSV to extract 100 rows
obj = s3.get_object(Bucket='my-bucket', Key='large-dataset.csv')
data = obj['Body'].read()
# Parse CSV, filter rows (1 GB download)

# Good: query with S3 Select (scan only, minimal transfer)
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='large-dataset.csv',
    ExpressionType='SQL',
    Expression="SELECT * FROM s3object s WHERE s.status = 'active'",
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'CSV': {}}
)

# Process only matching rows (maybe 10 MB transfer)
for event in response['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        process(records)

Benefit: Reduces data transfer by 80-99% for selective queries; faster query execution.


Closing Thoughts

Cost-effective S3 storage requires two steps. First, let observed costs guide what needs optimization. Attribute your bill down to specific buckets, applications, and object prefixes—which workloads store uncompressed data, which generate millions of small objects, which drive cross-region transfers. This granular attribution reveals which of the 7 efficiency patterns above matter most for your workloads. Second, apply the optimizations that address your cost concentrations: compress before upload (80-90% storage reduction for text data), consolidate small objects into larger files (eliminates minimum size penalties and reduces PUT costs by 100-1000x), structure keys hierarchically to enable prefix-based listing (90-99% fewer LIST operations), use appropriate storage classes with lifecycle policies and intelligent tiering, optimize request patterns to reduce unnecessary operations, and process data in the same region where it's stored to avoid transfer costs.

You don't lose performance or availability. You gain precision. Cost-effective storage is an engineering discipline—treat each storage decision as having a price tag and optimize based on measured impact.

Looking for help with cost optimizations like these? Sign up for Early Access to Frugal. Frugal attributes AWS S3 Storage costs to your code, finds inefficient usage, provides Frugal Fixes that reduce your bill, and helps keep you out of cost traps for new and changing code.

Back to Top