Cost Engineering
GCP
The Frugal Approach to GCS Storage Costs
Craig Conboy

With usage-billed storage services like Google Cloud Storage, cost optimization starts with your code. It's not just that applications store the data; applications spend. Every file upload, every storage operation, every byte stored contributes to your bill.

Taking an application- and code-centric approach to cost reduction means understanding what your code stores before and after each operation. Here's a practical walk-through of what to look for and how to optimize GCS costs at the code level.

Cost Trap Efficiency Pattern
Storing uncompressed compressible data Compress before upload
Small objects in Nearline/Coldline/Archive classes Consolidate small objects into larger files
Short-lived objects in minimum-duration classes Avoid Nearline/Coldline/Archive for temporary data
Inefficient bucket listing operations Use prefix-based listing and caching
Inefficient request patterns Optimize request patterns (reduce unnecessary operations)
High operation counts Batch operations to reduce operation counts
Frequent cross-region transfers Process data in the same region where it's stored
Expensive Archive operations Avoid high-cost operations on Archive storage

Attribute the Costs

Start with your bill. Break down costs by bucket, application, and object prefix to understand where spend concentrates. GCS charges across three dimensions: storage volume (~$0.020/GB-month for Standard multi-region), operations (~$0.05 per 10,000 Class A operations, ~$0.004 per 10,000 Class B operations), and data transfer/retrieval (~$0.12/GB egress to North America, ~$0.01-$0.05/GB for Nearline/Coldline/Archive retrieval). Your bill reveals whether you're burning budget on storage volume, excessive operation counts, or data transfer costs.

Attribute costs down to specific usage patterns in your codebase. Which buckets store uncompressed data? Which applications generate millions of small objects? Which workloads drive excessive list operations or cross-region transfers? This granular attribution reveals which of the 8 efficiency patterns below deliver the highest impact. Once you can attach a price tag to specific storage decisions, optimization priorities become clear.

Note: GCS platform features like lifecycle management, autoclass storage, and object versioning complement code-level optimizations. The patterns below focus on what your application code can control.


Tackling Storage Volume Costs

The most direct levers for storage volume are in your application code:

Compress data before upload.

Consolidate small objects into larger files.

Avoid minimum duration penalties.

Compress before upload

Cost impact: Storage

Text-based data (logs, JSON, CSV, XML) typically compresses 5-10x. Storing uncompressed data is leaving money on the table.

import gzip
import json
from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: upload uncompressed JSON (costs 10x more)
def upload_uncompressed(data):
    blob = bucket.blob('data.json')
    blob.upload_from_string(
        json.dumps(data),
        content_type='application/json'
    )

# Good: compress before upload
def upload_compressed(data):
    blob = bucket.blob('data.json.gz')

    # Compress data
    json_data = json.dumps(data).encode('utf-8')
    compressed = gzip.compress(json_data)

    blob.upload_from_string(
        compressed,
        content_type='application/json',
        content_encoding='gzip'
    )

For log files, always compress:

const { Storage } = require('@google-cloud/storage');
const zlib = require('zlib');

const storage = new Storage();
const bucket = storage.bucket('my-logs');

async function uploadLogs(logData) {
    // Compress log data
    const compressed = zlib.gzipSync(Buffer.from(logData));

    await bucket.file(`logs/${Date.now()}.log.gz`).save(compressed, {
        contentType: 'text/plain',
        contentEncoding: 'gzip',
        metadata: {
            contentEncoding: 'gzip'
        }
    });
}

Benefit: 80-90% reduction in storage costs and transfer costs for compressible data.

Consolidate small objects

Cost impact: Storage

Storing millions of small objects is expensive. GCS charges per-operation, and storage classes like Nearline/Coldline/Archive have higher operation costs and early deletion fees.

from google.cloud import storage
from datetime import datetime
import json
import gzip

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: one object per record (millions of Class A operations)
def save_record(record):
    blob = bucket.blob(f"records/{record['id']}.json")
    blob.upload_from_string(json.dumps(record))

# Good: batch records into larger files
class RecordBatcher:
    def __init__(self, batch_size=1000):
        self.batch = []
        self.batch_size = batch_size

    def add_record(self, record):
        self.batch.append(record)

        if len(self.batch) >= self.batch_size:
            self.flush()

    def flush(self):
        if not self.batch:
            return

        # Combine 1,000 records into one file
        timestamp = datetime.utcnow().isoformat()
        blob = bucket.blob(f"records/batch-{timestamp}.json.gz")

        # Compress batched data
        data = '\n'.join(json.dumps(r) for r in self.batch)
        compressed = gzip.compress(data.encode('utf-8'))

        blob.upload_from_string(
            compressed,
            content_type='application/json',
            content_encoding='gzip'
        )

        self.batch = []

Benefit: Reduces Class A operations by 100-1000x; reduces costs by 90%+ for small object workloads.

Avoid minimum duration penalties

Cost impact: Storage

Storage classes like Nearline (30 days), Coldline (90 days), and Archive (365 days) have minimum storage durations. Deleting objects early triggers charges for the full duration.

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: upload short-lived data to Nearline (30-day minimum)
# Object lives 5 days, but you pay for 30 days
def upload_weekly_report(data):
    blob = bucket.blob(f"reports/{datetime.now().isoformat()}.pdf")
    blob.upload_from_string(
        data,
        content_type='application/pdf'
    )
    # If bucket is Nearline class, deleted after 7 days → pay for 30 days

# Good: use Standard storage for short-lived data
# Or use a Standard storage bucket for temporary/short-lived objects
def upload_weekly_report(data):
    standard_bucket = client.bucket('my-standard-bucket')
    blob = standard_bucket.blob(f"reports/weekly/{datetime.now().isoformat()}.pdf")
    blob.upload_from_string(
        data,
        content_type='application/pdf'
    )
    # Deleted after 7 days → pay for 7 days only

Lifecycle rules should transition objects only when they'll stay in the target class long enough to avoid early deletion penalties:

# If objects are deleted after 120 days:
# - Standard for first 30 days
# - Transition to Nearline after 30 days (objects stay 90 days in Nearline)
# - Delete after 120 days total
# This ensures objects stay in Nearline for 90+ days, avoiding early deletion fees

Early deletion fees also apply when manually transitioning objects between storage classes. Moving an object from Nearline to Coldline before the 30-day minimum triggers the same penalty:

# Bad: move objects between storage classes too early
# (triggers early deletion fees from previous class)
def move_to_coldline(bucket_name, object_name):
    bucket = client.bucket(bucket_name)
    blob = bucket.blob(object_name)
    blob.update_storage_class('COLDLINE')
    # If object was in Nearline for < 30 days, incurs early deletion fee

# Good: transition only after minimum duration in current class
def should_transition_to_coldline(blob):
    age_days = (datetime.now(blob.updated.tzinfo) - blob.updated).days

    # Only transition if object has been in Nearline for 30+ days
    if blob.storage_class == 'NEARLINE' and age_days >= 30:
        return True
    return False

Benefit: Avoids early deletion charges from both object deletion and storage class transitions; ensures cheaper storage classes actually save money.


Tackling Operation Costs

GCS charges per operation. Class A operations (writes, lists) cost ~10x more than Class B operations (reads).

Consolidate objects (as shown above)

Optimize listing operations with prefixes and caching

Optimize request patterns to reduce unnecessary operations

Optimize bucket listing

Cost impact: Operations

Listing large buckets without prefixes generates expensive Class A operations:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: list entire bucket (expensive Class A operations)
def get_all_objects():
    blobs = bucket.list_blobs()
    for blob in blobs:
        process(blob)

# Good: use prefix-based listing
def get_objects_by_date(year, month, day):
    prefix = f"data/year={year}/month={month:02d}/day={day:02d}/"
    blobs = bucket.list_blobs(prefix=prefix)

    for blob in blobs:
        process(blob)

Structure your object names hierarchically to enable prefix-based queries:

# Good object naming structure
def get_object_name(timestamp, uuid):
    year = timestamp.year
    month = timestamp.month
    day = timestamp.day
    hour = timestamp.hour

    return f"logs/year={year}/month={month:02d}/day={day:02d}/hour={hour:02d}/{uuid}.log.gz"

# This enables efficient prefix queries:
# - All logs for a day: prefix="logs/year=2024/month=01/day=15/"
# - All logs for an hour: prefix="logs/year=2024/month=01/day=15/hour=14/"

Cache listing results when appropriate:

from google.cloud import storage
from datetime import datetime, timedelta

class GCSLister:
    def __init__(self):
        self.cache = {}
        self.cache_ttl = timedelta(minutes=5)

    def list_with_cache(self, bucket_name, prefix):
        cache_key = f"{bucket_name}:{prefix}"

        # Check cache
        if cache_key in self.cache:
            cached_time, cached_result = self.cache[cache_key]
            if datetime.now() - cached_time < self.cache_ttl:
                return cached_result

        # Fetch from GCS
        result = self._list_objects(bucket_name, prefix)
        self.cache[cache_key] = (datetime.now(), result)
        return result

    def _list_objects(self, bucket_name, prefix):
        client = storage.Client()
        bucket = client.bucket(bucket_name)
        blobs = list(bucket.list_blobs(prefix=prefix))
        return blobs

Benefit: Reduces Class A operations by 90-99%; improves application performance.

Optimize request patterns

Cost impact: Operations

Reduce unnecessary operations in application code:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: check if object exists before every upload (2 operations)
def upload_if_not_exists(name, data):
    blob = bucket.blob(name)
    if blob.exists():
        return  # Already exists

    blob.upload_from_string(data)

# Good: just upload with overwrite (1 operation)
def upload(name, data):
    blob = bucket.blob(name)
    blob.upload_from_string(data)
    # GCS handles overwrites; no need to check first

Use generation matching to avoid race conditions without extra operations:

# Use if-generation-match for conditional uploads
def upload_if_not_changed(name, data, expected_generation):
    blob = bucket.blob(name)

    try:
        # Only upload if generation matches (object unchanged)
        blob.upload_from_string(
            data,
            if_generation_match=expected_generation
        )
    except google.api_core.exceptions.PreconditionFailed:
        # Object was modified by another process
        pass

Cache object metadata to avoid redundant reads:

class ObjectCache:
    def __init__(self):
        self.metadata_cache = {}

    def get_metadata(self, bucket_name, object_name):
        cache_key = f"{bucket_name}/{object_name}"

        if cache_key in self.metadata_cache:
            return self.metadata_cache[cache_key]

        # Fetch from GCS (Class B operation)
        client = storage.Client()
        bucket = client.bucket(bucket_name)
        blob = bucket.blob(object_name)
        blob.reload()  # Fetch metadata

        metadata = {
            'size': blob.size,
            'content_type': blob.content_type,
            'updated': blob.updated
        }

        self.metadata_cache[cache_key] = metadata
        return metadata

Benefit: Reduces operation costs by 30-60%; improves application efficiency.


Tackling Data Transfer & Retrieval Costs

Data transfer (egress) costs can exceed storage costs for frequently accessed data. A 10 GB file downloaded 1,000 times costs $1,200 in egress.

Process data in the same region where it's stored

Minimize cross-region transfers

Avoid high-cost operations on Archive storage

Process data in the same region

Cost impact: Transfer

Don't transfer data cross-region if you can process it locally:

from google.cloud import storage

# Bad: download data from us-central1 to europe-west1 for processing
client = storage.Client()
bucket = client.bucket('my-bucket-us')  # us-central1
blob = bucket.blob('data.json')
data = blob.download_as_string()  # Cross-region transfer charges

# Process in europe-west1 (incurs cross-region transfer cost)
process(data)

# Good: process in the same region as the data
# Deploy your Cloud Run/GCE to us-central1 or replicate data to europe-west1
client_local = storage.Client()
bucket_local = client_local.bucket('my-bucket-local')  # Same region as compute
blob_local = bucket_local.blob('data.json')
data = blob_local.download_as_string()  # No cross-region charges
process(data)

Batch operations to reduce operation counts

Cost impact: Operations

Performing operations individually generates expensive Class A operations. Batching operations reduces the total operation count:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-bucket')

# Bad: delete objects one at a time (N Class A operations)
def delete_objects(object_names):
    for name in object_names:
        blob = bucket.blob(name)
        blob.delete()

# Good: use batch operations
def delete_objects_batch(object_names):
    # GCS doesn't have native batch delete, but you can use batch API
    with client.batch():
        for name in object_names:
            blob = bucket.blob(name)
            blob.delete()

Benefit: Reduces operation costs; improves deletion/update performance.

Avoid high-cost operations on Archive storage

Cost impact: Operations

Archive storage has the lowest storage cost but highest operation and retrieval costs. Use it only for truly infrequently accessed data:

from google.cloud import storage

client = storage.Client()

# Bad: frequently access objects in Archive storage
# Class B operations cost ~$0.50 per 10,000 (100x more than Standard)
# Retrieval costs ~$0.05/GB
def read_from_archive(bucket_name, object_name):
    bucket = client.bucket(bucket_name)  # Archive storage class
    blob = bucket.blob(object_name)
    data = blob.download_as_string()  # Expensive operation + retrieval cost
    return data

# Good: use Archive only for compliance/rarely accessed data
# For data accessed monthly or quarterly, use Coldline instead
# For data accessed weekly or monthly, use Nearline

Benefit: Avoids excessive operation and retrieval costs; matches storage class to access pattern.


Closing Thoughts

Cost-effective GCS storage requires two steps. First, let observed costs guide what needs optimization. Attribute your bill down to specific buckets, applications, and object prefixes—which workloads store uncompressed data, which generate millions of small objects, which drive excessive list operations or cross-region transfers. This granular attribution reveals which of the 8 efficiency patterns above matter most for your workloads. Second, apply the optimizations that address your cost concentrations: compress before upload (80-90% storage reduction for text data), consolidate small objects into larger files (reduces Class A operations by 100-1000x), structure object names hierarchically to enable prefix-based listing (90-99% fewer operations), use appropriate storage classes with lifecycle management and autoclass storage, optimize request patterns and batch operations to minimize Class A operation costs, and process data in the same region where it's stored to avoid transfer costs.

You don't lose performance or availability. You gain precision. Cost-effective storage is an engineering discipline—treat each storage decision as having a price tag and optimize based on measured impact.

Looking for help with cost optimizations like these? Sign up for Early Access to Frugal. Frugal attributes GCS Storage costs to your code, finds inefficient usage, provides Frugal Fixes that reduce your bill, and helps keep you out of cost traps for new and changing code.

Back to Top