Optimizing Large-Scale Merges

The MERGE command is the workhorse of data warehousing, handling upserts (update or insert) in a single transaction. However, as dataset sizes grow into the terabytes, a poorly optimized MERGE can become a major performance bottleneck and cost driver.

The mechanics of MERGE#

MERGE requires joining the source (incoming data) with the target (existing table) to identify matches. If your target table has billions of rows, this join is expensive.

Optimization Strategy 1: Pruning#

The most effective way to speed up a MERGE is to reduce the number of micro-partitions Snowflake needs to read from the target table.

Dynamic Pruning#

Ensure your ON clause includes the clustering key of the target table.

-- Bad: Only joining on ID
MERGE INTO target t USING source s
ON t.id = s.id
...

-- Good: Joining on ID and Partition Key (e.g., Event Date)
MERGE INTO target t USING source s
ON t.id = s.id AND t.event_date = s.event_date
...

sql

Even if id is unique, adding event_date helps the extensive pruning if the table is clustered by date.

Optimization Strategy 2: Pre-Filtering Source Data#

Don’t try to merge the entire world. If your upstream process sends full snapshots, filter the source data to changes_only before it hits the MERGE.

MERGE INTO target t
USING (
    SELECT * FROM staging_table
    WHERE event_date >= DATEADD(day, -3, CURRENT_DATE()) -- safety net
) s
ON ...

sql

Optimization Strategy 3: Clustering Quality#

If your target table is clustered but the depth is poor, pruning won’t work. Check SYSTEM$CLUSTERING_INFORMATION and consider enabling Automatic Clustering for tables with heavy churn.

Optimization Strategy 4: Warehouse Sizing and Spilling#

MERGE operations are memory-intensive. Check your Query Profile.

Bytes Spilled to Local Storage: You need a larger Warehouse (e.g., move from L to XL).
Bytes Spilled to Remote Storage: You seriously need a larger Warehouse.

Spilling kills performance. Scaling up the warehouse for the duration of the batch job is often cheaper than running a smaller warehouse for a much longer time.

Conclusion#

Large scale MERGE operations are a common scaling pain point. By focusing on pruning and proper sizing, you can keep your pipelines fast and your credits under control.