Optimizing Large-Scale Merges
Strategies for handling massive MERGE operations in Snowflake efficiently.
The MERGE command is the workhorse of data warehousing, handling upserts (update or insert) in a single transaction.
However, as dataset sizes grow into the terabytes, a poorly optimized MERGE can become a major performance bottleneck
and cost driver.
The mechanics of MERGE#
MERGE requires joining the source (incoming data) with the target (existing table) to identify matches. If
your target table has billions of rows, this join is expensive.
Optimization Strategy 1: Pruning#
The most effective way to speed up a MERGE is to reduce the number of micro-partitions Snowflake needs to read from
the target table.
Dynamic Pruning#
Ensure your ON clause includes the clustering key of the target table.
-- Bad: Only joining on ID
MERGE INTO target t USING source s
ON t.id = s.id
...
-- Good: Joining on ID and Partition Key (e.g., Event Date)
MERGE INTO target t USING source s
ON t.id = s.id AND t.event_date = s.event_date
...sqlEven if id is unique, adding event_date helps the extensive pruning if the table is clustered by date.
Optimization Strategy 2: Pre-Filtering Source Data#
Don’t try to merge the entire world. If your upstream process sends full snapshots, filter the source data to
changes_only before it hits the MERGE.
MERGE INTO target t
USING (
SELECT * FROM staging_table
WHERE event_date >= DATEADD(day, -3, CURRENT_DATE()) -- safety net
) s
ON ...sqlOptimization Strategy 3: Clustering Quality#
If your target table is clustered but the depth is poor, pruning won’t work. Check SYSTEM$CLUSTERING_INFORMATION and
consider enabling Automatic Clustering for tables with heavy churn.
Optimization Strategy 4: Warehouse Sizing and Spilling#
MERGE operations are memory-intensive. Check your Query Profile.
- Bytes Spilled to Local Storage: You need a larger Warehouse (e.g., move from L to XL).
- Bytes Spilled to Remote Storage: You seriously need a larger Warehouse.
Spilling kills performance. Scaling up the warehouse for the duration of the batch job is often cheaper than running a smaller warehouse for a much longer time.
Conclusion#
Large scale MERGE operations are a common scaling pain point. By focusing on pruning and proper sizing, you can keep
your pipelines fast and your credits under control.