Optimizing Large-Scale Merges
Strategies for handling massive MERGE operations in Snowflake efficiently.
The MERGE command is the workhorse of data warehousing, handling upserts (update or insert) in a single transaction.
However, as dataset sizes grow into the terabytes, a poorly optimized MERGE can become a major performance bottleneck
and cost driver.
The mechanics of MERGE
MERGE requires joining the source (incoming data) with the target (existing table) to identify matches. If
your target table has billions of rows, this join is expensive.
Optimization Strategy 1: Pruning
The most effective way to speed up a MERGE is to reduce the number of micro-partitions Snowflake needs to read from
the target table.
Dynamic Pruning
Ensure your ON clause includes the clustering key of the target table.
-- Bad: Only joining on ID
MERGE INTO target t USING source s
ON t.id = s.id
...
-- Good: Joining on ID and Partition Key (e.g., Event Date)
MERGE INTO target t USING source s
ON t.id = s.id AND t.event_date = s.event_date
...sqlEven if id is unique, adding event_date helps the extensive pruning if the table is clustered by date.
Optimization Strategy 2: Pre-Filtering Source Data
Don’t try to merge the entire world. If your upstream process sends full snapshots, filter the source data to
changes_only before it hits the MERGE.
MERGE INTO target t
USING (
SELECT * FROM staging_table
WHERE event_date >= DATEADD(day, -3, CURRENT_DATE()) -- safety net
) s
ON ...sqlOptimization Strategy 3: Clustering Quality
If your target table is clustered but the depth is poor, pruning won’t work. Check SYSTEM$CLUSTERING_INFORMATION and
consider enabling Automatic Clustering for tables with heavy churn.
Optimization Strategy 4: Warehouse Sizing and Spilling
MERGE operations are memory-intensive. Check your Query Profile.
- Bytes Spilled to Local Storage: You need a larger Warehouse (e.g., move from L to XL).
- Bytes Spilled to Remote Storage: You seriously need a larger Warehouse.
Spilling kills performance. Scaling up the warehouse for the duration of the batch job is often cheaper than running a smaller warehouse for a much longer time.
Conclusion
Large scale MERGE operations are a common scaling pain point. By focusing on pruning and proper sizing, you can keep
your pipelines fast and your credits under control.