❄️
Data Flakes

Back

The MERGE command is the workhorse of data warehousing, handling upserts (update or insert) in a single transaction. However, as dataset sizes grow into the terabytes, a poorly optimized MERGE can become a major performance bottleneck and cost driver.

The mechanics of MERGE#

MERGE requires joining the source (incoming data) with the target (existing table) to identify matches. If your target table has billions of rows, this join is expensive.

Optimization Strategy 1: Pruning#

The most effective way to speed up a MERGE is to reduce the number of micro-partitions Snowflake needs to read from the target table.

Dynamic Pruning#

Ensure your ON clause includes the clustering key of the target table.

-- Bad: Only joining on ID
MERGE INTO target t USING source s
ON t.id = s.id
...

-- Good: Joining on ID and Partition Key (e.g., Event Date)
MERGE INTO target t USING source s
ON t.id = s.id AND t.event_date = s.event_date
...
sql

Even if id is unique, adding event_date helps the extensive pruning if the table is clustered by date.

Optimization Strategy 2: Pre-Filtering Source Data#

Don’t try to merge the entire world. If your upstream process sends full snapshots, filter the source data to changes_only before it hits the MERGE.

MERGE INTO target t
USING (
    SELECT * FROM staging_table
    WHERE event_date >= DATEADD(day, -3, CURRENT_DATE()) -- safety net
) s
ON ...
sql

Optimization Strategy 3: Clustering Quality#

If your target table is clustered but the depth is poor, pruning won’t work. Check SYSTEM$CLUSTERING_INFORMATION and consider enabling Automatic Clustering for tables with heavy churn.

Optimization Strategy 4: Warehouse Sizing and Spilling#

MERGE operations are memory-intensive. Check your Query Profile.

  • Bytes Spilled to Local Storage: You need a larger Warehouse (e.g., move from L to XL).
  • Bytes Spilled to Remote Storage: You seriously need a larger Warehouse.

Spilling kills performance. Scaling up the warehouse for the duration of the batch job is often cheaper than running a smaller warehouse for a much longer time.

Conclusion#

Large scale MERGE operations are a common scaling pain point. By focusing on pruning and proper sizing, you can keep your pipelines fast and your credits under control.

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.