❄️
Data Flakes

Back

In this article we discuss Snowflake’s External Tables. As cloud data platforms evolve, organisations often face the decision of whether to store data inside Snowflake’s managed storage or leave it in external cloud storage such as Amazon S3, Azure Blob, or Google Cloud Storage. Snowflake’s external tables allow you to query this externally managed data directly, without first loading it into Snowflake.

Understanding when to use external tables — and when not to — is critical to building a cost-effective and performant architecture. This article explains the differences between external tables and managed storage, lists the supported file types, highlights the benefits and limitations, provides example SQL, and concludes with a practical decision framework.

Features change from time to time with new capabilities being added regularly, it is recommended that you review the documentation for the latest on supported file formats and limitations.

External Tables vs Managed Storage#

Snowflake provides two main ways of storing and querying data:

  • Managed Storage (Native Tables) Data is ingested into Snowflake, benefiting from advanced features such as time travel, zero-copy cloning, and automatic optimisation.

  • External Tables Metadata about files stored in external cloud storage is registered in Snowflake, enabling queries without ingestion. The data remains in the external location, and Snowflake only maintains metadata.

The choice between these options depends on balancing performance, cost, and operational flexibility.

Supported File Types#

External tables support querying data stored in semi-structured and structured file formats. Commonly supported types include:

  • Parquet
  • ORC
  • Avro
  • JSON
  • Delimited text files (CSV/TSV, with limitations)

Each file type has varying levels of support for schema detection, column mapping, and optimisation. For semi-structured formats such as JSON, Snowflake leverages the VARIANT type to handle flexible schemas.

Benefits of External Tables#

External tables offer several advantages in specific scenarios:

  • Cost Savings: Avoids duplicating large datasets into Snowflake storage when data is already in a cloud data lake.
  • Flexibility: Enables hybrid architectures where some data remains in external storage but is still accessible from Snowflake.
  • Integration: Useful for data-sharing scenarios across multiple systems without centralising all data in Snowflake.
  • Incremental Adoption: Allows teams to begin querying existing data lakes with minimal migration effort.

Limitations of External Tables#

Despite their benefits, external tables come with trade-offs:

  • No Time Travel: You cannot roll back changes or query historical versions.
  • No Cloning: Zero-copy cloning is only available for managed tables.
  • Performance Overheads: Querying external storage is typically slower than querying Snowflake-managed storage due to reliance on remote file systems.
  • Limited Optimisation: Features such as clustering and automatic micro-partitioning are not available.
  • Refresh Requirements: Metadata must be refreshed (ALTER EXTERNAL TABLE ... REFRESH) to reflect new or changed files.

For performance-sensitive or frequently accessed data, managed storage is usually the better option.

Implementation: Example SQL#

Below is an example of creating an external table that references files in an Amazon S3 stage:

-- Create an external table on top of Parquet files in S3
CREATE OR REPLACE EXTERNAL TABLE my_ext_table (
    id INT AS (value:c1::INT),
    name STRING AS (value:c2::STRING),
    created_at TIMESTAMP_NTZ AS (value:c3::TIMESTAMP_NTZ)
)
WITH LOCATION = @my_s3_stage/path/to/files/
FILE_FORMAT = (TYPE = PARQUET)
AUTO_REFRESH = FALSE;
sql

Queries against the table can then be run like any other Snowflake table, though performance will depend on the underlying storage and file organisation.

Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.

Decision Framework: When to Use (and Not Use) External Tables#

When designing your architecture, consider the following framework:

Good Use Cases#

  • Querying large, immutable datasets stored in cloud object storage.

  • Performing ad-hoc or exploratory analysis on raw data without ingestion overhead.

  • Supporting hybrid data lake + data warehouse architectures.

  • Integrating with third-party data sources that are delivered into cloud storage.

Anti-Patterns#

High-frequency transactional workloads requiring sub-second latency.

Datasets that benefit from time travel, cloning, or clustering.

Frequently updated or small tables where metadata refresh overhead outweighs ingestion costs.

Mission-critical analytics where predictable performance is essential.

Conclusion#

Snowflake’s external tables provide a powerful way to bridge data lakes and the Snowflake platform, enabling flexible access to externally managed data. They shine in scenarios where cost and flexibility outweigh performance and feature requirements.

However, for workloads that demand time travel, cloning, and performance consistency, managed Snowflake storage remains the recommended approach. By applying the decision framework outlined above, you can ensure that external tables are used where they provide maximum value — and avoided where they may create inefficiencies.

Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.