In this article we discuss Snowflake’s External Tables. As cloud data platforms evolve, organisations often face the decision of whether to store data inside Snowflake’s managed storage or leave it in external cloud storage such as Amazon S3, Azure Blob, or Google Cloud Storage. Snowflake’s external tables allow you to query this externally managed data directly, without first loading it into Snowflake.
Table of Contents
Open Table of Contents
External Tables vs Managed Storage
Snowflake provides two main ways of storing and querying data:
-
Managed Storage (Native Tables)
Data is ingested into Snowflake, benefiting from advanced features such as time travel, zero-copy cloning, and automatic optimisation. -
External Tables
Metadata about files stored in external cloud storage is registered in Snowflake, enabling queries without ingestion. The data remains in the external location, and Snowflake only maintains metadata.
The choice between these options depends on balancing performance, cost, and operational flexibility.
Supported File Types
External tables support querying data stored in semi-structured and structured file formats. Commonly supported types include:
- Parquet
- ORC
- Avro
- JSON
- Delimited text files (CSV/TSV, with limitations)
Each file type has varying levels of support for schema detection, column mapping, and optimisation. For semi-structured formats such as JSON, Snowflake leverages the VARIANT type to handle flexible schemas.
Benefits of External Tables
External tables offer several advantages in specific scenarios:
- Cost Savings: Avoids duplicating large datasets into Snowflake storage when data is already in a cloud data lake.
- Flexibility: Enables hybrid architectures where some data remains in external storage but is still accessible from Snowflake.
- Integration: Useful for data-sharing scenarios across multiple systems without centralising all data in Snowflake.
- Incremental Adoption: Allows teams to begin querying existing data lakes with minimal migration effort.
Limitations of External Tables
Despite their benefits, external tables come with trade-offs:
- No Time Travel: You cannot roll back changes or query historical versions.
- No Cloning: Zero-copy cloning is only available for managed tables.
- Performance Overheads: Querying external storage is typically slower than querying Snowflake-managed storage due to reliance on remote file systems.
- Limited Optimisation: Features such as clustering and automatic micro-partitioning are not available.
- Refresh Requirements: Metadata must be refreshed (
ALTER EXTERNAL TABLE ... REFRESH) to reflect new or changed files.
For performance-sensitive or frequently accessed data, managed storage is usually the better option.
Implementation: Example SQL
Below is an example of creating an external table that references files in an Amazon S3 stage:
-- Create an external table on top of Parquet files in S3
CREATE OR REPLACE EXTERNAL TABLE my_ext_table (
id INT AS (value:c1::INT),
name STRING AS (value:c2::STRING),
created_at TIMESTAMP_NTZ AS (value:c3::TIMESTAMP_NTZ)
)
WITH LOCATION = @my_s3_stage/path/to/files/
FILE_FORMAT = (TYPE = PARQUET)
AUTO_REFRESH = FALSE;
Queries against the table can then be run like any other Snowflake table, though performance will depend on the underlying storage and file organisation.
Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.
Decision Framework: When to Use (and Not Use) External Tables
When designing your architecture, consider the following framework:
Good Use Cases
-
Querying large, immutable datasets stored in cloud object storage.
-
Performing ad-hoc or exploratory analysis on raw data without ingestion overhead.
-
Supporting hybrid data lake + data warehouse architectures.
-
Integrating with third-party data sources that are delivered into cloud storage.
Anti-Patterns
High-frequency transactional workloads requiring sub-second latency.
Datasets that benefit from time travel, cloning, or clustering.
Frequently updated or small tables where metadata refresh overhead outweighs ingestion costs.
Mission-critical analytics where predictable performance is essential.
Conclusion
Snowflake’s external tables provide a powerful way to bridge data lakes and the Snowflake platform, enabling flexible access to externally managed data. They shine in scenarios where cost and flexibility outweigh performance and feature requirements.
However, for workloads that demand time travel, cloning, and performance consistency, managed Snowflake storage remains the recommended approach. By applying the decision framework outlined above, you can ensure that external tables are used where they provide maximum value — and avoided where they may create inefficiencies.
Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.