Last year, Snowflake surprised everyone by announcing Polaris Catalog, an open-source technical catalog for Apache Iceberg.
Wait, why would a closed-source SaaS company release open-source infrastructure? Because they want to be the center of gravity for metadata, even if they aren’t storing the data.
The Problem: Catalog Chaos
You have data in S3.
- Spark uses the Hive Metastore.
- Trino uses the Glue Catalog.
- Snowflake uses its internal catalog.
They all disagree on the schema. It’s a mess.
The Solution: REST Protocol
Polaris implements the Iceberg REST Open API. It sits in the middle.
- Spark asks Polaris: “I want to write to Table A.”
- Snowflake asks Polaris: “I want to read Table A.”
Polaris ensures both engines see the same atomic snapshot of the transaction.
Setting it up
Polaris can be hosted by Snowflake (managed) or run in your own Kubernetes cluster (self-hosted).
-- Connecting Snowflake to a Polaris Catalog
CREATE CATALOG INTEGRATION my_polaris
CATALOG_SOURCE = 'ICEBERG_REST'
TABLE_FORMAT = 'ICEBERG'
URI = 'https://my-polaris-url/api/catalog'
ENABLED = TRUE;sqlGovernance at the Layer
The killer feature of Polaris is that it brings RBAC (Role Based Access Control) to the open lake. You can define who can read/write tables in Polaris, and those rules apply whether the user is coming from Spark, Flink, or Dremio.
Conclusion
Polaris is the “Switzerland” of the Data Wars. It allows you to pick best-of-breed engines for different tasks while engaging in a single source of truth for table structure.