❄️
Data Flakes

Back

Last year, Snowflake surprised everyone by announcing Polaris Catalog, an open-source technical catalog for Apache Iceberg.

Wait, why would a closed-source SaaS company release open-source infrastructure? Because they want to be the center of gravity for metadata, even if they aren’t storing the data.

The Problem: Catalog Chaos#

You have data in S3.

  • Spark uses the Hive Metastore.
  • Trino uses the Glue Catalog.
  • Snowflake uses its internal catalog.

They all disagree on the schema. It’s a mess.

The Solution: REST Protocol#

Polaris implements the Iceberg REST Open API. It sits in the middle.

  • Spark asks Polaris: “I want to write to Table A.”
  • Snowflake asks Polaris: “I want to read Table A.”

Polaris ensures both engines see the same atomic snapshot of the transaction.

Setting it up#

Polaris can be hosted by Snowflake (managed) or run in your own Kubernetes cluster (self-hosted).

-- Connecting Snowflake to a Polaris Catalog
CREATE CATALOG INTEGRATION my_polaris
  CATALOG_SOURCE = 'ICEBERG_REST'
  TABLE_FORMAT = 'ICEBERG'
  URI = 'https://my-polaris-url/api/catalog'
  ENABLED = TRUE;
sql

Governance at the Layer#

The killer feature of Polaris is that it brings RBAC (Role Based Access Control) to the open lake. You can define who can read/write tables in Polaris, and those rules apply whether the user is coming from Spark, Flink, or Dremio.

Conclusion#

Polaris is the “Switzerland” of the Data Wars. It allows you to pick best-of-breed engines for different tasks while engaging in a single source of truth for table structure.

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.