❄️
Data Flakes

Back

“Zero-ETL” sounds like marketing hype—until you realize AWS just killed half your job description.

At re:Invent 2024, AWS expanded its zero-ETL integrations to include Salesforce, SAP, and ServiceNow. These connectors reached General Availability in December 2024, and they represent a fundamental architectural shift. The days of writing custom Airflow DAGs to sync CRM data are ending.

But what does this actually mean? Not just “faster data,” but fundamentally—how does zero-ETL change what data engineers build, and what we don’t?

What Zero-ETL Actually Is (and Isn’t)#

Let’s clear up the naming confusion: Zero-ETL doesn’t mean zero transformation. It means zero separate transformation infrastructure.

Traditional ETL Pattern:

Source (Salesforce)
  → Extract (Python script)
  → Transform (dbt/Spark on EC2)
  → Load (COPY to Redshift)
plaintext

Zero-ETL Pattern:

Source (Salesforce)
  → AWS Glue Zero-ETL
  → Redshift (transform on-read with SQL/MV)
plaintext

The transformation still happens. But instead of maintaining a Python-based extraction layer and a separate compute cluster for transformation, you transform in the warehouse using materialized views, stored procedures, or dbt running directly on Redshift.

The Three Zero-ETL Architectures#

AWS offers three patterns, and understanding which to use when is critical.

Pattern 1: Database Mirroring (Aurora, RDS → Redshift)#

Latency: Sub-minute Use Case: Operational database replication for analytics

This is the gold standard. AWS manages continuous CDC (Change Data Capture) from your transactional database into Redshift. You get near-real-time analytics without touching your production database.

Gotcha: The Redshift tables are read-only. You must create materialized views or new tables for any aggregation/transformation logic.

Pattern 2: SaaS Integration (Salesforce, SAP, ServiceNow → Redshift)#

Latency: ~1 hour minimum Use Case: Enterprise app data warehousing

This is where it gets interesting. AWS Glue zero-ETL pulls data from SaaS platforms using their APIs (e.g., Salesforce Bulk API, SAP OData).

Example:

-- In Redshift, you now have:
SELECT * FROM salesforce_zero_etl.account;  -- read-only
SELECT * FROM salesforce_zero_etl.opportunity;

-- Build your analytics layer:
CREATE MATERIALIZED VIEW sales_pipeline AS
SELECT
    a.name as account_name,
    o.stage_name,
    SUM(o.amount) as pipeline_value
FROM salesforce_zero_etl.opportunity o
JOIN salesforce_zero_etl.account a ON o.account_id = a.id
WHERE o.is_closed = false;
sql

Gotcha: Salesforce rate limits matter. AWS uses the Bulk API, but if your Salesforce org has strict API limits, you could hit throttling.

Pattern 3: Zero-Copy Data Sharing (Salesforce Data Cloud ↔ Redshift)#

Latency: Instant (query-time) Use Case: Federated queries across platforms

This is the most futuristic pattern. You don’t replicate any data. Instead, Redshift can query Salesforce Data Cloud directly using external schemas, and vice versa.

-- In Redshift, query Salesforce Data Cloud without copying data:
CREATE EXTERNAL SCHEMA sfdc_live
FROM DATA CATALOG
DATABASE 'salesforce_data_cloud'
IAM_ROLE 'arn:aws:iam::...';

SELECT * FROM sfdc_live.unified_customer_profile;
sql

Gotcha: Query performance depends on Salesforce’s infrastructure. This is best for ad-hoc exploration, not mission-critical dashboards that need sub-second response.

When Zero-ETL Fails (and Why You Still Need Traditional ETL)#

Zero-ETL is not a silver bullet. Here are the scenarios where it breaks down:

1. Complex Business Logic#

If your “transformation” involves fuzzy matching customer names across 5 different source systems using custom Python libraries, zero-ETL won’t cut it. You need a general-purpose compute layer (Spark, Fargate, etc.).

2. Non-AWS SaaS Platforms#

Zero-ETL only works for AWS-supported sources. If you need HubSpot, Stripe, or Zendesk data, you’re back to writing custom connectors (or using Fivetran).

3. Cost at Massive Scale#

Zero-ETL stores data in Redshift. For archival or cold data (logs you only query once a quarter), Redshift is expensive compared to S3 Parquet. Here, traditional ELT (Extract → Load to S3 → Transform on-demand with Athena) wins.

4. Cross-Cloud Integrations#

If your data lake is in Snowflake or BigQuery, AWS zero-ETL doesn’t help. You need a platform-agnostic solution.

The Death of the “Data Integration Engineer”?#

Hot Take: Zero-ETL is the Kubernetes of data engineering. It abstracts away infrastructure if you stay within the ecosystem.

In 2020, a “senior data engineer” role description included:

  • Building custom connectors
  • Managing Airflow DAGs for extractions
  • Tuning Spark jobs for transformations

In 2026, those tasks are increasingly automated or eliminated by zero-ETL. What remains?

The New Skillset:

  • SQL Mastery: Transformations now happen in the warehouse. You need to be fluent in window functions, recursive CTEs, and materialized view optimization.
  • Data Modeling: Without custom Python to hide complexity, your dimensional model is your transformation layer.
  • Cost Awareness: Zero-ETL trades engineering time for cloud costs. You need to understand Redshift pricing (storage, compute, concurrency scaling) to avoid bill shock.

Architecture Decision Flowchart#

Do you need rea-time (< 1 min) analytics?
  ├─ Yes → Is your source Aurora/RDS?
  │         ├─ Yes → Use Database Mirroring (Pattern 1)
  │         └─ No → Build custom streaming pipeline (Kinesis/Kafka)
  └─ No → Is your source a supported SaaS platform?
            ├─ Yes → Use SaaS Integration (Pattern 2)
            └─ No → Traditional ETL (Glue/Airflow)
plaintext

Practical Implementation: Salesforce → Redshift#

Here’s a real-world setup guide for AWS Glue zero-ETL with Salesforce.

Step 1: Configure AWS Glue Connection#

aws glue create-connection \
  --connection-input '{
    "Name": "salesforce-zero-etl",
    "ConnectionType": "CUSTOM",
    "ConnectionProperties": {
      "CONNECTOR_TYPE": "salesforce",
      "CONNECTOR_URL": "https://login.salesforce.com",
      "USERNAME": "your-salesforce-user",
      "PASSWORD_SECRET_ID": "salesforce/api/token"
    }
  }'
bash

Step 2: Create Zero-ETL Integration#

-- In Redshift:
CREATE INTEGRATION salesforce_integration
  TYPE ZERO_ETL
  SOURCE 'salesforce-zero-etl'
  TARGET current_database();
sql

Step 3: Build Transformation Layer#

Conclusion: The Unbundling#

Zero-ETL is part of a larger trend: the unbundling of the data stack. Tasks that used to require general-purpose code (Python, Scala) are being absorbed into specialized platforms (warehouses, catalogs, orchestrators).

This isn’t the death of data engineering. It’s the maturation of it. Just as DevOps engineers stopped manually provisioning servers and started writing Terraform, data engineers will stop writing extraction scripts and start architecting semantic layers.

The question isn’t “Should I use zero-ETL?” It’s “Which parts of my stack can become declarative SQL, and which still need procedural code?”

Choose wisely. Your AWS bill depends on it.

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.