Continuous Data Replication to Snowflake with Oracle GoldenGate
Oracle GoldenGate enables near real-time CDC from Oracle databases into Snowflake via S3 and Snowpipe. Here is how the pipeline works and when to use it.
Many organisations running Snowflake as their analytical platform also maintain substantial Oracle database estates for transactional workloads. Getting data from those Oracle instances into Snowflake continuously — with minimal latency and without impacting production database performance — is one of the more common integration challenges in enterprise data engineering.
Oracle GoldenGate is the established solution for this problem. In this article, we will discuss what GoldenGate is, how it works, why it remains relevant for Oracle-to-Snowflake replication, and how the end-to-end pipeline operates in practice.
Features change from time to time with new features being added regularly, it is recommended that you review the documentation ↗ for the latest on what specific features are included with any of the Editions.
What is Oracle GoldenGate?#
Oracle GoldenGate (OGG) is a data replication technology that captures database transactions in near real-time using Change Data Capture (CDC). Rather than running periodic bulk extracts, GoldenGate reads the Oracle redo log directly, capturing every insert, update, and delete as it occurs and propagating those changes to target systems with latency measured in seconds.
GoldenGate was originally designed for database-to-database replication — keeping two Oracle instances in sync for high availability or active-active configurations. Over time it evolved to support heterogeneous targets, including cloud data warehouses, and its “for Big Data” edition added handlers for writing to file formats, message queues, and cloud storage.
How GoldenGate Works with Snowflake#
There is no direct GoldenGate-to-Snowflake connector. Instead, the integration uses a three-stage pipeline that routes through AWS S3 and Snowflake’s Snowpipe service:
Each stage is discrete and independently configurable, which makes the pipeline flexible but requires careful co-ordination during setup.
The Three Stages in Detail#
Stage 1: Extract (Oracle to GoldenGate Trail Files)#
The extract process connects to the Oracle source database and reads from the redo log. It is configured with a parameter file specifying which tables to capture:
EXTRACT hr
USERIDALIAS ogg
LOGALLSUPCOLS
UPDATERECORDFORMAT COMPACT
EXTTRAIL /gghome/ogg/dirdat/hr
SOURCECATALOG orcl
TABLE hr.employee;
TABLE hr.departments;plaintextExplanation: LOGALLSUPCOLS ensures that all columns are captured for update operations, not just the changed
columns. UPDATERECORDFORMAT COMPACT reduces trail file size by omitting unchanged column values. EXTTRAIL specifies
the local path where GoldenGate writes its internal trail files (binary transaction logs).
Stage 2: Transform (Trail Files to JSON via GoldenGate for Big Data)#
The GoldenGate for Big Data (GG4BD) component reads the trail files and converts them into target formats. A properties file configures the file writer handler and S3 event handler:
gg.handlerlist=filewriter,s3
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.format=json
gg.handler.filewriter.eventHandler=s3
gg.handler.s3.type=aws_s3
gg.handler.s3.region=eu-west-1
gg.handler.s3.bucketMappingTemplate=my-goldengate-bucket
gg.handler.s3.pathMappingTemplate=goldengate/${fullyQualifiedTableName}/
goldengate.userexit.writers=javawriter
javawriter.bootoptions=-Xmx512m -Djava.class.path=.:ggjava/ggjava.jar:./dirprmplaintextImportant: The javawriter.bootoptions property must appear on a single line in the properties file. This is a
common configuration error — documentation examples sometimes show it spanning multiple lines, but the parser requires
it to be on one line. Similarly, goldengate.userexit.writers=javawriter must be present for the boot options to be
recognised.
Stage 3: Load (S3 to Snowflake via Snowpipe)#
Once JSON files land in S3, Snowpipe ingests them automatically using an SQS event notification:
-- Create a stage pointing to the S3 bucket
CREATE OR REPLACE STAGE goldengate_stage
URL = 's3://my-goldengate-bucket/goldengate/'
CREDENTIALS = (AWS_KEY_ID = '...' AWS_SECRET_KEY = '...');
-- Create a landing table using VARIANT for the JSON payload
CREATE OR REPLACE TABLE hr_employee_raw (
oggdata VARIANT,
batch_ts TIMESTAMP_LTZ(9) DEFAULT CURRENT_TIMESTAMP()
);
-- Create the pipe with auto_ingest enabled
CREATE OR REPLACE PIPE hr_employee_pipe
AUTO_INGEST = TRUE
AS
COPY INTO hr_employee_raw (oggdata, batch_ts)
FROM (
SELECT $1, CURRENT_TIMESTAMP()
FROM @goldengate_stage
)
FILE_FORMAT = (TYPE = 'JSON');sqlExplanation: AUTO_INGEST = TRUE configures Snowpipe to listen for SQS notifications from the S3 bucket. When
GoldenGate writes a new JSON file, S3 publishes an event to SQS, and Snowpipe picks it up automatically — no polling or
scheduling required. The data lands in the oggdata VARIANT column, preserving the full JSON structure including
GoldenGate’s transaction metadata.
Querying the Replicated Data#
GoldenGate’s JSON output includes both the row data and transaction context. Snowflake’s VARIANT type lets you query this directly using dot notation:
-- Extract employee data from GoldenGate JSON payload
SELECT
oggdata:op_type::STRING AS operation, -- I=insert, U=update, D=delete
oggdata:op_ts::TIMESTAMP_LTZ AS operation_time,
oggdata:after:EMPLOYEE_ID::INT AS employee_id,
oggdata:after:FIRST_NAME::STRING AS first_name,
oggdata:after:LAST_NAME::STRING AS last_name,
oggdata:after:SALARY::FLOAT AS salary,
batch_ts
FROM hr_employee_raw
WHERE oggdata:op_type::STRING != 'D' -- exclude deletes for current state
ORDER BY operation_time DESC;sqlExplanation: GoldenGate’s JSON structure places the operation type in op_type and the changed row values under
after (for inserts and updates) or before (for deletes). This schema-on-read approach means changes to the Oracle
table structure do not break the Snowflake landing table — new columns simply appear in the JSON.
When to Use Oracle GoldenGate#
GoldenGate is best suited to specific scenarios. It is not always the right choice:
Use GoldenGate when:
- The source is Oracle and you need genuine CDC (not periodic bulk extracts)
- Latency requirements are under 60 seconds
- You need to capture all DML operations including updates and deletes with full before/after images
- You are already running GoldenGate for other replication use cases in your estate
Consider alternatives when:
- The source is not Oracle (Debezium, AWS DMS, or Fivetran are better fits for other databases)
- You only need daily loads (bulk export with Snowpipe or COPY INTO is simpler to operate)
- You want a managed, connector-based solution without GoldenGate’s operational overhead
Best Practices#
From experience, I would recommend considering the below:
-
Use LOGALLSUPCOLS: Without this, update records capture only changed columns. Downstream queries that need the full row state for updates will be incomplete.
-
Flatten JSON into structured tables via Dynamic Tables or Tasks: The VARIANT landing table is a staging layer. Create a downstream structured layer using Dynamic Tables or scheduled Tasks to materialise typed columns for analytical queries.
-
Monitor trail file lag: GoldenGate’s extract lag metric tells you how far behind the process is from the redo log. Set up alerting when lag exceeds your SLA threshold.
-
Test S3 event notifications independently: SQS misconfiguration is a common failure point. Test that S3 events flow to Snowpipe before going live.
Common Pitfalls#
Pitfall: javawriter.bootoptions on Multiple Lines#
Problem: GoldenGate for Big Data fails to start or ignores AWS SDK configuration because the boot options property spans multiple lines.
Solution: Ensure javawriter.bootoptions is a single continuous line in the properties file, regardless of how
documentation examples format it.
Pitfall: Snowpipe Stalling on Malformed JSON#
Problem: GoldenGate occasionally writes malformed JSON files (e.g. during process restart), causing Snowpipe to skip files silently.
Solution: Monitor SYSTEM$PIPE_STATUS and query INFORMATION_SCHEMA.COPY_HISTORY regularly. Configure Snowpipe
error notifications to catch load failures promptly.
Conclusion#
Oracle GoldenGate remains one of the most capable CDC solutions for Oracle-to-Snowflake replication. Its combination of redo-log-based capture, flexible output formats, and near-real-time latency makes it well-suited to enterprise scenarios where Snowflake needs a live view of Oracle transactional data.
The three-stage architecture — Extract → S3 → Snowpipe — is more complex than managed connectors, but it provides full control over the replication pipeline and operates entirely within your cloud and network boundaries.
Key Takeaways:
- GoldenGate uses CDC via Oracle’s redo log, achieving sub-minute latency without impacting source performance
- The Snowflake integration routes through S3 and Snowpipe — there is no direct GoldenGate-to-Snowflake connector
- Data lands in a VARIANT column, supporting schema-on-read and absorbing Oracle schema changes gracefully
javawriter.bootoptionsmust be on a single line — this is the most common configuration error- Downstream structured tables should be created via Dynamic Tables or Tasks on top of the VARIANT landing layer
Features change from time to time with new features being added regularly, it is recommended that you review the documentation ↗ for the latest on what specific features are included with any of the Editions.