❄️
Data Flakes

Back

Many organisations running Snowflake as their analytical platform also maintain substantial Oracle database estates for transactional workloads. Getting data from those Oracle instances into Snowflake continuously — with minimal latency and without impacting production database performance — is one of the more common integration challenges in enterprise data engineering.

Oracle GoldenGate is the established solution for this problem. In this article, we will discuss what GoldenGate is, how it works, why it remains relevant for Oracle-to-Snowflake replication, and how the end-to-end pipeline operates in practice.

Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.

What is Oracle GoldenGate?#

Oracle GoldenGate (OGG) is a data replication technology that captures database transactions in near real-time using Change Data Capture (CDC). Rather than running periodic bulk extracts, GoldenGate reads the Oracle redo log directly, capturing every insert, update, and delete as it occurs and propagating those changes to target systems with latency measured in seconds.

GoldenGate was originally designed for database-to-database replication — keeping two Oracle instances in sync for high availability or active-active configurations. Over time it evolved to support heterogeneous targets, including cloud data warehouses, and its “for Big Data” edition added handlers for writing to file formats, message queues, and cloud storage.

How GoldenGate Works with Snowflake#

There is no direct GoldenGate-to-Snowflake connector. Instead, the integration uses a three-stage pipeline that routes through AWS S3 and Snowflake’s Snowpipe service:

graph LR A[(Oracle Database<br/>source)] --> B[GoldenGate Extract<br/>reads redo log] B --> C[GoldenGate for Big Data<br/>File Writer Handler] C --> D[GoldenGate S3 Handler<br/>writes JSON files] D --> E[(AWS S3<br/>staging bucket)] E --> F[SQS Event Notification] F --> G[Snowpipe<br/>auto-ingest] G --> H[(Snowflake<br/>VARIANT table)]

Each stage is discrete and independently configurable, which makes the pipeline flexible but requires careful co-ordination during setup.

The Three Stages in Detail#

Stage 1: Extract (Oracle to GoldenGate Trail Files)#

The extract process connects to the Oracle source database and reads from the redo log. It is configured with a parameter file specifying which tables to capture:

EXTRACT hr
USERIDALIAS ogg
LOGALLSUPCOLS
UPDATERECORDFORMAT COMPACT
EXTTRAIL /gghome/ogg/dirdat/hr
SOURCECATALOG orcl
TABLE hr.employee;
TABLE hr.departments;
plaintext

Explanation: LOGALLSUPCOLS ensures that all columns are captured for update operations, not just the changed columns. UPDATERECORDFORMAT COMPACT reduces trail file size by omitting unchanged column values. EXTTRAIL specifies the local path where GoldenGate writes its internal trail files (binary transaction logs).

Stage 2: Transform (Trail Files to JSON via GoldenGate for Big Data)#

The GoldenGate for Big Data (GG4BD) component reads the trail files and converts them into target formats. A properties file configures the file writer handler and S3 event handler:

gg.handlerlist=filewriter,s3
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.format=json
gg.handler.filewriter.eventHandler=s3
gg.handler.s3.type=aws_s3
gg.handler.s3.region=eu-west-1
gg.handler.s3.bucketMappingTemplate=my-goldengate-bucket
gg.handler.s3.pathMappingTemplate=goldengate/${fullyQualifiedTableName}/
goldengate.userexit.writers=javawriter
javawriter.bootoptions=-Xmx512m -Djava.class.path=.:ggjava/ggjava.jar:./dirprm
plaintext

Important: The javawriter.bootoptions property must appear on a single line in the properties file. This is a common configuration error — documentation examples sometimes show it spanning multiple lines, but the parser requires it to be on one line. Similarly, goldengate.userexit.writers=javawriter must be present for the boot options to be recognised.

Stage 3: Load (S3 to Snowflake via Snowpipe)#

Once JSON files land in S3, Snowpipe ingests them automatically using an SQS event notification:

Explanation: AUTO_INGEST = TRUE configures Snowpipe to listen for SQS notifications from the S3 bucket. When GoldenGate writes a new JSON file, S3 publishes an event to SQS, and Snowpipe picks it up automatically — no polling or scheduling required. The data lands in the oggdata VARIANT column, preserving the full JSON structure including GoldenGate’s transaction metadata.

Querying the Replicated Data#

GoldenGate’s JSON output includes both the row data and transaction context. Snowflake’s VARIANT type lets you query this directly using dot notation:

-- Extract employee data from GoldenGate JSON payload
SELECT
    oggdata:op_type::STRING        AS operation,       -- I=insert, U=update, D=delete
    oggdata:op_ts::TIMESTAMP_LTZ   AS operation_time,
    oggdata:after:EMPLOYEE_ID::INT AS employee_id,
    oggdata:after:FIRST_NAME::STRING AS first_name,
    oggdata:after:LAST_NAME::STRING  AS last_name,
    oggdata:after:SALARY::FLOAT    AS salary,
    batch_ts
FROM hr_employee_raw
WHERE oggdata:op_type::STRING != 'D'   -- exclude deletes for current state
ORDER BY operation_time DESC;
sql

Explanation: GoldenGate’s JSON structure places the operation type in op_type and the changed row values under after (for inserts and updates) or before (for deletes). This schema-on-read approach means changes to the Oracle table structure do not break the Snowflake landing table — new columns simply appear in the JSON.

When to Use Oracle GoldenGate#

GoldenGate is best suited to specific scenarios. It is not always the right choice:

graph TD A{Source database?} -->|Oracle| B{Latency requirement?} B -->|Under 60 seconds| C{Volume?} C -->|High volume CDC| D[GoldenGate is a strong fit] C -->|Low volume| E[Consider Fivetran or Airbyte<br/>simpler to operate] B -->|Hourly or daily acceptable| F[Consider Snowpipe with bulk export<br/>or managed connectors] A -->|Non-Oracle| G[GoldenGate not appropriate<br/>use source-native CDC]

Use GoldenGate when:

  • The source is Oracle and you need genuine CDC (not periodic bulk extracts)
  • Latency requirements are under 60 seconds
  • You need to capture all DML operations including updates and deletes with full before/after images
  • You are already running GoldenGate for other replication use cases in your estate

Consider alternatives when:

  • The source is not Oracle (Debezium, AWS DMS, or Fivetran are better fits for other databases)
  • You only need daily loads (bulk export with Snowpipe or COPY INTO is simpler to operate)
  • You want a managed, connector-based solution without GoldenGate’s operational overhead

Best Practices#

From experience, I would recommend considering the below:

  1. Use LOGALLSUPCOLS: Without this, update records capture only changed columns. Downstream queries that need the full row state for updates will be incomplete.

  2. Flatten JSON into structured tables via Dynamic Tables or Tasks: The VARIANT landing table is a staging layer. Create a downstream structured layer using Dynamic Tables or scheduled Tasks to materialise typed columns for analytical queries.

  3. Monitor trail file lag: GoldenGate’s extract lag metric tells you how far behind the process is from the redo log. Set up alerting when lag exceeds your SLA threshold.

  4. Test S3 event notifications independently: SQS misconfiguration is a common failure point. Test that S3 events flow to Snowpipe before going live.

Common Pitfalls#

Pitfall: javawriter.bootoptions on Multiple Lines#

Problem: GoldenGate for Big Data fails to start or ignores AWS SDK configuration because the boot options property spans multiple lines.

Solution: Ensure javawriter.bootoptions is a single continuous line in the properties file, regardless of how documentation examples format it.

Pitfall: Snowpipe Stalling on Malformed JSON#

Problem: GoldenGate occasionally writes malformed JSON files (e.g. during process restart), causing Snowpipe to skip files silently.

Solution: Monitor SYSTEM$PIPE_STATUS and query INFORMATION_SCHEMA.COPY_HISTORY regularly. Configure Snowpipe error notifications to catch load failures promptly.

Conclusion#

Oracle GoldenGate remains one of the most capable CDC solutions for Oracle-to-Snowflake replication. Its combination of redo-log-based capture, flexible output formats, and near-real-time latency makes it well-suited to enterprise scenarios where Snowflake needs a live view of Oracle transactional data.

The three-stage architecture — Extract → S3 → Snowpipe — is more complex than managed connectors, but it provides full control over the replication pipeline and operates entirely within your cloud and network boundaries.

Key Takeaways:

  • GoldenGate uses CDC via Oracle’s redo log, achieving sub-minute latency without impacting source performance
  • The Snowflake integration routes through S3 and Snowpipe — there is no direct GoldenGate-to-Snowflake connector
  • Data lands in a VARIANT column, supporting schema-on-read and absorbing Oracle schema changes gracefully
  • javawriter.bootoptions must be on a single line — this is the most common configuration error
  • Downstream structured tables should be created via Dynamic Tables or Tasks on top of the VARIANT landing layer

Features change from time to time with new features being added regularly, it is recommended that you review the documentation for the latest on what specific features are included with any of the Editions.

Further Reading#

Disclaimer

The information provided on this website is for general informational purposes only. While we strive to keep the information up to date and correct, there may be instances where information is outdated or links are no longer valid. We make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk.