Introduction

In our latest demonstration, we showcased how Smart Data Frameworks (SDF) can be used to migrate and continuously replicate data from IBM Netezza to Databricks. This capability is a significant milestone—not just for SDF, but for teams looking to modernize their data infrastructure while maintaining consistency across environments.

Setting the Stage

The walkthrough begins with configuring connections to the source (Netezza) and target (Databricks), alongside an AWS S3 bucket used as a staging area. This setup enables SDF to orchestrate a multi-step data pipeline that handles both initial migration and ongoing replication.

Migration Workflow Highlights

Using SDF’s migration wizard, the process includes:

  • Schema Migration: Extracting table definitions from Netezza, generating DDL, and creating equivalent schemas in Databricks.
  • Initial Data Migration: Exporting ~9.8M rows from Netezza, staging in S3, and loading into Databricks. Validation confirmed identical row counts across source and target.
  • Continuous Replication Setup: Transitioning the migration job into a replication job with configurable scheduling—supporting near real-time or batch intervals.
  • Replication Demonstration: Validating replication for inserts, deletes, updates, and truncations. For example:
    • Insert of 280k records
    • Delete operations reducing dataset size
    • Bulk updates to values
    • Full table truncation

What Makes This Different?

While SDF has long supported data migration and replication, this project introduced several unique challenges:

  • Near Real-Time Change Data Capture (CDC) from Netezza: SDF now supports heterogeneous replication from Netezza to any target database, including Databricks. This is a capability not offered by other platforms, which typically only support homogenous replication (e.g., Netezza to Netezza).
  • Cloud Database Complexity: Unlike on-prem systems where data can be streamed directly, cloud targets like Databricks require multi-step ingestion. Data must be:
    1. Unloaded into local files
    2. Uploaded to a cloud object store (e.g., S3)
    3. Ingested by the target database

    This adds overhead and complexity, making the replication process more intricate than with traditional systems.

Why It Matters

This enhancement to SDF reflects our deep expertise with Netezza and our commitment to supporting modern, cloud-native architectures. By enabling both initial synchronization and ongoing replication, SDF empowers organizations to maintain data consistency across hybrid environments—without vendor lock-in or proprietary constraints.

Conclusion

Whether you’re migrating legacy systems or building out a cloud-first data strategy, SDF’s new capabilities offer a powerful, flexible solution for data movement and replication. And with support for near real-time CDC from Netezza to any target, the possibilities are wide open. More insights here.

Roy Hammett
I am an IT consultant with 30 years experience in Data Warehousing and Data Analytics. I have written blog articles and website content for Smart Associates for the past 8 years, focusing on their range of products and services, data warehousing, data analytics, business intelligence, partner products and more.