On this page

Last updated: April 23, 2026

Change Data Capture on Guara Cloud

Guara Cloud ships two complementary catalog services for moving data between your systems in near real time: Debezium Server for log-based Change Data Capture (CDC) from your databases and Conduit for general-purpose streaming between databases, message buses, and HTTP endpoints.

What is CDC?

Change Data Capture turns the writes already happening inside your database into a stream of events. Instead of running a nightly SELECT * dump or hand-rolling “after save” hooks in every service, a CDC engine reads the transaction log (the PostgreSQL WAL, the MySQL binlog, the MongoDB oplog) and emits one structured event per insert, update, and delete.

The practical difference: CDC tails the same log the database uses to replicate to its own standbys, so you pick up every change exactly once, in commit order, with near-zero lag and essentially no load on the source. Whenever an order is inserted into your PostgreSQL orders table, an event lands on the sink within a fraction of a second — no polling, no dual writes, no missed rows during a migration window.

Downstream, consumers do whatever they want with that stream: keep a search index fresh, invalidate caches by key, mirror writes into an analytics store, ship notifications to a webhook, hydrate a read model. The source database does not know and does not care.

Debezium vs Conduit

Both services run as one pod per pipeline, both bridge sibling catalog services automatically, and both give you an Insights tab. They differ in the shape of the problems they handle best.

You want to…Pick
CDC from PostgreSQL, MySQL, or MongoDB to NATS, Kafka, or HTTPDebezium
Lightweight Go runtime, no JVM, Hobby tier okayConduit
Stream from a non-database source (NATS subject, HTTP endpoint)Conduit
Cross-database sync (e.g. Postgres to MySQL)Conduit
Strip PII fields with an in-line processorConduit
Battle-tested snapshot + streaming on a production OLTP databaseDebezium
Apache Kafka sink with SASL/SSL pass-throughEither

Rule of thumb: if the problem is “I have a database, I want every row change as an event,” Debezium is the tested default. If the problem is “I have a pipeline between two systems, one of which might not be a database,” reach for Conduit.

Source and sink matrix

The two engines cover overlapping but not identical ground.

DebeziumConduit
Source: PostgreSQLYes (logical WAL, pgoutput)Yes (logical WAL, built-in connector)
Source: MySQLYes (binlog, row format)Yes (standalone connector)
Source: MongoDBYes (oplog, replica set)Not in v1 — use Debezium
Source: NATS JetStreamNoYes (standalone connector)
Source: HTTPNoYes (standalone connector)
Sink: NATS JetStreamYesYes
Sink: Apache KafkaYes (external)Yes (built-in connector)
Sink: PostgreSQLNoYes
Sink: MySQLNoYes
Sink: HTTP webhookYesYes

Binding modes: sibling vs external

Both services let you point each side of the pipeline (source and sink) at either:

  • Sibling catalog service — the wizard shows a picker listing every eligible catalog service already deployed in this project. Credentials, host, port, database, and URL are injected into the CDC pod as BRIDGED_SOURCE_* / BRIDGED_SINK_* env vars automatically — there is nothing to paste.
  • External URL — you paste a full URL plus a username and password. Useful for an RDS instance outside Guara, a managed Kafka cluster, a webhook you already own, or anything else off-platform. The values land in a per-service Secret and are loaded into the pod as SOURCE_EXTERNAL_* / SINK_EXTERNAL_* env vars.

You can mix modes freely: sibling source with external sink, external source with sibling sink, or both sides external. The two sides never talk to each other through Guara’s internals — they only see the env vars the orchestrator injects.

Prerequisites

Sibling PostgreSQL

Guara-managed PostgreSQL already ships with everything CDC needs:

  • wal_level=logical is enabled at startup.
  • The dbuser role is granted the REPLICATION attribute during init.

Deploy a PostgreSQL catalog service, deploy Debezium or Conduit on top of it, and CDC works with no extra steps. Existing PostgreSQL catalog services pick up the flags after their next pod restart.

External PostgreSQL

Outside Guara, the database administrator needs to configure the source before you point a pipeline at it:

  • Set wal_level=logical in postgresql.conf (requires a restart).
  • Create a role with both LOGIN and REPLICATION attributes: CREATE ROLE cdc_user LOGIN REPLICATION PASSWORD '...';.
  • Grant SELECT on the tables you want to capture, plus CREATE on the database so the engine can create its publication.

External MySQL

  • Enable the binary log in row format: log_bin=ON, binlog_format=ROW, binlog_row_image=FULL.
  • Create a user with REPLICATION SLAVE, REPLICATION CLIENT, and SELECT on the capture tables.
  • Keep binlog_expire_logs_seconds high enough that the pipeline can catch up after a restart.

External MongoDB (Debezium only)

  • The source must be a replica set (single-node replica sets are fine for development) — oplog access is only available on replica sets.
  • Provide a user with read on the capture databases and clusterMonitor so Debezium can discover shards.

Sink side

  • NATS JetStream (sibling): nothing to configure — Guara-managed NATS already exposes the credentials Debezium and Conduit need, and the bridge handles the rest.
  • Apache Kafka (external): paste the bootstrap-servers URL and the SASL username/password; SSL is negotiated if the broker advertises it.
  • HTTP webhook: paste the full URL; basic-auth username/password is sent on every request.

Quick start

The shortest path from nothing to a live stream:

  1. Deploy the source and sink catalog services first. For a typical flow you would deploy a PostgreSQL catalog service (source) and a NATS catalog service (sink) and wait for both to reach Running.
  2. Deploy Debezium or Conduit. In the wizard, pick sibling mode on both sides, select your Postgres service as the source, and your NATS service as the sink. Pick which tables to include in the tableFilter field if you only want a subset.
  3. Verify. Open the new service’s Insights tab — within a minute the replication-lag panel (Debezium) or throughput panel (Conduit) should show a non-zero value. Insert a row in PostgreSQL and watch it land on the configured NATS subject.

Next steps