- Documentation
- Catalog — Database Tools & Streaming
- CDC Overview — Debezium & Conduit
Last updated: April 23, 2026
Change Data Capture on Guara Cloud
Guara Cloud ships two complementary catalog services for moving data between your systems in near real time: Debezium Server for log-based Change Data Capture (CDC) from your databases and Conduit for general-purpose streaming between databases, message buses, and HTTP endpoints.
What is CDC?
Change Data Capture turns the writes already happening inside your database into a stream of events. Instead of running a nightly SELECT * dump or hand-rolling “after save” hooks in every service, a CDC engine reads the transaction log (the PostgreSQL WAL, the MySQL binlog, the MongoDB oplog) and emits one structured event per insert, update, and delete.
The practical difference: CDC tails the same log the database uses to replicate to its own standbys, so you pick up every change exactly once, in commit order, with near-zero lag and essentially no load on the source. Whenever an order is inserted into your PostgreSQL orders table, an event lands on the sink within a fraction of a second — no polling, no dual writes, no missed rows during a migration window.
Downstream, consumers do whatever they want with that stream: keep a search index fresh, invalidate caches by key, mirror writes into an analytics store, ship notifications to a webhook, hydrate a read model. The source database does not know and does not care.
Debezium vs Conduit
Both services run as one pod per pipeline, both bridge sibling catalog services automatically, and both give you an Insights tab. They differ in the shape of the problems they handle best.
| You want to… | Pick |
|---|---|
| CDC from PostgreSQL, MySQL, or MongoDB to NATS, Kafka, or HTTP | Debezium |
| Lightweight Go runtime, no JVM, Hobby tier okay | Conduit |
| Stream from a non-database source (NATS subject, HTTP endpoint) | Conduit |
| Cross-database sync (e.g. Postgres to MySQL) | Conduit |
| Strip PII fields with an in-line processor | Conduit |
| Battle-tested snapshot + streaming on a production OLTP database | Debezium |
| Apache Kafka sink with SASL/SSL pass-through | Either |
Rule of thumb: if the problem is “I have a database, I want every row change as an event,” Debezium is the tested default. If the problem is “I have a pipeline between two systems, one of which might not be a database,” reach for Conduit.
Source and sink matrix
The two engines cover overlapping but not identical ground.
| Debezium | Conduit | |
|---|---|---|
| Source: PostgreSQL | Yes (logical WAL, pgoutput) | Yes (logical WAL, built-in connector) |
| Source: MySQL | Yes (binlog, row format) | Yes (standalone connector) |
| Source: MongoDB | Yes (oplog, replica set) | Not in v1 — use Debezium |
| Source: NATS JetStream | No | Yes (standalone connector) |
| Source: HTTP | No | Yes (standalone connector) |
| Sink: NATS JetStream | Yes | Yes |
| Sink: Apache Kafka | Yes (external) | Yes (built-in connector) |
| Sink: PostgreSQL | No | Yes |
| Sink: MySQL | No | Yes |
| Sink: HTTP webhook | Yes | Yes |
Binding modes: sibling vs external
Both services let you point each side of the pipeline (source and sink) at either:
- Sibling catalog service — the wizard shows a picker listing every eligible catalog service already deployed in this project. Credentials, host, port, database, and URL are injected into the CDC pod as
BRIDGED_SOURCE_*/BRIDGED_SINK_*env vars automatically — there is nothing to paste. - External URL — you paste a full URL plus a username and password. Useful for an RDS instance outside Guara, a managed Kafka cluster, a webhook you already own, or anything else off-platform. The values land in a per-service Secret and are loaded into the pod as
SOURCE_EXTERNAL_*/SINK_EXTERNAL_*env vars.
You can mix modes freely: sibling source with external sink, external source with sibling sink, or both sides external. The two sides never talk to each other through Guara’s internals — they only see the env vars the orchestrator injects.
Prerequisites
Sibling PostgreSQL
Guara-managed PostgreSQL already ships with everything CDC needs:
wal_level=logicalis enabled at startup.- The
dbuserrole is granted theREPLICATIONattribute during init.
Deploy a PostgreSQL catalog service, deploy Debezium or Conduit on top of it, and CDC works with no extra steps. Existing PostgreSQL catalog services pick up the flags after their next pod restart.
External PostgreSQL
Outside Guara, the database administrator needs to configure the source before you point a pipeline at it:
- Set
wal_level=logicalinpostgresql.conf(requires a restart). - Create a role with both
LOGINandREPLICATIONattributes:CREATE ROLE cdc_user LOGIN REPLICATION PASSWORD '...';. - Grant
SELECTon the tables you want to capture, plusCREATEon the database so the engine can create its publication.
External MySQL
- Enable the binary log in row format:
log_bin=ON,binlog_format=ROW,binlog_row_image=FULL. - Create a user with
REPLICATION SLAVE,REPLICATION CLIENT, andSELECTon the capture tables. - Keep
binlog_expire_logs_secondshigh enough that the pipeline can catch up after a restart.
External MongoDB (Debezium only)
- The source must be a replica set (single-node replica sets are fine for development) — oplog access is only available on replica sets.
- Provide a user with
readon the capture databases andclusterMonitorso Debezium can discover shards.
Sink side
- NATS JetStream (sibling): nothing to configure — Guara-managed NATS already exposes the credentials Debezium and Conduit need, and the bridge handles the rest.
- Apache Kafka (external): paste the bootstrap-servers URL and the SASL username/password; SSL is negotiated if the broker advertises it.
- HTTP webhook: paste the full URL; basic-auth username/password is sent on every request.
Quick start
Debezium
Log-based CDC from PostgreSQL, MySQL, or MongoDB — battle-tested against production OLTP databases.
Conduit
Lightweight Go streaming between databases, NATS, Kafka, and HTTP — with in-line PII scrubbing.
The shortest path from nothing to a live stream:
- Deploy the source and sink catalog services first. For a typical flow you would deploy a PostgreSQL catalog service (source) and a NATS catalog service (sink) and wait for both to reach Running.
- Deploy Debezium or Conduit. In the wizard, pick
siblingmode on both sides, select your Postgres service as the source, and your NATS service as the sink. Pick which tables to include in thetableFilterfield if you only want a subset. - Verify. Open the new service’s Insights tab — within a minute the replication-lag panel (Debezium) or throughput panel (Conduit) should show a non-zero value. Insert a row in PostgreSQL and watch it land on the configured NATS subject.