With a Dgraph enterprise license, you can use Dgraph’s change data capture (CDC) capabilities to track data changes over time.
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
With a Dgraph enterprise license, you can use change data capture (CDC) to track data changes over time; including mutations and drops in your database. Dgraph’s CDC implementation lets you use Kafka or a local file as a sink to store CDC updates streamed by Dgraph Alpha leader nodes.
When CDC is enabled, Dgraph streams events for all set
and delete
mutations,
except those that affect password fields; along with any drop events. Live
Loader events are recorded by CDC, but Bulk Loader events aren’t.
CDC events are based on changes to Raft logs. So, if the sink is not reachable by the Alpha leader node, then Raft logs expand as events are collected on that node until the sink is available again. You should enable CDC on all Dgraph Alpha nodes to avoid interruptions in the stream of CDC events.
Kafka records CDC events under the dgraph-cdc
topic. The topic must be created
before events are sent to the broker. To enable CDC and sink events to Kafka,
start Dgraph Alpha with the --cdc
command and the sub-options shown below, as
follows:
If you use Kafka on the localhost without SASL authentication, you can just specify the hostname and port used by Kafka, as follows:
If the Kafka cluster to which you are connecting requires TLS, the ca-cert
option is required. Note that this certificate can be self-signed.
To enable CDC and sink results to a local unencrypted file, start Dgraph Alpha
with the --cdc
command and the sub-option shown below, as follows:
The --cdc
option includes several sub-options that you can use to configure
CDC when running the dgraph alpha
command:
Sub-option | Example dgraph alpha command option | Notes |
---|---|---|
tls | --tls=false | boolean flag to enable/disable TLS while connecting to Kafka. |
ca-cert | --cdc "ca-cert=/cert-dir/ca.crt" | Path and filename of the CA root certificate used for TLS encryption, if not specified, Dgraph uses system certs if tls=true |
client-cert | --cdc "client-cert=/c-certs/client.crt" | Path and filename of the client certificate used for TLS encryption |
client-key | --cdc "client-cert=/c-certs/client.key" | Path and filename of the client certificate private key |
file | --cdc "file=/sink-dir/cdc-file" | Path and filename of a local file sink (alternative to Kafka sink) |
kafka | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | Hostname(s) of the Kafka hosts. May require authentication using the sasl-user and sasl-password sub-options |
sasl-user | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | SASL username for Kafka. Requires the kafka and sasl-password sub-options |
sasl-password | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | SASL password for Kafka. Requires the kafka and sasl-username sub-options |
sasl-mechanism | --cdc "kafka=kafka-hostname; sasl-mechanism=PLAIN" | The SASL mechanism for Kafka (PLAIN, SCRAM-SHA-256 or SCRAM-SHA-512). The default is PLAIN |
CDC events are in JSON format. Most CDC events look like the following example:
The Meta.Commit_Ts
value (shown above as "meta":{"commit_ts":5}
) will
increase with each CDC event, so you can use this value to find duplicate events
if those occur due to Raft leadership changes in your Dgraph Alpha group.
A set mutation event updating counter.val
to 10 would look like the following:
Similarly, a delete mutation event that removes all values for the Author.name
field for a specified node would look like the following:
CDC drop events look like the following example event for “drop all”:
The operation
field specifies which drop operation (attribute
, type
,
specified data
, or all
data) is tracked by the CDC event.
When you enable CDC in a multi-tenant environment, CDC events streamed to Kafka are distributed by the Kafka client. It distributes events between the available Kafka partitions based on their multi-tenancy namespace.
CDC has the following known limitations:
dgraph alpha
commandWith a Dgraph enterprise license, you can use Dgraph’s change data capture (CDC) capabilities to track data changes over time.
We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.
With a Dgraph enterprise license, you can use change data capture (CDC) to track data changes over time; including mutations and drops in your database. Dgraph’s CDC implementation lets you use Kafka or a local file as a sink to store CDC updates streamed by Dgraph Alpha leader nodes.
When CDC is enabled, Dgraph streams events for all set
and delete
mutations,
except those that affect password fields; along with any drop events. Live
Loader events are recorded by CDC, but Bulk Loader events aren’t.
CDC events are based on changes to Raft logs. So, if the sink is not reachable by the Alpha leader node, then Raft logs expand as events are collected on that node until the sink is available again. You should enable CDC on all Dgraph Alpha nodes to avoid interruptions in the stream of CDC events.
Kafka records CDC events under the dgraph-cdc
topic. The topic must be created
before events are sent to the broker. To enable CDC and sink events to Kafka,
start Dgraph Alpha with the --cdc
command and the sub-options shown below, as
follows:
If you use Kafka on the localhost without SASL authentication, you can just specify the hostname and port used by Kafka, as follows:
If the Kafka cluster to which you are connecting requires TLS, the ca-cert
option is required. Note that this certificate can be self-signed.
To enable CDC and sink results to a local unencrypted file, start Dgraph Alpha
with the --cdc
command and the sub-option shown below, as follows:
The --cdc
option includes several sub-options that you can use to configure
CDC when running the dgraph alpha
command:
Sub-option | Example dgraph alpha command option | Notes |
---|---|---|
tls | --tls=false | boolean flag to enable/disable TLS while connecting to Kafka. |
ca-cert | --cdc "ca-cert=/cert-dir/ca.crt" | Path and filename of the CA root certificate used for TLS encryption, if not specified, Dgraph uses system certs if tls=true |
client-cert | --cdc "client-cert=/c-certs/client.crt" | Path and filename of the client certificate used for TLS encryption |
client-key | --cdc "client-cert=/c-certs/client.key" | Path and filename of the client certificate private key |
file | --cdc "file=/sink-dir/cdc-file" | Path and filename of a local file sink (alternative to Kafka sink) |
kafka | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | Hostname(s) of the Kafka hosts. May require authentication using the sasl-user and sasl-password sub-options |
sasl-user | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | SASL username for Kafka. Requires the kafka and sasl-password sub-options |
sasl-password | --cdc "kafka=kafka-hostname; sasl-user=tstark; sasl-password=m3Ta11ic" | SASL password for Kafka. Requires the kafka and sasl-username sub-options |
sasl-mechanism | --cdc "kafka=kafka-hostname; sasl-mechanism=PLAIN" | The SASL mechanism for Kafka (PLAIN, SCRAM-SHA-256 or SCRAM-SHA-512). The default is PLAIN |
CDC events are in JSON format. Most CDC events look like the following example:
The Meta.Commit_Ts
value (shown above as "meta":{"commit_ts":5}
) will
increase with each CDC event, so you can use this value to find duplicate events
if those occur due to Raft leadership changes in your Dgraph Alpha group.
A set mutation event updating counter.val
to 10 would look like the following:
Similarly, a delete mutation event that removes all values for the Author.name
field for a specified node would look like the following:
CDC drop events look like the following example event for “drop all”:
The operation
field specifies which drop operation (attribute
, type
,
specified data
, or all
data) is tracked by the CDC event.
When you enable CDC in a multi-tenant environment, CDC events streamed to Kafka are distributed by the Kafka client. It distributes events between the available Kafka partitions based on their multi-tenancy namespace.
CDC has the following known limitations:
dgraph alpha
command