The context and implications of CDC in Xcalar are different from RDBMS systems because Xcalar does not persist the originating data; Xcalar imports it from the external source systems.
Xcalar's ability to scale and process with tremendous parallelism allows you to apply a lot of concurrent resource to the problem of deducing and distilling what has changed.
If the source systems truly update the data in place, and do not persist any indicator in the data (such as an update timestamp, version number, etc), then you should use Xcalar to compare and contrast versions of data to deduce what has changed.
In order to do that, you need a two version versions of the data that can be compared.
Xcalar is extremely helpful in the case of semi-structured data, because you really need to convert the data into rows and columns so that you can perform a reasonable comparison on a row-by-row and column-by-column basis.
Using keys to help uniquely identify rows to be compared is where you start. If you do have meta-data to help (such as timestamp or version number), then you start by comparing that information.
If you have no metadata, then you big to do a systematic field by field comparison.
The Xcalar map eq operator is very useful here - it will give you a boolean result, after which you filter on all of the false values. If you have a sense of which fields change most frequently, then start by comparing those fields first so that you can identify the changed rows as early as possible in your comparison.