Great question. Yes you can use Xcalar to determine if two tables are identical. Several GBs per table is no problem, sizewise. Here’s how:
- Add your Excel file as a data source, name the new table ExcelTab.
- Add a column to your Excel file by right-clicking on a column header and selecting Add a column → On the Right/Left. Name it InExcel.
- Assign a string value of “Excel” to InExcel using the Formula Bar formula =map(string(“Excel”)).
- Repeat the above steps to your Xcalar modeling dataflow output table, creating a column InXcalar with value Xcalar.
- If you have a common primary key field in the two tables, right-click on the key field's column from XcalarTab, and select Join....
- Choose the Join Type: Full Outer Join.
- Select ExcelTab as the 2nd table. Select the column in ExcelTab that matches the column in XcalarTab.If you do not have a primary key, instead, you will need to join all of the columns in the table, except InExcel and InXcalar. Select the Add Another Clause icon and add the 2nd column from XcalarTab and ExcelTab. Repeat the previous step for all columns in XcalarTab and ExcelTab except InXcalar and InExcel.
- Click NEXT.
- Type in the name for your new table and click JOIN TABLES.
- Next, right click on the InXcalar column in your new table and select Filter..., and select Filter Function: neq and Operand 2(b) as Xcalar. Click the + Additional Condition icon and add a second filter so that InExcel is not equal to Excel. Confirm that the two clauses are combined using the logical OR operation and click FILTER. This removes all rows that are in both tables, XcalarTab and ExcelTab.
- If any rows remain, then the two tables are not identical. You will notice that remaining rows will have the value FNF (field not found) in either the InExcel or InXcalar column. This means that when the table was joined, this row came from the Excel or Xcalar table, and there was no equivalent row in the table with the FNF value.
This process also identifies each difference between the two tables.
Note: This operation is memory intensive, but shouldn’t pose a challenge for your Xcalar cluster. If you run into any concerns with memory, schedule a time to execute this operation when the rest of your team has suspended their Xcalar Workbooks.
Let me know if you have any questions regarding this.