Dear mlarson,
You can attach headers to source data files in Xcalar Design during the import phase that creates a dataset.
Directory: /var/local/datasets/credit
Files: transactions.csv, transactions.hdr
The header file contains a single line with column headers.
Both header and data file are in CSV format with "|
" as the field separator.
A Python User-Defined Function (UDF) is invoked by Xcalar to attach the column headers.
Create a file called attach_headers.py
using any text editor with these contents:
def attach_headers(inPath, inStream):
headers = []
# Header file has same path and name with .hdr extension
hdrFilename = inPath.split(".")[0] + '.hdr'
# read the first line of the header file, parse fields
try:
with open(hdrFilename,'r') as hdrFile:
for hdrLine in hdrFile:
headers = hdrLine.split("|")
headersExist = True
break
except IOError: # ignore if there is no header file
headersExist = False
# Build a python dictionary for each record, with column name as key
# and data field as value. Yield one record at a time in this loop.
for line in inStream:
fields = line.split("|")
record = {}
for i in range(len(fields)):
if headersExist:
record[headers[i]] = fields[i]
else:
record["column" + str(i)] = fields[i]
yield record
- Create or reopen an existing workbook in Xcalar Design
- Select Import Datasets left-panel icon
- Select Data Source Protocol from pulldown as
file:///
- Browse to
/var/local/datasets/credit
and click Next to select the entire directory
- Select the
Parse Data With UDF
checkbox
- Click on the UDF left-panel icon (shaped like a scroll)
- Browse File, select
attach_headers.py
, name the module attach_headers and click ADD UDF
- Now choose
attach_headers
from the Parse Data With UDF
dialog
- Click on
refresh preview
to apply the UDF and check for correct header/data output
- Select
CREATE TABLE
to create the Transactions virtual table
The Transactions table can be analyzed and transformed in Xcalar Design, then exported as a CSV file with header.