Hi Dale,
This is a good question! You would do this by writing an Import UDF. We have several examples of Import UDF that you can search for on this discussion group. But let me take an stab at this:
Here is an example UDF code for this in python:
import json
import io
import apache_log_parser
def parseAccessLog(fullPath, inStream):
line_parser = apache_log_parser.make_parser("%h %l %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"")
outputStr = []
for line in inStream.readlines():
try:
logData = line_parser(line)
except:
continue # we recommend you do some processsing here and generate an ICV row instead
yield logData
Notice the use of yield above. Yield is a python feature. It returns a generator which can be used by the consumer of the UDF, which in this case is the Xcalar Compute Environment (XCE). Each line maps to a dictionary instance that is then converted to a generator using yield.
If you needed to test your UDF with a main driver, make sure you use a stream generator. Here is example code for this:
inStream = io.open("access_log")
gen = parseAccessLog("", inStream)
for record in gen:
print "Printing record..."
for fieldName, fieldValue in record.iteritems():
print "field {}: value {}".format(fieldName, fieldValue)
Once you write your UDF, you can test this using Xcalar Design (XD). Use the Point to Data source feature and make sure your browser points to the dataset which needs to be streamed. Choose the JSON format for conversion.
Once you have this UDF tested in command line shell, you can paste it in the UDF editor in Xcalar and upload it. Using the Import Data Source you can perform the import.
Hope this helps!
Cheers,
Manoj