Hi -- thanks for your question!
There are many reasons why data import could take longer than desired. If the data is in a simple format like CSV, and the I/O performance capabilities of your cluster are reasonably high, this might simply be a problem of too few data source files. Xcalar parallelizes data import by using multiple threads of execution, one per data source file. If there are only two files, Xcalar will only execute two threads to perform the work. One way you might be able to speed up the import process is to break up dataset into a larger number of smaller files.
Two other reasons that could cause data import performance to be slow are:
A complex transformation executing in an import UDF, which could be CPU-bound or could consume a large amount of memory. This may require tuning the import UDF.
Configurations that result in poor I/O performance, such as a multi-node cluster attached to a shared filesystem via a single 1Gb network link. In such a configuration, importing 40GB of data will take about a minute, even if there is no other load on the network and the shared storage server is otherwise idle. This is regardless of the number of nodes in the cluster.
Hope this helps!