Since Hive itself is a compute platform, most of our customers do not use Hive for any heavy duty compute queries as Xcalar solves the compute issue quite easily. However you can still connect to your Hive data lake from Xcalar and perform a low-bandwidth Hive SQL query to ingest small amounts of data for sampling.
And you guessed it right, we do not recommend executing high-bandwidth queries in Hive as Hive can tend to take a while to compute the result. So if you had a use case where you needed to ingest data in parallel at a very high bandwidth you can do this very easily by directly connecting to the HDFS layer that Hive metastore references. This is the preferred option for high-bandwidth or high-throughput import.
To answer your current question, here are the steps involved in setting up a connection to your Hive Data Lake.
Step 1: Install Hive Package
The first step is to install the Hive package on each node of the Xcalar cluster. Xcalar can also provide you the python wheels to do this more easily. This is especially useful, if you did not have internet access to the python repositories. Please note you will need to be root to execute the following.
apt-get install libsasl2-dev
pip3 install sasl
pip3 install --ignore-installed six thrift # It complained about six and distutils
pip3 install thrift-sasl
pip3 install pyhive
Step 2: Restart Xcalar Cluster
This step requires you to be a Xcalar administrator. Restart the Xcalar cluster.
Step 3: Test Hive Import
You can login to Xcalar Design and use Jupyter notebook to test if you can import pyhive.
If above succeeds, you are all set.
Step 4: Setup Hive Connection
This step requires you to be a Xcalar administrator. In order to connect to your Hive data lake you can use the Database Connector Target. For instance, here is an example configuration.
The above assumes that you are not using a password server. If you do use a password server, you can mention the hostname of the password server in the pswprovider_ field, and the function name to invoke the password in the pswarguments_ field. The authentication mode is CUSTOM, assuming you have implemented a CUSTOM authenticator.
Press Add to add this Data Target.
Step 5: Test Hive Connection
Use the Import Data Source screen and select the Hive target you just created and press Next. In the next screen, you can enter an example SQL query to test your Hive connection.