There're essentially 2 ways of talking to a HDFS cluster, 1) either by implementing the native Hadoop RPC (i.e. speak the HDFS protocol, which is HDFS-RPC-version dependent), or 2) with the new HTTP RPC-version agnostic way
There are 2 clients that implement the Hadoop RPC to allow you to talk HDFS:
1) The Java client (i.e. hadoop fs) -- Problem with this is this is available only to nodes within the Hadoop cluster
2) Snakebite client -- Problem with this is this doesn't support transparent HDFS encryption
To allow applications to access data from HDFS from outside the cluster in a way that's RPC-version agnostic, WebHDFS protocol was invented in the newer versions of HDFS.
1) You can talk WebHDFS directly to the HDFS cluster. More specifically you can talk WebHDFS directly to the name nodes
2) You can set up a proxy node, called a HttpFs node, which speaks WebHDFS on one side, and translates this to HDFS on the other side.
Talking WebHDFS directly to the name nodes is preferred because it avoids the bottleneck on the HttpFs node. Cloudera solves this problem by allowing you to spawn multiple HttpFs nodes to put a load balancer on top of them, but that still incurs the additional network hop, since data is streamed via the HttpFs nodes, instead of directly from the datanodes.
Unfortunately, because of a bug introduced in Java 8 that broke SPNEGO protocol, and because WebHdfs can't impersonate other users and will always act as the "hdfs" user, talking WebHdfs directly to namenodes is not always an option.
Thus, we have HttpFs and WebHdfs as possible data target types in Xcalar. Whenever possible, we should try to use WebHdfs, then fall back to Httpfs.
However, in older versions of Hadoop, we don't have WebHdfs or Httpfs, and can only talk HDFS. For that, we have our snakebite client. And so, the last fall back is our snakebite data target type, which shows up as "Unsecured HDFS" in Xcalar.
Unfortunately, snakebite client is not available in Python3. Which means "Unsecured HDFS" might be removed once we've moved to Python3. This seems ok so far, since only really old versions of Hadoop don't have WebHDFS, and we haven't seen any of those yet.