Accessing HBase via Spotfire

It works! You really can access HBase from Spotfire. Even though it's a dirt simple chart accessing a handful of rows, it's enough to prove out that HBase data is available via Spotfire.

For me, part of the roadblock to using Hadoop technologies was getting data out of Hadoop for end-users to consume. Sure, I could write some custom front end site leveraging D3.js to do the graphical representation. I even started down this path using node.js to access HBase

For an environment where Spotfire and other reporting tools are used, I wanted to hook those tools up to Hadoop and visualize my data rather than building custom sites to display data visually.

Rather than using Hive, I wanted to use HBase for the faster retrieval times and also the easier insert/update/delete semantics. The Phoenix project applies a SQL layer on top of HBase and with that SQL layer, comes a JDBC driver. That's all that's needed to hook Spotfire or any other BI tool to HBase.

Configuring Spotfire

The first step is to configure Phoenix for use with your HBase cluster. The Phoenix site has documentation around this setup. Once Phoenix is enabled on each of your HBase nodes, copy the Phoenix JDBC client/driver jar file to the Spotfire server's webapps\spotfire\WEB-INF\lib directory.

The next step is to setup a datasource template to use the Phoenix driver. From the Spotfire server configuration tool, add a new datasource template with the following XML configuration:

<jdbc-type-settings>
    <type-name>hadoop_phoenix</type-name>
    <driver>org.apache.phoenix.jdbc.PhoenixDriver</driver>
    <connection-url-pattern>jdbc:phoenix:&lt;host&gt;:2181</connection-url-pattern>
    <supports-catalogs>false</supports-catalogs>
    <supports-schemas>false</supports-schemas>
    <supports-procedures>false</supports-procedures>
    <table-types>TABLE, VIEW</table-types>
    <table-expression-pattern>$$table$$</table-expression-pattern>
</jdbc-type-settings>

It took me a few tries to get the combination of "supports" options correct. Basically, I just kept switching options to false until I got the driver to work. The key piece of the template is the connection-url-pattern. When an actual datasource is configured, the host will be replaced with the Zookeeper host that supports the HBase cluster.

With the template defined, a new datasource can be configured within the information designer tool in the Spotfire client. I did not use any security on my HBase servers so no user or password is required. Phoenix does support authentication through the driver if that's required.

Once the datasource is saved with the proper ZooKeeper hostname, it's time to visualize some HBase data through an information link. At this point, Spotfire is accessing just another JDBC data source. No additional configuration or mapping is required.

Simply build your information link, use it in a data table, and then watch your HBase data come to life in all kinds of charts and maps and other visualizations.