Introduction
I started creating my GitHub Wiki containing some practical guidelines and steps related to HDP and BigSQL. I do not want to duplicate or retell HortonWorks or IBM official documents and tutorial. Just my success stories and some practical advice to remember and avoid, like Sisyphus, to roll your boulder up that Kerberos hill again and again.
The list and the content are not closed and static, I'm constantly updating and enhancing it.
https://github.com/stanislawbartkowski/wikis
Below is the current content.
LDAP and Kerberos authentication for CentOS
https://github.com/stanislawbartkowski/wikis/wiki/Centos---Kerberos---LDAP
The guidelines for how to set up Kerberos and LDAP authentication for CentOS. It is a good practice to have host systems secured before installing HDP. Although described in many places, I had a hard time to put it together and end up in the loving arms of Kerberos. Particularly, secure LDAP connection and two-ways LDAP security required a lot of patience with gritted teeth.
Dockerized Kerberos
https://github.com/stanislawbartkowski/docker-kerberos
Docker is the buzzword these days. There are several Kerberos on Docker implementations circulating around but I decided to create my own I'm confident with.
HDP and GPFS
https://github.com/stanislawbartkowski/javahotel/tree/hdpgpf
GPFS is the alternative for HDFS. HDP running on the top GPFS has some advantages and disadvantages. Before going live, it is a good idea to practice using a local cluster. Here I'm presenting practical steps on how to set up HDP using GPFS as a data storage. Important: you need a valid IBM GPFS license to do that.
DB2 11 and aggregate UDF
https://github.com/stanislawbartkowski/javahotel/tree/db2aggr
Seems strange but until DB2 11 it was not possible to implement custom aggregate UDF in DB2. This feature was finally added in DB2 11 but I found it not well documented. So it took me some time to create even a simple aggregate UDF but in the end, I made it.
IBM BigSQL, monitoring and maintenance queries
https://github.com/stanislawbartkowski/wikis/wiki/BigSQL,-useful-commands
For some time, I was supporting the IBM client on issues related to
IBM BigSQL. During my engagement, I created a notebook with a number of useful SQL queries related to different aspects of maintenance, performance, security etc for copying and pasting.
Dockerized IBM DB2
https://github.com/stanislawbartkowski/docker-db2
Yet another DB2 on Docker. Download the free release of DB2 or, if you are lucky enough, your licensed edition and be happy to get up DB2 by the tap of your finger.
Enable CentOS for Active Directory
https://github.com/stanislawbartkowski/wikis/wiki/CentOS---Active-Directory
Enable CentOS or RedHat for Active Directory authentication is easy comparing to MIT Kerberos/OpenLDAP but still, there are some hooks and nooks to know.
Monitoring tool for IBM BigSQL
https://github.com/stanislawbartkowski/bigsqlmoni
IBM BigSQL/DB2 contains a huge variety of monitoring queries but in most case what is important is not the value but delta. So I created a simple tool to store and provide deltas for some performance and monitoring indicators. But still looking for a way to make practical use of it.
Enable HDP for Active Directory/Kerberos/LDAP
https://github.com/stanislawbartkowski/wikis/wiki/HDP-2.6.5-3.1-and-Active-Directory
Practical steps on implementing Kerberos security in HDP. A simple test to make sure that security is in place.
Enable HDP services for Active Directory/Kerberos/LDAP
https://github.com/stanislawbartkowski/hdpactivedirectory
Wiki:
https://github.com/stanislawbartkowski/hdpactivedirectory/wiki
Basic Kerberization enables Kerberos authentication for Hadoop services and makes HDFS secure. Next step is to enable a particular service for Kerberos authentication and LDAP authorization. It is highly recommended to activate Ranger security. The GitHub Wiki attached contains guidelines and practical steps to enable Hadoop services for AD/Kerberos. Every chapter contains also a basic test to make sure that security is enforced and has teeth.
HiBench for HDP
https://github.com/stanislawbartkowski/MyHiBench
HiBench is widely recognized as a leading benchmark tool for Hadoop services. Unfortunately, the development seems to be closed two years ago and I found it difficult to run it in HDP 3.1. Also, the tool seems not to be enabled for Kerberos security. After spending some time trying to find a workaround, I decided to create my own fork of HiBench.
This fork is dedicated to HDP only, the original HiBench can be used also against a standalone instalment of some Hadoop services. It required also redeveloping some Java and Scala code related particularly to Kafka and Spark Streaming.
Several Java/Scala tools to test Kerberos security
HDFS Java client
https://github.com/stanislawbartkowski/KafkaSample
Kafka Java client
https://github.com/stanislawbartkowski/KafkaSample
Scala Spark Streaming against Kafka
https://github.com/stanislawbartkowski/SampleSparkStreaming
Several simple Java/Scala tools to test Kerberos connectivity. The tools come with source code to review. The tools are using as an additional test, next to batch or command line, for testing Kerberos security.