javahotel

wtorek, 22 października 2019

BigSQL and HDP upgrade

Problem
I spent several sleepless nights trying to resolve the really nasty problem. It happened after upgrade from HDP 2.6.4 and BigSQL 5.0 to HDP 3.1 and BigSQL 6.0. Everything runs smoothly, even BigSQL Healthcheck was smiling. The only exception was "LOAD HADOOP" command which failed. BigSQL can run on the top of Hive tables but it is an alternative SQL engine, it is using HCatalog service to get access to Hive tables. In order to ingest data into Hive tables, it launches a separate MapReduce task to accomplish the task.
An example command:
db2 "begin execute immediate 'load hadoop using file url ''/tmp/data_1211057166.txt'' with source properties (''field.delimiter''=''|'', ''ignore.extra.fields''=''true'') into table testuser.smoke_hadoop2_2248299375'; end" Closer examination of MapReduce logs brought up a more detailed error message.
Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NumberFormatException: For input string: "30s"
Good uncle Google suggests that it could be caused by old MapReduce engine against new configuration files. But how could it happen since all stuff related to HDP 2.6.4 and BigSQL 5.0.0. was meticulously annihilated?
What is more, another HDP 3.1/BigSQL 6.0 installation is executing LOAD HADOOP command without any interruption. Comparing all configuration data between both environments did not reveal any difference.
After a more closer examination, I discovered that related to LOAD HADOOP MapReduce job is empowered by HDP 2.6.4 environment including legacy jar files, pay attention to 2.6.4.0-91 parameter.
exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.4.0-91 -Xmx545m
Also, corresponding local cache seems to be populated by the old jar files.
ll /data/hadoop/yarn/local/filecache/14/mapreduce.tar.gz/hadoop/
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 bin
drwxr-xr-x. 3 yarn hadoop 4096 Jan 4 2018 etc
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 include
drwxr-xr-x. 3 yarn hadoop 4096 Jan 4 2018 lib
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 libexec

-r-xr-xr-x. 1 yarn hadoop 87303 Jan 4 2018 LICENSE.txt
-r-xr-xr-x. 1 yarn hadoop 15753 Jan 4 2018 NOTICE.txt
-r-xr-xr-x. 1 yarn hadoop 1366 Jan 4 2018 README.txt
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 sbin
drwxr-xr-x. 4 yarn hadoop 4096 Jan 4 2018 share

But how it could be possible when all remnants of old HDP were wiped out and no sign of any reference do 2.6.4 including running the grep command against any directory suspected of retaining this nefarious mark.
grep 2\.6\.4 /etc/hadoop -R
Solution
The nutcracker turned out to be BigSQL/DB2 dbset command.
db2set
DB2_BIGSQL_JVM_STARTARGS=-Dhdp.version=3.1.0.0-78 -Dlog4j.configuration=file:///usr/ibmpacks/bigsql/6.0.0.0/bigsql/conf/log4j.properties -Dbigsql.logid.prefix=BSL-${DB2NODE}
DB2_DEFERRED_PREPARE_SEMANTICS=YES
DB2_ATS_ENABLE=YES
DB2_COMPATIBILITY_VECTOR=40B
DB2RSHTIMEOUT=60
DB2RSHCMD=/usr/bin/ssh
DB2FODC=CORESHM=OFF
DB2_JVM_STARTARGS=-Xnocompressedrefs -Dhdp.version=2.6.4.0-91 -Dlog4j.configuration=file:///usr/ibmpacks/bigsql/5.0.4.0/bigsql/conf/log4j.properties -Dbigsql.logid.prefix=BSL-${DB2NODE}
DB2_EXTENDED_OPTIMIZATION=BI_INFER_CC ON
DB2COMM=TCPIP
DB2AUTOSTART=NO

Obviously, the DB2_JVM_STARTARGS took precedence over DB2_BIGSQL_JVM_STARTARGS and it was the reason why the old MapReduce framework was resurrected. The legacy jar files were downloaded from HFDFS /hdp/apps directory.
hdfs dfs -ls /hdp/apps
Found 2 items
drwxr-xr-x - hdfs hdfs 0 2019-10-07 22:09 /hdp/apps/2.6.4.0-91
drwxr-xr-x - hdfs hdfs 0 2019-10-12 00:16 /hdp/apps/3.1.0.0-78

The problem was sorted by a single command unsetting malicious DB2_JVM_STARTARGS variable and restarting BigSQL to take it into effect.
db2set DB2_JVM_STARTARGS=
I also removed /hdp/apps/2.6.4.0-91 HDFS directory to make sure that the vampire is ultimately killed.

poniedziałek, 23 września 2019

How to obtain an active NameNode remotely

Problem
While using WebHDFS REST API interface, the client is dealing directly with the NameNode. In HA (high availability) environment, only one NameNode is active, the second is standby. If the standby NameNode is addressed, the request is denied. So the client should be aware of which NameNode is active and construct a valid URL. But how to discover remotely the active NameNode automatically and avoid redirecting the client manually in case of failover?
Sounds strange but the there is no Ambari REST API to detect an active NameNode.
One obvious solution is to use WebHDFS Knox Gateway which, assuming configured properly, is propagating the query to the valid NameNode.
Solution
There are two convenient methods to discover the active NameNode outside Knox Gateway. One is to use JMX query and the second is to use hdfs haadmin.
The solution is described here in more details. I also added a convenient bash script to extract the active NameNode using both methods: JMX Query and hdfs haadmin. The script can be easily customized. If hdfs haadmin method is used, the script can be executed inside the cluster only so the remote shell call should be implemented.

piątek, 30 sierpnia 2019

The 'krb5-conf' configuration is not available

HDP 3.1
A nasty message as visible above suddenly popped up out of the blue. Every configuration change, stopping or starting the service was blocked because of that. The message was related to the Kerberization but "Disable Kerberos" option was also under the spell. It seemed that the only option was to plough under everything and build up the cluster from the bare ground.
The problem is described here but no solution is proposed.
Solution
The solution was quite simple. Remove the "Kerberos" marker from the cluster by modifying the Ambari database. In the case of Postgresql database execute the command:

update clusters set security_type='NONE'

After that magic, the "Enable Kerberos" button is active and after performing the "Kerberization" the cluster is happy and healthy again.

niedziela, 4 sierpnia 2019

HDP 3.1, HBase REST API, security gap

Problem
I found a nasty problem with HDP 3.1 which cost me several sleepless nights. There is a security gap in HBase REST API. The HBase REST API service does not impersonate users and all HBase commands are executed as hbase user. The same behaviour is passed to Knox HBase. It means that any user having access to HBase REST API or Knox Gateway HBase is authorized to do any action bypassing any security settings in Ranger or HBase service directly.
Solution
The only solution I found was to compile the current version of HBase downloaded from GitHub and replace the legacy hbase-rest jar with the new one.

Clone GitHub repository and build the packages
git clone https://github.com/apache/hbase.git -b branch-2.0
cd hbase
mvn package -DskipTests

As root user
cd /usr/hdp/3.1.0.0-78/hbase/lib

Archive existing jar
mkdir arch
mv mv hbase-rest-2.0.2.3.1.0.0-78.jar arch/
unlink hbase-rest.jar

Replace with the new one
ln -s /home/hbase/hbase/hbase-rest/target/hbase-rest-2.0.6-SNAPSHOT.jar hbase-rest.jar

Restart HBase REST API server.

środa, 31 lipca 2019

HDP 3.1, Wired Encryption

Introduction
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-wire-encryption/content/wire_encryption.html
A Wired Encryption is not only adding the next layer of security, but also the next layer of complexity. Although everything is described in the HortonWorks documentation, it is not easy to extract the practical steps on how to set up the encryption, it took me some time to accomplish it. So I decided to recap my experience and publish several useful scripts and practical procedure.
https://github.com/stanislawbartkowski/hdpwiredencryption
What is described

Self-signed certificates
Enable SSL for WebHDFS, MapReduce, Tez, and YARN

Future plans

Certificates signed by CA
Other HDP services
Application connection

Problem to solve

After enabling the encryption, BigSQL LOAD HADOOP command refused working. It is related to the TimeLine service secure connection. Will try to sort it.

niedziela, 30 czerwca 2019

My Wiki

Introduction
I started creating my GitHub Wiki containing some practical guidelines and steps related to HDP and BigSQL. I do not want to duplicate or retell HortonWorks or IBM official documents and tutorial. Just my success stories and some practical advice to remember and avoid, like Sisyphus, to roll your boulder up that Kerberos hill again and again.
The list and the content are not closed and static, I'm constantly updating and enhancing it.
https://github.com/stanislawbartkowski/wikis
Below is the current content.
LDAP and Kerberos authentication for CentOS
https://github.com/stanislawbartkowski/wikis/wiki/Centos---Kerberos---LDAP
The guidelines for how to set up Kerberos and LDAP authentication for CentOS. It is a good practice to have host systems secured before installing HDP. Although described in many places, I had a hard time to put it together and end up in the loving arms of Kerberos. Particularly, secure LDAP connection and two-ways LDAP security required a lot of patience with gritted teeth.
Dockerized Kerberos
https://github.com/stanislawbartkowski/docker-kerberos
Docker is the buzzword these days. There are several Kerberos on Docker implementations circulating around but I decided to create my own I'm confident with.
HDP and GPFS
https://github.com/stanislawbartkowski/javahotel/tree/hdpgpf
GPFS is the alternative for HDFS. HDP running on the top GPFS has some advantages and disadvantages. Before going live, it is a good idea to practice using a local cluster. Here I'm presenting practical steps on how to set up HDP using GPFS as a data storage. Important: you need a valid IBM GPFS license to do that.
DB2 11 and aggregate UDF
https://github.com/stanislawbartkowski/javahotel/tree/db2aggr
Seems strange but until DB2 11 it was not possible to implement custom aggregate UDF in DB2. This feature was finally added in DB2 11 but I found it not well documented. So it took me some time to create even a simple aggregate UDF but in the end, I made it.
IBM BigSQL, monitoring and maintenance queries
https://github.com/stanislawbartkowski/wikis/wiki/BigSQL,-useful-commands
For some time, I was supporting the IBM client on issues related to IBM BigSQL. During my engagement, I created a notebook with a number of useful SQL queries related to different aspects of maintenance, performance, security etc for copying and pasting.
Dockerized IBM DB2
https://github.com/stanislawbartkowski/docker-db2
Yet another DB2 on Docker. Download the free release of DB2 or, if you are lucky enough, your licensed edition and be happy to get up DB2 by the tap of your finger.
Enable CentOS for Active Directory
https://github.com/stanislawbartkowski/wikis/wiki/CentOS---Active-Directory
Enable CentOS or RedHat for Active Directory authentication is easy comparing to MIT Kerberos/OpenLDAP but still, there are some hooks and nooks to know.
Monitoring tool for IBM BigSQL
https://github.com/stanislawbartkowski/bigsqlmoni
IBM BigSQL/DB2 contains a huge variety of monitoring queries but in most case what is important is not the value but delta. So I created a simple tool to store and provide deltas for some performance and monitoring indicators. But still looking for a way to make practical use of it.
Enable HDP for Active Directory/Kerberos/LDAP
https://github.com/stanislawbartkowski/wikis/wiki/HDP-2.6.5-3.1-and-Active-Directory
Practical steps on implementing Kerberos security in HDP. A simple test to make sure that security is in place.
Enable HDP services for Active Directory/Kerberos/LDAP
https://github.com/stanislawbartkowski/hdpactivedirectory
Wiki: https://github.com/stanislawbartkowski/hdpactivedirectory/wiki
Basic Kerberization enables Kerberos authentication for Hadoop services and makes HDFS secure. Next step is to enable a particular service for Kerberos authentication and LDAP authorization. It is highly recommended to activate Ranger security. The GitHub Wiki attached contains guidelines and practical steps to enable Hadoop services for AD/Kerberos. Every chapter contains also a basic test to make sure that security is enforced and has teeth.
HiBench for HDP
https://github.com/stanislawbartkowski/MyHiBench
HiBench is widely recognized as a leading benchmark tool for Hadoop services. Unfortunately, the development seems to be closed two years ago and I found it difficult to run it in HDP 3.1. Also, the tool seems not to be enabled for Kerberos security. After spending some time trying to find a workaround, I decided to create my own fork of HiBench.
This fork is dedicated to HDP only, the original HiBench can be used also against a standalone instalment of some Hadoop services. It required also redeveloping some Java and Scala code related particularly to Kafka and Spark Streaming.
Several Java/Scala tools to test Kerberos security
HDFS Java client https://github.com/stanislawbartkowski/KafkaSample
Kafka Java client https://github.com/stanislawbartkowski/KafkaSample
Scala Spark Streaming against Kafka https://github.com/stanislawbartkowski/SampleSparkStreaming
Several simple Java/Scala tools to test Kerberos connectivity. The tools come with source code to review. The tools are using as an additional test, next to batch or command line, for testing Kerberos security.

niedziela, 5 maja 2019

HDP, Ranger and plugins

Problem
I spent several sleepless nights trying to resolve a strange problem related to HDP (HortonWorks Data Platform), Ranger service and plugins.
After installing Ranger and enabling any plugin, an appropriate service entry should be created and visible in Ranger Admin UI. More details are here. In the beginning, a default policy is created which can be customized later according to needs.
But in my environment, the service entry was not created thus blocking any attempt to implement authorization policy. What is more, even disabling/enabling plugin, stopping/restarting the cluster does not make any change, I was unable to conjure the service entry. At some point, I even removed the Ranger, recreated the Ranger database and reinstalled the service again from scratch, but it did not help.
Solution
Finally, after carefully browsing through the log files, I found the solution. The culprit is the local directory /etc/ranger. There is a subdirectory reflecting the service entry in Ranger Admin UI.

ls /etc/ranger/MyCluster_hadoop/
cred.jceks
policycache

This directory contains a copy of ranger/service policy and is used as a recovery point in case of the database failure. It seems that after enabling the plugin if the service discovers this cache, the ranger/service policy is recreated but in this scenario, the Ranger Admin UI service entry is not restored. This cache is not removed after disabling the plugin and even after removing the whole Ranger service.
Unfortunately, it is not documented and badly implemented.
The solution is to switch off the plugin, manually remove the /etc/ranger/{service name} directory and switch on the plugin again. The service entry and default policy are recreated.
Keep in mind that the directory /etc/ranger/{service name} is located on the host where the appropriate service is installed, not the Ranger service host.