Blog do projektu Open Source JavaHotel

środa, 30 października 2019

HDP, Kafka, LEADER_NOT_AVAILABLE

Problem
After HDP cluster kerberization, the Kafka does not work even though Kafka Healthcheck passes green. Any execution of kafka-console-producer.sh ends up with the error message: /usr/hdp/3.1.0.0-78/kafka/bin/kafka-console-producer.sh --broker-list kafka-host:6667 --producer-property security.protocol=SASL_PLAINTEXT --topic xxx

WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {xxxx=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 2 : {xxxx=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 3 : {xxxx=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
kafka-topics.sh reports that the leader is not assigned to the partition.
/usr/hdp/3.1.0.0-78/kafka/bin/kafka-topics.sh -zookeeper zookeeper-host:2181 --describe xxx --unavailable-partitions

Topic: ambari_kafka_service_check Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001
Topic: identity Partition: 0 Leader: none Replicas: 1002 Isr:
Topic: xxx Partition: 0 Leader: none Replicas: 1002 Isr:
Topic: xxxx Partition: 0 Leader: none Replicas: 1002 Isr:

There is nothing special in Kafka /var/log/kafka/server.log file. Only /var/log/kafka/controller.log suggests that something went wrong: cat controller.log

INFO [ControllerEventThread controllerId=1002] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
ERROR [ControllerEventThread controllerId=1002] Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2571)
at kafka.utils.Json$.parseBytes(Json.scala:62)
at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)

Solution
Uncle Google brings back many entries related to LEADER_NOT_AVAILABLE error but none of them led to the solution. Finally, I found this entry.
So the healing is very simple.
  • Stop Kafka
  • Run zkCli.sh Zookeeper command line
  • Remove /controller znode, rmr /controller
  • Start Kafka again
The evil spell is defeated.

wtorek, 29 października 2019

RedHat, Steam and Crusaders King 2

I'm playing King Crusaders II on my RedHat desktop for some time. Although Steam officially supports only Ubuntu distribution, the games, particularly the older ones developed for Ubuntu 16.04, work fine also on RedHat/CentOS. Unfortunately, after the latest background update of the King Crusaders, the game stubbornly refused to run. After closer examination, I discovered that probably the updated game was compiled using newer version of GNU GCC compiler and the required level of libstdc++.so.6 library is not available on my platform.
./ck2: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./ck2)
./ck2: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./ck2)

So it was bad news. But there is good news, after the epic battle I straightened it out. It was as simple as installing a newer version of GNU GCC to get more modern libraries.
The solution is described here.

środa, 23 października 2019

BigSQL 6.0 and HDP 3.1.4

Problem
There is a problem with BigSQL 6.0 installed on the top HDP 3.1.4 or after upgrade from HDP 3.1. Look at this product support web page. Installation is successful but there are plenty of entries in BigSQL diagnostic blog
java.lang.NoSuchMethodError: com/google/common/base/Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V (loaded from file:/usr/ibmpacks/bigsql/6.0.0.0/bigsql/lib/java/guava-14.0.1.jar by sun.misc.Launcher$AppClassLoader@28528006) called from class org.apache.hadoop.conf.Configuration (loaded from file:/usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar by sun.misc.Launcher$AppClassLoader@28528006).
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1358)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1339)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:518)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:536)

Solution
The cause of the problem is the old guava jar file in the /usr/ibmpacks/bigsql/6.0.0.0/bigsql/lib/java. Replace the old guava with any new guava greater than 20. Or simply make a link to existing guava at the proper level. The fix should be applied on all BigSQL nodes, Head and Worker nodes. The BigSQL should be restarted to make the change to take effect.
rm /usr/ibmpacks/bigsql/6.0.0.0/bigsql/lib/java/guava-14.0.1.jar
cd /usr/ibmpacks/bigsql/6.0.0.0/bigsql/lib/java/
ln -s /usr/hdp/3.1.4.0-315/hadoop/lib/guava-28.0-jre.jar

wtorek, 22 października 2019

BigSQL and HDP upgrade

Problem
I spent several sleepless nights trying to resolve the really nasty problem. It happened after upgrade from HDP 2.6.4 and BigSQL 5.0 to HDP 3.1 and BigSQL 6.0. Everything runs smoothly, even BigSQL Healthcheck was smiling. The only exception was "LOAD HADOOP" command which failed. BigSQL can run on the top of Hive tables but it is an alternative SQL engine, it is using HCatalog service to get access to Hive tables. In order to ingest data into Hive tables, it launches a separate MapReduce task to accomplish the task.
An example command:
db2 "begin execute immediate 'load hadoop using file url ''/tmp/data_1211057166.txt'' with source properties (''field.delimiter''=''|'', ''ignore.extra.fields''=''true'') into table testuser.smoke_hadoop2_2248299375'; end" Closer examination of MapReduce logs brought up a more detailed error message.
Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NumberFormatException: For input string: "30s"
Good uncle Google suggests that it could be caused by old MapReduce engine against new configuration files. But how could it happen since all stuff related to HDP 2.6.4 and BigSQL 5.0.0. was meticulously annihilated?
What is more, another HDP 3.1/BigSQL 6.0 installation is executing LOAD HADOOP command without any interruption. Comparing all configuration data between both environments did not reveal any difference.
After a more closer examination, I discovered that related to LOAD HADOOP MapReduce job is empowered by HDP 2.6.4 environment including legacy jar files, pay attention to  2.6.4.0-91 parameter.
exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.4.0-91 -Xmx545m 
Also, corresponding local cache seems to be populated by the old jar files.
ll /data/hadoop/yarn/local/filecache/14/mapreduce.tar.gz/hadoop/
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 bin
drwxr-xr-x. 3 yarn hadoop 4096 Jan 4 2018 etc
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 include
drwxr-xr-x. 3 yarn hadoop 4096 Jan 4 2018 lib
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 libexec

-r-xr-xr-x. 1 yarn hadoop 87303 Jan 4 2018 LICENSE.txt
-r-xr-xr-x. 1 yarn hadoop 15753 Jan 4 2018 NOTICE.txt
-r-xr-xr-x. 1 yarn hadoop 1366 Jan 4 2018 README.txt
drwxr-xr-x. 2 yarn hadoop 4096 Jan 4 2018 sbin
drwxr-xr-x. 4 yarn hadoop 4096 Jan 4 2018 share

But how it could be possible when all remnants of old HDP were wiped out and no sign of any reference do 2.6.4 including running the grep command against any directory suspected of retaining this nefarious mark.
grep 2\.6\.4 /etc/hadoop -R
Solution
The nutcracker turned out to be BigSQL/DB2 dbset command.
db2set
DB2_BIGSQL_JVM_STARTARGS=-Dhdp.version=3.1.0.0-78 -Dlog4j.configuration=file:///usr/ibmpacks/bigsql/6.0.0.0/bigsql/conf/log4j.properties -Dbigsql.logid.prefix=BSL-${DB2NODE}
DB2_DEFERRED_PREPARE_SEMANTICS=YES
DB2_ATS_ENABLE=YES
DB2_COMPATIBILITY_VECTOR=40B
DB2RSHTIMEOUT=60
DB2RSHCMD=/usr/bin/ssh
DB2FODC=CORESHM=OFF
DB2_JVM_STARTARGS=-Xnocompressedrefs -Dhdp.version=2.6.4.0-91 -Dlog4j.configuration=file:///usr/ibmpacks/bigsql/5.0.4.0/bigsql/conf/log4j.properties -Dbigsql.logid.prefix=BSL-${DB2NODE}
DB2_EXTENDED_OPTIMIZATION=BI_INFER_CC ON
DB2COMM=TCPIP
DB2AUTOSTART=NO

Obviously, the DB2_JVM_STARTARGS took precedence over DB2_BIGSQL_JVM_STARTARGS and it was the reason why the old MapReduce framework was resurrected. The legacy jar files were downloaded from HFDFS /hdp/apps directory.
hdfs dfs -ls /hdp/apps
Found 2 items
drwxr-xr-x   - hdfs hdfs          0 2019-10-07 22:09 /hdp/apps/2.6.4.0-91
drwxr-xr-x   - hdfs hdfs          0 2019-10-12 00:16 /hdp/apps/3.1.0.0-78

The problem was sorted by a single command unsetting malicious DB2_JVM_STARTARGS variable and restarting BigSQL to take it into effect.
db2set DB2_JVM_STARTARGS=
I also removed /hdp/apps/2.6.4.0-91 HDFS directory to make sure that the vampire is ultimately killed.