Blog do projektu Open Source JavaHotel

czwartek, 13 sierpnia 2020

HBase, Phoenix and CsvBulkLoadTool

 I'm running MyBench in a new environment and it fails while loading data into Phoenix table using CsvBulkLoadTool utility.

WARN tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into table BENCH.USERVISITS with files [family:0 path:hdfs://bidev/tmp/386640ec-d49e-4760-8257-05858a409321/BENCH.USERVISITS/0/b467b5560eee4d61a42d4c9e6a78eb7e] failed.  This is recoverable and they will be retried.

INFO tool.LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 100 with 1 files remaining to group or split

ERROR tool.LoadIncrementalHFiles: -------------------------------------------------

Bulk load aborted with some files not yet loaded:

After closer examination, I discovered that the error takes place while moving/renaming input file into HBase staging directory /apps/hbase/data/staging. In this cluster, the HBase data is encrypted and moving data between encrypted and normal zone is not possible.

java.io.IOException: Failed to move HFile: hdfs://bidev/apps/hbase/data/staging/ambari-qa__BENCH.USERVISITS__dbb5qdfppq1diggr0dmdbcb1ji74ol4b9jn9ee2dgp1ttn9n5i6llfih7101fi1d/0/3a7f2d612c034253ad375ae002cc6ade to hdfs://bidev/tmp/fc43e454-00b3-4db0-8bdd-8b475885ab49/BENCH.USERVISITS/0/3a7f2d612c034253ad375ae002cc6ade

at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.failedBulkLoad(SecureBulkLoadManager.java:423)

at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6035)

at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$3.run(SecureBulkLoadManager.java:284)

The source code can be found here.
if (!FSUtils.isSameHdfs(conf, srcFs, fs)) {
LOG.debug("Bulk-load file " + srcPath + " is on different filesystem than " +
"the destination filesystem. Copying file over to destination staging dir.");
FileUtil.copy(srcFs, p, fs, stageP, false, conf);
} else if (copyFile) {
LOG.debug("Bulk-load file " + srcPath + " is copied to destination staging dir.");
FileUtil.copy(srcFs, p, fs, stageP, false, conf);
} else {
LOG.debug("Moving " + p + " to " + stageP);
FileStatus origFileStatus = fs.getFileStatus(p);
origPermissions.put(srcPath, origFileStatus.getPermission());
if(!fs.rename(p, stageP)) {
throw new IOException("Failed to move HFile: " + p + " to " + stageP);
}
}
When data is moved between different file system, the copying is enforced but unfortunately, data movement between encrypted and decrypted zone is not covered here.

Another option is to make use of "copyFile" parameter which enforces copying. After analyzing the control flow I discovered that there exists "hbase-site.xml" parameter  always.copy.files which seems to be the solution to the problem. But after applying this parameter, nothing has changed. 
Further examination with a little help of remote debugging unearthed a sad truth. CsvBulkLoadTool is passing the control to LoadIncrementalHFiles.java and "doBulkLoad" function.

public Map<LoadQueueItem, ByteBuffer> doBulkLoad(Path hfofDir, final Admin admin, Table table,  RegionLocator regionLocator) throws TableNotFoundException, IOException {
      return doBulkLoad(hfofDir, admin, table, regionLocator, false, false);
  }

Unfortunately, the "copyFiles" parameter is hardcoded as "false" although there is a sound and ready to use "isAlwaysCopyFiles()" function utilizing "hbase-site.xml" config file.
The only solution is manual fix and recreating the package from source files. But it does not go easy because one has to leverage different and outdated versions of HBase and Phoenix to create "Phoenix client" package matching HDP 3.1.
So two days spent without a solution.

piątek, 7 sierpnia 2020

HDP 3.1, HBase Phoenix and IBM Java

 I spent several sleepless nights trying to sort a nasty problem. While connecting to HBase Phoenix, the Phoenix client was stalled before displaying the prompt.
/usr/hdp/current/phoenix-client/bin/sqlline.py
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix: none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
20/08/07 11:28:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

What's more, there was no single error message, neither from the client nor in the HBase log file, Master or Region Servers. Simply nothing, no clue. Besides that, everything was healthy and running soundly.

The first step toward the solution was to enable tracing in Phoenix client. This setting is not managed by Ambari and requires manual modification.

vi /usr/hdp/3.1.0.0-78/phoenix/bin/log4j.properties
....
#psql.root.logger=WARN,console
psql.root.logger=TRACE,console

After TRACING was enabled, the sqlline client was more talkative and provided the first clue. INFO client.RpcRetryingCallerImpl: Call exception, tries=6, retries=36, started=4724 ms ago, cancelled=false, msg=Call to data2-worker.mycloud.com/10.135.118.222:16020 failed on local exception: javax.security.sasl.SaslException: Failure to initialize security context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
major string: Invalid credentials
minor string: SubjectCredFinder: no JAAS Subject], details=row 'SYSTEM:CATALOG' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=data2-worker.mycloud.com,16020,1596794212924, seqNum=-1
The HDP cluster was Kerberized and the problem seemed to be related to Kerberos authentication. Kerberos is the source of many problems but everything except this worked smoothly here.
Finally, I found the hint under this link. The culprit was IBM Java. The cluster is running under control of  OpenJDK Java but the sqlline client is using default host Java. update-alternatives --config java

There are 3 programs which provide 'java'.

Selection Command
-----------------------------------------------
*+ 1 /usr/lib/jvm/java-1.8.0-ibm-1.8.0.5.10-1jpp.1.el7.x86_64/jre/bin/java
2 /usr/jdk64/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/java
3 /usr/java/jdk1.8.0_202-amd64/jre/bin/java

So the solution was extremely simple, just switch to OpenJDK Java and Phoenix client works like a dream.