Blog do projektu Open Source JavaHotel

poniedziałek, 28 grudnia 2020

Mail server in a single docker

 I created a simple mail server, SMPT and IMAPS, in a single docker/podman container. The mail services engines are Postfix (SMP) and Dovecot (IMAPS). The solution is described here. I also added guidelines on how to test and configure several mail clients: Evolution and mutt.

The storage is ephemeral, not recommended for any production environment but ideal for testing, easy to create and easy to dismantle. 

I also added a sample yaml configuration file and remarks on how to deploy the container to OpenShift/Kubernetes cluster.

niedziela, 29 listopada 2020

My own Hadoop/HDP benchmark

 Introduction

I spent some time trying to come to terms with HiBench but finally, I gave up. The project seemed to be abandoned, there was a lot of problems to run it in a secure (kerberized) environment and adjust it to new versions of Hive and Spark. Also, it is the project with a long history and there are a lot of layers not consistent with each other

So I ended up with creating my own version of HiBench benchmark. Only code migrated from HiBench is Spark/Scala and Jave source code upgraded to new versions of dependencies and more consistent parameter handling.

Features

  • Dedicated to HDP 3.1.5. Standalone services are not supported.  
  • Enabled for Kerberos
  • Hive 3.0
  • Spark 2.x
  • Simple to run and expand, minimal configuration
Features not supported
  • Streaming, pending
  • Nutch indexing, not under development any longer
  • Flink, Gearpump, not part of HDP stack
Configuration 

    All details are here.


sobota, 31 października 2020

SSL for masses

Motivation

 I expanded my tool for enabling wired encryption in the HDP cluster.

https://github.com/stanislawbartkowski/hdpwiredencryption

Previously, only self-signed certificates were supported. I added automation for CA-signed certificates. Important: it works only if CA-signed certificate package follows the supported format.

There are two paths possible: self-signed certificates and CA-signed certificates.

Self-signed certificates

  1. ./run.sh 0 Creates self-signed certificate and truststores for every node.
  2. ./run.sh.1 Creates and distributes all-client truststore.
  3. ./run.sh 2 Secure keystores and truststores. Apply owner and Linux permissions.
CA-signed certificates
  1. ./run.sh 3 Creates self-signed certificates and CSR (Certificate Signing Request) for every node
  2. Manual step. Send all CSR to CA centre for signing. The CA-signed certificates  should be stored in a designed format.
  3. ./run.sh 4 CA-signed certificates are imported into corresponding keystore and replacing the self-signed certificates. Truststores are created.
  4. ./run.sh 1 Creates and distributes all-client trustore.
  5. ./run.sh 2 Secure keystores and trustores.

Bonus

https://github.com/stanislawbartkowski/hdpwiredencryption/wiki

There is a number of pages containing practical steps on how to enable SSL for HDP components. It is based on documentation but more practical based on experience. 

For instance:

HDFS Ranger Plugin for SSL

NiFi service for SSL



czwartek, 13 sierpnia 2020

HBase, Phoenix and CsvBulkLoadTool

 I'm running MyBench in a new environment and it fails while loading data into Phoenix table using CsvBulkLoadTool utility.

WARN tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into table BENCH.USERVISITS with files [family:0 path:hdfs://bidev/tmp/386640ec-d49e-4760-8257-05858a409321/BENCH.USERVISITS/0/b467b5560eee4d61a42d4c9e6a78eb7e] failed.  This is recoverable and they will be retried.

INFO tool.LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 100 with 1 files remaining to group or split

ERROR tool.LoadIncrementalHFiles: -------------------------------------------------

Bulk load aborted with some files not yet loaded:

After closer examination, I discovered that the error takes place while moving/renaming input file into HBase staging directory /apps/hbase/data/staging. In this cluster, the HBase data is encrypted and moving data between encrypted and normal zone is not possible.

java.io.IOException: Failed to move HFile: hdfs://bidev/apps/hbase/data/staging/ambari-qa__BENCH.USERVISITS__dbb5qdfppq1diggr0dmdbcb1ji74ol4b9jn9ee2dgp1ttn9n5i6llfih7101fi1d/0/3a7f2d612c034253ad375ae002cc6ade to hdfs://bidev/tmp/fc43e454-00b3-4db0-8bdd-8b475885ab49/BENCH.USERVISITS/0/3a7f2d612c034253ad375ae002cc6ade

at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.failedBulkLoad(SecureBulkLoadManager.java:423)

at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6035)

at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$3.run(SecureBulkLoadManager.java:284)

The source code can be found here.
if (!FSUtils.isSameHdfs(conf, srcFs, fs)) {
LOG.debug("Bulk-load file " + srcPath + " is on different filesystem than " +
"the destination filesystem. Copying file over to destination staging dir.");
FileUtil.copy(srcFs, p, fs, stageP, false, conf);
} else if (copyFile) {
LOG.debug("Bulk-load file " + srcPath + " is copied to destination staging dir.");
FileUtil.copy(srcFs, p, fs, stageP, false, conf);
} else {
LOG.debug("Moving " + p + " to " + stageP);
FileStatus origFileStatus = fs.getFileStatus(p);
origPermissions.put(srcPath, origFileStatus.getPermission());
if(!fs.rename(p, stageP)) {
throw new IOException("Failed to move HFile: " + p + " to " + stageP);
}
}
When data is moved between different file system, the copying is enforced but unfortunately, data movement between encrypted and decrypted zone is not covered here.

Another option is to make use of "copyFile" parameter which enforces copying. After analyzing the control flow I discovered that there exists "hbase-site.xml" parameter  always.copy.files which seems to be the solution to the problem. But after applying this parameter, nothing has changed. 
Further examination with a little help of remote debugging unearthed a sad truth. CsvBulkLoadTool is passing the control to LoadIncrementalHFiles.java and "doBulkLoad" function.

public Map<LoadQueueItem, ByteBuffer> doBulkLoad(Path hfofDir, final Admin admin, Table table,  RegionLocator regionLocator) throws TableNotFoundException, IOException {
      return doBulkLoad(hfofDir, admin, table, regionLocator, false, false);
  }

Unfortunately, the "copyFiles" parameter is hardcoded as "false" although there is a sound and ready to use "isAlwaysCopyFiles()" function utilizing "hbase-site.xml" config file.
The only solution is manual fix and recreating the package from source files. But it does not go easy because one has to leverage different and outdated versions of HBase and Phoenix to create "Phoenix client" package matching HDP 3.1.
So two days spent without a solution.

piątek, 7 sierpnia 2020

HDP 3.1, HBase Phoenix and IBM Java

 I spent several sleepless nights trying to sort a nasty problem. While connecting to HBase Phoenix, the Phoenix client was stalled before displaying the prompt.
/usr/hdp/current/phoenix-client/bin/sqlline.py
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix: none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
20/08/07 11:28:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

What's more, there was no single error message, neither from the client nor in the HBase log file, Master or Region Servers. Simply nothing, no clue. Besides that, everything was healthy and running soundly.

The first step toward the solution was to enable tracing in Phoenix client. This setting is not managed by Ambari and requires manual modification.

vi /usr/hdp/3.1.0.0-78/phoenix/bin/log4j.properties
....
#psql.root.logger=WARN,console
psql.root.logger=TRACE,console

After TRACING was enabled, the sqlline client was more talkative and provided the first clue. INFO client.RpcRetryingCallerImpl: Call exception, tries=6, retries=36, started=4724 ms ago, cancelled=false, msg=Call to data2-worker.mycloud.com/10.135.118.222:16020 failed on local exception: javax.security.sasl.SaslException: Failure to initialize security context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
major string: Invalid credentials
minor string: SubjectCredFinder: no JAAS Subject], details=row 'SYSTEM:CATALOG' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=data2-worker.mycloud.com,16020,1596794212924, seqNum=-1
The HDP cluster was Kerberized and the problem seemed to be related to Kerberos authentication. Kerberos is the source of many problems but everything except this worked smoothly here.
Finally, I found the hint under this link. The culprit was IBM Java. The cluster is running under control of  OpenJDK Java but the sqlline client is using default host Java. update-alternatives --config java

There are 3 programs which provide 'java'.

Selection Command
-----------------------------------------------
*+ 1 /usr/lib/jvm/java-1.8.0-ibm-1.8.0.5.10-1jpp.1.el7.x86_64/jre/bin/java
2 /usr/jdk64/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/java
3 /usr/java/jdk1.8.0_202-amd64/jre/bin/java

So the solution was extremely simple, just switch to OpenJDK Java and Phoenix client works like a dream.

piątek, 31 lipca 2020

DB2 ODBC FDW PostgreSQL extension

I reinvigorated my Postgresql DB2 wrapper. Now it works also with PostgreSQL 11 and 12. The source code and installation instruction are available here.

It took me some time to accomplish it. The PostgreSQL failed because of memory violation related problem after returning from GetForeignPaths function. The crashed happened inside the server code. What's more, it worked after recompiling the server directly from source code so nailing down the problem was impossible. Finally, the solution was to migrate the db2odbc_fdw.c to pure C code and get rid of all warnings. Probably the reason for the failure was a discrepancy between data types in C function signature and launching sequence.

wtorek, 30 czerwca 2020

My TPC/DS, new features

Introduction

I uploaded a new version of mytpcds, a wrapper around TPC/DS benchmark allowing easy and quick roll-out of TPC/DS test against leading RDMS including Hadoop SQL engines. Just deploy, configure using a template provided and run. In the new version, the implementation of Query Validation Test is added.

Query Validation Test

Query Validation Test verifies the accuracy of SQL engine. RDBMS is to run a sequence of SQL statements, called Validation Queries, on Qualification database and compare the result data set against the expected data set. During Validation Test, the queries should come back with the same result. The Validation Queries are standard TPC/DS queries templates having where query parameters are substituted by predefined constants.
The substitution values for Validation Queries are defined in "TPC-DS Specification" manual, chapter "Appendix B: Business Questions". It is a mundane and error-prone task to prepare Validation Queries manually in 99 queries templates and I wanted also to avoid having two different versions of TPC/DS queries. So I decided to make the process automatic. Firstly I extracted all parameters and substitution values into separate configuration files. The name convention is <n>.par. The <n> maps to appropriate TPC/DS query. For instance, 1.par contains substitution values for query1.tpl.
YEAR=2000
STATE=TN
AGG_FIELD=SR_RETURN_AMT
The run.sh launcher contains a separate task: ./tpc.sh queryqualification. This task replaces all parameters placeholder with the corresponding validation values and put them in <TPC/DS root dir>/work/{dbtype}queries directory ready to be picked up by other run.sh tasks.
TPC/DS package comes with expected data sets for Query Validation. It is included in <TPC/DS root>/answer_sets. Unfortunately, the format of answer sets is not consistent which makes impossible the automated verification. So I prepared my own version of answer sets using DB2 output. Unfortunately, it does not comply with output from other RDBMS including Hive, so it is still a pending task which output is invalid.

QueryRunner

Query Validation Test requires comparing the current result sets against a reference result set. Unfortunately, the output from different RDBS using the command line client varies significantly which make automated comparison impossible. So I decided to prepare my own Java QueryRunner using JDBC and have the full control of how the result is produced. The target jar is produced by mvn package command. The only prerequisite for every database is JDBC driver jar. So far, I tested the QueryRunner for: NPS/Netezza, DB2, IBM BigSql, SqlServer and Hadoop Hive.

QueryRunner, Hive and Kerberos

Because life is never an easy road free of stones, the real challenge was to execute QueryRunner for Hadoop/Hive in Kerberized environment. The parameters regarding Kerberos cannot be included in URL connection string. Before executing DriverManager.getConnection(url, user, password);  the client should authenticate in Hadoop cluster using Hadoop related libraries. Of course, I wanted to avoid keeping two versions of QueryRunner and have the development consistent. So I developed a separate package HadoopAuth and in Hadoop/Hive environment, the Hadoop Kerberos authentication is done using HadoopAuth package but through Java reflection feature. This way I was able to keep QueryRunner clean. How to configure QueryRunner for Hadoop/Hive is described here.

Next steps

  • Further analysis of reference QueryValidation answer tests.
  • Add Microsoft SqlServer to RDBMS supported by myTPC-DS.
  • Enable QueryRunner for all RDBMS supported.



niedziela, 31 maja 2020

Simple RestService library continued

Introduction

I enhanced my Simple RestService library to a little more complex but still simple.
The full source code is here
The sample application depending on it is here.
The principle behind this project is to set up a RestAPI server based on Java only without any additional dependencies. I added two features: SSL and Kerberos authentication.

SSL 

Java source code: here
Allows to set up HTTPS RestService. The certificate can be self-signed or CA signed. Client certificate authentication is not supported.

Kerberos

Java Source code: here
Allows Kerberos authentication. Tested with AD and MIT KDC. Only authentication is implemented, no DoAs action.






czwartek, 30 kwietnia 2020

My private CA Center

Certificates, self-signed certificates, certificates signed by Certificate Authority, sounds complicated. But the devil is not so black as he's painted. It is easy to create a self-signed certificate but sometimes one needs to have a CA-signed certificate without paying fees. So be the authority for yourselves. I found a very good article on how to create private CA using open-source tools. But following the procedure manually is not a good method of spending your free time, so I created a solution which automates it all.
The solution and description are available here. The solution comes with three components.
  • Bash script ca.sh.  Script automates procedure described in the article. Creates a new Certificate Authority containing root and intermediate certificates. Also produces a certificate signed by the CA using CSR (Certificate Signing Request) or by providing all necessary data including CN.
  • Java server Rest/API. Assuming CA is created, the Java server generates signed certificate through Rest/API.
  • Docker script. The CA centre is created during Docker image creation and the container exposes Rest/API for certificate signing.

poniedziałek, 23 marca 2020

HDP 3.1.5, OpenJDK, Infra Solr and AD/Kerberos

Problem 
I spent several sleepless nights caused by very nasty problem coming up after HDP 3.1.5 Kerberization. The Infra Solr components could not start just blocking the whole cluster. The message in the Ambari Console was saying.
Skip /infra-solr/configs and /infra-solr/collections
Set world:anyone to 'cr' on  /infra-solr/configs and /infra-solr/collections
KeeperErrorCode = NoAuth for /infra-solr/configs
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /infra-solr/configs
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1399)

It looked that infra-solr user could not update the ZooKeeper /infra-solr znode because of not sufficient privileges. But the ACL privileges looked correct.
[zk: localhost:2181(CONNECTED) 0] getAcl /infra-solr
'sasl,'infra-solr
: cdrwa
'world,'anyone
: r
[zk: localhost:2181(CONNECTED) 1]

After closer examination, I discovered strange stuff in ZooKeeper log. 2020-03-23 01:33:12,260 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@120] - Successfully authenticated client: authenticationID=$6O1000-3NO0GILCOJUA@FYRE.NET; authorizationID=infra-solr/a1.fyre.ibm.com@FYRE.NET.
2020-03-23 01:33:12,261 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@136] - Setting authorizedID: $6O1000-3NO0GILCOJUA
2020-03-23 01:33:12,261 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1030] - adding SASL authorization for authorizationID: $6O1000-3NO0GILCOJUA
2020-03-23 01:33:24,011 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x17104ad52230007, likely client has closed socket
So it seemed that Zookeeper to apply authorization rights was using not AD principal name (infra-solr) but sAMAccountName attribute of infra-solr AD principal ($6O1000-3NO0GILCOJUA). Ambari Kerberos Wizard is filling this attribute with random data only to keep it unique.
Solution 
The problem is described here, it is the bug coming with 1.8.0_242 version of OpenJDK.
The only workaround is to downgrade the OpenJDK to 232 level or switch to Oracle JDK.
yum downgrade java*

java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

And last but not least.
Block the Java upgrade unless the bug is fixed.
vi /etc/yum.conf

exclude=java*

sobota, 29 lutego 2020

Simple RestService library

Motivation
REST API is the mean of choice to communicate between different loosely coupled applications. There are plenty of REST API implementations but I was looking for a solution as simple as possible with minimal external dependencies or prerequisites. Finally, I ended up with the compact library utilizing existing in Java JDK HttpServer.
Links
Intellij IDEA project, source code and javadoc.
Sample project utilizing the RestService project.
Another project having RestService dependency.
Highlights
  • Very lightweight, no external dependency, just Java JDK.
  • Can be dockerized, sample Dockerfile.
  • Adds-on making a life of developer easier.
    • Validating and extracting URL query parameters including type control.
    • Upload data
    • CORS relaxation
    • Sending data with valid HTTP response code.
  • "Dynamic" and "static" REST API call. "Dynamic" means that specification of particular REST API endpoint can be defined after the request reached the server but before handling the request thus allowing providing different custom logic according to the URL path.
Usage
The service class should extend RestHelper.RestServiceHelper abstract class and implements two methods:
  • getParams : delivers REST API call specification (look below) including URL query parameter definition. The method is called after the REST API request accepted by HTTP Server but before validating and running the call. 
  • servicehandle: custom logic to serve the particular REST API endpoint. The method should conclude handling the request by proper "produceresponse" call. The "servicehandle" can take a URL query parameters and utilize several helper methods.
REST API specification
The REST API endpoint specification is defined through RestParams class. The specification consists of:
  • HTTP request method: GET, POST, PUT etc
  • List of allowed  URL query parameters. Three parameters type are supported: BOOLEAN, INT and STRING (meaning any other).
  • CORS should be relaxed for this particular endpoint.
  • Response type content (TEXT, JSON or not specified), Content-Type.
  • List of methods allowed in the response header, Access-Control-Allow-Methods.
Main
The main class should extend RestStart abstract class.






wtorek, 4 lutego 2020

HDP 3.1 and Spark job

I spent several sleepless nights trying to solve the problem while running Spark/HBase application. The application was dying giving the nasty error stack.
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.socket.ChannelOutputShutdownException: Channel output shutdown at io.netty.channel.AbstractChannel$AbstractUnsafe.shutdownOutput(AbstractChannel.java:587) ... 22 more Caused by: java.lang.NoSuchMethodError: org.apache.spark.network.util.AbstractFileRegion.transferred()J at org.apache.spark.network.util.AbstractFileRegion.transfered(AbstractFileRegion.java:28) at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:228) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:282) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:879)
Usually problems like that point at versioning problem. But how could it happen if the application while running is dependent only on the client libraries provided by the HDP cluster?
Finally, after browsing through source code and comparing different versions of libraries, I crawled out of the swamp.
The culprit was an incompatible library in HBase client directory, netty-all-4.0.52.Final.jar. This library is calling deprecated transfered method which, in turn, calls a non-existing transferred method in Spark class. New version netty-all-4.1.17.Final.jar calls a correct transferred method.

The solution was dazzling simple. Just reverse the order of classpath in submit-spark command and give precedence to correct libraries in spark client jars.
spark-submit ... --conf "spark.driver.extraClassPath=$CONF:$LIB" ...
  • wrong: LIB=/usr/hdp/current/hbase-client/lib/*:/usr/hdp/current/spark2-client/jars/*
  • correct: LIB=/usr/hdp/current/spark2-client/jars/*:/usr/hdp/current/hbase-client/lib/*

piątek, 31 stycznia 2020

Kerberos and Phoenix

Phoenix is a SQL solution on the top of HBase. There are two methods connecting to Phoenix, using full JDBC driver and thin JDBC driver. The latter requires additional server component, Phoenix Query Server.
I created the article about how to connect to Phoenix using both methods in kerberized environment. I also created a simple IntelliJ IDEA project demonstrating a connection to Phoenix from  Java program. Both JDBC drivers are supported.
I also added information on how to connect to Phoenix from Zeppelin notebook.
The article and sample project are available here.

AMS (Ambari Metrics System) is powered by standalone HBase and Phoenix server. Using a command line, one can get access to AMS Phoenix and query the metrics without going to UI. I also added information on how to accomplish it.
I was successful only by launching the command tool from the node where AMS is installed. I will work how to access it remotely.