javahotel: Java, Parquet and JDBC

niedziela, 28 lutego 2021

Java, Parquet and JDBC

That is strange but it is almost impossible to access Parquet files outside Hadoop/Spark context. I was trying to move data imprisoned in Parquet to JDBC accessed relational database using standalone Java application and failed.

So I ended up with Spark/Java application to address the issue.

Source and description: https://github.com/stanislawbartkowski/ParquetJDBC

The application loads Parquet formatted data, it can be a single file or a directory, partition the data into several chunks, launches executors and loads data into JDBC databases in parallel. The number of partitions and executors are configurable.

The application was tested as a local and single-node Spark configuration. The next step is to configure and test the application in a distributed Hadoop environment.

javahotel

niedziela, 28 lutego 2021

Java, Parquet and JDBC

Brak komentarzy:

Prześlij komentarz

Etykiety

Civilization The Board Game

Archiwum bloga

O mnie