Blog do projektu Open Source JavaHotel

niedziela, 28 lutego 2021

Java, Parquet and JDBC

 That is strange but it is almost impossible to access Parquet files outside Hadoop/Spark context. I was trying to move data imprisoned in Parquet to JDBC accessed relational database using standalone Java application and failed.

So I ended up with Spark/Java application to address the issue. 

Source and description: https://github.com/stanislawbartkowski/ParquetJDBC

The application loads Parquet formatted data, it can be a single file or a directory, partition the data into several chunks, launches executors and loads data into JDBC databases in parallel. The number of partitions and executors are configurable. 

The application was tested as a local and single-node Spark configuration. The next step is to configure and test the application in a distributed Hadoop environment.

Brak komentarzy:

Prześlij komentarz