That is strange but it is almost impossible to access Parquet files outside Hadoop/Spark context. I was trying to move data imprisoned in Parquet to JDBC accessed relational database using standalone Java application and failed.
So I ended up with Spark/Java application to address the issue.
Source and description: https://github.com/stanislawbartkowski/ParquetJDBC
The application loads Parquet formatted data, it can be a single file or a directory, partition the data into several chunks, launches executors and loads data into JDBC databases in parallel. The number of partitions and executors are configurable.
The application was tested as a local and single-node Spark configuration. The next step is to configure and test the application in a distributed Hadoop environment.
Brak komentarzy:
Prześlij komentarz