Blog do projektu Open Source JavaHotel

środa, 30 czerwca 2021

Recursive HDFS

That's strange but I was unable to find any solution to download the HDFS tree. The project I developed is covering the gap.

https://github.com/stanislawbartkowski/webhdfsdirectory

The only dependency is Python3 and the requests package. The HDFS tree is downloaded using HDFS Rest/API or Web/HDFS. It does not support Kerberos authentication and wasn't tested on a secured/encrypted connection.

The solution contains a simple hdfs package doing the main stuff and Python and bash wrappers. It also allows applying a RegularExpession directory selector to and "dry-run" option to test it without downloading anything. It was tested in a real production environment.