That's strange but I was unable to find any solution to download the HDFS tree. The project I developed is covering the gap.
https://github.com/stanislawbartkowski/webhdfsdirectory
The only dependency is Python3 and the requests package. The HDFS tree is downloaded using HDFS Rest/API or Web/HDFS. It does not support Kerberos authentication and wasn't tested on a secured/encrypted connection.
The solution contains a simple hdfs package doing the main stuff and Python and bash wrappers. It also allows applying a RegularExpession directory selector to and "dry-run" option to test it without downloading anything. It was tested in a real production environment.
Brak komentarzy:
Prześlij komentarz