uftp (Unicore FTP)#
Introduction#
uftp is a data transfer tool which uses parallel streams to transfer big amounts of data efficiently. It uses client/server architecture so that a user can use the uftp client to transfer data to and from existing uftp servers. DKRZ provides a uftp server (which uses multiple transfer nodes) to enable data transfers to and from DKRZ lustre filesystem. To transfer data to/from your local computer, you need to have the uftp client installed, which is available on GitHub.
uftp usage#
uftp uses ssh keys to authenticate the user to the uftp server. So to use uftp you have to create a ssh key which is used for data transfer only. To create a key use ssh-keygen, eg:
ssh-keygen -b 4096 -t rsa -f uftpkey
To be able to use this key for authentication to the DKRZ uftp server, you have to upload the key to your personal profile on https://luv.dkrz.de. Choose “User -> public keys” and select the “Add key” button. You simply paste the public key (in our example “uftp-key.pub”) into the text box and select the “uftp” check box, then “register key”.
Keys from luv are transferred to the uftp server every twenty minutes, so after a while you should be able to use uftp.
The uftp client is available in the DKRZ software tree. To use the client, simply load the environment module:
$ module load uftp-client
To get information about the uftp server (and to test if authentication works with your key), use the uftp client:
$ uftp info -u k202066 -i ./uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA
Client identity: CN=k202066, OU=ssh-local-users
Client auth method: SSHKEY
Auth server type: AuthServer
Server: HPCDATA
URL base: https://uftp.dkrz.de:9000/rest/auth/HPCDATA:
Description: HPCDATA
Remote user info: uid=k202066;gid=N/A
Sharing support: not available
Server status: OK [2 of 2 UFTPD servers available]
To list the content of a directory on the uftp server, use:
$ uftp ls -u k202066 -i ./uftpkey -v https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/scratch/k/k202066
Verbose mode
Using SSH agent
Using SSH key </home/k202066/uftpkey>
drwx 4096 2016-01-28 18:11 test
drwx 4096 2020-01-03 13:09 tmp
To copy a file from the uftp server, use:
$ uftp cp -u k202066 -i ./uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/scratch/k/k202066/myfile.nc ./myfile.nc
To copy a file to the uftp server, use:
$ uftp cp -u k202066 -i ./uftpkey ./myfile.nc https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/home/dkrz/k202066/myfile.nc
If your file is big, you can choose to use multiple threads to transfer the data in parallel. This is set with the option “-t <threadnumber>:
$ uftp cp -u k202066 -i ./uftpkey -t4 ./myfile.nc https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/home/dkrz/k202066/myfile.nc
Experiments show that this is useful when files are larger than 20GB.
Other options of uftp can be listed with “uftp –help”. There exist many more options for the “uftp cp” command (eg. to preserve the file modification time, to use multiple tcp streams or even to limit used bandwidth), see “uftp cp -h” for a list of all options.
Use cases#
Copy a data tree
In order to copy a data tree (without taring), we recommend to add the following options from uftp cp:
-p,--preserve Preserve file modification timestamp
-R,--resume Check existing target file(s) and
try to resume
-r,--recurse Recurse into subdirectories, if applicable
Consider file sizes when configuring the parallelization of the transfer.
Syncronization
On file level, you can use
$ uftp sync
while you can resume a copy process with
$ uftp cp -R
Transfer speed optimization#
In order to optimize uftp transfer speed, 3 points need to be considered and addressed:
The number of running uftp servers
At DKZR, two uftp servers are running. One uftp client command is directed to exactly one server. Two commands need to be submitted at once so that a proxy would delegate the comands to both servers so that they are fully utilized. The number of running servers can be deduced by the Server status which is printed with “uftp info”:
$ uftp info -u k204210 -i uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA
Server status: OK [2 of 2 UFTPD servers available]
The target storage/filesystem connection
If the target storage is levante lustre, the transfer speed depends on the connection from the client servers/node where uftp is executed to the lustre servers. Experiments show a maximum speed of about 500MB/s per rack. Since all servers in one rack share lustre connections, the uftp commands need to be submitted on nodes which are in different racks. Please request our support for further help on an implementation for that.
Data size.
If the target storage is the levante lustre, small files decrease the transfer speed. For small files, use a lot of processes within uftp (large “THREADS” in “uftp cp -tTHREADS”) so that it can be parallelized well.
Note that DKRZ uplink limits transfer speed at 2GB/s. More likely, you can only reach 1GB/s.