uftp#

Introduction#

uftp is a data transfer tool which uses parallel streams to transfer big amounts of data efficiently. It uses client/server architecture so that a user can use the uftp client to transfer data to and from existing uftp servers. DKRZ provides a uftp server (which uses multiple transfer nodes) to enable data transfers to and from DKRZ lustre filesystem.

uftp usage#

uftp uses ssh keys to authenticate the user to the uftp server. So to use uftp you have to create a ssh key which is used for data transfer only. To create a key use ssh-keygen, eg:

ssh-keygen -b 4096 -t rsa -f uftpkey

To be able to use this key for authentication to the DKRZ uftp server, you have to upload the key to your personal profile on https://luv.dkrz.de. Choose “User -> public keys” and select the “Add key” button. You simply paste the public key (in our example “uftp-key.pub”) into the text box and select the “uftp” check box, then “register key”.

Keys from luv are transferred to the uftp server every twenty minutes, so after a while you should be able to use uftp.

The uftp client is available in the DKRZ software tree. To use the client, simply load the environment module:

$ module load uftp-client

To get information about the uftp server (and to test if authentication works with your key), use the uftp client:

$ uftp info -u k202066 -i ./uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA
Client identity:    CN=k202066, OU=ssh-local-users
Client auth method: SSHKEY
Auth server type:   AuthServer
Server: HPCDATA
  URL base:         https://uftp.dkrz.de:9000/rest/auth/HPCDATA:
  Description:      HPCDATA
  Remote user info: uid=k202066;gid=N/A
  Sharing support:  not available
  Server status:    OK [2 of 2 UFTPD servers available]

To list the content of a directory on the uftp server, use:

$ uftp ls -u k202066 -i ./uftpkey -v https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/mnt/lustre01/scratch/k/k202066
Verbose mode
Using SSH agent
Using SSH key </home/k202066/uftpkey>
drwx 4096 2016-01-28 18:11 test
drwx 4096 2020-01-03 13:09 tmp

 To copy a file from the uftp server, use:

$ uftp cp -u k202066 -i ./uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/mnt/lustre01/scratch/k/k202066/myfile.nc ./myfile.nc

 To copy a file to the uftp server, use:

$ uftp cp -u k202066 -i ./uftpkey ./myfile.nc https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/home/dkrz/k202066/myfile.nc

 If your file is big, you can choose to use multiple threads to transfer the data in parallel. This is set with the option “-t <threadnumber>:

$ uftp cp -u k202066 -i ./uftpkey -t4 ./myfile.nc https://uftp.dkrz.de:9000/rest/auth/HPCDATA:/home/dkrz/k202066/myfile.nc

Experiments show that this is useful when files are larger than 20GB.

Other options of uftp can be listed with “uftp –help”. There exist many more options for the “uftp cp” command (eg. to preserve the file modification time, to use multiple tcp streams or even to limit used bandwidth), see “uftp cp -h” for a list of all options.

Use cases#

  1. Copy a data tree

In order to copy a data tree (without taring), we recommend to add the following options from uftp cp:

-p,--preserve                        Preserve file modification timestamp
-R,--resume                          Check existing target file(s) and
                                     try to resume
-r,--recurse                         Recurse into subdirectories, if applicable

Consider file sizes when configuring the parallelization of the transfer.

  1. Syncronization

On file level, you can use

$ uftp sync

while you can resume a copy process with

$ uftp cp -R

Transfer speed optimization#

In order to optimize uftp transfer speed, 3 points need to be considered and addressed:

  1. The number of running uftp servers

At DKZR, two uftp servers are running. One uftp client command is directed to exactly one server. Two commands need to be submitted at once so that a proxy would delegate the comands to both servers so that they are fully utilized. The number of running servers can be deduced by the Server status which is printed with “uftp info”:

$ uftp info -u k204210 -i uftpkey https://uftp.dkrz.de:9000/rest/auth/HPCDATA

 Server status:    OK [2 of 2 UFTPD servers available]
  1. The target storage/filesystem connection

If the target storage is levante lustre, the transfer speed depends on the connection from the client servers/node where uftp is executed to the lustre servers. Experiments show a maximum speed of about 500MB/s per rec. Since all servers in one rec share lustre connections, the uftp commands need to be submitted on nodes which are in different racks. Please request our support for further help on an implementation for that.

  1. Data size.

If the target storage is the levante lustre, small files decrease the transfer speed. For small files, use a lot of processes within uftp (large “THREADS” in “uftp cp -tTHREADS”) so that it can be parallelized well.

Note that DKRZ uplink limits transfer speed at 2GB/s. More likely, you can only reach 1GB/s.