GridFTP

Note

Support for open source Globus Toolkit ended in January 2018. We are working on a transition to Globus cloud transfer service. Existing installations of GridFTP client globus-url-copy are still working.

Introduction

GridFTP is an extension of the File Transfer Protocol (FTP) for secure, efficient and fault-tolerant data transfer. GridFTP implementation employed at DKRZ is a part of the Globus Toolkit, a bundle of tools for Grid Computing, which uses GSI (Grid Security Infrastructure) or SSH mechanisms for authentication and encryption. It provides numerous tuning options and enhanced features for optimal use of bandwidth, for example use of multiple parallel data streams, striping the data transfers over multiple connections or clusters of servers etc. For performance reasons the control connection is authenticated and encrypted, the data channel is only authenticated but not encrypted. The encryption and integrity protection of data can be activated if required.

In short, GridFTP offers significantly more performance than basic transfer tools like scp, sftp or rsync and should be used for movement of large data amounts.

GridFTP Usage

A GridFTP server is running on two mistral frontend nodes. The connection load is distributed via Round Robin DNS across them. The DKRZ GridFTP server can be accessed under the name gridftp.dkrz.de. No additional environment modules need to be loaded on mistral.

In the following, we assume that a GridFTP client and proxy utils are already installed on your local system.

The globus-url-copy program is a GridFTP client tool supplied by the Globus Project. The command line tool is suitable for scripting and supports different data movement protocols. The command syntax is:

globus-url-copy [options] source_url destination_url

The source and destination URLs combine specifications on data movement protocol, GridFTP server name and paths to the file to be transferred, i.e. protocol://server[:port]/path. For example:

  • gsiftp://hostname/path/to/remote_file

  • sshftp://username@hostname/path/to/remote_file

  • file:///absolute_path/to/local_file

The available protocol specifiers in the URL are

  • gsiftp:// - Access remote file using GridFTP with GSI security

  • sshftp:// - Access remote file using GridFTP over SSH

  • file:// - Locally accessible file

  • ftp:// - Access remote file using original FTP

  • http://, https:// - Access remote file using HTTP or HTTPS

For non-experienced users, it is recommended to use absolute file paths in the source and destination URLs.

Note

If the data path refers to a directory, it must be terminated with a slash (/).

The following table lists some important globus-url-copy options alphabetically:

-cd

Create destination directory if does not exist

-dbg

Print debug information for control channel

-f FILENAME

Read a list of source and destination URLs from file FILENAME

-fast

Use MODE E (extended block mode) for data transfers (allows for data channel re-use etc.)

-list URL

List files located at URL

-p N

Enable N parallel streams (the number N of parallel streams is usually between 4 and 8) Note, that this option switches from passive to active mode FTP which causes the server to initiate the data connection. If your local machine is behind a firewall you can not use this performance improving option

-r

Transfer directories recursively

-sync

Do not transfer files that already exist identically at destination

-tcp-bs SIZE

Specify TCP buffer size (in bytes) for underlying FTP data channels

-vb

Display source and destination URLs, transferred number of bytes and transfer rates

For the description of all options and arguments please refer to the manual pages:

$ man globus-url-copy

GridFTP over SSH

For sshftp protocol you only need a valid user account with mistral access. It is convenient to set up SSH keys for a password-less authentication.

To list the contents of a remote mistral directory use:

$ globus-url-copy -list sshftp://userid@gridftp.dkrz.de/some/path/

Note, the trailing “/” at the end of the directory path is required to list the content of the directory and not just the directory name.

To upload a single file from a local file system to mistral you can use:

$ globus-url-copy -vb -p 4 file:///local_path/local_filename \
                           sshftp://userid@gridftp.dkrz.de/remote_path/remote_filename

For directory upload use:

$ globus-url-copy -vb -p 4  -r -cd file:///local_path/local_dir/ \
                                   sshftp://userid@gridftp.dkrz.de/remote_path/remote_dir/

Note, that URLs must be terminated with a slash (/) to refer to a directory.

To download a single file or the whole directory from mistral to your local system use:

$ globus-url-copy -vb -p 4 sshftp://userid@gridftp.dkrz.de/remote_path/remote_filename \
                           file:///local_path/local_filename

$ globus-url-copy -vb -p 4 -r -cd sshftp://userid@gridftp.dkrz.de/remote_path/remote_dir/ \
                                  file:///local_path/local_dir/

GridFTP can read source and destination URLs of multiple files and directories to be transferred from an input file whose name can be specified with the option -f:

$ globus-url-copy -vb -p 4 -sync -r -cd -f FILENAME

Each line of the input file, which can be generated automatically with an appropriate script, contains source and destination URLs and looks like:

sshftp://userid@gridftp.dkrz.de/remote_path1/remote_filename1 file:///local_path1/local_filename1
sshftp://userid@gridftp.dkrz.de/remote_path2/remote_filename2 file:///local_path2/local_filename2
sshftp://userid@gridftp.dkrz.de/remote_path3/remote_dir3/     file:///local_path3/local_dir3/
file:///local_path4/local_filename4                           sshftp://userid@gridftp.dkrz.de/remote_path4/remote_filename4

The globus-url-copy command can specify source and destination systems independent from the local system (so-called Third Party Transfer). This is different from scp and sftp where either source or destination must be a local path. The command to move data between two GridFTP servers at different computer centers using locally installed GridFTP client is:

$ globus-url-copy -vb -p 4 sshftp://userid@gridftp.dkrz.de/tmp/foo \
                           sshftp://other_userid@other_machine/tmp/bar

GridFTP with GSI security

To use the GridFTP protocol gsiftp you need a digital X.509 certificate from a grid Certificate Authority (CA) trusted by GridFTP. A step-by-step description of how to obtain and install a user certificate can be found here.

The concept of proxy certificates allows for a password free authentication to remote hosts. Therefore, the first step is to generate a short-lived proxy certificate using the grid-proxy-init command, for example:

$ grid-proxy-init -valid 1:00
Enter GRID pass phrase for this identity:
Your identity: /C=DE/O=GridGermany/OU=Deutsches Klimarechenzentrum GmbH/CN=Max Mustermann
Creating proxy ............................................. Done
Your proxy is valid until: Mon May 15 20:13:52 2017

The grid-proxy-init call above grants you access for the next hour (option -valid 1:00) without entering your grid pass phrase again. The default validity period is 12 hours. The authentication is done using a short-lived proxy certificate, which is stored in the file /tmp/x509_u<UID> by default. To write the proxy certificate file to an alternative location, use the option -out:

$ grid-proxy-init -out $HOME/.globus/credentials

or set the path and file name of the user proxy via the environment variable X509_USER_PROXY:

# sh, bash, ksh, zsh
export X509_USER_PROXY=$HOME/.globus/credentials
# csh, tcsh
setenv 509_USER_PROXY $HOME/.globus/credentials

The grid-proxy-info command shows information about the generated proxy certificate:

$ grid-proxy-info
subject  : /C=DE/O=GridGermany/OU=Deutsches Klimarechenzentrum GmbH/CN=Erika Mustermann/CN=1234567898
issuer   : /C=DE/O=GridGermany/OU=Deutsches Klimarechenzentrum GmbH/CN=Erika Mustermann
identity : /C=DE/O=GridGermany/OU=Deutsches Klimarechenzentrum GmbH/CN=Erika Mustermann
type     : RFC 3820 compliant impersonation proxy
strength : 1024 bits
path     : /tmp/x509up_u1234
timeleft : 0:59:31

The non-standard name of the proxy certificate file can be specified using the environment variable X509_USER_PROXY or with the option -file (short form -f):

$ grid-proxy-info -f $HOME/.globus/credentials

Your credential is expired if

  • timeleft : 0:00:00 or

  • an error message like “ERROR: Couldn’t find a valid proxy …”

is displayed by grid-proxy-info command. In this case you have to generate a new proxy certificate by calling grid-proxy-init again.

For the complete set of command line options to grid-proxy-init and grid-proxy-info commands, please refer to the corresponding manual pages:

$ man grid-proxy-init
$ man grid-proxy-info

Having a valid proxy certificate you can copy local files to Mistral using globus-url-copy command:

$ globus-url-copy -vb file:///local_path/local_filename \
                      gsiftp://userid@gridftp.dkrz.de/remote_path/remote_filename

The non-default location of the credential can be explicitly specified with the -cred option in the globus-url-copy call:

globus-url-copy -cred $HOME/.globus/credentials -vb file:///local_path/local_filename \
                 gsiftp://userid@gridftp.dkrz.de/remote_path/remote_filename

or via the environment variable X509_USER_PROXY. All examples from the section GridFTP over SSH apply correspondingly if you replace the protocol specification ‘sshftp://’ with ‘gsiftp://’.

Getting help

If any questions or problems arise please contact DKRZ User Support.