Moving data on and off the cluster

Note

The recommended way to move your data on and off the cluster is by using rsync.


Using RSYNC

Rsync is a fast and versatile file copying tool, it is most useful for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.

Basic examples

# Copy a local directory to your cluster home directory
rsync -avz example_local_dir abc123@login.hpc.qmul.ac.uk:

# Copy the contents of a local directory
rsync -avz example_local_dir/ abc123@login.hpc.qmul.ac.uk:

# Copy a local directory to a specific directory
rsync -avz example_local_dir/ abc123@login.hpc.qmul.ac.uk:/data/example/directory

# Copy a remote directory to current local directory
rsync -avz abc123@login.hpc.qmul.ac.uk:/data/home/abc123/remote_directory .

The switches in use here are:

-a, --archive               archive mode; equals -rlptgoD (no -H,-A,-X)
    -r, --recursive             recurse into directories
    -l, --links                 copy symlinks as symlinks
    -p, --perms                 preserve permissions
    -t, --times                 preserve modification times
    -g, --group                 preserve group
    -o, --owner                 preserve owner (super-user only)
    -D                          same as --devices --specials
        --devices               preserve device files (super-user only)
        --specials              preserve special files
-v, --verbose               increase verbosity
-z, --compress              compress file data during the transfer

Dry Run

Sometimes an rsync command line can get complicated, using -n or --dry-run will allow you to test your command without actually affecting any data.


Using SCP

SCP (secure copy) can be used to copy individual files over ssh, although unlike rsync, resuming of file copying is not supported. If your connection is interrupted, you will have to repeat the upload.

# Basic Copy
scp example_file abc123@login.hpc.qmul.ac.uk:

# Copy to specific directory
scp example_file abc123@login.hpc.qmul.ac.uk:/data/example/directory

# Copy whole directory
scp -r example_directory abc123@login.hpc.qmul.ac.uk:

Using SFTP

SFTP (Secure File Transfer) can be used to interactively transfer files over ssh.

Command Line

$ sftp abc123@login.hpc.qmul.ac.uk
sftp> ls
example_remote_file1  example_remote_file_2
sftp> lls
example_local_file1  example_local_file_2
sftp> get example_remote_file1
Fetching /data/home/abc123/example_remote_file1 to example_remote_file1
sftp> put example_local_file1
Uploading example_local_file1 to /data/home/abc123/example_local_file1

Further commands are available via the help command or the man pages.

GUI - Filezilla

For a GUI on Windows, Mac OS or Linux we suggest FileZilla.

To connect to the cluster:

  • From the File menu open the Site Manager.
  • Click the New Site button and name the connection Apocrita.
  • In the Host box put login.hpc.qmul.ac.uk.
  • Change the Servertype to SFTP.
  • Change the Logontype to Ask for password.
  • Enter your username e.g.: btw999.
  • Click Connect.

FZ_screen

If you have access to shared storage, then that can be set up as a bookmark under the Apocrita site:

  • From the File menu open the Site Manager.
  • Select the Apocrita site
  • Press the New Bookmark button
  • Enter /data/YOURSHARE-NAME in the Remote Directory
  • Give the bookmark an appropriate name (e.g. YOURSHARE-NAME)
  • Selecting the bookmark and clicking Connect will open the folder

Using Mobaxterm on Windows

Mobaxterm can use rsync or the gui to download / upload files.

Mobaxterm - RSYNC

Mobaxterm is bundled with a command-line rsync tool that functions identically to the one described above. Be sure to use full paths as you may experience issues with mobaxterm incorrectly interpreting shortcuts.

# On QMUL Managed computers /drives/g should point to your windows home folder.
rsync -avz abc123@login.hpc.qmul.ac.uk:/data/home/example /drives/g

Mobaxterm - GUI

Login to Apocrita as per Logging in, the left sidebar should then display a list of files on the remote server.
Files can be downloaded by right clicking and selecting 'Download'.

mobadown_screen

Files can be uploaded by clicking the upload button mobaup_screen at the top of the sidebar.

Alternatively files can be drag and dropped from file explorer.

mobadnd_screen


Using SSHFS

As all of your data is available over a SSH connection, you can use SSHFS to access your files as a network drive mounted on your local computer. However, this is not supported by ITS Research.


Aspera - ASCP

Aspera's ascp is a high-speed file transfer application, commonly used for the download of genome data and other large datasets.

To load the ascp binary into your PATH, run command: module load aspera.

Usage: ascp [OPTION] SRC... DEST
          SRC to DEST, or multiple SRC to DEST dir
          SRC, DEST format: [[user@]host:]PATH

Transfer rate limit

By default, ascp will utilise all available bandwidth, and impact other cluster users. To avoid this, please set the max transfer rate to 300Mbit/s by passing the -l 300M switch.