Moving data on and off the cluster¶
The recommended way to move your data on and off the cluster is by using rsync.
Rsync is a fast and versatile file copying tool, it is most useful for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.
# Copy a local directory to your cluster home directory rsync -avz --partial example_local_dir firstname.lastname@example.org: # Copy the contents of a local directory to your cluster home directory rsync -avz --partial example_local_dir/ email@example.com: # Copy a local directory to a specific directory rsync -avz --partial example_local_dir/ firstname.lastname@example.org:/data/example/directory # Copy a remote directory to current local directory rsync -avz --partial email@example.com:/data/home/abc123/remote_directory . # Copy a local directory to a different local directory rsync -av --partial /data/home/abc123/source /data/example/destination
The switches in use here are:
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X) -r, --recursive recurse into directories -l, --links copy symlinks as symlinks -p, --perms preserve permissions -t, --times preserve modification times -g, --group preserve group -o, --owner preserve owner (super-user only) -D same as --devices --specials --devices preserve device files (super-user only) --specials preserve special files --partial keep partially transferred files -v, --verbose increase verbosity -z, --compress compress file data during the transfer
Sometimes an rsync command line can get complicated, using
will allow you to test your command without actually affecting any data.
Large transfers should be run as a job e.g.
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 1 #$ -l h_rt=24:0:0 #$ -l h_vmem=1G rsync -av --partial <source> <destination>
SCP (secure copy) can be used to copy individual files over ssh, although unlike rsync, resuming of file copying is not supported. If your connection is interrupted, you will have to repeat the upload.
# Basic Copy scp example_file firstname.lastname@example.org: # Copy to specific directory scp example_file email@example.com:/data/example/directory # Copy whole directory scp -r example_directory firstname.lastname@example.org:
SFTP (Secure File Transfer) can be used to interactively transfer files over ssh.
$ sftp email@example.com sftp> ls example_remote_file1 example_remote_file_2 sftp> lls example_local_file1 example_local_file_2 sftp> get example_remote_file1 Fetching /data/home/abc123/example_remote_file1 to example_remote_file1 sftp> put example_local_file1 Uploading example_local_file1 to /data/home/abc123/example_local_file1
Further commands are available via the
help command or the man pages.
GUI - FileZilla¶
For a GUI on Windows, Mac OS or Linux we suggest FileZilla.
Adding a site connection¶
To connect to the cluster:
- From the
Filemenu open the
- Click the
New Sitebutton and name the connection
- In the
- Set the
- Set the
- Enter your Apocrita username in the
- Enter your Apocrita login password in the
Importing a private key into FileZilla¶
Your private key can be presented via the SSH agent, or you can import the key into FileZilla:
- In the
Add key file...to import an existing private key into FileZilla.
- Browse to the relevant OpenSSH private key.
Yeswhen asked if you would like to convert it into a supported format.
- Provide a filename for the converted key file e.g.
filezilla-apocrita-key.ppk, then click
Saveto import it.
Finally, you can return to the Site Manager and click
Connect on the Apocrita
site you created.
If you ask FileZilla to save passwords for you, it is recommended that you
protect passwords with a master password. This can be found under
Adding bookmarks to commonly used folders¶
You can optionally set up bookmarks under the Apocrita site, to jump to specific folders, such as scratch, or shared storage:
- From the
Filemenu open the
- Select the Apocrita site.
- Press the
- Give the bookmark an appropriate name (e.g.
- Enter the path to the desired folder in the
Remote Directorybox e.g.
Selecting the bookmark and clicking
Connect will open the folder.
Using Mobaxterm on Windows¶
Mobaxterm can use rsync or the gui to download/upload files.
Mobaxterm - RSYNC¶
Mobaxterm is bundled with a command-line rsync tool that functions identically to the one described above. Be sure to use full paths as you may experience issues with mobaxterm incorrectly interpreting shortcuts.
# On QMUL-managed computers /drives/g should point to your Windows home folder. rsync -avz firstname.lastname@example.org:/data/home/example /drives/g
Mobaxterm - GUI¶
Login to Apocrita as per Logging in. The left
sidebar should then display a list of files on the remote server.
Files can be downloaded by right clicking and selecting 'Download'.
Files can be uploaded by clicking the upload button at the top of the sidebar.
Alternatively files can be drag-and-dropped from file explorer.
As all of your data is available over a SSH connection, you can use SSHFS to access your files as a network drive mounted on your local computer. However, this is not supported by ITS Research.
Aspera - ASCP¶
ascp is a high-speed file transfer application, commonly used
for the download of genome data and other large datasets.
To load the
ascp binary into your PATH, run command:
module load aspera.
Usage: ascp [OPTION] SRC... DEST SRC to DEST, or multiple SRC to DEST dir SRC, DEST format: [[user@]host:]PATH
Transfer rate limit
ascp will utilise all available bandwidth, and impact
other cluster users. To avoid this, please set the max transfer
rate to 300Mbit/s by passing the
-l 300M switch.