Moving data on and off the cluster¶
The recommended way to move your data on and off the cluster is by using rsync.
Rsync is a fast and versatile file copying tool, it is most useful for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.
# Copy a local directory to your cluster home directory rsync -avz --partial example_local_dir email@example.com: # Copy the contents of a local directory to your cluster home directory rsync -avz --partial example_local_dir/ firstname.lastname@example.org: # Copy a local directory to a specific directory rsync -avz --partial example_local_dir/ email@example.com:/data/example/directory # Copy a remote directory to current local directory rsync -avz --partial firstname.lastname@example.org:/data/home/abc123/remote_directory . # Copy a local directory to a different local directory rsync -av --partial /data/home/abc123/source /data/example/destination
The switches in use here are:
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X) -r, --recursive recurse into directories -l, --links copy symlinks as symlinks -p, --perms preserve permissions -t, --times preserve modification times -g, --group preserve group -o, --owner preserve owner (super-user only) -D same as --devices --specials --devices preserve device files (super-user only) --specials preserve special files --partial keep partially transferred files -v, --verbose increase verbosity -z, --compress compress file data during the transfer
Sometimes an rsync command line can get complicated, using
will allow you to test your command without actually affecting any data.
Large transfers should be run as a job e.g.
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 1 #$ -l h_rt=24:0:0 #$ -l h_vmem=1G rsync -av --partial <source> <destination>
SCP (secure copy) can be used to copy individual files over ssh, although unlike rsync, resuming of file copying is not supported. If your connection is interrupted, you will have to repeat the upload.
# Basic Copy scp example_file email@example.com: # Copy to specific directory scp example_file firstname.lastname@example.org:/data/example/directory # Copy whole directory scp -r example_directory email@example.com:
SFTP (Secure File Transfer) can be used to interactively transfer files over ssh.
$ sftp firstname.lastname@example.org sftp> ls example_remote_file1 example_remote_file_2 sftp> lls example_local_file1 example_local_file_2 sftp> get example_remote_file1 Fetching /data/home/abc123/example_remote_file1 to example_remote_file1 sftp> put example_local_file1 Uploading example_local_file1 to /data/home/abc123/example_local_file1
Further commands are available via the
help command or the man pages.
GUI - Filezilla¶
For a GUI on Windows, Mac OS or Linux we suggest FileZilla.
To connect to the cluster:
- From the
Filemenu open the
- Click the
New Sitebutton and name the connection
- In the
- Change the
- Change the
Ask for password.
- Enter your username e.g.:
If you have access to shared storage, then that can be set up as a bookmark under the Apocrita site:
- From the
Filemenu open the
- Select the Apocrita site
- Press the
- Give the bookmark an appropriate name (e.g.
- Selecting the bookmark and clicking
Connectwill open the folder
Using Mobaxterm on Windows¶
Mobaxterm can use rsync or the gui to download / upload files.
Mobaxterm - RSYNC¶
Mobaxterm is bundled with a command-line rsync tool that functions identically to the one described above. Be sure to use full paths as you may experience issues with mobaxterm incorrectly interpreting shortcuts.
# On QMUL Managed computers /drives/g should point to your windows home folder. rsync -avz email@example.com:/data/home/example /drives/g
Mobaxterm - GUI¶
Login to Apocrita as per Logging in, the left
sidebar should then display a list of files on the remote server.
Files can be downloaded by right clicking and selecting 'Download'.
Files can be uploaded by clicking the upload button at the top of the sidebar.
Alternatively files can be drag and dropped from file explorer.
As all of your data is available over a SSH connection, you can use SSHFS to access your files as a network drive mounted on your local computer. However, this is not supported by ITS Research.
Aspera - ASCP¶
ascp is a high-speed file transfer application, commonly used
for the download of genome data and other large datasets.
To load the
ascp binary into your PATH, run command:
module load aspera.
Usage: ascp [OPTION] SRC... DEST SRC to DEST, or multiple SRC to DEST dir SRC, DEST format: [[user@]host:]PATH
Transfer rate limit
ascp will utilise all available bandwidth, and impact
other cluster users. To avoid this, please set the max transfer
rate to 300Mbit/s by passing the
-l 300M switch.