Monitoring Jobs

Check status of submitted jobs

To monitor jobs you have submitted to the queue you can run the qstat command:

$ qstat
job-ID  prior   name     user   state submit/start at     queue        slots ja-task-ID
--------------------------------------------------------------------------------------
3583807 5.76204 exampleA btw999 r     01/21/2016 23:10:10 serial.q@dn94   5 16
3583807 5.76204 exampleA btw999 r     01/21/2016 23:54:11 serial.q@dn58   5 17
3599804 5.00001 exampleB btw999 r     01/16/2016 12:05:39 serial.q@dn75   1
3599852 5.00001 exampleC btw999 r     01/16/2016 17:39:27 smserial.q@sm3  1
3602902 0.00000 exampleD btw863 qw    01/22/2016 10:49:47                 8
3602902 0.00000 exampleE btw863 Eqw   01/22/2016 10:49:47                 8

To see the status of a particular job (e.g. job 999) you can run the qstat command with the -j option:

qstat -j 999

To see the resources you are currently requesting you can run the qstat command with the -r option:

qstat -r

More details on the qstat command are available in the man page.

Checking where my job is in the queue

We provide some additional commands to display current activity of the cluster and the queues in a more readable format than the standard qhost and qstat commands.

nodestatus

nodestatus will show the current core and memory usage of all of the nodes. Note that some nodes are on queues that may not be available to all users - these are coloured blue in the list of nodes. nodestatus -F gives a detailed view of jobs running on each node, plus the number of slots used, and the latest finish time of the job based on the maximum requested runtime.

The -N option allows you to inspect the jobs running on a selected node. For example:

nodestatus -N dn55

Node       cores           memory
           used/total      used/available/total

dn55       (12/12)         (9/2/24)
           abc123          serial.q        4008031    Thu, 22 Jun 2017 15:20:39          1
           xyz126          serial.q        4012201    Mon, 26 Jun 2017 17:08:21          1
           xyz126          serial.q        4012202    Tue, 27 Jun 2017 05:02:36          1
           abc985          serial.q        4015392    Sat, 24 Jun 2017 16:03:05          8
           hij208          serial.q        4019133    Wed, 21 Jun 2017 13:48:18          1

The memory used is the actual RAM in use on the node at that moment. The memory available value is the approximate amount of RAM available for use by the scheduler. Note that the sum of these values may not equal the total memory on the node - for example, if a job has requested memory but not used it.

nodestatus command usage

nodestatus -h will provide a summary of available options.

showqueue

showqueue is another useful command that shows all of the jobs waiting to be run, in the order of priority. It is useful for inspecting typical wait times for different job sizes. Note that some jobs may not be running because they are restricted by resource quotas. showqueue -F gives additional detail on each queued job, such as the total RAM requested, and resource quotas. While most users won't hit core or memory quotas, they can be inspected using the qquota command.

showqueue command usage

showqueue -h will provide a summary of available options.

Jobs with error statuses

The qstat output may include jobs with a status of E. This indicates that an error occurred - not an error within the job itself, but one that prevented the job from being started. Typically, this may be because the user ran out of file space.

The jobs with errors can be deleted from the queue using:

qdel <job ID>

Alternatively, if the cause of the error has been cleared and the jobs need to run, the error state can be cleared using:

qmod -cj <job ID>

Email notifications

You can request email notifications of a change in status of your jobs by adding the following code to the "Grid Engine options" section of your submission script:

#$ -m bea # Send email at the beginning and end of the job and if aborted
#$ -M my_name@qmul.ac.uk # The email address to notify

If you do not add the -M line you will be emailed at the address that is registered with us (usually your QMUL email address).