Debugging your jobs
Failure to submit¶
It may be that you can chosen an incorrect scheduler settings and the scheduler rejects the request.
Make sure to check:
- Your memory request has been adjusted for cores
- You have selected a reasonable
h_rtand number of cores
- There are no errors in your job script
You can verify a job with:
$ qsub -w v job_script verification: found suitable queue(s)
Or a queued job with:
$ qalter -w v job_id verification: found suitable queue(s)
Failure to run¶
Your job may fail for a number of reasons:
- you are out of disk quota
- there are bad characters in your script
- you have requested insufficient resources to complete your job, check the resource usage
You can check the job output for the following:
- syntax errors in your script
- the code run by your job exits with an error
- the code failed to run because an expected file or directory did not exist
- permissions problem (can't read or write certain files)
Job output files¶
If the job is scheduled and runs but dies instantly, then the first place to check are the output files in the job working directory. These contain useful output and error information produced by the job and should be one of the first places to check when a job is failing or misbehaving
By default the scheduler places output files in the current working directory
unless otherwise specified by the
-o option, if you are using the
-j option all output will be in the
The default file name has the form
job_name.ejob_id.task_id for array job tasks, if you're using a parallel
environment there will also be
Job exit status¶
All jobs should complete with an exit status of zero, even if the data from your job looks correct. This can be checked by enabling email options in your submission script, or by checking the job statistics.
Job in Eqw state¶
When a job dies due to filesystem errors, node crashes or similar system issues
, the job may be placed in the
Eqw state, awaiting further action to avoid
running 100s of jobs that will inevitably crash.
If you have determined the cause of the error and think your job should now run
correctly the error state can be cleared using
qmod -cj <job-id>.
DOS / Windows newline Characters¶
DOS / Windows uses different characters from Unix to represent newlines in files. This can cause issues when a script has been written on a Windows machine and transferred to the cluster.
Incorrect newline characters can be detected with:
cat -v <script> | grep "\^M"
The file can then be fixed with:
$ dos2unix <script> dos2unix: converting file <script> to UNIX format ...
If you cannot resolve the issue with your job, please contact us, supplying all the relevant information.