Diagnosing Univa Grid Engine -DiagnosingoUniva(Grid Engine - sge_diagnostics()


NAME
       Diagnostics - Diagnostics and Debugging of Univa Grid Engine

DESCRIPTION
       The sections below describe aspects of diagnosing qmaster behaviour and
       obtaining more detailed information  about  the  state  of  Univa  Grid
       Engine.

LOGGING
       Certain  components  as  sge_qmaster(1) or sge_execd(1) create informa-
       tive, warning, error or debugging messages that are written to  a  mes-
       sage file of the corresponding component.

       The parameter loglevel of the global configuration of Univa Grid Engine
       allows to change the level of information that is written to  the  mes-
       sage  file.   When  the loglevel is set to log_debug then more detailed
       information is written that allows to see details of the internal state
       of  the  component  and  to debug certain error scenarios that would be
       difficult to diagnose otherwise.

   Received and Sent Messages
       When the loglevel log_debug is activated then Univa Grid Engine  writes
       log messages whenever sge_qmaster receives messages or sends messages.

       Message  have  the following format: ACTION: HOSTNAME/COMPROC-NAME/COM-
       PROC-ID/MESSAGE-ID:MESSAGE-TAG:SIZE

       o ACTION: SEND or RECEIVE

       o HOSTNAME: Identifies the host were the message was send from.

       o COMPROC-NAME: Name of the daemon or command  that  sent  the  message
         (e.g.  qsub, execd, qmon, ...)

       o COMPROC-ID: Univa Grid Engine internal ID used for communication

       o MESSAGE-ID:  Message ID that identifies the request on the communica-
         tion layer.

       o MESSAGE-TAG:  Type  of  message:  TAG_GDI_REQUEST,   TAG_ACK_REQUEST,
         TAG_REPORT_REQUEST, ...

       o SIZE: Size of the message in bytes

   Request execution
       When  the loglevel log_debug is activated then Univa Grid Engine writes
       log messages whenever sge_qmaster accepts new requests from client com-
       mands  (e.g qsub(1), qalter(1), qconf(1), ...), other server components
       (e.g.  sge_execd) or qmaster internal threads (lothread when the  Univa
       Grid  Engine  cluster  is  connected  to  Univa  License Orchestrator).
       Incoming requests are stored in qmaster internal queues till  a  thread
       is available that is able to handle the request properly.  Log messages
       will also be written when one of the  internal  qmaster  threads  start
       executing such a request and when request handling has finished.

       In  low  performing  clusters  this  allows  to  identify hosts, users,
       requests types ...   that  are  the  root  cause  for  the  performance
       decrease.

       Messages related to request execution have following format:

                 ACTION: HOSTNAME/COMPROC-NAME/COMPROC-ID/MESSAGE-ID:USER:SIZE:INTERFACE:REQUEST-DETAILS[:DURATION]

       o ACTION: QUEUE, FETCHED, STARTED or FINISHED

       o HOSTNAME: Identifies the host were the request was send from.

       o COMPROC-NAME:  Name  of  the  daemon or command that sent the request
         (e.g.  qsub, execd, qmon, ...)

       o COMPROC-ID: Univa Grid Engine internal ID used for communication

       o MESSAGE-ID: Message ID that identifies the request on the  communica-
         tion layer.

       o USER: Name of the user that caused the request to be send to qmaster.

       o SIZE: Size of the request in bytes (the commlib message) when receiv-
         ing requests from external clients, else 0

       o INTERFACE: Interface that was used to trigger  the  request  (GDI  or
         REP)

       o REQUEST-DETAILS:  For  GDI requests this will show the operation type
         (e.g ADD, MOD, DEL, ...) and the object type (JB for job  object,  CQ
         for cluster queue object, ...)

       o DURATION:  optionally:  time  in seconds since the last action on the
         request, e.g.  time a request was queued, time it took from  fetching
         a  request  till it can be processed (acquiring locks), time for pro-
         cessing a request

       Messages related to non GDI  requests  modifying  event  clients  (e.g.
       acknowledge receipt of an event package) have the following format:

                 ACTION(E): REQUEST:ID[:DURATION]

       o ACTION: QUEUE, STARTED or FINISHED

       o REQUEST: type of request, e.g.  ACK

       o ID: the event client id, see qconf -secl

       o DURATION:  optionally:  time  in seconds since the last action on the
         request, e.g.  time a request  was  queued,  time  for  processing  a
         request

JOB LIMITS
   SUPPORTED LIMITS
       The  following  table  shows  what kind of limits are supported via job
       submission and queue setting and where the observation  is  implemented
       (sge_execd, sge_shepherd or via cgroups):

       limit             execd   shepherd   cgroups   description
       -------------------------------------------------------------
       h_cpu/s_cpu       yes     yes        no        cpu      time
                                                      limit in sec-
                                                      onds
       h_vmem/s_vmem     yes*    yes*       yes       virtual  mem-
                                                      ory size
       h_rss/s_rss       yes*    yes*       no        resident  set
                                                      size
       h_stack/s_stack   no      yes        no        stack    size
                                                      limit
       h_data/s_data     no      yes        no        data  segment
                                                      size limit
       h_core/s_core     no      yes        no        max.  size of
                                                      a core file**


       h_fsize/s_fsize   no      yes        no        max.     file
                                                      size**

       (* = If supported by OS) (** = This kind of limit is not adjusted on pe
       settings)

       In  order  to  setup limit observation by the sge_execd or sge_shepherd
       the "execd_params" parameter "ENFORCE_LIMITS" in the  configuration  of
       the execution hosts is used (see sge_conf(5) man page).  This parameter
       only allows settings for the supported limits (cpu, vmem and rss).  The
       remaining  limits  (stack, data, core and fsize) cannot be switched off
       by this parameter.

       If virtual memory size is set to be observed by cgroups  the  sge_execd
       observation  is disabled for the "h_vmem" limit.  If cgroups limit set-
       ting did not report any error at the sge_sheperd startup  the  "h_vmem"
       resource  limit  will  be set to "infinity" with the setrlimit() system
       call.  How to enable cgroups "h_vmem" limit observation is described in
       the    man    page    sge_conf(5)    ("h_vmem_limit"    parameter    of
       "cgroups_params").

       If a limit is observerd by  sge_execd  the  execd  is  responsible  for
       killing  the  job.  For sge_shepherd the limits are set via setrlimit()
       command to let the Kernel  enforce  the  process  limit.   The  cgroups
       implementation  will write the corresponding limit for all processes of
       the job into the "memory.memsw.limit_in_bytes" file which is created in
       the cgroups directory of the job.

   SUPPORTED LIMITS VIA EXECD CONFIGURATION
       The following table shows the limits that can be set by sge_shepherd at
       job start via setrlimit() system call to enable Kernel enforced process
       limit.  The OS must of course support this limit type.

       Limit                      Description
       ------------------------------------------------------
       h_descriptors/s_descrip-   nr of open  file  descrip-
       tors                       tors
       h_maxproc/s_maxproc        max nr of processes
       h_locks/s_locks            nr of locks
       h_memorylocked/s_memory-   maximum number of bytes of
       locked
                                  memory locked into RAM

       Please see also the sge_conf(5) man page.  The  section  "execd_params"
       contains information how to enable this limits.

   LIMIT ADJUSTMENT DEPENDING ON PE SETUP
       For  parallel jobs the resulting limit value depends on the used paral-
       lel environment (PE) settings.  The following diagrams  should  explain
       this in more detail.

       List of abbreviations:

       Name           Description
       ------------------------------------------
       slave limit    Value specified with "-l"
       master limit   Value    specified    with
                      "-masterl"
       n              Nr of slave slots on  this
                      host
       CS             Boolean   "control_slaves"
                      PE option
       JFT            Boolean
                      "job_is_first_task"     PE
                      option


       MFS            Boolean              "mas-
                      ter_forks_slaves"       PE
                      option
       DFS            Boolean              "dae-
                      mon_forks_slaves"       PE
                      option
       n/a            Not applicable situation

       Master task limit adjustments

       This diagrams shows how the resulting limit for the master  task  of  a
       parallel job is calculated:

                                  MFS
                                   |
                          TRUE-----+------FALSE
                           |                |
                          JFT             Case C
                           |
                   TRUE----+----FALSE
                     |            |
                   Case A       Case B


              Case A:
              =======
                              master limit
                               requested?
                                   |
                     TRUE----------+---------FALSE
                      |                        |
                  master limit +           slave limit +
               (n-1) * slave limit      (n-1) * slave limit

              Case B:
              =======
                              master limit
                               requested?
                                   |
                     TRUE----------+---------FALSE
                      |                        |
                  master limit +           slave limit +
                 n * slave limit          n * slave limit

              Case C:
              =======
                              master limit
                               requested?
                                   |
                     TRUE----------+---------FALSE
                      |                        |
                  master limit             slave limit

       Adjustments for slave tasks running on master host

       This  diagrams shows the resulting limit for any slave task of a paral-
       lel job which is started on the master task host:

                                CS
                                 |
                         TRUE----+----FALSE
                          |             |
                         MFS           n/a
                          |
                   TRUE---+---FALSE
                    |           |
                   n/a         DFS
                                |
                         TRUE---+---FALSE
                          |           |
                         JFT      slave limit
                          |
                   TRUE---+----FALSE
                    |            |
                 (n-1) *         n *
               slave limit    slave limit

       Adjustments for slave tasks running on a slave host

       This diagrams shows the resulting limit for any slave task of a  paral-
       lel job which is started on a slave host:

                                 CS
                                 |
                         TRUE----+----FALSE
                           |            |
                          DFS          n/a
                           |
                   TRUE----+----------FALSE
                    |                   |
               n * slave limit      slave limit

       Examples for master_forks_slave=true in pe setting

              qsub -l h_vmem=1G -pe mpi 3
              h_vmem = 1G + 1G * 3 = 4G   (job first task=false)
              h_vmem = 1G + 1G * 2 = 3G   (job first task=true)


              qsub -masterl h_vmem=0.5G -l h_vmem=1G -pe mpi 3
              h_vmem = 0.5G + 3 * 1G = 3.5G (job first task = false)
              h_vmem = 0.5G + 2 * 1G = 2.5G (job first task = true)

              qsub -pe fixed 16 -masterl h_vmem=64G
              h_vmem = 64G + INFINITY * 16 = INFINITY (job first task = false)
              h_vmem = 64G + INFINITY * 15 = INFINITY (job first task = true)

              qsub -pe fixed 16 -masterl h_vmem=2G -l h_vmem=4G
              h_vmem = 2G + 4G * 16 = 2G + 64G = 66G (job first task = false)
              h_vmem = 2G + 4G * 15 = 2G + 60G = 62G (job first task = true)

              qsub -pe fixed 16 -l h_vmem=4G
              h_vmem = 4G + 4G * 16 = 4G + 64G = 68G (job first task = false)
              h_vmem = 4G + 4G * 15 = 4G + 60G = 64G (job first task = true)

       h_vmem limit for cgroups

       The  cgroups  h_vmem  limit  will be the sum of the limits of all tasks
       started on this host.  Once the individual limit for master task, slave
       task  on  master  host  and slave task on slave host are calculated the
       resulting sum for the cgroups h_vmem setting is done the following way:

              On master task host:
              resulting master task limit + nr of started slave tasks * resulting slave limit

              On slave task host:
              nr of started slave tasks * resulting slave limit

       Note: The PE  parameters  daemon_forks_slaves  and  master_forks_slaves
       have  an  influence on the nr of slave jobs that can be started on each
       host.  More information about this  parameters  can  be  found  in  the
       sge_pe(5) man page.

MONITORING
   MESSAGE FILE MONITORING
       Monitoring  output  of  the  sge_qmaster(1)  component  is  disabled by
       default.  It can be enabled by defining MONITOR_TIME  as  qmaster_param
       in  the  global  configuration  of Univa Grid Engine (see sge_conf(5)).
       MONITOR_TIME defines the time interval when monitoring  information  is
       printed.   The  generated output provides information per thread and it
       is written to the message file or displayed with qping(1).

       The messages that are shown start with the name  of  a  qmaster  thread
       followed by a three digit number and a colon character (:).  The number
       allows to distinguish monitoring output of different threads  that  are
       part of the same thread pool.

       All  counters  are  reset when the monitoring output was printed.  This
       means that all numbers show activity characteristics of about one MONI-
       TOR_TIME  interval.  Please note that the MONITOR_TIME is only a guide-
       line and not a fixed interval.  The interval that is actually  used  is
       shown by time in the monitoring output.

       For each thread type the output contains following parameters:

       o runs:  [iterations  per  second] number of cycles per second a thread
         executed its main loop.  Threads typically handle  one  work  package
         (message, request) per iteration.

       o out:  [messages  per  second] number of outgoing TCP/IP communication
         messages per second.  Only those threads  trigger  outgoing  messages
         that  handle  requests  that  were  triggered by external commands or
         interfaces (client commands, DRMAA, ...).

       o APT: [cpu time per message] average processing time  per  message  or
         request.

       o idle:  [%]  percentage  how  long the thread was idle and waiting for
         work.

       o wait: [%] percentage how long the thread  was  waiting  for  required
         resources that where already in use by other threads.

       o time: [seconds] time since last monitoring output for this thread was
         written.

       Depending on the thread type the output will contain more details:

       LISTENER

       Listener threads listen for incoming messages that are send to  qmaster
       via generic data interface, event client interface, mirror interface or
       reporting interface.  Requests are unpacked and verified.   For  simple
       requests  a  response  will also be sent back to the client but in most
       cases the request will be stored in one of the request queues that  are
       processed by reader, worker threads or the event master thread.

       o IN  g:  [requests  per  second]  number  of requests received via GDI
         interface.

       o IN a: [messages per second] handled ack's for a request response.

       o IN e: [requests per  second]  event  client  requests  received  from
         applications using the event client or mirror interface.

       o IN  r:  [requests  per  second] number of reporting requests received
         from execution hosts.

       o OTHER wql: [requests] number of pending read-write requests that  can
         immediately be handled by a worker thread.

       o OTHER  rql:  [requests] number of pending read-only requests that can
         immediately be handled by a reader thread.

       o OTHER wrql: number of waiting read-only requests.  read-only requests
         in waiting-state have to be executed as part of a GDI session and the
         data store of the read-only thread pool is not in a state to  execute
         those requests immediately.

       READER/WORKER

       Reader  and  worker  threads handle GDI and reporting requests.  Reader
       threads will handle read-only requests only whereas all  requests  that
       require read-write access will be processed by worker threads.

       o EXECD l: [reports per second] handled load reports per second.

       o EXECD j: [reports per second] handled job reports per second.

       o EXECD c: [reports per second] handled configuration version requests.

       o EXECD p: [reports per second] handled processor reports.

       o EXECD a: [messages per second] handled ack's for a request response.

       o GDI a: [requests per second] handled GDI add requests per second.

       o GDI g: [requests per second] handled GDI get requests per second.

       o GDI m: [requests per second] handled GDI modify requests per second.

       o GDI d: [requests per second] handled GDI delete requests per second.

       o GDI c: [requests per second] handled GDI copy requests per second.

       o GDI t: [requests per second] handled GDI trigger requests per second.

       o GDI p: [requests per second] handled GDI permission requests per sec-
         ond.

       EVENT MASTER

       The event master thread is responsible for handling activities for reg-
       istered  event  clients  that either use the event client or the mirror
       interface.  The interfaces can be used to register and subscribe all or
       a  subset  of  event types.  Clients will automatically receive updates
       for subscribed information as soon as it is added, modified or  deleted
       within  qmaster.  Clients using those interfaces don't have the need to
       poll required information.

       o clients: [clients] connected event clients.

       o mod: [modifications per second] event client modifications  per  sec-
         ond.

       o ack: [messages per second] handled ack's per second.

       o blocked: [clients] number of event clients blocked during send.

       o busy: [clients] number of event clients busy during send.

       o events: [events per second] newly added events per second.

       o added: [events per second] number of all events per second.

       o skipped:  [events  per  second] ignored events per second (because no
         client has subscribed them).

       TIMED EVENT

       The timed event thread is used within qmaster to either trigger activi-
       ties once at a certain point in time or in regular time intervals.

       o pending:  [events]  number  of  events  waiting  that  start  time is
         reached.

       o executed: [events per second] executed events per second.

QPING MONITORING
       The qping(1) command provides monitoring output of  Univa  Grid  Engine
       components.

   REQUEST QUEUES
       Requests  that  are  accepted by qmaster but that cannot be immediately
       handled by one of the reader or worker threads are  stored  in  qmaster
       internal  request queues.  qping(1) is able to show details about those
       pending requests when this is enabled by defining the  parameter  MONI-
       TOR_REQUEST_QUEUES  as  qmaster_param  in  the  global configuration of
       Univa Grid Engine.  The output format of requests is the  same  as  for
       requests log messages (explained in the section Logging -> Request exe-
       cution above).

GRID ENGINE ERROR, FAILURE AND EXIT CODES
       Univa Grid Engine provides a number of  job  or  feature  related  exit
       codes,  which  can  be used to trigger a job or a queue behaviour and a
       resulting consequence, for either the job or  also  the  queue.   These
       exit codes are shown in the following tables.

   Job related error and exit codes
       The  following  table  lists  the consequences of different job-related
       error codes or exit codes.  These codes are valid  for  every  type  of
       job.

       Script/Method   Exit or Error Code   Consequence
       ---------------------------------------------------------
       Job Scrips      0                    Success
                       99                   Re-queue
                       Rest                 Success:  Exit code
                                            in accounting
       Epilog/Prolog   0                    Success
                       99                   Re-queue
                       100                  Job in Error state
                       Rest                 Queue   in    Error
                                            state,          Job
                                            re-queued

   Parallel-Environment-Related Error or Exit Codes
       The following table lists the consequences of error codes or exit codes
       of jobs related to parallel environment (PE) configuration.

       Script/Method   Error or Exit Code   Consequence
       ---------------------------------------------------------
       pe_start        0                    Success
                       Rest                 Queue  set to error
                                            state,          job
                                            re-queued
       pe_stop         0                    Success
                       Rest                 Queue  set to error
                                            state,   job    not
                                            re-queued

   Queue-Related Error or Exit Codes
       The following table lists the consequences of error codes or exit codes
       of jobs related to queue configuration.  These codes are valid only  if
       corresponding methods were overwritten.

       Script/Method   Error or Exit Code   Consequence
       ---------------------------------------------------------
       Job Starter     0                    Success
                       Rest                 Success,  no  other
                                            special meaning
       Suspend         0                    Success
                       Rest                 Success,  no  other
                                            special meaning
       Resume          0                    Success
                       Rest                 Success,  no  other
                                            special meaning
       Terminate       0                    Success

                       Rest                 Success,  no  other
                                            special meaning

   Checkpointing-Related Error or Exit Codes
       The  following  table  lists the consequences of error or exit codes of
       jobs related to checkpointing.

       Script/Method   Error or Exit Code   Consequence
       ---------------------------------------------------------
       Checkpoint      0                    Success
                       Rest                 Success.  For  ker-
                                            nel     checkpoint,
                                            however, this means
                                            that the checkpoint
                                            was not successful.
       Migrate         0                    Success
                       Rest                 Success.  For  ker-
                                            nel     checkpoint,
                                            however, this means
                                            that the checkpoint
                                            was not successful.
                                            Migration      will
                                            occur.
       Restart         0                    Success
                       Rest                 Success,  no  other
                                            special meaning
       Clean           0                    Success
                       Rest                 Success,  no  other
                                            special meaning

   qacct -j failed line Codes
       For jobs that run successfully, the qacct -j  command  output  shows  a
       value of 0 in the failed field, and the output shows the exit status of
       the job in the exit_status field.  However, the shepherd might  not  be
       able  to  run a job successfully.  For example, the epilog script might
       fail, or the shepherd might not be able to  start  the  job.   In  such
       cases,  the  failed field displays one of the code values listed in the
       following table.

       Code   Description        Accounting valid   Meaning for Job
       --------------------------------------------------------------
       0      No failure         t                  Job ran,  exited
                                                    normally
       1      Presumably         f                  Job could not be
              before job                            started
       3      Before   writing   f                  Job could not be
              config                                started
       4      Before   writing   f                  Job could not be
              PID                                   started
       5      On  reading con-   f                  Job could not be
              fig file                              started
       6      Setting  proces-   f                  Job could not be
              sor set                               started
       7      Before prolog      f                  Job could not be
                                                    started
       8      In prolog          f                  Job could not be
                                                    started
       9      Before pestart     f                  Job could not be
                                                    started
       10     In pestart         f                  Job could not be
                                                    started
       11     Before job         f                  Job could not be
                                                    started
       12     Before pestop      t                  Job ran,  failed
                                                    before   calling
                                                    PE  stop  proce-
                                                    dure
       13     In pestop          t                  Job ran, PE stop
                                                    procedure failed


       14     Before epilog      t                  Job ran,  failed
                                                    before   calling
                                                    epilog script
       15     In epilog          t                  Job ran,  failed
                                                    in epilog script
       16     Releasing   pro-   t                  Job ran, proces-
              cessor set                            sor   set  could
                                                    not be released
       24     Migrating          t                  Job   ran,   job
              (checkpointing                        will be migrated
              jobs)
       25     Rescheduling       t                  Job   ran,   job
                                                    will be resched-
                                                    uled
       26     Opening   output   f                  Job could not be
              file                                  started,
                                                    stderr/stdout
                                                    file  could  not
                                                    be opened
       27     Searching          f                  Job could not be
              requested shell                       started,   shell
                                                    not found
       28     Changing      to   f                  Job could not be
              working   direc-                      started,   error
              tory                                  changing      to
                                                    start directory
       29     No  message   ->   f                  Job could not be
              AFS problem                           started
       30     Rescheduling  on   f                  Job  ran   until
              application                           application
              error                                 failed,
                                                    rescheduling
       31     Accessing          f                  Job could not be
              sgepasswd file                        started,     job
                                                    failure
       32     Entry is missing   f                  Job could not be
              in password file                      started,     job
                                                    failure
       33     Wrong password     f                  Job could not be
                                                    started,     job
                                                    failure
       34     Communicating      f                  Job could not be
              with Grid Engine                      started,     job
              Helper Service                        failure
       35     Before   job  in   f                  Job could not be
              Grid      Engine                      started,     job
              Helper Service                        failure
       36     Checking config-   f                  Job could not be
              ured daemons                          started,     job
                                                    failure
       37     Qmaster enforced   t                  Job  was  killed
              h_rt limit                            by      qmaster,
                                                    enforcing      a
                                                    resource  limit,
                                                    job failure
       38     No   Message  ->   f                  Job could not be
              ADD_GRP_ID   can                      started,
              not be set                            ADD_GRP_ID   can
                                                    not be set
       100    Assumedly  after   t                  Job   ran,   job
              job                                   killed by a sig-
                                                    nal

       The  Code  column lists the value of the failed field.  The Description
       column lists the text that appears in the qacct -j  output.   If  acct-
       Valid  is  set to t, the job accounting values are valid.  If acctValid
       is set to f, the resource usage values of the accounting record are not
       valid.   The  Meaning  for  Job column indicates whether the job ran or
       not.

SEE ALSO
       sge_intro(1) sge_qmaster(1) sge_execd(1) qconf(1) qping(1) sge_conf(5)

COPYRIGHT
       See sge_intro(1) for a full statement of rights and permissions.

AUTHORS
       Copyright (c) 2015-2017 Univa Corporation.


                              Diagnosing Univa Grid Engine - sge_diagnostics()