SGE_PE(5)                     File Formats Manual                    SGE_PE(5)


NAME
       sge_pe - Univa Grid Engine parallel environment configuration file for-
       mat

DESCRIPTION
       Parallel environments are parallel programming and runtime environments
       allowing  for the execution of shared memory or distributed memory par-
       allelized applications. Parallel environments usually require some kind
       of  setup  to  be  operational  before  starting parallel applications.
       Examples for common parallel environments are  shared  memory  parallel
       operating systems and the distributed memory environments Parallel Vir-
       tual Machine (PVM) or Message Passing Interface (MPI).

       sge_pe allows for the definition of interfaces  to  arbitrary  parallel
       environments.   Once a parallel environment is defined or modified with
       the -ap or -mp options to qconf(1) and linked with one or  more  queues
       via pe_list in queue_conf(5) the environment can be requested for a job
       via the -pe switch to qsub(1) together with a request of  a  range  for
       the number of parallel processes to be allocated by the job. Additional
       -l options may be used  to  specify  the  job  requirement  to  further
       detail.

       Note,  Univa  Grid Engine allows backslashes (\) be used to escape new-
       line (\newline) characters. The backslash and the newline are  replaced
       with a space (" ") character before any interpretation.

FORMAT
       The format of a sge_pe file is defined as follows:

   pe_name
       The  name  of  the  parallel  environment  as  defined  for  pe_name in
       sge_types(1).  To be used in the qsub(1) -pe switch.

   slots
       The number of parallel processes being allowed to run  in  total  under
       the  parallel  environment  concurrently.  Type is number, valid values
       are 0 to 9999999.

   user_lists
       A comma separated list of user access list names (see  access_list(5)).
       Each  user  contained  in at least one of the enlisted access lists has
       access to the parallel environment. If the user_lists parameter is  set
       to NONE (the default) any user has access being not explicitly excluded
       via the xuser_lists parameter described below.  If a user is  contained
       both  in an access list enlisted in xuser_lists and user_lists the user
       is denied access to the parallel environment.

   xuser_lists
       The xuser_lists parameter contains a comma separated list of so  called
       user  access lists as described in access_list(5).  Each user contained
       in at least one of the enlisted access lists is not allowed  to  access
       the  parallel  environment. If the xuser_lists parameter is set to NONE
       (the default) any user has access. If a user is contained  both  in  an
       access  list  enlisted in xuser_lists and user_lists the user is denied
       access to the parallel environment.

   start_proc_args
       The invocation command line of a start-up procedure  for  the  parallel
       environment  or  the  keyword  NONE if no startup-script should be exe-
       cuted.  The start-up procedure is invoked by  sge_shepherd(8)  for  the
       master  task  of a parallel job after a possibly configured prolog (see
       sge_conf(5)) and prior to executing the job script. Its purpose  is  to
       setup  the  parallel  environment  correspondingly  to  its  needs.  An
       optional prefix "user@" specifies the user under which  this  procedure
       is  to  be  started.   The standard output of the start-up procedure is
       redirected to the file REQUEST.poJID in  the  job's  working  directory
       (see  qsub(1)),  with REQUEST being the name of the job as displayed by
       qstat(1) and JID being the job's identification number.  Likewise,  the
       standard error output is redirected to REQUEST.peJID
       Scripts  where  the  execution  duration would exceed 2 minutes will be
       terminated. This timeout can be adjusted by defining SCRIPT_TIMEOUT  as
       execd_param in the configuration.
       The  following  special variables being expanded at runtime can be used
       (besides any other strings which have to be interpreted  by  the  start
       and stop procedures) to constitute a command line:

       $pe_hostfile
              The  pathname of a file containing a detailed description of the
              layout of the parallel environment to be setup by  the  start-up
              procedure.  Each line of the file refers to a host on which par-
              allel processes are to be run. The  first  entry  of  each  line
              denotes  the  hostname,  the second entry the number of parallel
              processes to be run on the host, the third entry the name of the
              queue, and the fourth entry a processor range to be used in case
              of a multiprocessor machine.

       $host  The name of the host on which the start-up  or  stop  procedures
              are started.

       $job_owner
              The user name of the job owner.

       $job_id
              Univa Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The  processors  string  as contained in the queue configuration
              (see queue_conf(5)) of the master queue (the queue in which  the
              start-up and stop procedures are started).

       $queue The cluster queue of the master queue instance.

   stop_proc_args
       The  invocation  command  line of a shutdown procedure for the parallel
       environment or the keyword NONE if no shutdown procedure should be exe-
       cuted.   The shutdown procedure is invoked by sge_shepherd(8) after the
       job script has finished, but before a possibly configured  epilog  (see
       sge_conf(5))  is  started. Its purpose is to stop the parallel environ-
       ment and to remove it from all participating systems.  An optional pre-
       fix  "user@"  specifies  the  user  under which this procedure is to be
       started.  The standard output of the stop procedure is also  redirected
       to the file REQUEST.poJID in the job's working directory (see qsub(1)),
       with REQUEST being the name of the job as displayed by qstat(1) and JID
       being  the  job's  identification number.  Likewise, the standard error
       output is redirected to REQUEST.peJID
       Scripts where the execution duration would exceed  2  minutes  will  be
       terminated.  This timeout can be adjusted by defining SCRIPT_TIMEOUT as
       execd_param in the configuration.
       The same special variables as for start_proc_args can be used  to  con-
       stitute a command line.

   allocation_rule
       The  allocation  rule  is interpreted by the scheduler thread and helps
       the scheduler to decide how to distribute parallel processes among  the
       available  machines.  If, for instance, a parallel environment is built
       for shared memory applications only, all parallel processes have to  be
       assigned  to a single machine, no matter how much suitable machines are
       available.  If, however, the parallel environment follows the  distrib-
       uted  memory paradigm, an even distribution of processes among machines
       may be favorable.
       The allocation rule always refers to hosts, not to queues. So if  there
       are specific queues requested for the parallel job, e.g. using the "-q"
       or "-masterq" switch (see qsub(1)) the tasks of the job might get  dis-
       tributed  over several queues, but the sum of tasks on one host will be
       always the one defined by the allocation rule.
       The current version of the scheduler  only  understands  the  following
       allocation rules:

       <int>:    An integer number that sets the number of processes per host.
                 A value of 1 limits processes to 1 process per host. When the
                 "-pe", Parallel Environment option is specified (see qsub(1))
                 the number of tasks specified  with  the  "-pe"  option  must
                 divide evenly into <int> otherwise the job will not be sched-
                 uled.  If a master queue is requested  using  the  "-masterq"
                 option  and  a list of queues are specified with "-q" for the
                 slave tasks and the master queue is not a member of the slave
                 queue  list  specified by "-q" then the job will be scheduled
                 if and only if the number of tasks specified  in  "-q"  minus
                 one divides evenly into <int>.
                 If  the special denominator $pe_slots is used, the full range
                 of processes as specified with the qsub(1) -pe switch has  to
                 be  allocated on a single host (no matter which value belong-
                 ing to the range is finally chosen for the job  to  be  allo-
                 cated).

       $fill_up: Starting  from  the  best  suitable host/queue, all available
                 slots are allocated. Further hosts and queues are "filled up"
                 as long as a job still requires slots for parallel tasks.

       $round_robin:
                 From  all suitable hosts a single slot is allocated until all
                 tasks requested by the parallel job are dispatched.  If  more
                 tasks are requested than suitable hosts are found, allocation
                 starts again from the  first  host.   The  allocation  scheme
                 walks through suitable hosts in a best-suitable-first order.

       Note:  If the master queue (i.e. the queue where the master task of the
       parallel job is located) is not requested explicitly, these  allocation
       rules are always obeyed exactly.
       If a master queue is requested explicitly by adding the -masterq, -mas-
       terl queue=<queue> or -masterl hostname=<host>  switch  to  the  submit
       command  line (see submit(1)), then the Scheduler tries to fulfill both
       the allocation rule and the master queue request,  which  are  possibly
       contrary  requirements.  If the allocation rule and the distribution of
       both the master queue and the slave queues  over  the  execution  hosts
       allow  to  make one of these tasks the master task, then both the -mas-
       terq (or -masterl queue=<queue> or  -masterl  hostname=<host>)  request
       and  the  allocation  rule are obeyed exactly. If this is not possible,
       then the Scheduler automatically adds one task  that  will  become  the
       master task.
       Generally, the Scheduler will have to add one task for fixed allocation
       rules (i.e. "<int>" or "$pe_slots") if the requested  master  queue  is
       not part of the set of slave queues and none of the slave queues has an
       instance on the master host.  The exception of this rule is an  alloca-
       tion rule of "1".

   control_slaves
       This  parameter can be set to TRUE or FALSE (the default). It indicates
       whether Univa Grid Engine is the creator of the slave tasks of a paral-
       lel  application via sge_execd(8) and sge_shepherd(8) and thus has full
       control over all processes in a  parallel  application,  which  enables
       capabilities  such  as resource limitation and correct accounting. How-
       ever, to gain control over the slave tasks of a parallel application, a
       sophisticated  PE  interface  is required, which works closely together
       with Univa Grid Engine facilities. Such  PE  interfaces  are  available
       through your local Univa Grid Engine support office.

       Please  set  the  control_slaves  parameter  to  false for all other PE
       interfaces.

   job_is_first_task
       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
       TRUE  indicates  that the Univa Grid Engine job script already contains
       one of the tasks of the  parallel  application  (the  number  of  slots
       reserved  for  the  job  is  the number of slots requested with the -pe
       switch), while a value of FALSE indicates that the job script (and  its
       child  processes)  is  not  part of the parallel program (the number of
       slots reserved for the job is the number of slots  requested  with  the
       -pe switch + 1).

       If  wallclock  accounting  is  used  (execd_params  ACCT_RESERVED_USAGE
       and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is  set
       to FALSE, the job_is_first_task parameter influences the accounting for
       the job: A value of TRUE means that accounting for  cpu  and  requested
       memory  gets  multiplied  by the number of slots requested with the -pe
       switch, if job_is_first_task is set to FALSE, the  accounting  informa-
       tion gets multiplied by number of slots + 1.

   urgency_slots
       For  pending  jobs  with a slot range PE request the number of slots is
       not determined. This setting specifies the method to be used  by  Univa
       Grid Engine to assess the number of slots such jobs might finally get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the
       resource-request-based priority contribution for numeric  resources  as
       described  in  sge_priority(5)  and  is  displayed when qstat(1) is run
       without -g t option.

       The following methods are supported:

       <int>:    The specified integer number is directly used as  prospective
                 slot amount.

       min:      The slot range minimum is used as prospective slot amount. If
                 no lower bound is specified with the range 1 is assumed.

       max:      The of the slot range maximum is  used  as  prospective  slot
                 amount.   If  no  upper bound is specified with the range the
                 absolute maximum possible due to the PE's  slots  setting  is
                 assumed.

       avg:      The  average  of  all  numbers  occurring within the job's PE
                 range request is assumed.

   accounting_summary
       This parameter is only checked if control_slaves (see above) is set  to
       TRUE  and thus Univa Grid Engine is the creator of the slave tasks of a
       parallel application via sge_execd(8)  and  sge_shepherd(8).   In  this
       case,  accounting  information is available for every single slave task
       started by Univa Grid Engine.

       The accounting_summary parameter can be set to TRUE or FALSE.  A  value
       of  TRUE  indicates  that only a single accounting record is written to
       the accounting(5) file, containing the accounting summary of the  whole
       job  including  all  slave  tasks,  while a value of FALSE indicates an
       individual accounting(5) record is written for  every  slave  task,  as
       well as for the master task.
       Note:    When    running    tightly   integrated   jobs   with   SHARE-
       TREE_RESERVED_USAGE set, and with having accounting_summary enabled  in
       the  parallel  environment, reserved usage will only be reported by the
       master task of the parallel job.  No per parallel  task  usage  records
       will be sent from execd to qmaster, which can significantly reduce load
       on qmaster when running large tightly integrated parallel jobs.

   daemon_forks_slaves
       This parameter is only checked if control_slaves (see above) is set  to
       TRUE  and thus Univa Grid Engine is the creator of the slave tasks of a
       parallel application via sge_execd(8) and sge_shepherd(8).

       The daemon_forks_slaves parameter defines if every task  of  a  tightly
       integrated  parallel  job  gets  started individually via qrsh -inherit
       (default value FALSE, e.g. used for mpich integration) or if  a  single
       daemon is started via qrsh -inherit on every slave host which forks the
       slave tasks (value TRUE, e.g. used for openmpi or lam integration).

       With daemon_forks_slaves set to TRUE only a single  task  (the  daemon)
       may get started per slave host, all limits set for this task are multi-
       plied by the number of slots granted on the host.


   master_forks_slaves
       The master_forks_slaves parameter can be set to TRUE if the master task
       (e.g. mpirun called in the job script) starts tasks running on the mas-
       ter host via fork/exec instead of starting them via qrsh -inherit.

       With master_forks_slaves set to TRUE all limits set for the master task
       (the  job  script) will be increased by the slave task limit multiplied
       by the number of slots granted on the host.  No further  tasks  can  be
       started on the master host via qrsh -inherit.


RESTRICTIONS
       Note,  that  the  functionality of the start-up, shutdown and signaling
       procedures remains the full responsibility of the administrator config-
       uring  the  parallel  environment.   Univa Grid Engine will just invoke
       these procedures and evaluate their exit status. If the  procedures  do
       not  perform their tasks properly or if the parallel environment or the
       parallel application behave unexpectedly,  Univa  Grid  Engine  has  no
       means to detect this.

SEE ALSO
       sge_intro(1),   sge__types(1),  qconf(1),  qdel(1),  qmod(1),  qsub(1),
       access_list(5), sge_qmaster(8), sge_shepherd(8).

COPYRIGHT
       See sge_intro(1) for a full statement of rights and permissions.


Univa Grid Engine File Formats     UGE 8.5.4                         SGE_PE(5)