SCHED_CONF(5)                 File Formats Manual                SCHED_CONF(5)


NAME
       sched_conf - Univa Grid Engine default scheduler configuration file

DESCRIPTION
       sched_conf  defines  the  configuration  file  format  for  Univa  Grid
       Engine's  scheduler.  In order to modify  the  configuration,  use  the
       graphical  user's  interface  qmon(1)  or  the  -msconf  option  of the
       qconf(1) command. A default configuration is provided together with the
       Univa Grid Engine distribution package.

       Note,  Univa  Grid Engine allows backslashes (\) be used to escape new-
       line (\newline) characters. The backslash and the newline are  replaced
       with a space (" ") character before any interpretation.

FORMAT
       The following parameters are recognized by the Univa Grid Engine sched-
       uler if present in sched_conf:

   algorithm
       Note: Deprecated, may be removed in future release.
       Allows for the selection of alternative scheduling algorithms.

       Currently default is the only allowed setting.

   load_formula
       A simple algebraic expression used to derive  a  single  weighted  load
       value  from all or part of the load parameters reported by sge_execd(8)
       for each host and from all or part of  the  consumable  resources  (see
       complex(5))  being  maintained for each host.  The load formula expres-
       sion syntax is that of a summation weighted load values, that is:

              {w1|load_val1[*w1]}[{+|-}{w2|load_val2[*w2]}[{+|-}...]]

       Note, no blanks are allowed in the load formula.
       The load values and consumable resources (load_val1, ...)   are  speci-
       fied by the name defined in the complex (see complex(5)).
       Note:  Administrator defined load values (see the load_sensor parameter
       in sge_conf(5) for details) and consumable resources available for  all
       hosts (see complex(5)) may be used as well as Univa Grid Engine default
       load parameters.
       The weighting factors  (w1,  ...)  are  positive  integers.  After  the
       expression  is  evaluated for each host the results are assigned to the
       hosts and are used to sort the  hosts  corresponding  to  the  weighted
       load. The sorted host list is used to sort queues subsequently.
       The default load formula is "np_load_avg".

   job_load_adjustments
       The  load,  which is imposed by the Univa Grid Engine jobs running on a
       system varies in time, and often, e.g. for the CPU load, requires  some
       amount  of time to be reported in the appropriate quantity by the oper-
       ating system. Consequently, if a job was  started  very  recently,  the
       reported  load  may not provide a sufficient representation of the load
       which is already imposed on that host by the  job.  The  reported  load
       will adapt to the real load over time, but the period of time, in which
       the reported load is too low, may already lead to  an  oversubscription
       of  that  host.  Univa  Grid Engine allows the administrator to specify
       job_load_adjustments which are used in the Univa Grid Engine  scheduler
       to compensate for this problem.
       The  job_load_adjustments  are  specified  as a comma separated list of
       arbitrary load parameters or consumable resources and (separated by  an
       equal sign) an associated load correction value. Whenever a job is dis-
       patched to a host by the scheduler, the load parameter  and  consumable
       value  set  of  that  host  is  increased by the values provided in the
       job_load_adjustments list. These correction values are decayed linearly
       over  time  until  after  load_adjustment_decay_time from the start the
       corrections reach the value 0.  If  the  job_load_adjustments  list  is
       assigned  the  special  denominator  NONE, no load corrections are per-
       formed.
       The adjusted load and consumable values are used to  compute  the  com-
       bined  and weighted load of the hosts with the load_formula (see above)
       and to compare the load and consumable values against the load  thresh-
       old  lists defined in the queue configurations (see queue_conf(5)).  If
       the load_formula consists simply of the default CPU load average param-
       eter np_load_avg, and if the jobs are very compute intensive, one might
       want to set the job_load_adjustments list  to  np_load_avg=1.00,  which
       means  that  every  new job dispatched to a host will require 100 % CPU
       time, and thus the machine's load is instantly increased by 1.00.

   load_adjustment_decay_time
       The load corrections  in  the  "job_load_adjustments"  list  above  are
       decayed  linearly  over time from the point of the job start, where the
       corresponding load or consumable parameter is raised by the  full  cor-
       rection   value,   until   after   a   time   period  of  "load_adjust-
       ment_decay_time", where the correction becomes  0.  Proper  values  for
       "load_adjustment_decay_time" greatly depend upon the load or consumable
       parameters used and the specific operating system(s).  Therefore,  they
       can  only  be  determined  on-site and experimentally.  For the default
       np_load_avg load parameter a "load_adjustment_decay_time" of 7  minutes
       has proven to yield reasonable results.

   maxujobs
       The  maximum  number  of jobs any user may have running in a Univa Grid
       Engine cluster at the same time. If set to 0 (default)  the  users  may
       run an arbitrary number of jobs.

   schedule_interval
       At  the time the scheduler thread initially registers at the event mas-
       ter thread in sge_qmaster(8)process schedule_interval is  used  to  set
       the  time  interval  in  which the event master thread sends scheduling
       event updates to the scheduler thread.  A scheduling event is a  status
       change  that  has  occurred  within sge_qmaster(8) which may trigger or
       affect scheduler decisions (e.g. a job has finished and thus the  allo-
       cated resources are available again).
       In  the Univa Grid Engine default scheduler the arrival of a scheduling
       event report triggers a scheduler run. The scheduler  waits  for  event
       reports otherwise.
       Schedule_interval  is  a time value (see queue_conf(5) for a definition
       of the syntax of time values).

   queue_sort_method
       This parameter determines in which order  several  criteria  are  taken
       into  account  to  product a sorted queue list. Currently, two settings
       are valid: seqno and load. However in both  cases,  Univa  Grid  Engine
       attempts  to  maximize  the  number  of  soft  requests (see qsub(1) -s
       option) being fulfilled by the queues for a particular as  the  primary
       criterion.
       Then,  if  the  queue_sort_method parameter is set to seqno, Univa Grid
       Engine will use the seq_no parameter as configured in the current queue
       configurations  (see  queue_conf(5))  as the next criterion to sort the
       queue list. The load_formula (see above) has  only  a  meaning  if  two
       queues  have  equal  sequence  numbers.  If queue_sort_method is set to
       load the load according the load_formula is the criterion  after  maxi-
       mizing  a  job's  soft requests and the sequence number is only used if
       two hosts have the same load.  The sequence number sorting is most use-
       ful  if  you  want  to  define  a fixed order in which queues are to be
       filled (e.g.   the cheapest resource first).

       The default for this parameter is load.

   halftime
       When executing under a share based policy, the scheduler  "ages"  (i.e.
       decreases)  usage to implement a sliding window for achieving the share
       entitlements as defined by the share tree.  The  halftime  defines  the
       time interval in which accumulated usage will have been decayed to half
       its original value. Valid values are specified in hours or according to
       the time format as specified in queue_conf(5).
       If the value is set to 0, the usage is not decayed. -1 results in imme-
       diate decay.

   usage_weight_list
       Univa Grid Engine accounts for the consumption of the  resources  wall-
       clock-time,  CPU-time,  memory  and  IO to determine the usage which is
       imposed on a system by a job. A single usage  value  is  computed  from
       these  four  input  parameters  by multiplying the individual values by
       weights  and  adding  them  up.  The  weights  are   defined   in   the
       usage_weight_list. The format of the list is

              wallclock=wwallclock,cpu=wcpu,mem=wmem,io=wio

       where  wwallclock, wcpu, wmem and wio are the configurable weights. The
       weights are real numbers. The sum of all tree weights should be 1.

   compensation_factor
       Determines how fast Univa Grid Engine should compensate for past  usage
       below  of above the share entitlement defined in the share tree. Recom-
       mended values are between 2 and 10, where 10 means faster compensation.

   weight_user
       The relative importance of the user shares in  the  functional  policy.
       Values are of type real.

   weight_project
       The relative importance of the project shares in the functional policy.
       Values are of type real.

   weight_department
       The relative importance of the department shares in the functional pol-
       icy. Values are of type real.

   weight_job
       The  relative  importance  of  the job shares in the functional policy.
       Values are of type real.

   weight_tickets_functional
       The maximum number of functional tickets available for distribution  by
       Univa Grid Engine. Determines the relative importance of the functional
       policy.  See under sge_priority(5) for an overview on job priorities.

   weight_tickets_share
       The maximum number of share based tickets available for distribution by
       Univa Grid Engine. Determines the relative importance of the share tree
       policy. See under sge_priority(5) for an overview on job priorities.

   weight_deadline
       The weight applied on the remaining time  until  a  jobs  latest  start
       time.  Determines  the  relative  importance of the deadline. See under
       sge_priority(5) for an overview on job priorities.

   weight_waiting_time
       The weight applied on the jobs waiting time  since  submission.  Deter-
       mines  the relative importance of the waiting time.  See under sge_pri-
       ority(5) for an overview on job priorities.

   weight_urgency
       The weight applied on jobs normalized urgency when determining priority
       finally  used.   Determines  the  relative  importance of urgency.  See
       under sge_priority(5) for an overview on job priorities.

   weight_priority
       The weight applied on jobs normalized POSIX priority  when  determining
       priority finally used. Determines the relative importance of POSIX pri-
       ority.  See under sge_priority(5) for an overview on job priorities.

   weight_ticket
       The weight applied on normalized ticket amount when determining  prior-
       ity  finally  used.   Determines  the relative importance of the ticket
       policies. See under sge_priority(5) for an overview on job priorities.

   flush_finish_sec
       The parameters are provided for tuning the system's  scheduling  behav-
       ior.   By default, a scheduler run is triggered in the scheduler inter-
       val. When this parameter is set to 1 or larger, the scheduler  will  be
       triggered x seconds after a job has finished. Setting this parameter to
       0 disables the flush after a job has finished.

   flush_submit_sec
       The parameters are provided for tuning the system's  scheduling  behav-
       ior.   By default, a scheduler run is triggered in the scheduler inter-
       val.  When this parameter is set to 1 or larger, the scheduler will  be
       triggered   x  seconds after a job was submitted to the system. Setting
       this parameter to 0 disables the flush after a job was submitted.

   schedd_job_info
       The default scheduler can keep track why jobs could  not  be  scheduled
       during  the  last scheduler run. This parameter enables or disables the
       observation.  The value true enables the monitoring false turns it off.

       It is also possible to activate the observation only for certain  jobs.
       This  will  be  done  if the parameter is set to job_list followed by a
       comma separated list of job ids.

       The user can obtain the collected information with  the  command  qstat
       -j.

   params
       This  is  foreseen  for passing additional parameters to the Univa Grid
       Engine scheduler. The following values are recognized:

       DURATION_OFFSET
              If set, overrides the default of value 60 seconds.  This parame-
              ter  is  used  by  the Univa Grid Engine scheduler when planning
              resource utilization as the delta between net job  runtimes  and
              total  time until resources become available again. Net job run-
              time as specified  with  -l  h_rt=...   or  -l  s_rt=...  or  -l
              d_rt=...  or default_duration always differs from total job run-
              time due to delays before and after actual job start and finish.
              Among the delays before job start is the time until the end of a
              schedule_interval, the  time  it  takes  to  deliver  a  job  to
              sge_execd(8)  and the delays caused by prolog in queue_conf(5) ,
              start_proc_args in sge_pe(5) and starter_method in queue_conf(5)
              (notify,  terminate_method  or  checkpointing),  procedures  run
              after actual job finish, such as stop_proc_args in sge_pe(5)  or
              epilog  in  queue_conf(5)  ,  and  the  delay until a new sched-
              ule_interval.
              If the offset is too low, resource reservations (see  max_reser-
              vation)  can  be  delayed repeatedly due to an overly optimistic
              job circulation time.

       JC_FILTER
              Note: Deprecated, may be removed in future release.
              If set to true, the scheduler limits the number of jobs it looks
              at  during  a scheduling run. At the beginning of the scheduling
              run it assigns each job a specific category, which is  based  on
              the  job's  requests,  priority settings, and the job owner. All
              scheduling policies will assign the same importance to each  job
              in  one category. Therefore the number of jobs per category have
              a FIFO order and can be limited to the number of free  slots  in
              the system.

              A exception are jobs, which request a resource reservation. They
              are included regardless of the number of jobs in a category.

              This setting is turned off per default,  because  in  very  rare
              cases,  the  scheduler  can  make  a  wrong decision. It is also
              advised to turn report_pjob_tickets off.  Otherwise  qstat  -ext
              can report outdated ticket amounts. The information shown with a
              qstat -j for a job, that was excluded in a  scheduling  run,  is
              very limited.

       PROFILE
              If set equal to 1, the scheduler logs profiling information sum-
              marizing each scheduling run. In combination with WARN_DISPATCH-
              ING_TIME  it  is  possible to get profiling data for the longest
              and shortest job scheduling.

       MONITOR
              If set equal to 1, the scheduler records  information  for  each
              scheduling  run  allowing to reproduce job resources utilization
              in the file <sge_root>/<cell>/common/schedule.  In order to  see
              entries in the schedule file resource reservation must be turned
              on (max_reservation must be greater than 0) and jobs need a run-
              time (using h_rt, s_rt, d_rt or setting a default_duration).
              The format of the schedule file is:
              <jobid>:         The jobs id.
              <taskid>:         The  array  task  id or 1 in case of non-array
              jobs.
              <state>:         One of RUNNING, SUSPENDED, MIGRATING, STARTING,
              RESERVING.
              <start_time>:    Start time in seconds after 1.1.1970.
              <duration>:      Assumed job duration in seconds.
              <level_char>:     One  of {P, G, H, Q} standing for {PE, Global,
              Host, Queue}.
              <object_name>:   The name of the PE, global, host, queue.
              <resource_name>: The name of the consumable resource.
              <utilization>    The resource utilization debited for the job.
              A line "::::::::" marks the begin of a new schedule interval.
              Please note this file is not truncated. Make sure the monitoring
              is  switched  off  in case you have no automated procedure setup
              that truncates the schedule file.

       PE_RANGE_ALG
              This parameter sets the algorithm for the pe range  computation.
              The  default is "bin", which means that the scheduler will use a
              binary search to select the best one. It should not be necessary
              to  change  it  to a different setting in normal operation. If a
              custom setting is needed, the following values are available:
              auto       : the scheduler selects the best algorithm
              least      : starts the resource matching with the  lowest  slot
              amount first
              bin         :  starts the resource matching in the middle of the
              pe slot range
              highest    : starts the resource matching with the highest  slot
              amount first

       PREFER_SOFT_REQUESTS
              If  this  parameter is set scheduler will try to find an assign-
              ment or a  resource  reservation  which  matches  as  many  soft
              requests as possible.  "PREFER_SOFT_REQUESTS" only has impact on
              parallel jobs.
              In case of the dispatching of jobs (no reservation)  by  default
              (PREFER_SOFT_REQUESTS  not set) resources will be prefered which
              provide more slots (in case of pe ranges),  with  the  parameter
              set  resources  will  be preferred which have less infringements
              for soft requests.
              In case of resource reservation without the  parameter  set  the
              scheduler reserves the earliest available resources in time even
              when soft requests for the job can not be  fulfilled.  When  the
              parameter  is  set  then resources are preferred which have less
              infringements for soft requests.

       PE_SORT_ORDER
              When using wildcard parallel environment selection  during  sub-
              mission  time, the parallel environment the scheduler chooses is
              arbitrary. In order to fill up the parallel  environments  in  a
              specific  order  this  parameter allows to change the sorting of
              matching  parallel  environments  either  to  an  ascending   or
              descending  order. When PE_SORT_ORDER is set to ASCENDING (or 1)
              the first PE which is tested for job selection is the one  which
              is  in  alpha-numerical  order  the  first  one  (test1pe before
              test2pe and test10pe before test2pe, when  submitting  with  -pe
              test*).  When  it  is  set  to DESCENDING (or 2) the PE which is
              tested is in alpha-numerical order the last one (testpe2 in  the
              previous  example).  When  it is set to 0 or NONE then the first
              matching PE is arbitrary (default), which is a good  choice  for
              balancing PEs and the same than with absence of the parameter.

       COUNT_CORES_AS_THREADS
              If set to 1 or TRUE the scheduler treats the requested amount of
              cores of a job (with -binding parameter) as request for hardware
              supported  threads.  On  hosts  with  SMT  (topology string with
              threads, like SCTTCTT) the amount of requested cores is  divided
              by the number of threads per core. In case a core is filled only
              partially the complete core is requested by  the  job.  Example:
              When  a  job requests 3 cores, on a host with hyper-threading (2
              hardware threads per core) the request is transformed to 2 cores
              (because  3 threads are needed). On a host without hyper-thread-
              ing the job requests 3 cores, and on a  host  with  4  hardware-
              threads supported per core the job requests 1 core.

       WRITE_SCHEDD_RUNLOG
              If  set  equal  to 1, scheduler will write trace messages of the
              next  scheduling  run   to   the   file   <sge_root>/<cell>/com-
              mon/schedd_runlog  when  triggered  by  qconf -tsm.  Writing the
              schedd_runlog file can have significant impact on scheduler per-
              formance.   This  feature should only be enabled when the debug-
              ging information contained  in  the  file  is  actually  needed.
              Default setting is disabled.

       MAX_SCHEDULING_TIME
              This  parameter  can  be used to specify a maximum time interval
              (time_specifier, see sge_types(1)) for one  scheduling  run.  If
              the  scheduler  has  not  finished a dispatching run within this
              time interval job dispatching is stopped for this one scheduling
              run.   In  the  next scheduling run job dispatching again starts
              with the highest priority job.  Default for this parameter is  0
              (do  full  dispatching from the highest priority job down to the
              lowest priority job).  In huge clusters with a  high  number  of
              pending  jobs  setting this parameter to reasonable values (e.g.
              one minute) can improve cluster utilization  and  responsiveness
              of sge_qmaster.

       MAX_DISPATCHED_JOBS
              This parameter can be used to limit the number of jobs which get
              scheduled in one scheduling interval. Can be set to any positive
              number  or  0  (do  not  limit  the  number  of scheduled jobs).
              Default is 0.  Limiting the number of jobs getting scheduled  in
              a  single scheduling interval can be useful to avoid overload on
              the cluster, especially on file servers due to many jobs  start-
              ing up at the same time.  But use this option with care: Setting
              it to a too low value can lead to bad utilization of  the  clus-
              ter.

       HIGH_PRIO_DRAINS_CLUSTER
              When  this  parameter  is  set  to 1 or TRUE the cluster will be
              drained until the highest priority job could be scheduled.  This
              can be used as a workaround to avoid starvation of parallel jobs
              when resource reservation cannot be applied, e.g.  as  job  run-
              times are unknown.  Use this parameter with care and only tempo-
              rarily: It can lead to very bad utilization of the cluster.

       WARN_DISPATCHING_TIME
              When this parameter is set to a threshold  in  milliseconds  the
              Univa  Grid  Engine  scheduler  will  print  a  warning  to  the
              sge_qmaster(8) messages file when dispatching a job takes longer
              than the given threshold.  If this parameter is enabled and PRO-
              FILE is turned on the profiling output will  contain  additional
              information  about the longest and shortest job scheduling time.
              The default for "WARN_DISPATCHING_TIME" is 0 (switched off).

       SHARE_BASED_ON_SLOTS
              When this parameter is set to 1 or TRUE, the scheduler will con-
              sider  the  number  of  slots  being used by running jobs and by
              pending jobs when pushing users and projects toward their  shar-
              ing  targets  as  defined by the share tree. That is, a parallel
              job using 4 slots will be considered to be  equal  to  4  serial
              jobs. When the parameter is set to FALSE (default), every job is
              considered equal.  The urgency_slots PE attribute  in  sge_pe(5)
              will be used to determine the number of slots when a job is sub-
              mitted with a PE range.

       Changing params will take immediate effect.  The default for params  is
       none.

   reprioritize_interval
       Interval  (HH:MM:SS)  to reprioritize jobs on the execution hosts based
       on the current ticket amount for the running jobs. If the  interval  is
       set  to  00:00:00 the reprioritization is turned off. The default value
       is 00:00:00.  The reprioritization tickets are calculated by the sched-
       uler  and update events for running jobs are only sent after the sched-
       uler calculated new values. How often the schedule should calculate the
       tickets is defined by the reprioritize_interval.  Because the scheduler
       is only triggered in  a  specific  interval  (scheduler_interval)  this
       means  the reprioritize_interval has only a meaning if set greater than
       the scheduler_interval.  For example, if the  scheduler_interval  is  2
       minutes  and reprioritize_interval is set to 10 seconds, this means the
       jobs get re-prioritized every 2 minutes.

   report_pjob_tickets
       This parameter allows to tune the system's scheduling run time.  It  is
       used  to  enable  / disable the reporting of pending job tickets to the
       qmaster.  It does not influence the tickets calculation. The sort order
       of  jobs  in  qstat and qmon is only based on the submit time, when the
       reporting is turned off.
       The reporting should be turned off in a system with a very large amount
       of jobs by setting this parameter to "false".

   halflife_decay_list
       The  halflife_decay_list  allows to configure different decay rates for
       the "finished_jobs usage types, which is used in the pending job ticket
       calculation  to account for jobs which have just ended. This allows the
       user the pending jobs algorithm to count finished jobs against  a  user
       or  project  for  a  configurable  decayed time period. This feature is
       turned off by default, and the halftime is used instead.
       The halflife_decay_list also allows one to  configure  different  decay
       rates for each usage type being tracked (cpu, io, and mem). The list is
       specified in the following format:

              <USAGE_TYPE>=<TIME>[:<USAGE_TYPE>=<TIME>[:<USAGE_TYPE>=<TIME>]]

       <Usage_TYPE> can be one of the following: cpu, io, or mem.
       <TIME> can be -1, 0 or a timespan specified in minutes.  If  <TIME>  is
       -1,  only the usage of currently running jobs is used. 0 means that the
       usage is not decayed.

   policy_hierarchy
       This parameter sets up a dependency chain  of  ticket  based  policies.
       Each  ticket  based policy in the dependency chain is influenced by the
       previous policies and influences the following policies. A typical sce-
       nario  is  to assign precedence for the override policy over the share-
       based policy. The override policy determines in such a case how  share-
       based  tickets  are  assigned  among  jobs of the same user or project.
       Note that all policies contribute to the ticket amount  assigned  to  a
       particular  job  regardless of the policy hierarchy definition. Yet the
       tickets calculated in each of the policies can be  different  depending
       on "POLICY_HIERARCHY".

       The "POLICY_HIERARCHY" parameter can be a up to 3 letter combination of
       the first letters of the 3 ticket based policies S(hare-based),  F(unc-
       tional) and O(verride). So a value "OFS" means that the override policy
       takes precedence over the functional policy, which  finally  influences
       the  share-based  policy.   Less  than  3 letters mean that some of the
       policies do not influence other policies and also are not influenced by
       other  policies.  So  a  value of "FS" means that the functional policy
       influences the share-based policy and that  there  is  no  interference
       with the other policies.

       The special value "NONE" switches off policy hierarchies.

   share_override_tickets
       If  set  to  "true"  or  "1",  override  tickets of any override object
       instance are shared equally among all running jobs associated with  the
       object.  The  pending  jobs  will get as many override tickets, as they
       would have, when they were running. If set to "false" or "0", each  job
       gets the full value of the override tickets associated with the object.
       The default value is "true".

   share_functional_shares
       If set to "true" or "1", functional shares  of  any  functional  object
       instance  are  shared among all the jobs associated with the object. If
       set to "false" or "0", each job associated with  a  functional  object,
       gets  the  full  functional shares of that object. The default value is
       "true".

   max_functional_jobs_to_schedule
       The maximum number of pending jobs to schedule in the  functional  pol-
       icy.  The default value is 200.

   max_pending_tasks_per_job
       The  maximum number of subtasks per pending array job to schedule. This
       parameter exists in order to reduce scheduling  overhead.  The  default
       value is 50.

   fair_urgency_list
       A list of complex attributes for which fair urgency shall be applied.

       Without  fair  urgency  every  job requesting a resource having urgency
       gets the full urgency  assigned.   With  fair  urgency  the  first  job
       requesting  a  resource gets the full urgency, the second job gets half
       of the urgency, the third job a third of the urgency ...

       This influences the sorting of the pending job list and can be used  to
       get an even distribution of jobs across multiple resources.

   max_reservation
       The  maximum  number of reservations scheduled within a schedule inter-
       val.  When a runnable job can not be  started  due  to  a  shortage  of
       resources  a  reservation  can  be scheduled instead. A reservation can
       cover consumable resources with the global host, any execution host and
       any  queue.  For  parallel  jobs  reservations  are done also for slots
       resource as specified in sge_pe(5).  As job runtime the maximum of  the
       time  specified  with  -l  h_rt=...  or  -l  s_rt=... or -l d_rt=... is
       assumed. For jobs that have neither of  them  the  default_duration  is
       assumed.   Reservations  prevent jobs of lower priority as specified in
       sge_priority(5) from utilizing the reserved resource quota  during  the
       time  of  reservation.   Jobs  of lower priority are allowed to utilize
       those reserved resources only if their prospective job  end  is  before
       the  start  of the reservation (backfilling).  Reservation is done only
       for non-immediate jobs (-now no) that request reservation  (-R  y).  If
       max_reservation is set to "0" no job reservation is done.

       Note,  that  reservation  scheduling  can  be performance consuming and
       hence reservation scheduling is switched off by default. Since reserva-
       tion  scheduling performance consumption is known to grow with the num-
       ber of pending jobs, the use of -R y option  is  recommended  only  for
       those  jobs  actually  queuing for bottleneck resources.  Together with
       the max_reservation parameter this technique can be used to narrow down
       performance impacts.

   default_duration
       When  job  reservation is enabled through max_reservation sched_conf(5)
       parameter the default duration is assumed as runtime for jobs that have
       neither  -l h_rt=... nor -l s_rt=... nor -l d_rt=... specified. In con-
       trast to a h_rt/s_rt time limit the d_rt and the  default_duration  are
       not enforced.

   backfilling
       When   job   reservation   is   enabled   through  the  max_reservation
       sched_conf(5) parameter jobs fitting before resource  reservations  can
       be backfilled.  Backfilling requires a job runtime specification.  If a
       job does not request a runtime via the h_rt, s_rt or d_rt attribute the
       default  duration  is  assumed as runtime.  Using default duration or a
       badly estimated d_rt runtime can lead to false  backfilling  decisions,
       therefore  the  backfilling  parameter allows switching off or limiting
       the scope of backfilling.  It can be set to one the following values:

       OFF    Scheduler will never do backfilling.

       H_RT   Only jobs requesting a runtime via the h_rt limit can  be  back-
              filled.

       ON     Backfilling is enabled for all jobs types (default).

   prioritize_preemptees
       When preemptive scheduling is enabled and when this parameter is set to
       TRUE then the scheduler will create a reservation for preemptees before
       the  regular  scheduling  run is done. This ensures that preemptees get
       started again at least when the preemptor  finishes,  unless  resources
       required  by the preemptee are still held by jobs which got backfilled.
       prioritize_preemptees in combination with disabling of backfilling  (by
       setting  backfilling  to  OFF) provides a guarantee that preemptees get
       restarted at least when the preemptor finishes, at the expense of lower
       cluster utilization.  Default for this parameter is FALSE.

   preemptees_keep_resources
       When  this  parameter  is set to TRUE then jobs that get preempted will
       only be enforced to free those resources that will be consumed  by  the
       job (preemptor) that causes the preemption.  This prevents resources of
       a   preemptee   from   getting   consumed   by   other    jobs.    pre-
       emptees_keep_resources  and  prioritize_preemptees  in combination pro-
       vides a guarantee that preemptees get restarted at latest when the pre-
       emptor finishes, at the expense of a waste of resources and a bad clus-
       ter utilization.  One exception from this are software licenses managed
       through Univa License Orchestrator. Those resources cannot be held by a
       preemptee because the preemptee  process  will  be  suspended  and  the
       underlying license manager might assume the license to be free anyways.
       Default for this parameter is FALSE.

   max_preemptees
       Defines the maximum number of preemptees in the cluster.  As  preempted
       jobs   might   hold  some  resources  (memory)  and  through  the  pre-
       emptees_keep_resources  parameter  might  even  hold  most   of   their
       resources  a high number of preemptees can significantly impact cluster
       operation. Limiting the number of preemptees will limit the  amount  of
       held but unused resources. Default for this parameter is 0.

   preemption_distance
       A  preemption  will  only be triggered if the resource reservation that
       could be done for a job is farther in the future than  the  given  time
       interval  (hh:mm:ss).  Reservation can be disabled by setting the value
       to 00:00:00. Reservation will also be omitted if preemption of jobs  is
       forced  manually using 'qmod -f -p ... S|N|P'. Default for this parame-
       ter is 00:15:00.

   preemption_priority_adjustments
       This parameter allows to automatically adjust  the  POSIX  priority  of
       running  jobs depending on their type or state. This will influence the
       normalized and weighted priority (prio as show by qstat) before running
       jobs are considered as preemption candidates. As default this parameter
       is set to NONE but it is allowed to set it to  a  list  of  name/value-
       pairs.  The  name  of  such  an entry defines a possible type, state or
       other characteristic of a running job and the  value  defines  the  new
       POSIX priority or a relative POSIX priority adjustment.

       Name/value-pairs  have to be separated by comma (','). Delimiting char-
       acter for name and value is the equal-character ('='). Priority  values
       that would leave the allowed POSIX priority range will be automatically
       set to the smallest or biggest priority value depending  on  the  limit
       that is exceeded.

       Adjustment  value might be in range from -1023 to 1024. Relative values
       start with the letter 'd' (for delta) and have to be in range -2047  to
       2047 (e.g 'd-100' to decrease the POSIX priority by 100).

       Please  note  that  currently  the list may only contain one name/value
       pair but this may change with each patch release of UGE.  If  the  list
       contains  multiple entries then all of them are considered from left to
       right (first to last entry).


       ALREADY_PREEMPTED
              Adjusts the priority of jobs that have been restarted after pre-
              emption.  Prevent  that jobs that have been restarted after pre-
              emption get immediately preempted again  by  a  higher  priority
              job.

FILES
       <sge_root>/<cell>/common/sched_configuration
                  scheduler thread configuration

SEE ALSO
       sge_intro(1),   qalter(1),  qconf(1),  qstat(1),  qsub(1),  complex(5),
       queue_conf(5), sge_execd(8), sge_qmaster(8), Univa Grid Engine  Instal-
       lation and Administration Guide

COPYRIGHT
       See sge_intro(1) for a full statement of rights and permissions.


Univa Grid Engine File Formats     UGE 8.5.4                     SCHED_CONF(5)