SGE_CONF(5) File Formats Manual SGE_CONF(5) NAME sge_conf - Univa Grid Engine configuration files DESCRIPTION sge_conf defines the global and local Univa Grid Engine configurations and can be shown/modified by qconf(1) using the -sconf/-mconf options. Only root or the cluster administrator may modify sge_conf. At its initial start-up, sge_qmaster(8) checks to see if a valid Univa Grid Engine configuration is available at a well known location in the Univa Grid Engine internal directory hierarchy. If so, it loads that configuration information and proceeds. If not, sge_qmaster(8) writes a generic configuration containing default values to that same loca- tion. The Univa Grid Engine execution daemons sge_execd(8) upon start- up retrieve their configuration from sge_qmaster(8). The actual configuration for both sge_qmaster(8) and sge_execd(8) is a superposition of a global configuration and a local configuration per- tinent for the host on which a master or execution daemon resides. If a local configuration is available, its entries overwrite the corre- sponding entries of the global configuration. Note: The local configu- ration does not have to contain all valid configuration entries, but only those which need to be modified against the global entries. Note: Univa Grid Engine allows backslashes (\) be used to escape new- line (\newline) characters. The backslash and the newline are replaced with a space (" ") character before any interpretation. FORMAT The paragraphs that follow provide brief descriptions of the individual parameters that compose the global and local configurations for a Univa Grid Engine cluster: execd_spool_dir The execution daemon spool directory path. Again, a feasible spool directory requires read/write access permission for root. The entry in the global configuration for this parameter can be overwritten by exe- cution host local configurations, i.e. each sge_execd(8) may have a private spool directory with a different path, in which case it needs to provide read/write permission for the root account of the corre- sponding execution host only. Under execd_spool_dir a directory named corresponding to the unquali- fied hostname of the execution host is opened and contains all informa- tion spooled to disk. Thus, it is possible for the execd_spool_dirs of all execution hosts to physically reference the same directory path (the root access restrictions mentioned above need to be met, however). Changing the global execd_spool_dir parameter set at installation time is not supported in a running system. If the change should still be done it is required to restart all affected execution daemons. Please make sure running jobs have finished before doing so, otherwise running jobs will be lost. The default location for the execution daemon spool directory is $SGE_ROOT/$SGE_CELL/spool. The global configuration entry for this value may be overwritten by the execution host local configuration. mailer mailer is the absolute pathname to the electronic mail delivery agent on your system. It must accept the following syntax: mailer -s Each sge_execd(8) may use a private mail agent. Changing mailer will take immediate effect. The default for mailer depends on the operating system of the host on which the Univa Grid Engine master installation was run. Common values are /bin/mail or /usr/bin/Mail. The global configuration entry for this value may be overwritten by the execution host local configuration. xterm xterm is the absolute pathname to the X Window System terminal emula- tor, xterm(1). Changing xterm will take immediate effect. The default for xterm is /usr/bin/X11/xterm. The global configuration entry for this value may be overwritten by the execution host local configuration. load_sensor A comma separated list of executable shell script paths or programs to be started by sge_execd(8) and to be used in order to retrieve site configurable load information (e.g. free space on a certain disk parti- tion). Each sge_execd(8) may use a set of private load_sensor programs or scripts. Changing load_sensor will take effect after two load report intervals (see load_report_time). A load sensor will be restarted auto- matically if the file modification time of the load sensor executable changes. The global configuration entry for this value may be overwritten by the execution host local configuration. In addition to the load sensors configured via load_sensor, sge_exec(8) searches for an executable file named qloadsensor in the execution host's Univa Grid Engine binary directory path. If such a file is found, it is treated like the configurable load sensors defined in load_sensor. This facility is intended for pre-installing a default load sensor. prolog The executable path of a shell script that is started before execution of Univa Grid Engine jobs with the same environment setting as that for the Univa Grid Engine jobs to be started afterwards. An optional pre- fix "user@" specifies the user under which this procedure is to be started. The procedures standard output and the error output stream are written to the same file used also for the standard output and error output of each job. This procedure is intended as a means for the Univa Grid Engine administrator to automate the execution of general site specific tasks like the preparation of temporary file systems with the need for the same context information as the job. Each sge_execd(8) may use a private prolog script. Correspondingly, the execution host local configurations is can be overwritten by the queue configuration (see queue_conf(5) ). Changing prolog will take immedi- ate effect. The default for prolog is the special value NONE, which prevents from execution of a prolog script. Scripts where the execution duration would exceed 2 minutes will be terminated. This timeout can be adjusted by defining SCRIPT_TIMEOUT as execd_params. The following special variables expanded at runtime can be used (besides any other strings which have to be interpreted by the proce- dure) to constitute a command line: $host The name of the host on which the prolog or epilog procedures are started. $job_owner The user name of the job owner. $job_id Univa Grid Engine's unique job identification number. $job_name The name of the job. $processors The processors string as contained in the queue configuration (see queue_conf(5)) of the master queue (the queue in which the prolog and epilog procedures are started). $queue The cluster queue name of the master queue instance, i.e. the cluster queue in which the prolog and epilog procedures are started. $stdin_path The pathname of the stdin file. This is always /dev/null for prolog, pe_start, pe_stop and epilog. It is the pathname of the stdin file for the job in the job script. When delegated file staging is enabled, this path is set to $fs_stdin_tmp_path. When delegated file staging is not enabled, it is the stdin pathname given via DRMAA or qsub. $stdout_path $stderr_path The pathname of the stdout/stderr file. This always points to the output/error file. When delegated file staging is enabled, this path is set to $fs_stdout_tmp_path/$fs_stderr_tmp_path. When delegated file staging is not enabled, it is the std- out/stderr pathname given via DRMAA or qsub. $merge_stderr If merging of stderr and stdout is requested, this flag is "1", otherwise it is "0". If this flag is 1, stdout and stderr are merged in one file, the stdout file. Merging of stderr and std- out can be requested via the DRMAA job template attribute 'drmaa_join_files' (see drmaa_attributes(3) ) or the qsub param- eter '-j y' (see qsub(1) ). $fs_stdin_host When delegated file staging is requested for the stdin file, this is the name of the host where the stdin file has to be copied from before the job is started. $fs_stdout_host $fs_stderr_host When delegated file staging is requested for the stdout/stderr file, this is the name of the host where the stdout/stderr file has to be copied to after the job has run. $fs_stdin_path When delegated file staging is requested for the stdin file, this is the pathname of the stdin file on the host $fs_stdin_host. $fs_stdout_path $fs_stderr_path When delegated file staging is requested for the stdout/stderr file, this is the pathname of the stdout/stderr file on the host $fs_stdout_host/$fs_stderr_host. $fs_stdin_tmp_path When delegated file staging is requested for the stdin file, this is the destination pathname of the stdin file on the execu- tion host. The prolog script must copy the stdin file from $fs_stdin_host:$fs_stdin_path to localhost:$fs_stdin_tmp_path to establish delegated file staging of the stdin file. $fs_stdout_tmp_path $fs_stderr_tmp_path When delegated file staging is requested for the stdout/stderr file, this is the source pathname of the stdout/stderr file on the execution host. The epilog script must copy the stdout file from localhost:$fs_stdout_tmp_path to $fs_stdout_host:$fs_std- out_path (the stderr file from localhost:$fs_stderr_tmp_path to $fs_stderr_host:$fs_stderr_path) to establish delegated file staging of the stdout/stderr file. $fs_stdin_file_staging $fs_stdout_file_staging $fs_stderr_file_staging When delegated file staging is requested for the stdin/std- out/stderr file, the flag is set to "1", otherwise it is set to "0" (see in delegated_file_staging how to enable delegated file staging). These three flags correspond to the DRMAA job template attribute 'drmaa_transfer_files' (see drmaa_attributes(3) ). The global configuration entry for this value may be overwritten by the execution host local configuration. Exit codes for the prolog attribute can be interpreted based on the following exit values: 0: Success 99: Reschedule job 100: Put job in error state Anything else: Put queue in error state epilog The executable path of a shell script that is started after execution of Univa Grid Engine jobs with the same environment setting as that for the Univa Grid Engine jobs that has just completed. An optional prefix "user@" specifies the user under which this procedure is to be started. The procedures standard output and the error output stream are written to the same file used also for the standard output and error output of each job. This procedure is intended as a means for the Univa Grid Engine administrator to automate the execution of general site specific tasks like the cleaning up of temporary file systems with the need for the same context information as the job. Each sge_execd(8) may use a private epilog script. Correspondingly, the execution host local con- figurations is can be overwritten by the queue configuration (see queue_conf(5) ). Changing epilog will take immediate effect. The default for epilog is the special value NONE, which prevents from execution of a epilog script. The same special variables as for pro- log can be used to constitute a command line. Scripts where the execution duration would exceed 2 minutes will be terminated. This timeout can be adjusted by defining SCRIPT_TIMEOUT as execd_params. The global configuration entry for this value may be overwritten by the execution host local configuration. Exit codes for the epilog attribute can be interpreted based on the following exit values: 0: Success 99: Reschedule job 100: Put job in error state Any other value <= 127: Put queue in error state, re-queue the job Any value > 127: If RESCHEDULE_ON_KILLED_EPILOG is set to "true" or "1", the queue is put in error state, the job is re-queued. If this parameter is set to "false" or "0", the job simply fin- ishes. shell_start_mode Note: Deprecated, may be removed in future release. This parameter defines the mechanisms which are used to actually invoke the job scripts on the execution hosts. The following values are recog- nized: unix_behavior If a user starts a job shell script under UNIX interactively by invoking it just with the script name the operating system's executable loader uses the information provided in a comment such as `#!/bin/csh' in the first line of the script to detect which command interpreter to start to interpret the script. This mechanism is used by Univa Grid Engine when starting jobs if unix_behavior is defined as shell_start_mode. posix_compliant POSIX does not consider first script line comments such a `#!/bin/csh' as significant. The POSIX standard for batch queu- ing systems (P1003.2d) therefore requires a compliant queuing system to ignore such lines but to use user specified or config- ured default command interpreters instead. Thus, if shell_start_mode is set to posix_compliant Univa Grid Engine will either use the command interpreter indicated by the -S option of the qsub(1) command or the shell parameter of the queue to be used (see queue_conf(5) for details). script_from_stdin Setting the shell_start_mode parameter either to posix_compliant or unix_behavior requires you to set the umask in use for sge_execd(8) such that every user has read access to the active_jobs directory in the spool directory of the correspond- ing execution daemon. In case you have prolog and epilog scripts configured, they also need to be readable by any user who may execute jobs. If this violates your site's security policies you may want to set shell_start_mode to script_from_stdin. This will force Univa Grid Engine to open the job script as well as the epilog and prolog scripts for reading into STDIN as root (if sge_execd(8) was started as root) before changing to the job owner's user account. The script is then fed into the STDIN stream of the command interpreter indicated by the -S option of the qsub(1) command or the shell parameter of the queue to be used (see queue_conf(5) for details). Thus setting shell_start_mode to script_from_stdin also implies posix_compliant behavior. Note, however, that feeding scripts into the STDIN stream of a command interpreter may cause trouble if commands like rsh(1) are invoked inside a job script as they also process the STDIN stream of the command interpreter. These problems can usually be resolved by redirecting the STDIN chan- nel of those commands to come from /dev/null (e.g. rsh host date < /dev/null). Note also, that any command-line options associ- ated with the job are passed to the executing shell. The shell will only forward them to the job if they are not recognized as valid shell options. Changes to shell_start_mode will take immediate effect. The default for shell_start_mode is unix_behavior. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. login_shells UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C- Shell (see csh(1)) can be used by Univa Grid Engine to start job scripts. The command interpreters can either be started as login-shells (i.e. all system and user default resource files like .login or .pro- file will be executed when the command interpreter is started and the environment for the job will be set up as if the user has just logged in) or just for command execution (i.e. only shell specific resource files like .cshrc will be executed and a minimal default environment is set up by Univa Grid Engine - see qsub(1)). The parameter login_shells contains a comma separated list of the executable names of the command interpreters to be started as login-shells. Shells in this list are only started as login shells if the parameter shell_start_mode (see above) is set to posix_compliant. Changes to login_shells will take immediate effect. The default for login_shells is sh,bash,csh,tcsh,ksh. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. min_uid min_uid places a lower bound on user IDs that may use the cluster. Users whose user ID (as returned by getpwnam(3)) is less than min_uid will not be allowed to run jobs on the cluster. Changes to min_uid will take immediate effect. The default for min_uid is 0. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. min_gid This parameter sets the lower bound on group IDs that may use the clus- ter. Users whose default group ID (as returned by getpwnam(3)) is less than min_gid will not be allowed to run jobs on the cluster. Changes to min_gid will take immediate effect. The default for min_gid is 0. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. user_lists The user_lists parameter contains a comma separated list of user access lists as described in access_list(5). Each user contained in at least one of the enlisted access lists has access to the cluster. If the user_lists parameter is set to NONE (the default) any user has access not explicitly excluded via the xuser_lists parameter described below. If a user is contained both in an access list enlisted in xuser_lists and user_lists the user is denied access to the cluster. Changes to user_lists will take immediate effect This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. xuser_lists The xuser_lists parameter contains a comma separated list of user access lists as described in access_list(5). Each user contained in at least one of the enlisted access lists is denied access to the cluster. If the xuser_lists parameter is set to NONE (the default) any user has access. If a user is contained both in an access list enlisted in xuser_lists and user_lists (see above) the user is denied access to the cluster. Changes to xuser_lists will take immediate effect This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. administrator_mail administrator_mail specifies a comma separated list of the electronic mail address(es) of the cluster administrator(s) to whom internally- generated problem reports are sent. The mail address format depends on your electronic mail system and how it is configured; consult your sys- tem's configuration guide for more information. Changing administrator_mail takes immediate effect. The default for administrator_mail is an empty mail list. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. projects The projects list contains all projects which a job can be submitted to or pending jobs can be altered to. If the projects list is defined, only jobs which are submitted or altered to one of these projects are accepted by Univa Grid Engine, all other jobs are rejected. If the projects list is not defined (i.e. it is "none"), jobs are not rejected because of their project membership. Changing projects takes immediate effect. Changing projects doesn't affect pending or running jobs, except for altering them. The default for projects is none. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. xprojects The xprojects list contains all projects which a job can not be submit- ted to or pending job can not be altered to. If the xprojects list is defined, all jobs that are submitted or altered to one of these projects are rejected by Univa Grid Engine, all other jobs are not rejected. If the xprojects list is not defined (i.e. it is "none"), jobs are not rejected because of the project membership. Changing xprojects takes immediate effect. Changing xprojects doesn't affect pending or running jobs, except for altering them. The default for xprojects is none. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. load_report_time System load is reported periodically by the execution daemons to sge_qmaster(8). The parameter load_report_time defines the time inter- val between load reports. Each sge_execd(8) may use a different load report time. Changing load_report_time will take immediate effect. Note: Be careful when modifying load_report_time. Reporting load too frequently might block sge_qmaster(8) especially if the number of exe- cution hosts is large. Moreover, since the system load typically increases and decreases smoothly, frequent load reports hardly offer any benefit. The default for load_report_time is 40 seconds. The global configuration entry for this value may be overwritten by the execution host local configuration. gdi_request_limits This parameter can be used to define a maximum number of requests per second that sge_qmaster(1) will accept before it starts rejecting incoming requests. The value NONE, which is the default for this param- eter, means that all valid requests that will be received by the sge_qmaster(1) process will also be accepted, processed and answered. Incoming requests that are accepted and that can not immediately be answered will be stored in request queues till a thread is available to handle the request and send a response to the client. Instead of NONE a comma separated list of limit rules can be specified. A limit rule consist of a set of filters and a number that expresses how many requests per second are allowed for those requests that match the corresponding filters. There are filters for the request source (name of the command line client), request type (ADD, MOD, DEL, GET), object type that should be addressed by the request (e.g. JOB, CLUS- TER_QUEUE, JOB_CLASS, ...), users that triggered the request (user name) and the hostname of the host where the request is coming from. For each part of such a filter expression it is allowed to specify "*" so that the corresponding part of that expression will match any incom- ing request. The full syntax for this parameter is as follows: gdi_request_limits ::= "NONE" | limit_rule [ "," limit_rule ]* . limit_rule ::= source ":" request_type ":" obj_type ":" user "=" max_requests . source ::= "*" | "drmaa" | "qacct" | "qalter" | "qsub" | "qsh" | "qlogin" | "qrsh" | "qconf" | "qdel" | "qhost" | "qmod" | "qquota" | "qmon" | "qrdel" | "qrstat" | "qrsub" | "qselect" | "qstat" . request_type ::= "*" | "ADD" | "MOD" | "DEL" | "GET" . obj_type ::= "*" | "JOB" | "ADMIN_HOST" | "SUBMIT_HOST" | "EXEC_HOST" | "CLUSTER_QUEUE" | "CPLX_ENTRY" | "CONFIG" | "MANAGER" | "OPERATOR" | "PARALLEL_ENV" | "SCHED_CONFIG" | "USER" | "USER_SET" | "PROJECT" | "SHARETREE_NODE" | "CKPT_ENV" | "CALENDAR" | "HOST_GROUP" | "RESOURCE_QUOTA" | "ADVANCE_RESERVATION" | "RESOURCE_RESERVATION" | "JOB_CLASS" | "SESSION" | "CLUSTER" | "LICENSE_MANAGER" . user ::= "*" | <> . hostname ::= "*" | <> . max_requests ::= <=_1>> . If multiple limit rules are defined then all of them are taken into account, i.e. none of the maximum values defined in those rules is allowed to be exceeded. Requests that are not accepted will be rejected with an error message that shows the first limit rule that rejected the request. Limit rules will be tested in the order in that they appear. Example: qsub:ADD:JOB:peter:*=400,qstat:GET:JOB:*:*=400, qstat:GET:JOB:*:poipu=10 The example above will limit the number of job submissions (done via qsub(1) for the user named peter to a maximum of 400 submits per sec- ond. The second and third limit rule limits the number of qstat job-get requests for all users on all hosts to 400 and to 10 for such requests that are received from the host poipu. This means qstat(1) commands that will show job related information (like qstat -f, qstat -j, qstat -ext, ...) might get rejected if those limits get exceeded. Also com- mands might get rejected that do not show jobs directly but that require job information to generate the output (like qstat -gc -> which shows the used job slots of queues). gdi_request_limit replaces the functionality provided by gdi_multi_read_req. gdi_multi_read_req is deprecated since Univa Grid Engine 8.2. reschedule_unknown Determines whether jobs on hosts in unknown state are rescheduled and thus sent to other hosts. Hosts are registered as unknown if sge_mas- ter(8) cannot establish contact to the sge_execd(8) on those hosts (see max_unheard ). Likely reasons are a breakdown of the host or a break- down of the network connection in between, but also sge_execd(8) may not be executing on such hosts. In any case, Univa Grid Engine can reschedule jobs running on such hosts to another system. reschedule_unknown controls the time which Univa Grid Engine will wait before jobs are rescheduled after a host became unknown. The time format specification is hh:mm:ss. If the spe- cial value 00:00:00 is set, then jobs will not be rescheduled from this host. Rescheduling is only initiated for jobs which have activated the rerun flag (see the -r y option of qsub(1) and the rerun option of queue_conf(5)). Parallel jobs are only rescheduled if the host on which their master task executes is in unknown state. The behavior of reschedule_unknown for parallel jobs and for jobs without the rerun flag be set can be adjusted using the qmaster_params settings ENABLE_RESCHEDULE_KILL and ENABLE_RESCHEDULE_SLAVE. Checkpointing jobs will only be rescheduled when the when option of the corresponding checkpointing environment contains an appropriate flag. (see checkpoint(5)). Interactive jobs (see qsh(1), qrsh(1), qtcsh(1)) are not rescheduled. The default for reschedule_unknown is 00:00:00 The global configuration entry for this value may be over written by the execution host local configuration. max_unheard If sge_qmaster(8) could not contact or was not contacted by the execu- tion daemon of a host for max_unheard seconds, all queues residing on that particular host are set to status unknown. sge_qmaster(8), at least, should be contacted by the execution daemons in order to get the load reports. Thus, max_unheard should by greater than the load_report_time (see above). Changing max_unheard takes immediate effect. The default for max_unheard is 5 minutes. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. loglevel This parameter specifies the level of detail that Univa Grid Engine components such as sge_qmaster(8) or sge_execd(8) use to produce infor- mative, warning or error messages which are logged to the messages files in the master and execution daemon spool directories (see the description of the execd_spool_dir parameter above). The following mes- sage levels are available: log_err All error events being recognized are logged. log_warning All error events being recognized and all detected signs of potentially erroneous behavior are logged. log_info All error events being recognized, all detected signs of poten- tially erroneous behavior and a variety of informative messages are logged. Changing loglevel will take immediate effect. The default for loglevel is log_warning. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. max_aj_instances This parameter defines the maximum amount of array task to be scheduled to run simultaneously per array job. An instance of an array task will be created within the master daemon when it gets a start order from the scheduler. The instance will be destroyed when the array task finishes. Thus the parameter provides control mainly over the memory consumption of array jobs in the master and scheduler daemon. It is most useful for very large clusters and very large array jobs. The default for this parameter is 2000. The value 0 will deactivate this limit and will allow the scheduler to start as many array job tasks as suitable resources are available in the cluster. Changing max_aj_instances will take immediate effect. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. max_aj_tasks This parameter defines the maximum number of array job tasks within an array job. sge_qmaster(8) will reject all array job submissions which request more than max_aj_tasks array job tasks. The default for this parameter is 75000. The value 0 will deactivate this limit. Changing max_aj_tasks will take immediate effect. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. max_u_jobs The number of active (not finished) jobs which each Univa Grid Engine user can have in the system simultaneously is controlled by this param- eter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_u_jobs limit is exceeded by a job submis- sion then the submission command exits with exit status 25 and an appropriate error message. Changing max_u_jobs will take immediate effect. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. max_jobs The number of active (not finished) jobs simultaneously allowed in Univa Grid Engine is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlimited". If the max_jobs limit is exceeded by a job submission then the submission com- mand exits with exit status 25 and an appropriate error message. Changing max_jobs will take immediate effect. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. max_advance_reservations The number of active (not finished) Advance Reservations simultaneously allowed in Univa Grid Engine is controlled by this parameter. A value greater than 0 defines the limit. The default value 0 means "unlim- ited". If the max_advance_reservations limit is exceeded by an Advance Reservation request then the submission command exits with exit status 25 and an appropriate error message. Changing max_advance_reservations will take immediate effect. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. enforce_project If set to true, users are required to request a project whenever sub- mitting a job. See the -P option to qsub(1) for details. Changing enforce_project will take immediate effect. The default for enforce_project is false. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. enforce_jc If set to true, users are required to specify a job class whenever sub- mitting a job. Default value for this parameter is false. Manager can define a default job class with the default_jc parameter of this con- figuration. This allows to define a fallback job class that will be automatically used if the user does not specify a job class. default_jc This parameter allows to specify a job class that will be used as default for each submitted job if the user itself does not request a certain job class. Default for this parameter is NONE. enforce_user If set to true, a user(5) must exist to allow for job submission. Jobs are rejected if no corresponding user exists. If set to auto, a user(5) object for the submitting user will automati- cally be created during job submission, if one does not already exist. The auto_user_oticket, auto_user_fshare, auto_user_default_project, and auto_user_delete_time configuration parameters will be used as default attributes of the new user(5) object. Changing enforce_user will take immediate effect. The default for enforce_user is auto. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. auto_user_oticket The number of override tickets to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto. Changing auto_user_oticket will affect any newly created user objects, but will not change user objects created in the past. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. auto_user_fshare The number of functional shares to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto. Changing auto_user_fshare will affect any newly created user objects, but will not change user objects created in the past. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. auto_user_default_project The default project to assign to automatically created user(5) objects. User objects are created automatically if the enforce_user attribute is set to auto. Changing auto_user_default_project will affect any newly created user objects, but will not change user objects created in the past. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. auto_user_delete_time The number of seconds of inactivity after which automatically created user(5) objects will be deleted. User objects are created automatically if the enforce_user attribute is set to auto. If the user has no active or pending jobs for the specified amount of time, the object will auto- matically be deleted. A value of 0 can be used to indicate that the automatically created user object is permanent and should not be auto- matically deleted. Changing auto_user_delete_time will affect the deletion time for all users with active jobs. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. set_token_cmd Note: Deprecated, may be removed in future release. This parameter is only present if your Univa Grid Engine system is licensed to support AFS. Set_token_cmd points to a command which sets and extends AFS tokens for Univa Grid Engine jobs. In the standard Univa Grid Engine AFS distribu- tion, it is supplied as a script which expects two command line parame- ters. It reads the token from STDIN, extends the token's expiration time and sets the token: As a shell script this command will call the programs: - SetToken - forge which are provided by your distributor as source code. The script looks as follows: -------------------------------- #!/bin/sh # set_token_cmd forge -u $1 -t $2 | SetToken -------------------------------- Since it is necessary for forge to read the secret AFS server key, a site might wish to replace the set_token_cmd script by a command, which connects to a custom daemon at the AFS server. The token must be forged at the AFS server and returned to the local machine, where SetToken is executed. Changing set_token_cmd will take immediate effect. The default for set_token_cmd is none. The global configuration entry for this value may be overwritten by the execution host local configuration. pag_cmd Note: Deprecated, may be removed in future release. This parameter is only present if your Univa Grid Engine system is licensed to support AFS. The path to your pagsh is specified via this parameter. The sge_shep- herd(8) process and the job run in a pagsh. Please ask your AFS admin- istrator for details. Changing pag_cmd will take immediate effect. The default for pag_cmd is none. The global configuration entry for this value may be overwritten by the execution host local configuration. token_extend_time Note: Deprecated, may be removed in future release. This parameter is only present if your Univa Grid Engine system is licensed to support AFS. The token_extend_time is the time period for which AFS tokens are peri- odically extended. Univa Grid Engine will call the token extension 30 minutes before the tokens expire until jobs have finished and the cor- responding tokens are no longer required. Changing token_extend_time will take immediate effect. The default for token_extend_time is 24:0:0, i.e. 24 hours. The global configuration entry for this value may be overwritten by the execution host local configuration. shepherd_cmd Alternative path to the shepherd_cmd binary. Typically used to call the shepherd binary by a wrapper script or command. Changing shepherd_cmd will take immediate effect. The default for shep- herd_cmd is none. The global configuration entry for this value may be overwritten by the execution host local configuration. gid_range The gid_range is a comma separated list of range expressions of the form n-m (n as well as m are integer numbers greater than 99), where m is an abbreviation for m-m. These numbers are used in sge_execd(8) to identify processes belonging to the same job. Each sge_execd(8) may use a separate set up group ids for this purpose. All number in the group id range have to be unused supplementary group ids on the system, where the sge_execd(8) is started. Changing gid_range will take immediate effect. There is no default for gid_range. The administrator will have to assign a value for gid_range during installation of Univa Grid Engine. The global configuration entry for this value may be overwritten by the execution host local configuration. qmaster_params A list of additional parameters can be passed to the Univa Grid Engine qmaster. The following values are recognized: ALLOW_INCREASE_POSIX_PRIORITY If this parameter is set then the POSIX priority of jobs might be increased by users up to level 0 for their own jobs even if they do not have the operator or manager role. In case of absence of this parameter users are only allowed to decrease the priority of their jobs whereas operators and managers might increase/decrease the priority of jobs independent of the owner- ship. ALLOW_REQUEST_CHANGE_FOR_ALL_USERS If this parameter is set then all users are allowed to change the assigned resources of running jobs (see qalter -when NOW), which is also the default in case of absence of this parameter. It can be set to 0 to disallow the modification for all users that do not have the manager role. This parameter does not restrict resource modification of resource requests that will get active on reschedule (see qalter -when ON_RESCHEDULE). ALLOW_JC_AS_VIOLATION If this parameter is set then managers are allowed to change job attributes of jobs derived from a job class where the access specifier would normally not allow adjustment. ALLOW_PREEMPT_OWN_JOBS If this parameter is set then users are allowed to trigger man- ual preemption requests for own jobs. As default only managers and operators are allowed to trigger manual preemption requests. ENABLE_ENFORCE_MASTER_LIMIT If this parameter is set then the s_rt, h_rt limit of a running job are tested and executed by the sge_qmaster(8) when the sge_execd(8) where the job is in unknown state. After s_rt or h_rt limit of a job is expired then the master daemon will wait additional time defined by DURATION_OFFSET (see sched_conf(5)). If the execution daemon still cannot be con- tacted when this additional time is elapsed, then the master daemon will force the deletion of the job (see -f of qdel(1)). For jobs which will be deleted that way an accounting record will be created. As usage the record will contain the last reported online usage, when the execution daemon could contact qmaster. The failed state in the record will be set to 37 to indicate that the job was terminated by a limit enforcement of master daemon. After the restart of sge_qmaster(8) the limit enforcement will at first be triggered after the double of the biggest load_report_interval interval defined in sge_conf(5) has been elapsed. This will give the execution daemons enough time to reregister at master daemon. ENABLE_FORCED_QDEL_IF_UNKNOWN If this parameter is set then a deletion request for a job is automatically interpreted as a forced deletion request (see -f of qdel(1)) if the host, where the job is running is in unknown state. ENABLE_SUP_GRP_EVAL By default all UNIX group entries in access lists, the manager or operator list will only be evaluated against the primary UNIX group of users. If such group entries should also be evaluated against secondary groups then this paramter can be defined. ENABLE_FORCED_QDEL If this parameter is set, non-administrative users can force deletion of their own jobs via the -f option of qdel(1). With- out this parameter, forced deletion of jobs is only allowed by the Univa Grid Engine manager or operator. Note: Forced deletion for jobs is executed differently depending on whether users are Univa Grid Engine administrators or not. In case of administrative users, the jobs are removed from the internal database of Univa Grid Engine immediately. For regular users, the equivalent of a normal qdel(1) is executed first, and deletion is forced only if the normal cancellation was unsuc- cessful. FORBID_RESCHEDULE If this parameter is set, re-queuing of jobs cannot be initiated by the job script which is under control of the user. Without this parameter jobs returning the value 99 are rescheduled. This can be used to cause the job to be restarted at a different machine, for instance if there are not enough resources on the current one. FORBID_APPERROR If this parameter is set, the application cannot set itself to error state. Without this parameter jobs returning the value 100 are set to error state (and therefore can be manually rescheduled by clearing the error state). This can be used to set the job to error state when a starting condition of the application is not fulfilled before the application itself has been started, or when a clean up procedure (e.g. in the epilog) decides that it is necessary to run the job again, by returning 100 in the prolog, pe_start, job script, pe_stop or epilog script. DISABLE_AUTO_RESCHEDULING Note: Deprecated, may be removed in future release. If set to "true" or "1", the reschedule_unknown parameter is not taken into account. ENABLE_RESCHEDULE_KILL If set to "true" or "1", the reschedule_unknown parameter affects also jobs which have the rerun flag not activated (see the -r y option of qsub(1) and the rerun option of queue_conf(5)), but they are just finished as they can't be rescheduled. ENABLE_RESCHEDULE_SLAVE If set to "true" or "1" Univa Grid Engine triggers job rescheduling also when the host where the slave tasks of a par- allel job executes is in unknown state, if the resched- ule_unknown parameter is activated. MAX_DYN_EC Sets the max number of dynamic event clients (as used by qsub -sync y and by Univa Grid Engine DRMAA API library sessions). The default is set to 1000. The number of dynamic event clients should not be bigger than half of the number of file descriptors the system has. The number of file descriptors are shared among the connections to all exec hosts, all event clients, and file handles that the qmaster needs. MONITOR_TIME Specifies the time interval when the monitoring information should be printed. The monitoring is disabled by default and can be enabled by specifying an interval. The monitoring is per thread and is written to the messages file or displayed by the "qping -f" command line tool. Example: MONITOR_TIME=0:0:10 gen- erates and prints the monitoring information approximately every 10 seconds. The specified time is a guideline only and not a fixed interval. The interval that is actually used is printed. In this example, the interval could be anything between 9 sec- onds and 20 seconds. MONITOR_REQUEST_QUEUES If set to "true" or "1" then addition information about the qmaster internal request queues will be provided in the monitor- ing output of qping. Find more information in sge_diagnos- tics(1). LOG_MONITOR_MESSAGE Monitoring information is logged into the messages files by default. This information can be accessed via by qping(1). If monitoring is always enabled, the messages files can become quite large. This switch disables logging into the messages files, making qping -f the only source of monitoring data. PROF_SIGNAL Profiling provides the user with the possibility to get system measurements. This can be useful for debugging or optimization of the system. The profiling output will be done within the mes- sages file. Enables the profiling for qmaster signal thread. (e.g. PROF_SIGNAL=true) PROF_WORKER Enables the profiling for qmaster worker threads. (e.g. PROF_WORKER=true) PROF_LISTENER Enables the profiling for qmaster listener threads. (e.g. PROF_LISTENER=true) PROF_DELIVER Enables the profiling for qmaster event deliver thread. (e.g. PROF_DELIVER=true) PROF_TEVENT Enables the profiling for qmaster timed event thread. (e.g. PROF_TEVENT=true) PROF_COMMLIB_TIME Enables the profiling for communication library. The value spec- ifies the log interval for commlib profiling into the messages file. The logging shows the number of connected clients, the number of buffered messages at commlib layer (incoming/outgo- ing), the memory needed within the commlib layer for the buffered messages (incoming/outgoing) and the number of cached resolved hostnames. LOG_INCOMING_MESSAGE_SIZE This parameter is used to define if profiling information about incoming requests is logged into the messages file. The speci- fied value will be used as threshold. All incoming messages needing more memory size than specified will be logged. Default value for this parameter is 0 which means the feature is turned off. (e.g. LOG_INCOMING_MESSAGE_SIZE=20M) LOG_OUTGOING_MESSAGE_SIZE This parameter is used to define if profiling information about outgoing requests is logged into the messages file. The speci- fied value will be used as threshold. All outgoing messages needing more memory size than specified will be logged. Default value for this parameter is 0 which means the feature is turned off. (e.g. LOG_OUTGOING_MESSAGE_SIZE=20M) MAX_INCOMING_MESSAGE_SIZE This parameter is used to define a message size limit for accepting incoming requests. All incoming client requests using more memory than specified are rejected. The client will get an error message for the request. The value cannot be set below 1M. Values < 1M will be interpreted as 0 (=turned off). All rejected client requests are logged into the messages file. Default value for this parameter is 0 which means the feature is turned off. (e.g. MAX_INCOMING_MESSAGE_SIZE=1G) MAX_OUTGOING_MESSAGE_SIZE This parameter is used to define a message size limit for creat- ing client responses like qstat -j "*". All client requests that result in creating a response message exceeding the speci- fied memory size will get an error message. The value cannot be set below 1M. Values < 1M will be interpreted as 0 (=turned off). All rejected client requests are logged into the messages file. Default value for this parameter is 0 which means the feature is turned off. (e.g. MAX_OUTGOING_MESSAGE_SIZE=1G) STREE_SPOOL_INTERVAL Sets the time interval for spooling the sharetree usage. The default is set to 00:04:00. The setting accepts colon-separated string or seconds. There is no setting to turn the sharetree spooling off. (e.g. STREE_SPOOL_INTERVAL=00:02:00) MAX_JOB_DELETION_TIME Sets the value of how long the qmaster will spend deleting jobs. After this time, the qmaster will continue with other tasks and schedule the deletion of remaining jobs at a later time. The default value is 3 seconds, and will be used if no value is entered. The range of valid values is > 0 and <= 5. (e.g. MAX_JOB_DELETION_TIME=1) MAX_MASTER_TASK_WAIT_TIME Sets the value of how long the qmaster will wait for getting all slave task reports for parallel jobs when the master task already has been finished. The value is the waiting time in sec- onds. The range of valid values is >= 20 and <= 720. The default for this parameter is to wait 20 seconds. (e.g. MAX_MAS- TER_TASK_WAIT_TIME=30) ENABLE_JOB_FAILURE_IF_SLAVE_TASK_MISSING If this parameter is set to true a missing slave task report of a tightly integrated parallel job will set the failed state of the master task to 101. If the master task is already in failure state the value of the master task will not be overwritten. (e.g. ENABLE_JOB_FAILURE_IF_SLAVE_TASK_MISSING=true) ENABLE_JOB_FAILURE_ON_SLAVE_TASK_ERROR If this parameter is set to true a slave task which is reporting a failure or reports a non-zero exit status will automatically set the failed state for the master task of the parallel job. The first slave job which is reporting a non-zero exit status will set the master task failure field in the accounting file to the value 102. If a slave task is reporting some general fail- ure the master task failure state would be set to 103. If the master task is already in failure state the value will not be overwritten. This option is only valid for tight integration jobs. (e.g. ENABLE_JOB_FAILURE_ON_SLAVE_TASK_ERROR=true) LOST_JOB_TIMEOUT If this timeout parameter is set the qmaster worker threads will monitor the jobs reported by the execution daemons. If a task of a job that was started on an execution node is not reported for longer than the defined timeout the job is logged in the qmaster messages file. Job loss is e.g. possible if an execution daemon cannot read one ore more files in his spooling directory at startup. This can happen when the spooling directory runs out of disc space or on any other possible file problems. Such jobs typically are shown as running and occupy a slot on the execu- tion daemon indefinitely (see also "enable_lost_job_resched- ule"). If an execution daemon is not online or came online shortly the timeout will be extended until all preconditions are fulfilled. The minimum timeout depends also on the used "max_unheard" and "load_report_time" settings. If the timeout is set below the allowed minimum timeout - the calculated minimum timeout is used. The resulting timeout will be logged in the qmaster messages file. If the parameter is changed the job time- outs will be reinitialized. If the timeout is set to 00:00:00 the lost job detection is turned off. This is also the default setting for this parameter. The timeout is specified in seconds. ENABLE_LOST_JOB_RESCHEDULE This parameter is only valid if there is a "lost_job_timeout" parameter configured. If it is enabled the jobs for which the timeout was detected are set to error and will show up in the pending job list again. The accounting record will contain the "failed" state 22. Such jobs will not occupy a slot on the exe- cution node and the slots are free again for other jobs. The administrator might remove the error state and let the job run again or just delete them after solving the reported problem. The default for "enable_lost_job_reschedule" is false. GDI_TIMEOUT Sets how long the communication will wait for gdi send/receive operations. The default value is set to 60 seconds. After this time, the communication library will retry, if "gdi_retries" is configured, receiving the gdi request. In case of not configured "gdi_retries" the communication will return with a "gdi receive failure" (e.g. gdi_timeout=120 will set the timeout time to 120 sec) Configuring no gdi_timeout value, the value defaults to 60 sec. GDI_REQUEST_SESSION_TIMEOUT Default duration of a session as defined for "time" in sge_types(1). When this value is not defined then 00:15:00 (= 900 seconds) will be used by default duration for new sessions. Changing this value will not change the duration of existing sessions. MAX_READER_DELAY If defined then the value for this parameter has to be an inte- ger value in the range from 0 to 5000. It defines the number of milliseconds before the event processing thread of the read- only-thread thread-pool will enforce the update of the read-only data store. 0 means that the event processing thread will interrupt all other read-only threads as soon as possible (regularly when cur- rently processed requests are finished) so that it can update the read-only thread data store immediately. With values >0 the event processing thread also tries to process immediately but if there are pending read-only-requests then handling of this requests will be preferred as long as the defined time value did not elapse. When this value is not specified then the used reader delay value is 1000 which is also the recommended value for up to 8 read-only threads. If more read-only threads are started then it is recommended to increase the delay (8-16 threads => 2500; 16-32 threads => 3750; 32-64 threads 5000). Please note that the delay is the same that you might see for command line clients that use a session (see session_conf(5)) ENFORCE_GDI_WORKER GDI (Grid Engine Data Interface) is the name of an interface that command line clients use to communicate with qmaster. When enforce_gdi_worker is set to 1 then all GDI requests (read- only and read-write) will be handled by worker threads in sge_qmaster(1) even if reader threads are activated. Request will then be handled in FCFS manner like it was done in prior versions of Univa Grid Engine. Read-only threads can also be disabled by setting the bootstrap parameter reader to 0. This does not only disable reader threads but also disables the creation of the read-only thread pool in sge_qmaster(1). Please note that changing the bootstrap file requires to restart sge_qmaster(1) before the changes get active. ENFORCE_GDI_READER_FOR_EXECD If set to "true" or "1" also incoming get requests from the exe- cution daemons are handled by the reader threads if such threads are configured in the bootstrap configuration file. Default value for this parameter is true. GDI_RETRIES Sets how often the gdi receive call will be repeated until the gdi receive error appears. The default is set to 1. In this case the call will be done 1 time with no retry. Setting the value to -1 the call will be done permanently. In combination with gdi_timeout parameter it is possible to configure a system with eg. slow NFS, to make sure that all jobs will be submitted. (e.g. gdi_retries=4) CL_PING Turns on/off a communication library ping. This parameter will create additional debug output. This output shows information about the error messages which are returned by communication and it will give information about the application status of the qmaster. eg, if it's unclear what's the reason for gdi timeouts, this may show you some useful messages. The default value is false (off) (e.g cl_ping=false) SCHEDULER_TIMEOUT Setting this parameter allows the scheduler GDI event acknowl- edge timeout to be manually configured to a specific value. Cur- rently the default value is 10 minutes with the default sched- uler configuration and limited between 600 and 1200 seconds. Value is limited only in case of default value. The default value depends on the current scheduler configuration. The SCHED- ULER_TIMEOUT value is specified in seconds. JSV_TIMEOUT This parameter measures the response time of the server JSV. In the event that the response time of the JSV is longer than the timeout value specified, this will cause the JSV to be re- started. The default value for the timeout is 10 seconds and if modified, must be greater than 0. If the timeout has been reach, the JSV will only try to re-start once, if the timeout is reached again an error will occur. JSV_THRESHOLD The threshold of a JSV is measured as the time it takes to per- form a server job verification. If this value is greater than the user defined value, it will cause logging to appear in the qmaster messages file at the INFO level. By setting this value to 0, all jobs will be logged in the qmaster messages file. This value is specified in milliseconds and has a default value of 5000. GDI_THRESHOLD When processing a gdi request (e.g. submitting a job or querying job information via qstat) takes too long a warning is printed into the qmaster messages file. The time being considered too long can be defined by setting gdi_threshold in seconds. Default is a threshold of 60 seconds. GDI_MULTI_READ_REQ This parameter is deprecated since Univa Grid Engine 8.2. Func- tionality will be removed with the next minor or major release. Instead gdi_request_limits can be used (see above). This parameters defines the maximum number of multi-gdi-get requests that are accepted by qmaster per second. Multi-gdi-get requests are sent by command line clients like qstat, qhost or qmon to receive status information about the running cluster. When a command line client recognizes that it runs into this limit then it will retry to receive this information for 60 sec- onds. If whithin this time qmaster does not accept the request then the client will abort. The default value for this parame- ter is 0 to disable this limit. In clusters where users execute qstat -f or qhost regularely the limit might be set to a posi- tive integer value so that these commands will not have a nega- tive impact on the cluster throughput. This limit has no influ- ence on single-gdi-get requests that are sent by administration commands invoked via qconf. OLD_RESCHEDULE_BEHAVIOR Beginning with version 8.0.0 of Univa Grid Engine the scheduling behavior changed for jobs that are rescheduled by users. Rescheduled jobs will not be put at the beginning of the pending job list anymore. The submit time of those jobs is set to the end time of the previous run. Due to that those rescheduled jobs will be appended at the end of the pending job list as if a new job would have been submitted. To achive the old behaviour the paramter OLD_RESCHEDULE_BEHAVIOR has to be set. Please note that this parameter is declared as deprecated. So it might be removed with next minor release. OLD_RESCHEDULE_BEHAVIOR_ARRAY_JOB Beginning with version 8.0.0 of Univa Grid Engine the scheduling behavior changed for array job tasks that are rescheduled by users. As soon as a array job task gets scheduled all remaining pending tasks of that job will be put at the end of the pending job list. To achive the old scheduling behavior the paramter OLD_RESCHEDULE_BEHAVIOR_ARRAY_JOB has to be set. Please note that this parameter is declared as deprecated. So it might be removed with next minor release. ENABLE_SUBMIT_LIB_PATH Beginning with version 8.0.1p3 of Univa Grid Engine environment variables like LD_PRELOAD, LD_LIBRARY_PATH and similar variables by default may no longer be set via submit option -v or -V. Setting these variables could be misused to execute malicious code from user jobs, if the execution environment contained methods (e.g. prolog) to be executed as the root user, or if the old interactive job support (e.g. via ssh) was configured. Should it be necessary to allow setting environment variables like LD_LIBRARY_PATH (except LD_PRELOAD see ENABLE_SUB- MIT_LD_PRELOAD) via submit option -v or -V, this can be enabled again by setting ENABLE_SUBMIT_LIB_PATH to TRUE. In general the correct job environment should be set up in the job script or in a prolog, making the use of the -v or -V option for this purpose unnecessary. ENABLE_SUBMIT_LD_PRELOAD Setting these variable could be misused to execute malicious code from user jobs, if the execution environment contained methods (e.g. prolog) to be executed as the root user, or if the old interactive job support (e.g. via ssh) was configured. Should it be necessary to allow setting LD_PRELOAD via submit option -v or -V, this can be enabled again by setting ENABLE_SUBMIT_LD_PRELOAD to TRUE. In general the correct job environment should be set up in the job script or in a prolog, making the use of the -v or -V option for this purpose unnecessary. See also ENABLE_SUBMIT_LIB_PATH for more information. ALLOW_EMPTY_AFS_TOKEN This parameter is considered only if Univa Grid Engine is installed with AFS support. If this parameter is set to TRUE the AFS token generation can be done with the set_token_cmd only. The configured script can be used to completely generate the token at job execution time. The default method to generate the token by setting up the script $SGE_ROOT/util/get_token_cmd is still active with this setting, but it will not result in an error if the get_token_cmd script is not available. If this parameter is set to TRUE moving away the get_token_cmd script is suggested to get a better submit performance. ENABLE_REDUCE_MEM_FREE When the mem_free complex is configured as a consumable this parameter allows the users to reduce the requested amount of mem_free during job run-time. When the parameter is set, the user is able to reduce the requested amount of memory by per- forming a qalter on the previously requested mem_free consum- able. If this parameter is disabled or not set mem_free (when configured as a consumable) can't be changed for a running job. MAX_JOB_ID This parameter can be used for setting the maximum job id used by Univa Grid Engine. Job ids are allocated from id 1 to the maximum set. Setting MAX_JOB_ID to 0 disables job submission. The default maximum job id is 4294967295 (the maximum 32bit num- ber). This parameter also has effect on the advance reservation ids. MIN_PENDING_ENROLLED_TASKS Ticket calculation for sharetree, functional and override policy is done per job and for already existing tasks of array jobs. Already existing array tasks are running array tasks (getting running tickets) and array tasks which have been running but are pending again, e.g. due to rescheduling (getting pending tick- ets). If different pending tickets shall be computed for tasks of an array job it is necessary to create (enroll) pending array tasks. This can be controlled via the MIN_PEND- ING_ENROLLED_TASKS parameter. Default setting is 0: No pending array tasks will be enrolled for ticket calculation, the tasks are not created before they are scheduled. Pending tickets are calculated for the whole job, all tasks of an array job will get the same amount of tickets. Setting it to a positive number triggers creation of this number of pending array tasks per job. When it is set to -1 all tasks of an array job will get enrolled. SGE_DEBUG_LEVEL With the environment variable SGE_DEBUG_LEVEL debug output of sge_qmaster can be enabled when sge_qmaster is running non dae- monized. The qmaster_params SGE_DEBUG_LEVEL serves the same purpose but can be switched on and off during runtime. See sge_diagnostics(1) for details about SGE_DEBUG_LEVEL. Please note that the delimiter between multiple levels is : (colon) for the qmaster_params, so the environment setting SGE_DEBUG_LEVEL="3 0 0 0 0 0 0 0" translates to qmaster_params SGE_DEBUG_LEVEL=3:0:0:0:0:0:0:0. Debug output by default goes to stderr, which means, with a dae- monized sge_qmaster it would get lost. Please use the qmas- ter_params SGE_DEBUG_TARGET to redirect debug output to file. SGE_DEBUG_TARGET With the SGE_DEBUG_TARGET parameter debug output of sge_qmaster can be redirected into a file, to stdout or to stderr. Default is stderr if the parameter is not set or if a given file cannot be opened. AR_RESERVE_AVAILABLE_ONLY When this parameter is set to 1 or true, advance reservations submitted via qrsub will only be scheduled to currently avail- able resources, this means not to queue instances being dis- abled, suspended or in error state. This is a cluster wide set- ting which when enabled will overwrite the AR specific setting done via qrsub option -rao, see also qrsub(1). MAX_TCON_TASKS This parameter can be used to disable (value 0) concurrent array jobs or limit the maximum size of concurrent array jobs. Sub- mission of concurrent array jobs will be rejected if their size (number of array tasks) exceeds the value of MAX_TCON_TASKS. See also documentation of the -tcon submit option in MAX_AR_CAL_DEPTH Can be used for increasing or decreasing the maximum allowable calendar depth for Standing Reservation requests using qrsub -cal_depth. Per default the limit is set to 8. MAX_AR_CAL_JMP Can be used for increasing or decreasing the maximum allowable un-allocated (skippable) Standing Reservation instances. Per default the maximum is set to 8. The default request for a Standing reservation is 0. sge_submit(1). DISABLE_NAME_SERVICE_LOOKUP_CACHE This parameter can be used to disable (value 0) caching of name service lookup calls. The default setting is enabled (value 1). Switching off the cache might decrease the performance signifi- cantly. NAME_SERVICE_LOOKUP_CACHE_ENTRY_LIFE_TIME This parameter can be used to define when a cached entry in the name service lookup cache is removed. The default setting is zero (value 0). The default setting will auto adjust the timeout for compatibility reasons to 600 seconds. The value cannot be set > 86400 (1 day). Changing this parameter might have a sig- nificant performance influence. NAME_SERVICE_LOOKUP_CACHE_ENTRY_UPDATE_TIME This parameter can be used to define when a cached entry in the name service should get re-resolved. The default setting is zero (value 0). The default setting will auto adjust the timeout for compatibility reasons to 120 seconds. The value cannot be set > 1800 (30 min). Changing this parameter might have a sig- nificant performance influence. NAME_SERVICE_LOOKUP_CACHE_ENTRY_RERESOLVE_TIME This parameter can be used to define when a cached entry that was not resolvable (name service returned an error) when the host was added or at the last cache entry update. The default setting is zero (value 0). The default setting will auto adjust the timeout for compatibility reasons to 60 seconds. The value cannot be set > 600. Changing this parameter might have a sig- nificant performance influence. LOG_JOB_VERIFICATION_TIME This parameter can be used to enable profiling logging in the qmaster messages file for the job verification with the submis- sion option -w, see also submit(1). When the parameter is not set or it is set to a negative value no profiling will be done. When the parameter is set to 0 then profiling will be done and the resulting verification time will always be logged as an INFO message. When the parameter is set to a value (threshold) > 0 then pro- filing will be done, if the verification time exceeds the given threshold a WARNING message will be logged. LOG_REQUEST_PROCESSING_TIME This parameter can be used to enable profiling logging in the qmaster messages file for the processing of requests by worker, reader and event master threads. When the parameter is not set or it is set to a negative value no profiling will be done. When the parameter is set to 0 then profiling will be done and the resulting processing time will always be logged as an INFO message. When the parameter is set to a value (threshold) > 0 then pro- filing will be done, if the processing time exceeds the given threshold a WARNING message will be logged. LOG_SPOOLING_TIME This parameter can be used to enable profiling logging in the qmaster messages file for spooling operations. When the parameter is not set or it is set to a negative value no profiling will be done. When the parameter is set to 0 then profiling will be done and the resulting spooling time will always be logged as an INFO message. When the parameter is set to a value (threshold) > 0 then pro- filing will be done, if the spooling time exceeds the given threshold a WARNING message will be logged. Changing qmaster_params will take immediate effect, except gdi_timeout, gdi_retries, cl_ping, these will take effect only for new connections. The default for qmaster_params is none. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. execd_params This is used for passing additional parameters to the Univa Grid Engine execution daemon. The following values are recognized: ENABLE_DIR_SERVICE_TIMEOUT Enables a timeout that is used for operations that require a connection to a directory service (like NIS, LDAP, Active Direc- tory, ...). Such operations are triggered by the execd and shep- herd to retrieve user/group specific information to be able to start corresponding jobs. Valid timeout values are in the range between 1 and 10 seconds. Builtin default is 1 second and used if this parameter is not defined. IGNORE_NGROUPS_MAX_LIMIT If a user is assigned to NGROUPS_MAX-1 supplementary groups so that Univa Grid Engine is not able to add an addition one for job tracking then the job will go into error state when it is started. Administrators that want prevent the system doing so can set this parameter. In this case the NGROUPS_MAX limit is ignored and the additional group (see gid_range) is not set. As a result for those jobs no online usage will be available. Also the parameter ENABLE_ADDGRP_KILL will have no effect. Please note that it is not recommended to use this parameters. Instead the group membership of the submit user should be reduced. KEEP_ACTIVE If set to ERROR, the spool directory of the job (maintained by sge_shepherd(8)), the job script, a file which includes all job related messages from the execution daemon as also a list of all files located in the jobs temp-directory will be sent to the sge_qmaster(8) if the job had an exit-status != 0 or if the job failed (see accounting(5)). If set to ALWAYS, the execution daemon will send the spool directory as also the debugging files for every job. These files can be found at $SGE_ROOT/$SGE_CELL/faulty_jobs If set to true, the execution daemon will not remove the spool directory maintained by sge_shepherd(8) for a job (this value should only be set for debugging purposes). KEEP_ACTIVE_SIZE If KEEP_ACTIVE is set to ERROR or ALWAYS the execution daemon will transfer files to sge_qmaster(8) As big files might lead to a high memory consumption in sge_qmaster(8) files with a bigger size than KEEP_ACTIVE_SIZE (in Bytes) will not get sent. If not set the file size is limited to 20 MB. KEEP_OPEN_FDS As part of the start process for a job the sge_shepherd will close all open file descriptors. If this should not be done for one or more file descriptors then this variable can be set to one number or to a range of numbers (like 4-9). Corresponding file descriptors will then not be closed. Please note that cer- tain implementations of services (like Active Directory) require this functionality, so that the processes part of the job, can use underlying library functionality. MAX_IJS_CLIENT_WAIT_TIME This parameter defines the waiting time for builtin interactive job support jobs to flush job output at job end and deliver the exit state to the connected qrsh or qlogin client. Once an interactive job has finished the sge_shepherd(8)) will wait for an acknowledge from the connected qrsh client. The already fin- ished job will be shown as running during this time. If the qrsh client component is suspended for some reason or the network has some outage it might not always be useful to wait until the data is acknowledged. If set to INFINITY the sge_shepherd(8)) will only finish on con- nection errors. If specified as time value in the format HH:MM:SS all values >= 00:00:01 are used as timeout value. The default value for this paramter is 1 Minute (00:01:00). Once the timeout occurs the job will be reported as finished and the final exit state of the job is available in the accounting file. The MAX_IJS_CLIENT_WAIT_TIME parameter has no influence on the suspend state of the qrsh client. A suspended qrsh client which was used to submit the job will stay suspended until the user unsuspends the qrsh. NOTE: The job control feature of the shell where the qrsh job is submitted may be responsible for suspending the qrsh when it tries to read from stdin or tries to write to stdout if qrsh is started in background. This can be bypassed using the qrsh -bgio parameter when submitting the qrsh job as background job (see -bgio option of submit(1) man page for more information). PTF_MIN_PRIORITY, PTF_MAX_PRIORITY The maximum/minimum priority which Univa Grid Engine will assign to a job. Typically this is a negative/positive value in the range of -20 (maximum) to 19 (minimum) for systems which allow setting of priorities with the nice(2) system call. Other sys- tems may provide different ranges. The default priority range (varies from system to system) is installed either by removing the parameters or by setting a value of -999. See the "messages" file of the execution daemon for the prede- fined default value on your hosts. The values are logged during the startup of the execution daemon. PROF_EXECD Enables the profiling for the execution daemon. (e.g. PROF_EXECD=true) SCRIPT_TIMEOUT This parameter defines the timeout value for scripts that are executed by sge_shepherd(8) (e.g. prolog/epilog of a job). Scripts where the execution duration would exceed the configured timeout value will be terminated by sge_shepherd(8) automati- cally. The default for this parameter is 2 minutes (00:02:00). It can be set to any value greater than 0. NOTIFY_KILL The parameter allows you to change the notification signal for the signal SIGKILL (see -notify option of qsub(1)). The parame- ter either accepts signal names (use the -l option of kill(1)) or the special value none. If set to none, no notification sig- nal will be sent. If it is set to TERM, for instance, or another signal name then this signal will be sent as notification sig- nal. NOTIFY_SUSP With this parameter it is possible to modify the notification signal for the signal SIGSTOP (see -notify parameter of qsub(1)). The parameter either accepts signal names (use the -l option of kill(1)) or the special value none. If set to none, no notification signal will be sent. If it is set to TSTP, for instance, or another signal name then this signal will be sent as notification signal. USE_QSUB_GID If this parameter is set to true, the primary group id active when a job was submitted will be set to become the primary group id for job execution. If the parameter is not set, the primary group id as defined for the job owner in the execution host passwd(5) file is used. The feature is only available for jobs submitted via qsub(1), qrsh(1), qmake(1) and qtcsh(1). Also, it only works for qrsh(1) jobs (and thus also for qtcsh(1) and qmake(1)) if rsh and rshd components are used which are provided with Univa Grid Engine (i.e., the rsh_daemon and rsh_command parameters may not be changed from the default). S_DESCRIPTORS, H_DESCRIPTORS, S_MAXPROC, H_MAXPROC, S_MEMORYLOCKED, H_MEMORYLOCKED, S_LOCKS, H_LOCKS Specifies soft and hard resource limits as implemented by the setrlimit(2) system call. See this manual page on your system for more information. These parameters complete the list of lim- its set by the RESOURCE LIMITS parameter of the queue configura- tion as described in queue_conf(5). Unlike the resource limits in the queue configuration, these resource limits are set for every job on this execution host. If a value is not specified, the resource limit is inherited from the execution daemon process. Because this would lead to unpredicted results, if only one limit of a resource is set (soft or hard), the corresponding other limit is set to the same value. S_DESCRIPTORS and H_DESCRIPTORS specify a value one greater than the maximum file descriptor number that can be opened by any process of a job. S_MAXPROC and H_MAXPROC specify the maximum number of processes that can be created by the job user on this execution host S_MEMORYLOCKED and H_MEMORYLOCKED specify the maximum number of bytes of virtual memory that may be locked into RAM. The value type is memory_specifier as described in the sge_types(1) manual page. S_LOCKS and H_LOCKS specify the maximum number of file locks any process of a job may establish. All of these values can be specified using the multiplier let- ters k, K, m, M, g and G, see sge_types(1) for details. For all of these values, the keyword "INFINITY" (which means RLIM_INFIN- ITY as described in the setrlimit(2) manual page) can be used to set the resource limit to "unlimited". INHERIT_ENV This parameter indicates whether the shepherd should allow the environment inherited by the execution daemon from the shell that started it to be inherited by the job it's starting. When true, any environment variable that is set in the shell which starts the execution daemon at the time the execution daemon is started will be set in the environment of any jobs run by that execution daemon, unless the environment variable is explicitly overridden, such as PATH or LOGNAME. If set to false, each job starts with only the environment variables that are explicitly passed on by the execution daemon, such as PATH and LOGNAME. The default value is true. SET_LIB_PATH This parameter tells the execution daemon whether to add the Univa Grid Engine shared library directory to the library path of executed jobs. If set to true, and INHERIT_ENV is also set to true, the Univa Grid Engine shared library directory will be prepended to the library path which is inherited from the shell which started the execution daemon. If INHERIT_ENV is set to false, the library path will contain only the Univa Grid Engine shared library directory. If set to false, and INHERIT_ENV is set to true, the library path exported to the job will be the one inherited from the shell which started the execution daemon. If INHERIT_ENV is also set to false, the library path will be empty. After the execution daemon has set the library path, it may be further altered by the shell in which the job is exe- cuted, or by the job script itself. The default value for SET_LIB_PATH is false. ENABLE_ADDGRP_KILL If this parameter is set then Univa Grid Engine uses the supple- mentary group ids (see gid_range) to identify all processes which are to be terminated when a job is deleted, or when sge_shepherd(8) cleans up after job termination. SUSPEND_PE_TASKS With this parameter set to TRUE tasks of tightly integrated jobs get suspended and unsuspended when the job gets suspended or unsuspended. Some MPI implementations are known to fail when tasks get suspended, in case you are running such jobs set SUS- PEND_PE_TASKS to FALSE and handle suspension / unsuspension through a suspend_method and resume_method, see queue_conf(5)). PDC_INTERVAL This parameter defines the interval how often the PDC (Portable Data Collector) is executed by the execution daemon. The PDC is responsible for enforcing the resource limits s_cpu, h_cpu, s_vmem and h_vmem (see queue_conf(5)) and job usage collection. The parameter can be set to a time_specifier (see sge_types(5)) , to PER_LOAD_REPORT or to NEVER. If this parameter is set to PER_LOAD_REPORT the PDC is triggered in the same interval as load_report_time (see above). If this parameter is set to NEVER the PDC run is never triggered. The default value for this parameter is 5 seconds. Note: A PDC run is quite compute extensive may degrade the per- formance of the running jobs. But if the PDC runs less often or never the online usage can be incomplete or totally missing (for example online usage of very short running jobs might be miss- ing) and the resource limit enforcement is less accurate or would not happen if PDC is turned of completely. PDC_CACHE_UPDATE_TIMEOUT This parameter is used to define the period of time how long the cached process data of a process that was not identified to belong to a Univa Grid Engine started job is valid. If the time- out is not reached the cached proc table information will be used. Once the information for the process is older than the defined timeout it will be re-read from the proc table. If the parameter is set to "0" the cache refreshing is turned off. This means once the process information is cached there will never be an update on the cached information. If you have a high throughput of jobs in your Univa Grid Engine cluster and your process id roll-over time is short it is recom- mended to set this parameter to a value below your typical process id wrap-around time. The default for this parameter is 120 seconds. This means that the cached process information for such processes is updated every two minutes. ENABLE_BINDING If this parameter is set then Univa Grid Engine enables the core binding module within the execution daemon to apply binding parameters that are specified during submission time of a job. This parameter is not set per default and therefore all binding related information will be ignored for hosts other than lx- amd64 and lx-x86. If the host has such a lx-amd64 or lx-x86 architecture it is internally turned on per default. Find more information for job to core binding in the section -binding of qsub(1). DISABLE_GID_RANGE_OBSERVATION If this parameter is set to 1 (or true) then gid range observa- tion is turned off in the Univa Grid Engine execution daemon. The default for this option is 0 (or false) which means the execd will per default the processes running on the execution host. If a process is using a group id that is reserved for starting Univa Grid Engine jobs and it does not belong to a cur- rent running Univa Grid Engine job this group id will be blocked for starting further Univa Grid Engine jobs until the unexpected processes are gone. DISABLE_M_MEM_FREE If this parameter is set to 1 (or true) then the execution dae- mon does not report load values for the m_mem_free and m_mem_free_n anymore. This is needed especially in cases where resource reservation for jobs requesting such a complex value must be enabled. When a load value of a specific host is lower than the requested value for job then the scheduler does no resource reservation for that host. In order to prevent this the load value sending can be turned off. ENABLE_MEM_DETAILS If this parameter is set to 1 (or true) execution daemons on Linux report additional per job memory usage: rss (resident set size), pss (proportional set size), smem (shared memory), pmem (private memory), maxrss (maximum resident set size), maxpss (maximum proportional set size). These additional memory usage values can be retrieved via qstat -j . ENFORCE_LIMITS When a job is started by sge_execd(8) limits configured in the queue_conf(5) or specified during job submission will be set as per process resource limit, see also setrlimit(2). The following limits are in addition enforced by sge_execd as per job limits: h_cpu, s_cpu, h_rss, s_rss, h_vmem, s_vmem. If cgroups_params is set to true h_vmem is controlled only by cgroups (see cgroups_params for more information). The ENFORCE_LIMITS parameter allows the specification of where the limits h_cpu, s_cpu, h_rss, s_rss, h_vmem, s_vmem are enforced. This parameter has no influence on the limits h_stack, s_stack, h_data, s_data, h_core, s_core, h_fsize and s_fsize which are always set as resource limit with setrlimit() system call. If ENFORCE_LIMITS is not set or if it is set to the value ALL the limits are both set as resource limit and they are enforced by sge_execd(8) as well. If ENFORCE_LIMITS is set to SHELL, only the resource limits are set, sge_execd(8) will not enforce them. If ENFORCE_LIMITS is set to EXECD, these limits are only enforced by sge_execd(8) INFINITY is set as resource limit. If ENFORCE_LIMITS is set to OFF, sge_execd(8) will not enforce them and INFINITY is set as resource limit. MONITOR_PDC When this parameter is set to true sge_execd(8) will write information and errors reported by the data collector to its messages file, e.g. errors when reading from the /proc file sys- tem. MONITOR_PDC can be set to 0 (or false) or 1 (true). Use this parameter with care and only when suggested by Univa Sup- port, e.g. for debugging issues with the reporting of online usage, as a significant amount of information might get written to the sge_execd(8) messages file. JOB_START_FLUSH_DELAY When a job is started by sge_execd(8) a job report will be sent to sge_qmaster(8) to trigger the job state transition from transferring to running. For short running jobs (runtime of a few seconds) the sending of the first job report can be delayed via the JOB_START_FLUSH_DELAY parameter. It specifies in seconds how long sending of the first job report will be delayed. Valid values for JOB_START_FLUSH_DELAY are 0 to 10, default is 0 (first job report will be sent immediately after receipt of job). Delaying the sending of the first job report can reduce load on sge_qmaster when many short jobs are run in the cluster. RESCHEDULE_ON_KILLED_EPILOG If set to "true" or "1", the behaviour depending on the exit status of the epilog as described in section epilog is in effect, which means if the epilog dies because of a signal, causing its exit_status to be larger than 127, the queue is put in error state and the job is re-queued to the pending job list. If set to "false" or "0", if the epilog dies because of a sig- nal, the job finishes normally and the queue is not put in error state. The "failed" field of the job is set to "15 : in epilog" then, but the "exit_status" is the one of the job itself. To detect if an epilog was signalled solely its exit status is taken into account, i.e. an epilog that exits with a status > 127 is handled like an epilog that was signaled. The default value of this parameter is "true". RESCHEDULE_ON_MISSING_EPILOG If set to "true" or "1", the behaviour depending on the exit status of the epilog as described in section epilog is in effect, which means if the epilog is configured but the epilog script cannot be found, the queue is put in error state and the job is re-queued to the pending job list. If set to "false" or "0", if the epilog is configured but the epilog script cannot be found, the job finishes normally and the queue is not put in error state. The "failed" field of the job is set to "15 : in epilog" then, but the "exit_status" is the one of the job itself. The default value of this parameter is "true". Changing execd_params will take effect after it was propagated to the execution daemons. The propagation is done in one load report interval. The default for execd_params is none. The global configuration entry for this value may be overwritten by the execution host local configuration. cgroups_params A list of parameters for enabling and controlling the behavior of cgroups. This list can be set globally in the global configuration or be overridden in the host configuration for particular hosts. The cgroups feature is only available for lx-amd64 hosts. The OS must sup- port cgroups (e.g. RHEL >= 6 with installed cgroup packages). Each cgroup subsystem must be mounted to a different subdirectory below cgroup_path. The following values are recognized: cgroup_path If set to none then the cgroup support is disabled otherwise the path to the cgroup main directory is set here (usually /cgroup). All cgroup subsystems must be available in subdirectories here (either as link or as mounted directories). Example: For memory limitation /cgroup/memory and for core binding /cgroup/cpuset must exist, when the cgroup_path is set to /cgroup. cpuset If set to true (or 1) then core binding is done by the cgroup cpuset subsystem. This affects only jobs requesting a core binding with the -binding submission parameter. Using cpuset is recommended since it limits the job to the chosen CPU cores without the possibility that the user overcomes these limits. m_mem_free_limit_hard If set to true (or 1) it restricts the usage of main memory to the value requested with the m_mem_free parameter. For using this parameter a mounted memory subsystem ($cgroup_path/memory) and a main memory request using the m_mem_free complex is required. If a job consumes more memory than requested it is usually aborted since further malloc calls will fail. Internally the value memory.limit_in_bytes is set. More details can be found in the operating system / cgroups documenta- tion. m_mem_free_limit_soft If set to true (or 1) this parameter restricts the usage of main memory only if the memory limit is exceeded and the operating system detects memory contention. Main memory restriction is usually applied by push- ing back the jobs main memory usage to the soft limits. Please consult the operating system / Linux kernel documentation for more details. Internally the value memory.soft_limit_in_bytes is set in the cgroup memory subsystem. If m_mem_free_limit_hard is active as well the hard memory limit rule is applied by the cgroup subsystem. h_vmem_limit If set to true (or 1) this parameter restricts the usage of the sum of main memory usage and swap space usage for a job if h_vmem is requested. Internally the cgroup parameter memory.memsw.limit_in_bytes is set. Please note that when the limit is applied successfully the h_vmem rlimit is not set for the job anymore. The execd daemon will also not enforce the h_vmem limit. This means only cgroups will handle the specified h_vmem rlimit. If the value is lower than min_mem- ory_limit it is automatically increased to the configured amount. If m_mem_free_limit_hard is used and m_mem_free requested with a higher value than h_vmem then m_mem_free is reduced to h_vmem limit. If m_mem_free is set to a lower value than h_vmem then the kernel ensures that only m_mem_free main memory is available for the job, when requesting more memory it is automatically taken from swap space. Only when the total memory exceeds the h_vmem limit cgroups will do some action. If h_vmem is requested but no m_mem_free then automatically a hard cgroup limit for main memory with the size of h_vmem is applied, otherwise virtual memory limitation will not work. This is a cgroup limitation. In this case a min_memory_limit value affects h_vmem as well. min_memory_limit If set to a memory value (like 10M) then each m_mem_free (or h_vmem request, when mixed with m_mem_free) which restricts the job due with m_mem_free_limit_hard or m_mem_free_limit_soft and which is lower than this value is automatically increased to the specified min_memory_limit value. Example: If m_mem_free_limit_hard is enabled and the job requests 100M but min_memory_limit is set to 150M then the internal limit for the job (memory.limit_in_bytes) is set to 150M. This does not affect qstat or internal book keeping. The parameter is used to solve OS specific issues with too large memory footprints (shepherd is part of the restriction) of small jobs. The memory is not multiplied by amount of slots requested by the job. The parameter is turned off by setting to 0 or not setting the parameter at all. Jobs just requesting cgroups h_vmem without m_mem_free are not affected. Here the same lim- its like for h_vmem are used. freezer If set to true (or 1) it enables the cgroup freezer subsystem for job suspension and resumption. The freezer subsystem needs to be mounted under $cgroup_path/freezer. If enabled a job is not longer suspended with the SIGSTOP and resumed with SIGCONT, the job is disabled from being scheduled by the Linux kernel by the freezer subsystem. There is no signal sent to the job. The processes are usually put in D state (which is an uninterruptible sleep, like for IO). If the job needs to be notified then the -notify submission option can be used. The queue configuration can override the cgroups suspension mechanism for certain jobs. This is done by putting the standard signals in the sus- pend_method (SIGSTOP) and resume_method (SIGCONT). This can be needed for certain job types which relay on signaling. For tightly integrated jobs only the master task is put into suspend state (the first task regardless JOB_IS_FIRST_TASK is configured in the parallel environment configuration or not). If all tasks of a parallel job has to be put in the freezer then freeze_pe_tasks needs to be activated. If queue over- rides freezer with own signals, freeze_pe_tasks is set to true, but SUSPEND_PE_TASKS (execd_params) is set to false than slave tasks are not signaled. The freezer is available for batch and parallel jobs, but not for interactive jobs (qlogin and qrsh, except for qrsh -inherit). freeze_pe_tasks If set to true (or 1) and the freezer subsystem is turned on then not only the master task is suspended also all slave tasks of the parallel job are frozen. If queue overrides freezer with own signals and freeze_pe_tasks is set to true, but SUSPEND_PE_TASKS (execd_params) is set to false then slave tasks are not signaled. If SUSPEND_PE_TASKS is true (this is the default when not set as execd_param) then slave tasks are signaled with the overridden queue signal / suspend_method. If set to true (or 1) the job is killed by using the tasks file of the cpuset subsystem (which when killing is enabled is automatically used for all jobs). As long as there are processes in the file the processes are signaled. This prevents any leftover processes from jobs to be run- ning after the job finished. mount Tries to mount the cgroup subsystems if it is not already mounted to $cgroup_path/subsystem before a cgroup is created. If $cgroup_path does not exist an error occurs (it will not tried to be created). If the subsystem directory does not exist it will be created. The subsystem is not unmounted by Grid Engine. Usually the mounting is done automati- cally by the operating system when it is started, so this parameter is usually turned off. Typically (like in RHEL 6) the configuration file for OS auto-mounting of cgroups is /etc/cgconfig.conf. forced_numa When memory binding was requested with -mbind cores:strict, so that only memory from the NUMA node the job is bound to (by using -binding) should be taken, then this is set in the cgroups settings cpuset.mems. If turned on by setting forced_numa to 1 or true then this limit is ensured by the Linux kernel. In difference to the traditional memory enforcement the job can not reset the value in order to get memory from other NUMA nodes. reporting_params Used to define the behavior of reporting modules in the Univa Grid Engine qmaster. Changes to the reporting_params takes immediate effect. The following values are recognized: accounting If this parameter is set to true, the accounting file is writ- ten. The accounting file is prerequisite for using the qacct command. reporting If this parameter is set to true, the reporting file is written. The reporting file contains data that can be used for monitoring and analysis, like job accounting, job log, host load and con- sumables, queue status and consumables and sharetree configura- tion and usage. Attention: Depending on the size and load of the cluster, the reporting file can become quite large. Only activate the reporting file if you have a process running that will consume the reporting file! See reporting(5) for further information about format and contents of the reporting file. flush_time Contents of the reporting file are buffered in the Univa Grid Engine qmaster and flushed at a fixed interval. This interval can be configured with the flush_time parameter. It is speci- fied as a time value in the format HH:MM:SS. Sensible values range from a few seconds to one minute. Setting it too low may slow down the qmaster. Setting it too high will make the qmaster consume large amounts of memory for buffering data. accounting_flush_time Contents of the accounting file are buffered in the Univa Grid Engine qmaster and flushed at a fixed interval. This interval can be configured with the accounting_flush_time parameter. It is specified as a time value in the format HH:MM:SS. Sensible values range from a few seconds to one minute. Setting it too low may slow down the qmaster. Setting it too high will make the qmaster consume large amounts of memory for buffering data. Setting it to 00:00:00 will disable accounting data buffering; as soon as data is generated, it will be written to the account- ing file. If this parameter is not set, the accounting data flush interval will default to the value of the flush_time parameter. joblog If this parameter is set to true, the reporting file will con- tain job logging information. See reporting(5) for more informa- tion about job logging. sharelog The Univa Grid Engine qmaster can dump information about share- tree configuration and use to the reporting file. The parameter sharelog sets an interval in which sharetree information will be dumped. It is set in the format HH:MM:SS. A value of 00:00:00 configures qmaster not to dump sharetree information. Intervals of several minutes up to hours are sensible values for this parameter. See reporting(5) for further information about sharelog. online_usage Online usage information of running jobs (e.g. cpu, mem, vmem, ...) can be written to the reporting(5) file. Which variables to report is configured as a colon separated list, e.g. online_usage=cpu:mem:vmem. finished_jobs Note: Deprecated, may be removed in future release. Univa Grid Engine stores a certain number of just finished jobs to pro- vide post mortem status information. The finished_jobs parameter defines the number of finished jobs stored. If this maximum number is reached, the eldest finished job will be discarded for every new job added to the finished job list. Changing finished_jobs will take immediate effect. The default for finished_jobs is 100. This value is a global configuration parameter only. It cannot be over- written by the execution host local configuration. qlogin_daemon This parameter specifies the mechanism that is to be started on the server side of a qlogin(1) request. Usually this is the builtin mecha- nism. It's also possible to configure an external executable by speci- fying the full qualified pathname, e.g. of the system's telnet daemon. Changing qlogin_daemon will take immediate effect. The default value for qlogin_daemon is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. Examples for the two allowed kinds of attributes are: qlogin_daemon builtin or qlogin_daemon /usr/sbin/in.telnetd qlogin_command This is the command to be executed on the client side of a qlogin(1) request. Usually this is the builtin qlogin mechanism. It's also pos- sible to configure an external mechanism, usually the absolute pathname of the system's telnet client program. It is automatically started with the target host and port number as parameters. Changing qlogin_command will take immediate effect. The default value for qlogin_command is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. Examples for the two allowed kinds of attributes are: qlogin_command builtin or qlogin_command /usr/bin/telnetd rlogin_daemon This parameter specifies the mechanism that is to be started on the server side of a qrsh(1) request without a command argument to be exe- cuted remotely. Usually this is the builtin mechanism. It's also pos- sible to configure an external executable by specifying the absolute pathname, e.g. of the system's rlogin daemon. Changing rlogin_daemon will take immediate effect. The default for rlogin_daemon is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. The allowed values are similar to the ones of the examples of qlo- gin_daemon. rlogin_command This is the mechanism to be executed on the client side of a qrsh(1) request without a command argument to be executed remotely. Usually this is the builtin mechanism. If no value is given, a specialized Univa Grid Engine component is used. The command is automatically started with the target host and port number as parameters. The Univa Grid Engine rlogin client has been extended to accept and use the port number argument. You can only use clients, such as ssh, which also understand this syntax. Changing rlogin_command will take immediate effect. The default value for rlogin_command is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. In addition to the examples of qlogin_command , this value is allowed: rsh_daemon none rsh_daemon This parameter specifies the mechanism that is to be started on the server side of a qrsh(1) request with a command argument to be executed remotely. Usually this is the builtin mechanism. If no value is given, a specialized Univa Grid Engine component is used. Changing rsh_daemon will take immediate effect. The default value for rsh_daemon is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. In addition to the examples of qlogin_daemon , this value is allowed: rsh_daemon none rsh_command This is the mechanism to be executed on the client side of a qrsh(1) request with a command argument to be executed remotely. Usually this is the builtin mechanism. If no value is given, a specialized Univa Grid Engine component is used. The command is automatically started with the target host and port number as parameters like required for telnet(1) plus the command with its arguments to be executed remotely. The Univa Grid Engine rsh client has been extended to accept and use the port number argument. You can only use clients, such as ssh, which also understand this syntax. Changing rsh_command will take immediate effect. The default value for rsh_command is builtin. The global configuration entry for this value may be overwritten by the execution host local configuration. In addition to the examples of qlogin_command , this value is allowed: rsh_command none port_range This parameter is used to define fix port ranges that are used by the interactive job execution modules ("builtin" or "daemon" based). The builtin and the "daemon" based components will bind TCP/IP ports within the specified port range. The use case would be to setup an open port range in a firewall configuration. If there is no free port available in the specified range the interac- tive command will fail. The configured range should be large enough to handle all interactive clients (qrsh, qlogin, qsh, ...) that might be started on a host at the same time. The "daemon" based interactive job methods will also bind ports on the execution host. The "builtin" method will only bind ports on the client side where the interactive command is started. Changing this parameter will have immediate effect. For the "daemon" based interactive job methods the execd must get the new config first which happens usually within the next load report interval. If no value is given (port_range=none) the resulting port is provided by the operating system. The global configuration entry for this value may be overwritten by a local configuration. The default value for port_range is "none". The syntax for this parameter is: none|port_range[,port_range,...] Where port_range is defined as: port_nr[-port_nr] Example: port_range 30400-30800,40400-40800 delegated_file_staging This flag must be set to "true" when the prolog and epilog are ready for delegated file staging, so that the DRMAA attribute 'drmaa_trans- fer_files' is supported. To establish delegated file staging, use the variables beginning with "$fs_..." in prolog and epilog to move the input, output and error files from one host to the other. When this flag is set to "false", no file staging is available for the DRMAA interface. File staging is currently implemented only via the DRMAA interface. When an error occurs while moving the input, output and error files, return error code 100 so that the error handling mechanism can handle the error correctly. (See also FORBID_APPERROR). reprioritize Note: Deprecated, may be removed in future release. This flag enables or disables the reprioritization of jobs based on their ticket amount. The reprioritize_interval in sched_conf(5) takes effect only if reprioritize is set to true. To turn off job reprioriti- zation, the reprioritize flag must be set to false and the repriori- tize_interval to 0 which is the default. This value is a global configuration parameter only. It cannot be over- ridden by the execution host local configuration. jsv_url This setting defines a server JSV instance which will be started and triggered by the sge_qmaster(8) process. This JSV instance will be used to verify job specifications of jobs before they are accepted and stored in the internal master database. The global configuration entry for this value cannot be overwritten by execution host local configura- tions. Find more details concerning JSV in jsv(1) and sge_request(1). The syntax of the jsv_url is specified in sge_types(1). jsv_allowed_mod If there is a server JSV script defined with jsv_url parameter, then all qalter(1) or qmon(1) modification requests for jobs are rejected by qmaster. With the jsv_allowed_mod parameter an administrator has the possibility to allow a set of switches which can then be used with clients to modify certain job attributes. The value for this parameter has to be a comma separated list of JSV job parameter names as they are documented in qsub(1) or the value none to indicate that no modifica- tion should be allowed. Please note that even if none is specified the switches -w and -t are allowed for qalter. libjvm_path libjvm_path is usually set during qmaster installation and points to the absolute path of libjvm.so. (or the corresponding library depend- ing on your architecture - e.g. /usr/java/jre/lib/i386/server/lib- jvm.so) The referenced libjvm version must be at least 1.5. It is needed by the JVM qmaster thread only. If the Java VM needs additional starting parameters they can be set in additional_jvm_args. If the JVM thread is started at all can be defined in the bootstrap(5) file. If libjvm_path is empty or an incorrect path the JVM thread fails to start. The global configuration entry for this value may be overwritten by the execution host local configuration. additional_jvm_args additional_jvm_args is usually set during qmaster installation. Details about possible values additional_jvm_args can be found in the help output of the accompanying Java command. This setting is normally not needed. The global configuration entry for this value may be overwritten by the execution host local configuration. SEE ALSO sge_intro(1), csh(1), qconf(1), qsub(1), jsv(1), rsh(1), sh(1), getpw- nam(3), drmaa_attributes(3), queue_conf(5), sched_conf(5), sge_types(1), sge_execd(8), sge_qmaster(8), sge_shepherd(8), sge_diag- nostics(1), cron(8), Univa Grid Engine Installation and Administration Guide. COPYRIGHT See sge_intro(1) for a full statement of rights and permissions. Univa Grid Engine File Formats UGE 8.5.4 SGE_CONF(5)