SGE_PE(5) File Formats Manual SGE_PE(5) NAME sge_pe - Univa Grid Engine parallel environment configuration file for- mat DESCRIPTION Parallel environments are parallel programming and runtime environments allowing for the execution of shared memory or distributed memory par- allelized applications. Parallel environments usually require some kind of setup to be operational before starting parallel applications. Examples for common parallel environments are shared memory parallel operating systems and the distributed memory environments Parallel Vir- tual Machine (PVM) or Message Passing Interface (MPI). sge_pe allows for the definition of interfaces to arbitrary parallel environments. Once a parallel environment is defined or modified with the -ap or -mp options to qconf(1) and linked with one or more queues via pe_list in queue_conf(5) the environment can be requested for a job via the -pe switch to qsub(1) together with a request of a range for the number of parallel processes to be allocated by the job. Additional -l options may be used to specify the job requirement to further detail. Note, Univa Grid Engine allows backslashes (\) be used to escape new- line (\newline) characters. The backslash and the newline are replaced with a space (" ") character before any interpretation. FORMAT The format of a sge_pe file is defined as follows: pe_name The name of the parallel environment as defined for pe_name in sge_types(1). To be used in the qsub(1) -pe switch. slots The number of parallel processes being allowed to run in total under the parallel environment concurrently. Type is number, valid values are 0 to 9999999. user_lists A comma separated list of user access list names (see access_list(5)). Each user contained in at least one of the enlisted access lists has access to the parallel environment. If the user_lists parameter is set to NONE (the default) any user has access being not explicitly excluded via the xuser_lists parameter described below. If a user is contained both in an access list enlisted in xuser_lists and user_lists the user is denied access to the parallel environment. xuser_lists The xuser_lists parameter contains a comma separated list of so called user access lists as described in access_list(5). Each user contained in at least one of the enlisted access lists is not allowed to access the parallel environment. If the xuser_lists parameter is set to NONE (the default) any user has access. If a user is contained both in an access list enlisted in xuser_lists and user_lists the user is denied access to the parallel environment. start_proc_args The invocation command line of a start-up procedure for the parallel environment or the keyword NONE if no startup-script should be exe- cuted. The start-up procedure is invoked by sge_shepherd(8) for the master task of a parallel job after a possibly configured prolog (see sge_conf(5)) and prior to executing the job script. Its purpose is to setup the parallel environment correspondingly to its needs. An optional prefix "user@" specifies the user under which this procedure is to be started. The standard output of the start-up procedure is redirected to the file REQUEST.poJID in the job's working directory (see qsub(1)), with REQUEST being the name of the job as displayed by qstat(1) and JID being the job's identification number. Likewise, the standard error output is redirected to REQUEST.peJID Scripts where the execution duration would exceed 2 minutes will be terminated. This timeout can be adjusted by defining SCRIPT_TIMEOUT as execd_param in the configuration. The following special variables being expanded at runtime can be used (besides any other strings which have to be interpreted by the start and stop procedures) to constitute a command line: $pe_hostfile The pathname of a file containing a detailed description of the layout of the parallel environment to be setup by the start-up procedure. Each line of the file refers to a host on which par- allel processes are to be run. The first entry of each line denotes the hostname, the second entry the number of parallel processes to be run on the host, the third entry the name of the queue, and the fourth entry a processor range to be used in case of a multiprocessor machine. $host The name of the host on which the start-up or stop procedures are started. $job_owner The user name of the job owner. $job_id Univa Grid Engine's unique job identification number. $job_name The name of the job. $pe The name of the parallel environment in use. $pe_slots Number of slots granted for the job. $processors The processors string as contained in the queue configuration (see queue_conf(5)) of the master queue (the queue in which the start-up and stop procedures are started). $queue The cluster queue of the master queue instance. stop_proc_args The invocation command line of a shutdown procedure for the parallel environment or the keyword NONE if no shutdown procedure should be exe- cuted. The shutdown procedure is invoked by sge_shepherd(8) after the job script has finished, but before a possibly configured epilog (see sge_conf(5)) is started. Its purpose is to stop the parallel environ- ment and to remove it from all participating systems. An optional pre- fix "user@" specifies the user under which this procedure is to be started. The standard output of the stop procedure is also redirected to the file REQUEST.poJID in the job's working directory (see qsub(1)), with REQUEST being the name of the job as displayed by qstat(1) and JID being the job's identification number. Likewise, the standard error output is redirected to REQUEST.peJID Scripts where the execution duration would exceed 2 minutes will be terminated. This timeout can be adjusted by defining SCRIPT_TIMEOUT as execd_param in the configuration. The same special variables as for start_proc_args can be used to con- stitute a command line. allocation_rule The allocation rule is interpreted by the scheduler thread and helps the scheduler to decide how to distribute parallel processes among the available machines. If, for instance, a parallel environment is built for shared memory applications only, all parallel processes have to be assigned to a single machine, no matter how much suitable machines are available. If, however, the parallel environment follows the distrib- uted memory paradigm, an even distribution of processes among machines may be favorable. The allocation rule always refers to hosts, not to queues. So if there are specific queues requested for the parallel job, e.g. using the "-q" or "-masterq" switch (see qsub(1)) the tasks of the job might get dis- tributed over several queues, but the sum of tasks on one host will be always the one defined by the allocation rule. The current version of the scheduler only understands the following allocation rules: : An integer number that sets the number of processes per host. A value of 1 limits processes to 1 process per host. When the "-pe", Parallel Environment option is specified (see qsub(1)) the number of tasks specified with the "-pe" option must divide evenly into otherwise the job will not be sched- uled. If a master queue is requested using the "-masterq" option and a list of queues are specified with "-q" for the slave tasks and the master queue is not a member of the slave queue list specified by "-q" then the job will be scheduled if and only if the number of tasks specified in "-q" minus one divides evenly into . If the special denominator $pe_slots is used, the full range of processes as specified with the qsub(1) -pe switch has to be allocated on a single host (no matter which value belong- ing to the range is finally chosen for the job to be allo- cated). $fill_up: Starting from the best suitable host/queue, all available slots are allocated. Further hosts and queues are "filled up" as long as a job still requires slots for parallel tasks. $round_robin: From all suitable hosts a single slot is allocated until all tasks requested by the parallel job are dispatched. If more tasks are requested than suitable hosts are found, allocation starts again from the first host. The allocation scheme walks through suitable hosts in a best-suitable-first order. Note: If the master queue (i.e. the queue where the master task of the parallel job is located) is not requested explicitly, these allocation rules are always obeyed exactly. If a master queue is requested explicitly by adding the -masterq, -mas- terl queue= or -masterl hostname= switch to the submit command line (see submit(1)), then the Scheduler tries to fulfill both the allocation rule and the master queue request, which are possibly contrary requirements. If the allocation rule and the distribution of both the master queue and the slave queues over the execution hosts allow to make one of these tasks the master task, then both the -mas- terq (or -masterl queue= or -masterl hostname=) request and the allocation rule are obeyed exactly. If this is not possible, then the Scheduler automatically adds one task that will become the master task. Generally, the Scheduler will have to add one task for fixed allocation rules (i.e. "" or "$pe_slots") if the requested master queue is not part of the set of slave queues and none of the slave queues has an instance on the master host. The exception of this rule is an alloca- tion rule of "1". control_slaves This parameter can be set to TRUE or FALSE (the default). It indicates whether Univa Grid Engine is the creator of the slave tasks of a paral- lel application via sge_execd(8) and sge_shepherd(8) and thus has full control over all processes in a parallel application, which enables capabilities such as resource limitation and correct accounting. How- ever, to gain control over the slave tasks of a parallel application, a sophisticated PE interface is required, which works closely together with Univa Grid Engine facilities. Such PE interfaces are available through your local Univa Grid Engine support office. Please set the control_slaves parameter to false for all other PE interfaces. job_is_first_task The job_is_first_task parameter can be set to TRUE or FALSE. A value of TRUE indicates that the Univa Grid Engine job script already contains one of the tasks of the parallel application (the number of slots reserved for the job is the number of slots requested with the -pe switch), while a value of FALSE indicates that the job script (and its child processes) is not part of the parallel program (the number of slots reserved for the job is the number of slots requested with the -pe switch + 1). If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE and/or SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set to FALSE, the job_is_first_task parameter influences the accounting for the job: A value of TRUE means that accounting for cpu and requested memory gets multiplied by the number of slots requested with the -pe switch, if job_is_first_task is set to FALSE, the accounting informa- tion gets multiplied by number of slots + 1. urgency_slots For pending jobs with a slot range PE request the number of slots is not determined. This setting specifies the method to be used by Univa Grid Engine to assess the number of slots such jobs might finally get. The assumed slot allocation has a meaning when determining the resource-request-based priority contribution for numeric resources as described in sge_priority(5) and is displayed when qstat(1) is run without -g t option. The following methods are supported: : The specified integer number is directly used as prospective slot amount. min: The slot range minimum is used as prospective slot amount. If no lower bound is specified with the range 1 is assumed. max: The of the slot range maximum is used as prospective slot amount. If no upper bound is specified with the range the absolute maximum possible due to the PE's slots setting is assumed. avg: The average of all numbers occurring within the job's PE range request is assumed. accounting_summary This parameter is only checked if control_slaves (see above) is set to TRUE and thus Univa Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8). In this case, accounting information is available for every single slave task started by Univa Grid Engine. The accounting_summary parameter can be set to TRUE or FALSE. A value of TRUE indicates that only a single accounting record is written to the accounting(5) file, containing the accounting summary of the whole job including all slave tasks, while a value of FALSE indicates an individual accounting(5) record is written for every slave task, as well as for the master task. Note: When running tightly integrated jobs with SHARE- TREE_RESERVED_USAGE set, and with having accounting_summary enabled in the parallel environment, reserved usage will only be reported by the master task of the parallel job. No per parallel task usage records will be sent from execd to qmaster, which can significantly reduce load on qmaster when running large tightly integrated parallel jobs. daemon_forks_slaves This parameter is only checked if control_slaves (see above) is set to TRUE and thus Univa Grid Engine is the creator of the slave tasks of a parallel application via sge_execd(8) and sge_shepherd(8). The daemon_forks_slaves parameter defines if every task of a tightly integrated parallel job gets started individually via qrsh -inherit (default value FALSE, e.g. used for mpich integration) or if a single daemon is started via qrsh -inherit on every slave host which forks the slave tasks (value TRUE, e.g. used for openmpi or lam integration). With daemon_forks_slaves set to TRUE only a single task (the daemon) may get started per slave host, all limits set for this task are multi- plied by the number of slots granted on the host. master_forks_slaves The master_forks_slaves parameter can be set to TRUE if the master task (e.g. mpirun called in the job script) starts tasks running on the mas- ter host via fork/exec instead of starting them via qrsh -inherit. With master_forks_slaves set to TRUE all limits set for the master task (the job script) will be increased by the slave task limit multiplied by the number of slots granted on the host. No further tasks can be started on the master host via qrsh -inherit. RESTRICTIONS Note, that the functionality of the start-up, shutdown and signaling procedures remains the full responsibility of the administrator config- uring the parallel environment. Univa Grid Engine will just invoke these procedures and evaluate their exit status. If the procedures do not perform their tasks properly or if the parallel environment or the parallel application behave unexpectedly, Univa Grid Engine has no means to detect this. SEE ALSO sge_intro(1), sge__types(1), qconf(1), qdel(1), qmod(1), qsub(1), access_list(5), sge_qmaster(8), sge_shepherd(8). COPYRIGHT See sge_intro(1) for a full statement of rights and permissions. Univa Grid Engine File Formats UGE 8.5.4 SGE_PE(5)