Difference between revisions of "Development and Testing"
|Line 42:||Line 42:|
==Pre-Allocation of Resources==
==Pre-Allocation of Resources==
to create a SLURM allocation under which you can run commands or scripts with 'srun' for as long as the allocation is valid. Whatever you srununder the allocation will be executed within a job environment, but there will be no delay .
|Line 60:||Line 62:|
Revision as of 15:50, 20 May 2020
Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes. However, short test jobs (processes) are permitted as long as they fall within the following limits.
- No more than 16 cores
- No longer than 10 minutes (wall time)
- No more than 64 GB of RAM.
That above resource limits essentially define what we mean by a “small, short” job or process. These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM.
Data management operations such as gzip, rsync, scp sftp, etc. can take a long time to complete and are exempt from the 10 minute time limit.
If you have development and testing requirements that exceed the above resource limits there are several options as described below.
SLURM Interactive Session
You can request resources for an interactive session (i.e. job) and start a command shell (bash, for example). Within that command shell you will have access to the resources you requested and can run whatever commands and processes you wish. Consider the example below. It will give you 4 GB of memory for 8 hours on a real compute host and present you, once the job starts, with an interactive command shell (bash). From that shell you can run commands and launch processes just as you would from any other host (login or otherwise).
$ srun --mem=4gb --time=08:00:00 --pty bash -i
Note: Because the requested resources must be allocated and scheduled by the batch scheduler, it could take any where from a few seconds to a few hours for your interactive session to start. How long it takes depends on many factors that include how busy the system is overall and what percentage of your group's allocation is already in use.
See the SchedMD srun documentation for further information and details regarding the srun command.
SLURM Development Session
A small number of servers have been placed into a SLURM partition (collection of hosts) for the purpose of supporting just software development. You can access the partition by specifying the "dev" partition to the appropriate SLURM command. For example, to obtain a resources for an interactive session (job) in the dev partition you could run the following command.
srun --partition=hpg2-dev --mem=4gb --time=04:00:00 --pty bash -i
or, if you need more cores,
srun --partition=hpg2-dev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00 --pty bash -i
By loading the ufrc environment module, you can take advantage srundev and simpify the above to,
module load ufrc srundev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00
The srundev command is a wrapper encapsulating srun --partition=hpg2-dev --pty bash -i.
Note: The default time limit for the SLURM development partition is 00:10:00 (10 minutes). The maximum time limit in the development partition is 12 hours.
Pre-Allocation of Resources
Finally, you can also use salloc to create a SLURM allocation under which you can run commands or scripts with srun for as long as the allocation is valid. Whatever you srun under the allocation will be executed within a job environment, but there will be no delay since the resources have already been allocated.
$ salloc -n 1 --cpus-per-task=2 --mem=8gb --time=10:00:00 salloc: Pending job allocation 33359121 salloc: job 33333333 queued and waiting for resources salloc: job 33333333 has been allocated resources salloc: Granted job allocation 33333333 $ srun hostname c99a-s1.ufhpc $ srun echo "Running inside an allocation" Running inside an allocation $ echo $SLURM_MEM_PER_NODE 8192