Difference between revisions of "Development and Testing"

From UFRC
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Category:Hardware]][[Category:SLURM]]
 
[[Category:Hardware]][[Category:SLURM]]
 
==Login Nodes==
 
==Login Nodes==
Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes.  However, short test jobs are permitted as long as they fall within the following limits.
+
Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes.  However, short test jobs (processes) are permitted as long as they fall within the following limits.
  
 
# No more than 16 cores
 
# No more than 16 cores
Line 7: Line 7:
 
# No more than 64 GB of RAM.
 
# No more than 64 GB of RAM.
  
That above resource limits essentially define what we mean by a “small, short” job or process.  These limits should allow for the testing of job submission scripts or even simple application development tests.   
+
That above resource limits essentially define what we mean by a “small, short” job or process.  These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM.   
  
==Interactive SLURM Job==
+
Data management operations such as ''gzip, rsync, scp sftp'', etc. can take a long time to complete and are exempt from the 10 minute time limit.
 +
 
 +
If you have development and testing requirements that exceed the above resource limits you have several options as described below.
 +
 
 +
==Interactive SLURM Session==
 
An alternative approach is to create an environment similar to what you would get in a real job on HiPerGator. One way to do this is to start an interactive job on a compute node with srun. E.g.
 
An alternative approach is to create an environment similar to what you would get in a real job on HiPerGator. One way to do this is to start an interactive job on a compute node with srun. E.g.
 
  $ srun --mem=4gb --time=08:00:00 --pty bash -i
 
  $ srun --mem=4gb --time=08:00:00 --pty bash -i
Line 15: Line 19:
 
will give you 4gb of memory for 8 hours on a real compute node and present you, once the job starts, with an interactive prompt. You can run your job scripts as regular shell scripts in that environment.
 
will give you 4gb of memory for 8 hours on a real compute node and present you, once the job starts, with an interactive prompt. You can run your job scripts as regular shell scripts in that environment.
  
==Developmental SLURM Session==
+
==SLURM Development Session==
If it's taking a while to start an interactive session perhaps you should try our developmental partition as shown below. The 'dev' nodes are set up to start jobs faster as long as resources are available. The software environment on the nodes within the dev partition is consistent with that of the compute nodes so you can run jobs and get an accurate idea of what resources are needed to successfully complete your jobs.
+
The 'dev' nodes are set up to start jobs faster as long as resources are available. The software environment on the nodes within the dev partition is consistent with that of the compute nodes so you can run jobs and get an accurate idea of what resources are needed to successfully complete your jobs.
  
 
For example, to get a four-hour session with the default 1 processor core and 2gb of memory:
 
For example, to get a four-hour session with the default 1 processor core and 2gb of memory:

Revision as of 12:58, 20 May 2020

Login Nodes

Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes. However, short test jobs (processes) are permitted as long as they fall within the following limits.

  1. No more than 16 cores
  2. No longer than 10 minutes (wall time)
  3. No more than 64 GB of RAM.

That above resource limits essentially define what we mean by a “small, short” job or process. These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM.

Data management operations such as gzip, rsync, scp sftp, etc. can take a long time to complete and are exempt from the 10 minute time limit.

If you have development and testing requirements that exceed the above resource limits you have several options as described below.

Interactive SLURM Session

An alternative approach is to create an environment similar to what you would get in a real job on HiPerGator. One way to do this is to start an interactive job on a compute node with srun. E.g.

$ srun --mem=4gb --time=08:00:00 --pty bash -i

will give you 4gb of memory for 8 hours on a real compute node and present you, once the job starts, with an interactive prompt. You can run your job scripts as regular shell scripts in that environment.

SLURM Development Session

The 'dev' nodes are set up to start jobs faster as long as resources are available. The software environment on the nodes within the dev partition is consistent with that of the compute nodes so you can run jobs and get an accurate idea of what resources are needed to successfully complete your jobs.

For example, to get a four-hour session with the default 1 processor core and 2gb of memory:

$ module load ufrc
$ srundev --time=04:00:00

The srundev command is a wrapper around the srun --partition=hpg2-dev --pty bash -i command.

Other SLURM directives can also be added to request more processors or memory. For example:

$ module load ufrc
$ srundev --time=60 --ntasks=1 --cpus-per-task=4 --mem=4gb
Note
  • The default time limit for the developmental SLURM partition is 00:10:00 (10 minutes). The maximum time limit in the dev partition is 12 hours.

Pre-Allocation of Resources

Yet another approach is to log into HiPerGator and create a SLURM allocation under which you can run commands or scripts with 'srun' for as long as the allocation is valid. Whatever you srun under the allocation will be executed within a job environment, but there will be no delay for job startup. For example,

$ salloc -n 1 --cpus-per-task=2 --mem=8gb --time=10:00:00
salloc: Pending job allocation 33359121
salloc: job 33333333 queued and waiting for resources
salloc: job 33333333 has been allocated resources
salloc: Granted job allocation 33333333

$ srun hostname
c99a-s1.ufhpc

$ srun echo "Running inside an allocation"
Running inside an allocation

$ echo $SLURM_MEM_PER_NODE
8192

Enjoy the many ways to make sure your jobs are set up right. Test responsibly.