Difference between revisions of "Development and Testing"

From UFRC
Jump to navigation Jump to search
 
(26 intermediate revisions by 7 users not shown)
Line 1: Line 1:
[[Category:Hardware]][[Category:SLURM]]
+
[[Category:Scheduler]]
 +
{|align=right
 +
  |__TOC__
 +
  |}
 +
Development and Testing often require quick turnaround on using smaller amounts of resources or interactive activities. See the available option for interactive or small-scale testing work below.
 +
 
 +
{{Note|'''Warning:'''
 +
'''Do not run full-scale (normal) analyses on login nodes'''. The main approach to run computational analyses is through writing [[Sample SLURM Scripts|job scripts]] and sending them to the [[SLURM_Commands|scheduler]] to run. See also Some interfaces like [[Open OnDemand]], [[Jupyter#JupyterHub|JupyterHub]], and [[Galaxy]] can manage job scheduling behind the scenes and may be more convenient than job submission from the command-line when appropriate.
 +
 
 +
'''Only run workloads from blue storage.''' This is a fast storage systems that can handle the I/O involved in research workloads. Before using <code>sbatch</code> or launching a workload interactively, make sure your working directory is a blue file path, e.g. <code>/blue/<group>/<user></code>, and not your /orange or /home directory (<code>~</code> or <code>/home/<user></code>). Use <code>pwd</code> to print working directory.
 +
 
 +
|warn}}
 +
 
 
==Login Nodes==
 
==Login Nodes==
Generally speaking, interactive work other than managing jobs and data is discouraged on the login nodes. However, short test jobs (processes) are permitted as long as they fall within the following limits.
+
Running resource intensive applications on the login nodes is against HPG usage policy. Learn more at [[HPG Computation]]. Interactive work other than managing jobs and data is discouraged on the login nodes. However, short test jobs (processes) are permitted as long as they fall within the following limits.
  
 
# No more than 16 cores
 
# No more than 16 cores
Line 7: Line 19:
 
# No more than 64 GB of RAM.
 
# No more than 64 GB of RAM.
  
That above resource limits essentially define what we mean by a “small, short” job or process.  These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM
+
That above resource limits essentially define what we mean by a “small, short” job or process.  These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM.
 
 
Data management operations such as ''gzip, rsync, scp sftp'', etc. can take a long time to complete and are exempt from the 10 minute time limit.  
 
  
If you have development and testing requirements that exceed the above resource limits there are several options as described below.
+
Data management operations such as ''gzip, rsync, scp sftp'', etc. can take a long time to complete and are exempt from the 10 minute time limit. If you have development and testing requirements that exceed the above resource limits there are several options as described below.
  
 
==SLURM Interactive Session==
 
==SLURM Interactive Session==
 
You can request resources for an interactive session (i.e. job) and start a command shell (bash, for example).  Within that command shell you will have access to the resources you requested and can run whatever commands and processes you wish. Consider the example below. It will give you 4 GB of memory for 8 hours on a real compute host and present you, once the job starts, with an interactive command shell (bash). From that shell you can run commands and launch processes just as you would from any other host (login or otherwise).  
 
You can request resources for an interactive session (i.e. job) and start a command shell (bash, for example).  Within that command shell you will have access to the resources you requested and can run whatever commands and processes you wish. Consider the example below. It will give you 4 GB of memory for 8 hours on a real compute host and present you, once the job starts, with an interactive command shell (bash). From that shell you can run commands and launch processes just as you would from any other host (login or otherwise).  
  
    $ srun --mem=4gb --time=08:00:00 --pty bash -i
+
<code>$ srun --mem=4gb --time=08:00:00 --pty bash -i</code>
  
 
'''Note:''' Because the requested resources must be allocated and scheduled by the batch scheduler, it could take any where from a few seconds to a few hours for your interactive session to start.  How long it takes depends on many factors that include how busy the system is overall and what percentage of your group's allocation is already in use.
 
'''Note:''' Because the requested resources must be allocated and scheduled by the batch scheduler, it could take any where from a few seconds to a few hours for your interactive session to start.  How long it takes depends on many factors that include how busy the system is overall and what percentage of your group's allocation is already in use.
 +
 +
See the [https://slurm.schedmd.com/srun.html SchedMD srun documentation] for further information and details regarding the ''srun'' command.
  
 
==SLURM Development Session==
 
==SLURM Development Session==
The 'dev' nodes are set up to start jobs faster as long as resources are available. The software environment on the nodes within the dev partition is consistent with that of the compute nodes so you can run jobs and get an accurate idea of what resources are needed to successfully complete your jobs.
 
  
For example, to get a four-hour session with the default 1 processor core and 2gb of memory:
+
A small number of servers have been placed into a SLURM partition (collection of hosts) for the purpose of supporting just software development.  Jobs in this partition are not subject to a user's normal [[Account_and_QOS_limits_under_SLURM | QOS]] limits.  You can access the partition by specifying the "dev" partition to the appropriate SLURM command.  For example, to obtain a resources for an interactive session (job) in the dev partition you could run the following command.
$ module load ufrc
+
<pre>srun --partition=hpg-dev --mem=4gb --time=04:00:00 --pty bash -i
$ srundev --time=04:00:00
+
#or if you need more cores: srun --partition=hpg-dev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00 --pty bash -i </pre>
 +
By loading the ''ufrc'' environment module, you can take advantage ''srundev'' and simpify the above to,
 +
<pre>module load ufrc
 +
srundev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00 </pre>
  
The srundev command is a wrapper around the <code>srun --partition=hpg2-dev --pty bash -i</code> command.
+
The ''srundev'' command is a wrapper encapsulating ''srun --partition=hpg-dev --pty bash -i''.
  
Other SLURM directives can also be added to request more processors or memory. For example:
+
'''Note:''' The default time limit for the SLURM development partition is 00:10:00 (10 minutes). The maximum time limit in the development partition is 12 hours.
$ module load ufrc
 
$ srundev --time=60 --ntasks=1 --cpus-per-task=4 --mem=4gb
 
 
 
;Note:
 
 
 
* The default time limit for the developmental SLURM partition is 00:10:00 (10 minutes). The maximum time limit in the dev partition is 12 hours.
 
  
 
==Pre-Allocation of Resources==
 
==Pre-Allocation of Resources==
Yet another approach is to log into HiPerGator and create a SLURM allocation under which you can run commands or scripts with 'srun' for as long as the allocation is valid. Whatever you srun under the allocation will be executed within a job environment, but there will be no delay for job startup. For example,
+
Finally, you can also use [https://slurm.schedmd.com/salloc.html salloc] to create a SLURM allocation under which you can run commands or scripts with ''srun'' for as long as the allocation is valid. Whatever you ''srun'' under the allocation will be executed within the context of the allocated resources but there will be no delay since the resources have already been allocated.  
  
 +
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand to see an example''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 
<pre>
 
<pre>
 
$ salloc -n 1 --cpus-per-task=2 --mem=8gb --time=10:00:00
 
$ salloc -n 1 --cpus-per-task=2 --mem=8gb --time=10:00:00
salloc: Pending job allocation 33359121
+
salloc: Pending job allocation 52219029
salloc: job 33333333 queued and waiting for resources
+
salloc: job 52219029 queued and waiting for resources
salloc: job 33333333 has been allocated resources
+
salloc: job 52219029 has been allocated resources
salloc: Granted job allocation 33333333
+
salloc: Granted job allocation 52219029
  
$ srun hostname
+
[chasman@login4 slurm]$ printenv | grep SLURM
c99a-s1.ufhpc
+
SLURM_NODELIST=c6a-s26
 +
SLURM_JOB_NAME=bash
 +
SLURM_NODE_ALIASES=(null)
 +
SLURM_JOB_QOS=ufhpc
 +
SLURM_NNODES=1
 +
SLURM_JOBID=52219029
 +
SLURM_NTASKS=1
 +
SLURM_TASKS_PER_NODE=1
 +
SLURM_CPUS_PER_TASK=2
 +
SLURM_JOB_ID=52219029
 +
SLURM_SUBMIT_DIR=/home/chasman
 +
SLURM_NPROCS=1
 +
SLURM_JOB_NODELIST=c6a-s26
 +
SLURM_CLUSTER_NAME=hipergator
 +
SLURM_JOB_CPUS_PER_NODE=2
 +
SLURM_SUBMIT_HOST=login4.ufhpc
 +
SLURM_JOB_PARTITION=hpg-default
 +
SLURM_JOB_ACCOUNT=ufhpc
 +
SLURM_JOB_NUM_NODES=1
 +
SLURM_MEM_PER_NODE=8192
 +
</pre>
 +
</div>
 +
</div>
  
$ srun echo "Running inside an allocation"
+
==Interactive Terminals==
Running inside an allocation
+
You can get a more familiar ''Interactive Terminal'' similar to what you could open on a local workstation via a web or GUI environment. We offer [[Open_OnDemand|Open OnDemand]] where you can select between a terminal session and a full Linux desktop (Xfce4) or a terminal in a JupyterLab session in [[Jupyter#JupyterHub| JupyterHub]].
  
$ echo $SLURM_MEM_PER_NODE
+
==Avoiding Session Interruption==
8192
+
It is possible to have multiple sessions running simultaneously. HiPerGator offers options for terminal multiplexing. See [[Persistent Terminal Sessions]] for more details.
</pre>
 
Enjoy the many ways to make sure your jobs are set up right. Test responsibly.
 

Latest revision as of 20:31, 5 September 2024

Development and Testing often require quick turnaround on using smaller amounts of resources or interactive activities. See the available option for interactive or small-scale testing work below.

Warning:

Do not run full-scale (normal) analyses on login nodes. The main approach to run computational analyses is through writing job scripts and sending them to the scheduler to run. See also Some interfaces like Open OnDemand, JupyterHub, and Galaxy can manage job scheduling behind the scenes and may be more convenient than job submission from the command-line when appropriate.

Only run workloads from blue storage. This is a fast storage systems that can handle the I/O involved in research workloads. Before using sbatch or launching a workload interactively, make sure your working directory is a blue file path, e.g. /blue/<group>/<user>, and not your /orange or /home directory (~ or /home/<user>). Use pwd to print working directory.

Login Nodes

Running resource intensive applications on the login nodes is against HPG usage policy. Learn more at HPG Computation. Interactive work other than managing jobs and data is discouraged on the login nodes. However, short test jobs (processes) are permitted as long as they fall within the following limits.

  1. No more than 16 cores
  2. No longer than 10 minutes (wall time)
  3. No more than 64 GB of RAM.

That above resource limits essentially define what we mean by a “small, short” job or process. These limits should allow for the testing of job submission scripts or even simple application development tests. If you need to run multiple instances of some process (gzip, make, cp, etc), you should observe the above limits and not run more than 16 simultaneous instances of any single process nor should the collection of such processes consume more than 64 GB of RAM.

Data management operations such as gzip, rsync, scp sftp, etc. can take a long time to complete and are exempt from the 10 minute time limit. If you have development and testing requirements that exceed the above resource limits there are several options as described below.

SLURM Interactive Session

You can request resources for an interactive session (i.e. job) and start a command shell (bash, for example). Within that command shell you will have access to the resources you requested and can run whatever commands and processes you wish. Consider the example below. It will give you 4 GB of memory for 8 hours on a real compute host and present you, once the job starts, with an interactive command shell (bash). From that shell you can run commands and launch processes just as you would from any other host (login or otherwise).

$ srun --mem=4gb --time=08:00:00 --pty bash -i

Note: Because the requested resources must be allocated and scheduled by the batch scheduler, it could take any where from a few seconds to a few hours for your interactive session to start. How long it takes depends on many factors that include how busy the system is overall and what percentage of your group's allocation is already in use.

See the SchedMD srun documentation for further information and details regarding the srun command.

SLURM Development Session

A small number of servers have been placed into a SLURM partition (collection of hosts) for the purpose of supporting just software development. Jobs in this partition are not subject to a user's normal QOS limits. You can access the partition by specifying the "dev" partition to the appropriate SLURM command. For example, to obtain a resources for an interactive session (job) in the dev partition you could run the following command.

srun --partition=hpg-dev --mem=4gb --time=04:00:00 --pty bash -i
#or if you need more cores: srun --partition=hpg-dev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00 --pty bash -i 

By loading the ufrc environment module, you can take advantage srundev and simpify the above to,

module load ufrc
srundev --mem=4gb --ntasks=1 --cpus-per-task=8 --time=04:00:00 

The srundev command is a wrapper encapsulating srun --partition=hpg-dev --pty bash -i.

Note: The default time limit for the SLURM development partition is 00:10:00 (10 minutes). The maximum time limit in the development partition is 12 hours.

Pre-Allocation of Resources

Finally, you can also use salloc to create a SLURM allocation under which you can run commands or scripts with srun for as long as the allocation is valid. Whatever you srun under the allocation will be executed within the context of the allocated resources but there will be no delay since the resources have already been allocated.

Expand to see an example

$ salloc -n 1 --cpus-per-task=2 --mem=8gb --time=10:00:00
salloc: Pending job allocation 52219029
salloc: job 52219029 queued and waiting for resources
salloc: job 52219029 has been allocated resources
salloc: Granted job allocation 52219029

[chasman@login4 slurm]$ printenv | grep SLURM
SLURM_NODELIST=c6a-s26
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_JOB_QOS=ufhpc
SLURM_NNODES=1
SLURM_JOBID=52219029
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_CPUS_PER_TASK=2
SLURM_JOB_ID=52219029
SLURM_SUBMIT_DIR=/home/chasman
SLURM_NPROCS=1
SLURM_JOB_NODELIST=c6a-s26
SLURM_CLUSTER_NAME=hipergator
SLURM_JOB_CPUS_PER_NODE=2
SLURM_SUBMIT_HOST=login4.ufhpc
SLURM_JOB_PARTITION=hpg-default
SLURM_JOB_ACCOUNT=ufhpc
SLURM_JOB_NUM_NODES=1
SLURM_MEM_PER_NODE=8192

Interactive Terminals

You can get a more familiar Interactive Terminal similar to what you could open on a local workstation via a web or GUI environment. We offer Open OnDemand where you can select between a terminal session and a full Linux desktop (Xfce4) or a terminal in a JupyterLab session in JupyterHub.

Avoiding Session Interruption

It is possible to have multiple sessions running simultaneously. HiPerGator offers options for terminal multiplexing. See Persistent Terminal Sessions for more details.