SAS is a commercial integrated system for statistical analysis, data mining, and graphics as well as many enterprise oriented additional features. SAS cost and the breadth of SAS features means that both a significant monetary investment and a substantial time investment are required to master it. SAS 9.3 Documentation is vast. For research purposes the SAS 9.3 User's Guide is a thorough reference for the functions and procedures you may need to do the statistical analysis using SAS.
Execution Environment and Modules
To use sas with the environment modules system at HPC the following commands are available:
Get module information for sas:
$module spider sas
Load the default application module:
$module load sas
The modulefile for this software adds the directory with executable files to the shell execution PATH and sets the following environment variables:
- HPC_SAS_DIR - directory where sas is located.
How To Run
SAS has a number of options:
- -nodms : This stops SAS from using its GUI capability and goes into a text only mode.
- -filelocks fail : This causes SAS to stop and print out an error when multiple sas processes try to use the same file.
- -nonews: Prevents SAS from printing a useless header to the output.
- -memsize xxxxM - specifies the total amount of memory that is available to each SAS session, and places an enforced limit on the amount of virtual memory that SAS can dynamically allocate at any one time.
- -realmemsize xxxxM - sets the recommended upper limit on working memory for procedures that can use both memory and utility disk space, such as PROC SUMMARY and PROC SORT, so that they can avoid virtual memory thrashing.
- -work $TMPDIR - the directory where SAS should store its temporary files. Using $TMPDIR will allow your program to run much faster and prevent any network-related file access issues that SAS is prone to run into.
- -sysin file : Designate a file for SAS to load as its input.
To do a batch submission of a SAS script use the -sysin command line option and write all of your SAS commands in a file. In this way you can submit jobs to the batch queue system to run your jobs on the cluster.
SAS is a mature product with a long history behind it. In a modern high-performance environment it means that additional actions need to be taken to mitigate potential issues stemming from SASs focus on filesystem I/O instead of using memory.
To connect to a test node, for example test01, where you can run the graphical interface to SAS use the following or a similar command on a linux or MacOS X system:
ssh USER@test01.ufhpc -o ForwardX11=yes -o ForwardX11Trusted=yes -o ProxyCommand='ssh USER@submit.hpc.ufl.edu exec nc test01 %p'
or add the following to your ~/.ssh/config file:
Host test01 User USER KeepAlive yes ProxyCommand ssh USER@submit.hpc.ufl.edu exec nc test01 %p ForwardX11 yes ForwardX11Trusted yes
where USER is your username. After editing ~/.ssh/config you can just run the following command to connect:
Once connected run
module load sas sas
PBS Script Examples
Sample Job Script
#!/bin/bash # #PBS -N Research #PBS -o market.out #PBS -e market.err #PBS -m abe #PBS -M <EMAIL ADDRESS> #PBS -l nodes=1:ppn=1 #PBS -l pmem=1gb #PBS -l walltime=01:00:00 module load sas cd /scratch/hpc/USERNAME sas -memsize 1024M -nodms -nonews -work $TMPDIR -filelocks none -sysin sas.inp