Econometrics Laboratory

Help Documentation

Find answers to your computing questions.
Search the documentation: 
Return to index.

How do I use the EML Linux cluster?


The EML operates a high-performance Linux-based computing cluster. As of June 29, 2017, this cluster consists of both our former "production" cluster and our former "high priority" cluster, and the combined cluster uses the SLURM queueing software to manage jobs. What was formerly known as our "production" cluster is now the low priority partition in SLURM and what was formerly the "high priority" cluster is now the high priority partition in SLURM. For those of you familiar with the low and high priority queues under SGE on the production cluster, you can think of the low priority partition as playing the role of the low priority SGE queue and the high priority partition as playing the role of the high priority SGE queue. However jobs on the two queues now run on separate cluster nodes.

The regular priority partition has eight nodes, each with two 16-core CPUs available for compute jobs (i.e., 32 cores per node), for a total of 256 cores. Each node has 248GB dedicated RAM.

The high priority partition provides faster cores than the regular priority partition (but with more restricted usage to ensure that the nodes are likely to be available when a user needs to run a job). This partition has four nodes, each with two 14-core CPUs available for compute jobs (i.e., 28 cores per node). Each core has two hyperthreads, for a total of 224 processing units. In the remainder of this document, we'll refer to these processing units as 'cores'. Each node has 132 GB dedicated RAM. As currently set up, this partition is limited to at most 28 cores per user at a single time, whether in a single job or spread across multiple jobs. 

Both partitions are managed by the SLURM queueing software. SLURM provides a standard batch queueing system through which users submit jobs to the cluster. Jobs are submitted to SLURM using a user-defined shell script that executes one's application code. Interactive use is also an option. Users may also query the cluster to see job status. As currently set up, the cluster is designed for processing single-core and multi-core/threaded jobs, as well as distributed memory jobs that use MPI. All software running on EML Linux machines is available on the cluster. Users can also compile programs on any EML Linux machine and then run that program on the cluster.

Below is more detailed information about how to use the cluster.

Access and Job Restrictions/Time Limits

The cluster is open to a restricted set of Department of Economics faculty, grad students, project account collaborators, and visitors using their EML logon.

Currently users may submit jobs on the following submit hosts:

  blundell, frisch, hicks, jorgensen, klein, laffont, logit, marlowe, marshall, nerlove, radner, theil, durban

The cluster has two job queues (called partitions by SLURM) called 'low' and 'high'. Interactive jobs can also be run in either queue.

One important rule in using the cluster is that your code should not use any more cores than you have requested via SLURM when submitting your job. If your code uses more than one core, you must follow the instructions given in the section on Submitting Parallel Jobs.

At the moment the default time limit on each job is five days run-time, but users can still request a longer limit (up to max of 28 days) with the -t flag. The scheduling software can better balance usage across multiple users when it has information about how long each job is expected to take, so if possible please indicate a time limit for your job even if it is less than five days. Feel free to be generous in this time limit to avoid having your job killed if it runs longer than expected. Also feel free to set a time limit as a round number, such as 1 hour, 4 hours, 1 day, 3 days, 10 days, and 28 days rather than trying to be more exact.

This table outlines the job restrictions in each partition.

Partition/Queue Max. # cores per user (running) Time Limit Max. memory/job (GB) Max. # cores/job
low (default) 256 28 days* 264 GB 32**
high 28 28 days* 128 GB 28

 

* See the Section on "Submitting Long Jobs" for jobs you expect to take more than three days.

** If you use MPI (including foreach with doMPI in R), you can run individual jobs on more than 32 cores. See the Section on "Submitting MPI Jobs" for such cases. You can also use up to 64 cores for a single Matlab job; see the Section on "Submitting MATLAB DCS Jobs". 

We have implemented a 'fair share' policy that governs the order in which jobs that are waiting in a given queue start when cores become available. In particular, if two users each have a job sitting in a queue, the job that will start first will be that of the user who has made less use of the cluster recently (measured in terms of CPU time). The measurement of CPU time downweights usage over time, with a half-life of one month, so a job that ran a month ago will count half as much as a job that ran yesterday. Apart from this prioritization based on recent use, all users are treated equally.

Basic SLURM Usage

SLURM and SGE Syntax Comparison

If you're used to the SGE commands on our former production cluster, the following table summarizes the analogous commands under SLURM.

Comparison of SGE and SLURM commands

 

qsub job.sh sbatch job.sh
qstat squeue
qdel {job_id} scancel {job_id}
qrsh -q interactive.q srun --pty /bin/bash
srun --pty --x11=first {command}

Submitting a Simple Single-Core Job

Prepare a shell script containing the instructions you would like the system to execute. When submitted using the instructions in this section, your code should only use a single core at a time; it should not start any additional processes. In the later sections of this document, we describe how to submit jobs that use a variety of types of parallelization

For example a simple script to run the Matlab code in the file 'simulate.m' would contain these lines:

#!/bin/bash
matlab -nodisplay -nodesktop -singleCompThread < simulate.m > simulate.out

Note that the first line, indicating which UNIX shell to use, is required. You can specify tcsh or another shell if you prefer.

Once logged onto a submit host, use the sbatch command with the name of the shell script (assumed to be job.sh here) to enter a job into the queue:

theil:~/Desktop$ sbatch job.sh
Submitted batch job 380

Here the job is assigned job ID 380. Results that would normally be printed to the screen via standard output and standard error will be written to a file called simulate.out per the invocation of Matlab in the job script.

Important: when submitting jobs in this fashion you must follow the instructions below to ensure your job uses only the single core requested.

Any Matlab jobs submitted in this fashion must start Matlab with the -singleCompThread flag in your job script.

Similarly, any SAS jobs submitted in this fashion must start SAS with the following flags in your job script.

sas -nothreads -cpucount 1

For Stata, if you submit a job without requesting multiple cores, please make sure to use Stata/SE so that Stata only uses a single core.

SLURM provides a number of additional flags (input options) to control what happens; you can see the man page for sbatch for help with these. Here are some examples, placed in the job script file, where we name the job, ask for email updates and name the output and error files:

#!/bin/bash
#SBATCH --job-name=myAnalysisName
#SBATCH --mail-type=ALL                       
#SBATCH --mail-user=blah@berkeley.edu
#SBATCH -o myAnalysisName.out #File to which standard out will be written
#SBATCH -e myAnalysisName.err #File to which standard err will be written
matlab -nodisplay -nodesktop -singleCompThread < simulate.m > simulate.out

For any of the sbatch flags you may choose to include them in the job script as just above, or to use the flags on the command line when you submit the job, just after you type 'sbatch' and before the name of the submission script, for example:

theil:~/Desktop$ sbatch --job-name=foo --mail-user=blah@berkeley.edu job.sh

How to Kill a Job

First, find the job-id of the job, by typing 'squeue' at the command line of a submit host (see the section on 'How to Monitor Jobs' below.

Then use scancel to delete the job (with id 380 in this case):

theil:~/Desktop$ scancel 380

Submitting a High-Priority Job

To submit a job to the faster nodes in the high priority partition, you must include either the '--partition=high' or '-p high' flag. Without this flag, jobs will be run by default in the low partition. For example:

theil:~/Desktop$ sbatch -p high job.sh
Submitted batch job 380

You can also submit interactive jobs (see next section) to the high partition, by simply adding the flag for the high partition to the srun command.

Interactive Jobs

You can work interactively on a node from the Linux shell command line by starting an interactive job (in either the low or high priority partitions). Please do not forget to close your interactive sessions when you finish your work so the cores are available to other users.

The syntax for requesting an interactive (bash) shell session is:

srun --pty /bin/bash

This will start a shell on one of the nodes. You can then act as you would on any EML Linux compute server. For example, you might use top to assess the status of one of your non-interactive (i.e., batch) cluster jobs. Or you might test some code before running it as a batch job. You can also transfer files to the local disk of the cluster node.

If you want to run a program that involves a graphical interface (requiring an X11 window), you need to add --x11=first to your srun command. So you could directly run Matlab, e.g., on a cluster node as follows:

srun --pty --x11=first matlab 

or you could add the -x11=first flag when requesting an interactive shell session and then subsequently start a program that has a graphical interface.

Please note that you should only use one core in your interactive job unless you specifically request more cores. To run an interactive session in which you would like to use multiple cores, do the following (here we request 4 cores for our use):

srun --pty --cpus-per-task 4 /bin/bash 

Note that "-c" is a shorthand for "--cpus-per-task". More details on jobs that use more than one core can be found below in the section on Submitting Parallel Jobs.

To monitor a job on a specific node or transfer files to the local disk of a specific node, you need to request that your interactive session be started on the node of interest (in this case eml-sm10):

srun --pty -w eml-sm10 /bin/bash

Note that if that specific node does not have sufficient free cores to run your job, you will need to wait until cores become available on that node before your interactive session will start. The squeue command (see below in the section on How to Monitor Jobs) will tell you on which node a given job is running.

Submitting Long Jobs and Setting Job Time Limits

As mentioned earlier, the default time limit on each job is five days run-time, but users can still request a longer limit (up to max of 28 days) with the -t flag, as illustrated here to request a 10-day job:

theil:~/Desktop$ sbatch -t 10-00:00:00 job.sh

The scheduling software can better balance usage across multiple users when it has information about how long each job is expected to take, so if possible please indicate a time limit for your job even if it is less than three days. Feel free to be generous in this time limit to avoid having your job killed if it runs longer than expected. (For this reason, we suggest that if you expect your job to take more than three days that you may want to increase the limit relative to the five-day default.) Also feel free to set a time limit as a round number, such as 1 hour, 4 hours, 1 day, 3 days, 10 days, and 28 days rather than trying to be more exact.

Here is an example of requesting three hours for a job:

theil:~/Desktop$ sbatch -t 3:00:00 job.sh

How To Monitor Jobs

The SLURM command squeue provides info on job status:

theil:~/Desktop> squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               381      high   job.sh paciorek  R      25:28      1 eml-sm20
               380      low    job.sh paciorek  R      25:37      1 eml-sm11

The following will tailor the output to include information on the number of cores (the CPUs column below) being used:

theil:~/Desktop> squeue -o "%.7i %.9P %.8j %.8u %.2t %.9M %.5C %.8r %.6D %R"
   JOBID PARTITION     NAME     USER ST      TIME  CPUS   REASON  NODES NODELIST(REASON)
     381      high   job.sh paciorek  R     28:00     4     None      1 eml-sm20
     380      low    job.sh paciorek  R     28:09     4     None      1 eml-sm11 

The 'ST' field indicates whether a job is running (R), failed (F), or pending (PD). The latter occurs when there are not yet enough resources on the system for your job to run.

Submitting Parallel Jobs

One can use SLURM to submit a variety of types of parallel code. Here is a set of potentially useful templates that we expect will account for most user needs. If you have a situation that does not fall into these categories or have questions about parallel programming, submitting jobs to use more than one core, or are not sure how to follow these rules, please email consult@econ.berkeley.edu.

For additional details, please see the notes from SCF workshops on the basics of parallel programming in R, Python, Matlab and C, with some additional details on using the cluster. If you're making use of the threaded BLAS, it's worth doing some testing to make sure that threading is giving an non-negligible speedup; see the notes above for more information.

Submitting Threaded Jobs

Here's an example job script to use multiple threads (4 in this case) in R (or with your own openMP-based program):

#!/bin/bash
#SBATCH --cpus-per-task 4
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
R CMD BATCH --no-save simulate.R simulate.Rout

This will allow your R code to use the system's threaded BLAS and LAPACK routines. [Note that in R you can instead use the omp_set_num_threads() function in the RhpcBLASctl package, again making use of the SLURM_CPUS_PER_TASK environment variable.]

The same syntax in your job script will work if you've compiled a C/C++/Fortran program that makes use of openMP for threading. Just replace the R CMD BATCH line with the line calling your program.

Here's an example job script to use multiple threads (4 in this case) in Matlab:

#!/bin/bash
#SBATCH --cpus-per-task 4
matlab -nodesktop -nodisplay < simulate.m > simulate.out

IMPORTANT: At the start of your Matlab code file you should include this line:

feature(’numThreads’, str2num(getenv('SLURM_CPUS_PER_TASK')));

Threading in Matlab uses at most 16 cores, so do not request more than 16 cores for a Matlab job that exploits threaded calculations.

Here's an example job script to use multiple threads (4 in this case) in SAS:

#!/bin/bash
#SBATCH --cpus-per-task 4
sas -threads -cpucount $SLURM_CPUS_PER_TASK

A number of SAS procs are set to take advantage of threading, including SORT, SUMMARY, TABULATE, GLM, LOESS, and REG. SAS enables threading by default, with the default number of threads set to four. Starting SAS as above ensures that the number of threads is set to the number of cores you requested. You can check that threading is enabled from within SAS by running the following and looking for the cpucount and threads options in the printout.

Proc Options group=performance; run;

You can use up to eight cores with Stata/MP (limited by the EML license for Stata). If you request eight cores (--cpus-per-task=8), you are all set. If you request fewer than eight (i.e., 2-7), you need to set the number of processors in Stata to be the number you requested in your job submission. One way to do this is to hard-code the following line at the start of your Stata code (in this case assuming you requested four cores):

set processors 4

You can do this in an automated fashion by first invoking Stata in your job script as:

stata-mp -b do myStataCode.do ${SLURM_CPUS_PER_TASK}

and then at the start of your Stata code including these two lines:

args ncores
set processors 'ncores'

Submitting Multi-core Jobs

The following example job script files pertain to jobs that need to use multiple cores on a single node that do not fall under the threading/openMP context. This is relevant for parallel code in R that starts multiple R process (e.g., foreach, mclapply, parLapply), for parfor in Matlab, and for IPython parallel, Pool.map and pp.Server in Python.

Here's an example script that uses multiple cores (4 in this case):

#!/bin/bash
#SBATCH --cpus-per-task 4
R CMD BATCH --no-save simulate.R simulate.Rout

IMPORTANT: Your R, Python, or any other code should use no more than the number of total cores requested (4 in this case). You can use the SLURM_CPUS_PER_TASK environment variable to programmatically control this.

The same syntax for your job script pertains to Matlab. IMPORTANT: when using parpool in Matlab, you should do the following:

parpool(str2num(getenv('SLURM_CPUS_PER_TASK')))

To use more than 32 workers (32 cores) in Matlab in a parpool context (or to use cores spread across multiple nodes), you need to use MATLAB DCS, discussed below. 

Submitting MPI Jobs

You can use MPI to run jobs across multiple nodes. This modality allows you to use more cores than exist on a single node or to gather free cores that are scattered across the nodes when the cluster is heavily used. (Note that on the high priority partition, given the 28-core limit per user, this is primarily useful for when there are not sufficient free cores on a single node for your job.) 

Here's an example script that uses multiple processors via MPI (28 in this case):

#!/bin/bash
#SBATCH --ntasks 28
mpirun -np $SLURM_NTASKS myMPIexecutable

Note that "-n" is a shorthand for "--ntasks". 

"myMPIexecutable" could be C/C++/Fortran code you've written that uses MPI, or R or Python code that makes use of MPI. More details are available here. One simple way to use MPI is to use the doMPI back-end to foreach in R. In this case you invoke R via mpirun as:

mpirun -np $SLURM_NTASKS R CMD BATCH file.R file.out

Note that in this case, unlike some other invocations of R via mpirun, mpirun starts all of the R processes.

Another use case for R in a distributed computing context is to use functions such as parSapply and parLapply after using the makeCluster command with a character vector indicating the nodes allocated by SLURM. If you run the following as part of your job script before the command invoking R, the file slurm.hosts will contain a list of the node names that you can read into R and pass to makeCluster.

srun hostname -s > slurm.hosts

To run an MPI job with each process threaded, your job script would look like the following (here with 14 processes and two threads per process):

#!/bin/bash
#SBATCH --ntasks 14 --cpus-per-task 2
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np $SLURM_NTASKS -x OMP_NUM_THREADS myMPIexecutable

Submitting MATLAB DCS Jobs

MATLAB DCS allows you to run a parallel MATLAB job across multiple nodes. You can use up to 64 cores, but since we only have licenses for 64 cores, if other users are using some of the license slots, your job may be queued until sufficient license slots become available.

To submit a MATLAB DCS job, you'll need to specify the number of MATLAB workers by using the -n flag and also indicate that a DCS job will be run using '-C dcs'. Here's an example job script:

#!/bin/bash
#SBATCH --ntasks 14 -C dcs
matlab -nodesktop -nodisplay < code.m > code.mout
Then in your MATLAB code, simply invoke parpool as:

pool = parpool(str2num(getenv('SLURM_NTASKS')))

Note that the workers will run as the user "matlabdcs", so if you interactively log into a node with some of the workers on it, you will see MATLAB processes running as this user if you use 'top' or 'ps'.

Automating Submission of Multiple Jobs

Using Job Arrays to Submit Multiple jobs at Once

Job array submissions are a nice way to submit multiple jobs in which you vary a parameter across the different jobs.

Here's what your job script would look like, in this case to run a total of 5 jobs with parameter values of 0, 1, 2, 5, 7:

#!/bin/bash
#SBATCH -a 0-2,5,7
myExecutable

Your program should then make use of the SLURM_ARRAY_TASK_ID environment variable, which for a given job will contain one of the values from the set given with the -a flag (in this case from {0,1,2,5,7}). You could, for example, read SLURM_ARRAY_TASK_ID into your R, Python, Matlab, or C code.

Here's a concrete example where it's sufficient to use SLURM_ARRAY_TASK_ID to distinguish different input files if you need to run the same command (the bioinformatics program tophat in this case) on multiple input files (in this case, trans0.fq, trans1.fq, ...):

#!/bin/bash
#SBATCH -a 0-2,5,7
tophat BowtieIndex trans${SLURM_ARRAY_TASK_ID}.fq

Submitting Data Parallel (SPMD) Code

Here's how you would set up your job script if you want to run multiple instances (18 in this case) of the same code as part of a single job.

#!/bin/bash
#SBATCH --ntasks 18 
srun myExecutable

To have each instance behave differently, you can make use of the SLURM_PROCID environment variable, which will be distinct (and have values 0, 1, 2, ...) between the different instances.

To have each process be  threaded, see the syntax under the MPI section above.

"Manually" Automating Job Submission

The above approaches are more elegant, but you can also use UNIX shell tools to submit multiple SLURM jobs. Here are some approaches and example syntax. We've tested these a bit but email consult@econ.berkeley.edu if you have problems or find a better way to do this. (Of course you can also manually create lots of individual job submission scripts, each of which calls a different script.)

First, remember that each individual job must be submitted through sbatch. I.e., no job submission script should execute jobs in parallel, except via the mechanisms discussed earlier in this document.

Here is some example bash shell code (which could be placed in a shell script file) that loops over two variables (one numeric and the other a string):

for ((it = 1; it <= 10; it++)); do
for mode in short long; do
sbatch job.sh $it $mode
done
done

You now have a couple options in terms of how job.sh is specified. This illustrates things for Matlab jobs, but it shouldn't be too hard to modify for other types of jobs.

Option #1:

# contents of job.sh

echo "it = $1; mode = '$2'; myMatlabCode" > tmp-$1-$2.m
matlab -nodesktop -nodisplay -singleCompThread < tmp-$1-$2.m > tmp-$1-$2.out 2> tmp-$1-$2.err

In this case myMatlabCode.m would use the variables 'it' and 'mode' but not define them.

Option #2:

# contents of job.sh

export it=$1; export mode=$2;
matlab -nodesktop -nodisplay -singleCompThread < myMatlabCode.m > tmp-$1-$2.out 2> tmp-$1-$2.err

In this case you need to insert the following Matlab code at the start of myMatlabCode.m so that Matlab correctly reads the values of 'it' and 'mode' from the UNIX environment variables:

it = str2num(getenv('it'));
mode = getenv('mode');

For Stata jobs, there's an easier mechanism for passing arguments into a batch job. Invoke Stata as follows in job.sh:
stata -b do myStataCode.do $1 $2
and then in the first line of your Stata code file, myStataCode.do above, assign the input values to variables (in this case I've named them id and mode to match the shell variables, but they can be named differently):
args id mode
Then the remainder of your code can make use of these variables.