Econometrics Laboratory

Help Documentation

Find answers to your computing questions.
Search the documentation: 
Return to index.

How do I use the EML High Priority Linux cluster?


In addition to our 256-core "production" cluster, the EML operates a 224-processing unit high-priority Linux cluster that provides faster cores than our production cluster for certain research projects. 

The cluster has four nodes, each with two 14-core CPUs available for compute jobs (i.e., 28 cores per node). Each core has two hyperthreads, for a total of 224 processing units. In the remainder of this document, we'll refer to these processing units as 'cores'. Each node has 132 GB dedicated RAM. It is managed by the SLURM queueing software. SLURM provides a standard batch queueing system through which users submit jobs to the cluster. Jobs are typically submitted to SLURM using a user-defined shell script that executes one's application code. Interactive use is also an option. Users may also query the cluster to see job status. As currently set up, the cluster is designed for processing single-core and multi-core/threaded jobs (at most 28 cores per user, whether in a single job or spread across multiple jobs), as well as distributed memory jobs that use MPI. All software running on EML Linux machines is available on the cluster. Users can also compile programs on any EML Linux machine and then run that program on the cluster.

Below is more detailed information about and how to use the cluster.

Access and Job Restrictions

The cluster is open to a restricted set of Department of Economics faculty, grad students, project account collaborators, and visitors using their EML logon.

Currently users may submit jobs on the following submit hosts:

  blundell, frisch, hicks, jorgensen, klein, laffont, logit, marlowe, marshall, nerlove, radner, theil, durban

The cluster has one job queue (called a partition by SLURM) called 'high'. Interactive jobs can also be run.

At the moment the default time limit on each job is three days run-time, but users can still request a longer limit (up to max of 28 days) with the -t flag. The scheduling software can better balance usage across multiple users when it has information about how long each job is expected to take, so if possible please indicate a time limit for your job even if it is less than three days. Feel free to be generous in this time limit to avoid having your job killed if it runs longer than expected. Also feel free to set a time limit as a round number, such as 1 hour, 4 hours, 1 day, 3 days, 10 days, and 28 days rather than trying to be more exact.

We have implemented a 'fair share' policy that governs the order in which jobs that are waiting in a given queue start when resources become available. In particular, if two users each have a job sitting in a queue, the job that will start first will be that of the user who has made less use of the cluster recently (measured in terms of CPU time). The measurement of CPU time downweights usage over time, with a half-life of one month, so a job that ran a month ago will count half as much as a job that ran yesterday. Apart from this prioritization based on recent use, all users are treated equally.

SLURM and SGE Syntax Comparison

If you're used to the SGE commands on our production cluster, the following table summarizes the analogous commands under SLURM.

Comparison of SGE and SLURM commands

 

qsub job.sh sbatch job.sh
qstat squeue
qdel {job_id} scancel {job_id}
qrsh -q interactive.q srun --pty /bin/bash
srun --pty --x11=first {command}

How to Submit a Simple Single-core Job

Prepare a shell script containing the instructions you would like the system to execute. When submitted using the instructions in this section, your code should only use a single core at a time; it should not start any additional processes. In the later sections of this document, we describe how to submit jobs that use a variety of types of parallelization.

For example a simple script to run the Matlab code in the file 'simulate.m' would contain these lines:

#!/bin/bash
matlab -nodisplay -nodesktop -singleCompThread < simulate.m > simulate.out

Once logged onto a submit host, use the sbatch command with the name of the shell script (assumed to be job.sh here) to enter a job into the queue:

theil:~/Desktop$ sbatch job.sh
Submitted batch job 380

Here the job and assigned job ID 380. Results that would normally be printed to the screen via standard output and standard error will be written to a file called simulate.out.

Any Matlab jobs submitted in this fashion must start Matlab with the -singleCompThread flag in your job script.

Similarly, any SAS jobs submitted in this fashion must start SAS with the following flags in your job script.

sas -nothreads -cpucount 1

For Stata, if you submit a job without requesting multiple cores, please make sure to use Stata/SE so that Stata only uses a single core.

SLURM provides a number of additional flags to control what happens; you can see the man page for sbatch for help with these. Here are some examples, placed in the job script file, where we name the job, ask for email updates and name the output and error files:

#!/bin/bash
#SBATCH --job-name=myAnalysisName
#SBATCH --mail-type=ALL                       
#SBATCH --mail-user=blah@berkeley.edu
#SBATCH -o myAnalysisName.out #File to which standard out will be written
#SBATCH -e myAnalysisName.err #File to which standard err will be written
matlab -nodisplay -nodesktop -singleCompThread < simulate.m > simulate.out

How to Kill a Job

First, find the job-id of the job, by typing squeue at the command line of a submit host (see the section on 'Monitoring Jobs' at the end of this document.

Then use scancel to delete the job (with id 380 in this case):

scancel 380

Interactive Jobs

You can work interactively on a node from the Linux shell command line by starting a job in the interactive queue.

The syntax for requesting an interactive (bash) shell session is:

srun --pty /bin/bash

This will start a shell on one of the four nodes. You can then act as you would on any EML Linux compute server. For example, you might use top to assess the status of one of your non-interactive (i.e., batch) cluster jobs. Or you might test some code before running it as a batch job. You can also transfer files to the local disk of the cluster node.

If you want to run a program that involves a graphical interface (requiring an X11 window), you need to add --x11=first to your srun command. So you could directly run Matlab, e.g., as follows:

srun --pty --x11=first matlab 

or you could add the -x11=first flag when requesting an interactive shell session and then subsequently start a program that has a graphical interface.

To run an interactive session in which you would like to use multiple cores, do the following (here we request 4 cores for our use):

srun --pty --cpus-per-task 4 /bin/bash 

Note that "-c" is a shorthand for "--cpus-per-task".

To monitor a job on a specific node or transfer files to the local disk of a specific node, you need to request that your interactive session be started on the node of interest (in this case eml-sm20):

srun --pty -w eml-sm20 /bin/bash

Note that if that specific node has 24 cores in use, you will need to wait until resources become available on that node before your interactive session will start. The squeue command (see below in the section on monitoring jobs) will tell you on which node (eml-sm20, eml-sm21, eml-sm22, eml-sm23) a given job is running.

Finally, you can request multiple cores using -c, as with batch jobs. As with batch jobs, you can change OMP_NUM_THREADS from its default of one, provided you make sure that that the total number of cores used (number of processes your code starts multiplied by threads per process) does not exceed the number of cores you request.

How To Monitor Jobs

The SLURM command squeue provides info on job status:

theil:~/Desktop> squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               381      high   job.sh paciorek  R      25:28      1 eml-sm20
               380      high   job.sh paciorek  R      25:37      1 eml-sm20

The following will tailor the output to include information on the number of cores (the CPUs column below) being used:

theil:~/Desktop> squeue -o "%.7i %.9P %.8j %.8u %.2t %.9M %.5C %.8r %.6D %R"
   JOBID PARTITION     NAME     USER ST      TIME  CPUS   REASON  NODES NODELIST(REASON)
     381      high   job.sh paciorek  R     28:00     4     None      1 eml-sm20
     380      high   job.sh paciorek  R     28:09     4     None      1 eml-sm20 

The 'ST' field indicates whether a job is running (R), failed (F), or pending (PD). The latter occurs when there are not yet enough resources on the system for your job to run.

Submitting Parallel Jobs

One can use SLURM to submit a variety of types of parallel code. Here is a set of potentially useful templates that we expect will account for most user needs. If you have a situation that does not fall into these categories or have questions about parallel programming, submitting jobs to use more than one core, or are not sure how to follow these rules, please email consult@econ.berkeley.edu.

For additional details, please see the notes from SCF workshops on the basics of parallel programming in R, Python, Matlab and C, with some additional details on using the cluster, but note that the job submission details are given in terms of SGE rather than SLURM. If you're making use of the threaded BLAS, it's worth doing some testing to make sure that threading is giving an non-negligible speedup; see the notes above for more information.

Submitting Threaded Jobs

Here's an example job script to use multiple threads (4 in this case) in R (or with your own openMP-based program):

#!/bin/bash
#SBATCH --cpus-per-task 4
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
R CMD BATCH --no-save simulate.R simulate.Rout

This will allow your R code to use the system's threaded BLAS and LAPACK routines. [Note that in R you can instead use the omp_set_num_threads() function in the RhpcBLASctl package, again making use of the SLURM_CPUS_PER_TASK environment variable.] The same syntax in your job script will work if you've compiled a C/C++/Fortran program that makes use of openMP for threading. Just replace the R CMD BATCH line with the line calling your program.

Here's an example job script to use multiple threads (4 in this case) in Matlab:

#!/bin/bash
#SBATCH --cpus-per-task 4
matlab -nodesktop -nodisplay < simulate.m > simulate.out

IMPORTANT: At the start of your Matlab code file you should include this line:

feature(’numThreads’, str2num(getenv('SLURM_CPUS_PER_TASK')));

Here's an example job script to use multiple threads (4 in this case) in SAS:

#!/bin/bash
#SBATCH --cpus-per-task 4
sas -threads -cpucount $SLURM_CPUS_PER_TASK

You can use up to eight cores with Stata/MP (limited by the EML license for Stata). If you request eight cores (--cpus-per-task=8), you are all set. If your request fewer than eight (i.e., 2-7), you need to set the number of processors in Stata to be the number you requested in your job submission. One way to do this is to hard-code the following line at the start of your Stata code (in this case assuming you requested four cores):

set processors 4

You can do this in an automated fashion by first invoking Stata in your job script as:

stata-mp -b do myStataCode.do ${SLURM_CPUS_PER_TASK}

and then at the start of your Stata code including these two lines:

args ncores
set processors 'ncores'

Submitting Multi-core Jobs

The following example job script files pertain to jobs that need to use multiple cores on a single node that do not fall under the threading/openMP context. This is relevant for parallel code in R that starts multiple R process (e.g., foreach, mclapply, parLapply), for parfor in Matlab, and for Pool.map and pp.Server in Python.

Here's an example script that uses multiple cores (4 in this case):

#!/bin/bash
#SBATCH --cpus-per-task 4
R CMD BATCH --no-save simulate.R simulate.Rout

IMPORTANT: Your R, Python, or any other code should use no more than the number of total cores requested (4 in this case). You can use the SLURM_CPUS_PER_TASK environment variable to programmatically control this.

The same syntax for your job script pertains to Matlab. IMPORTANT: when using parpool in Matlab, you should do the following:

parpool(str2num(getenv('SLURM_CPUS_PER_TASK')))

Submitting MPI Jobs

You can use MPI to run jobs across multiple nodes. Given the 28-core limit pre user, this is primarily useful for when there are not sufficient free cores on a single node for your job. This modality allows you to gather free cores that are scattered across the nodes.

Here's an example script that uses multiple processors via MPI (28 in this case):

#!/bin/bash
#SBATCH -n 28
mpirun -np $SLURM_NTASKS myMPIexecutable

"myMPIexecutable" could be C/C++/Fortran code you've written that uses MPI, or R or Python code that makes use of MPI. More details are available here. One simple way to use MPI is to use the doMPI back-end to foreach in R. In this case you invoke R via mpirun as:

mpirun -np $SLURM_NTASKS R CMD BATCH file.R file.out

Note that in this case, unlike some other invocations of R via mpirun, mpirun starts all of the R processes.

Another use case for R in a distributed computing context is to use functions such as parSapply and parLapply after using the makeCluster command with a character vector indicating the nodes allocated by SLURM. If you run the following as part of your job script before the command invoking R, the file slurm.hosts will contain a list of the node names that you can read into R and pass to makeCluster.

srun hostname -s > slurm.hosts

To run an MPI job with each process threaded, your job script would look like the following (here with 14 processes and two threads per process):

#!/bin/bash
#SBATCH -n 14 -c 2
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np $SLURM_NTASKS -x OMP_NUM_THREADS myMPIexecutable

Submitting Data Parallel (SPMD) Code

Here's how you would set up your job script if you want to run multiple instances (18 in this case) of the same code.

#!/bin/bash
#SBATCH -n 18 
srun myExecutable

To have each instance behave differently, you can make use of the SLURM_PROCID environment variable, which will be distinct (and have values 0, 1, 2, ...) between the different instances.

To have each process be  threaded, see the syntax under the MPI section above.

Job Arrays: Submitting Multiple jobs in an Automated fashion

Job array submissions are a nice way to submit multiple jobs in which you vary a parameter across the different jobs.

Here's what your job script would look like, in this case to run a total of 5 jobs with parameter values of 0, 1, 2, 5, 7:

#!/bin/bash
#SBATCH -a 0-2,5,7
myExecutable

Your program should then make use of the SLURM_ARRAY_TASK_ID environment variable, which for a given job will contain one of the values from the set given with the -a flag (in this case from {0,1,2,5,7}). You could, for example, read SLURM_ARRAY_TASK_ID into your R, Python, Matlab, or C code.

Here's a concrete example where it's sufficient to use SLURM_ARRAY_TASK_ID to distinguish different input files if you need to run the same command (the bioinformatics program tophat in this case) on multiple input files (in this case, trans0.fq, trans1.fq, ...):

#!/bin/bash
#SBATCH -a 0-2,5,7
tophat BowtieIndex trans$SLURM_ARRAY_TASK_ID.fq