Stata

Notitle

Stata is a general purpose statistical software package, that provides data management, statistical analysis, graphics, simulations, regression, and custom programming.

Stata is available as a module on Apocrita. Stata can be run on the command line and using a GUI. Both types of run must be done via the cluster's queuing system.

Usage

To run the latest installed version of Stata, simply load the stata module:

$ module load stata
$ stata -h
stata-mp:  usage:  stata-mp [-h -q -s -b] ["stata command"]
        where:
             -h      show this display
             -q      suppress logo, initialization messages
             -s      "batch" mode creating .smcl log
             -b      "batch" mode creating .log file

Core Usage

To ensure that Stata uses the correct number of cores, the set processors command should be used. e.g.

$ stata
. set processors 4
The maximum number of processors or cores being used is changed from 8 to 1.
It can be set to any number between 1 and 8

Licensing

The Stata/MP8 perpetual licence on Apocrita allows for up to 8 cores per process and 6 concurrent users, when running the application you can request a licence for 8 cores via -l stata

Example jobs

Interactive jobs

Interactive mode can be useful for diagnosing issues with your Stata code.

Text mode

Run qlogin then load the module and run the binary:

qlogin -l stata
module load stata
stata
set processors 1

GUI mode

Run qsh to get a session on a compute node and display an xterm window, then load the module and run the xstata binary:

qsh -l stata
module load stata
xstata

After starting xstata run the following command to set the core usage correctly, e.g. for a single core job:

set processors 1

X Forwarding Required

To run the Stata GUI you must have logged in to the Apocrita with X-windows forwarding enabled.

Additional memory and CPU requirements can be added to the qsh command in the same way as the qsub command, as per the section on submitting jobs.

Use of the login nodes

Please ensure stata and xstata are run only via the scheduling system; running on the login nodes is against the Usage Policy.

Serial job (batch mode)

Batch mode preferred

HPC clusters are designed to run queued jobs via the command line. Typically the graphical and interactive modes should only be used during debugging or early testing of your job, and the command line batch mode used to submit jobs to the scheduler queue to be run as soon as your job is ready to run correctly.

Running a job via the scheduler requires creation of a Stata code file e.g. example.do and a job submission script. For example, to request 4 cores and 8GB total RAM, create a job submission script called submit_stata.sh:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_vmem=2G
#$ -l h_rt=4:0:0
#$ -l stata

module load stata

# Run the Stata code example.do
stata -b do example.do ${NSLOTS}

Example .do file:

args ncores
set processors `ncores'

use http://www.stata-press.com/data/r13/census5
tabulate region
summarize marriage_rate divorce_rate median_age if state!="Nevada"

Stata quoting

Note that ncores is preceded by a single back tick ` and followed by a single quote '

Submit your job to the scheduler with the qsub command:

qsub submit_stata.sh

Custom Stata commands using "ado" files

Users can extend the Stata programming language by writing custom commands. These are saved as ado-files. Stata checks the ado-files when a non-system Stata command is executed.

Use the sysdir command to list where Stata searches for ado-files. On Apocrita, the PERSONAL directory is expected at ~/ado/personal/, which needs to exist. If the directory does not exist, execute:

mkdir -p ~/ado/personal

You can then save your "ado" files in that folder and Stata should be able to find them.

References