- Setting the range for procedures
- Obtaining frequency distributions
- Contingency tables
- Multiway contingency tables
- Univariate statistics
- Correlations and covariances
- Options to the COVA command

`IF`

and `OBS`

subops. However, you may want to use the same range
for several procedures and avoid typing the `IF`

and `OBS`

subops over and over. To do this, issue a `RANGE`

statement
with the `IF`

or `OBS`

subops. For example:

range obs[100-200] if[x > 0]

Until you issue a new `RANGE`

statement, only observations numbered
between 100 and 200 for which the value of the variable `x`

is positive
would be used.

`FREQ`

command calculates one-way frequency distributions. In our
example, the command:

freq var[party]

produces the following output:

party 0 1 Democrat Repub ---------- ---------- Count 12 9 Percent 57.14 42.86

Across the top of the frequency table are the values of the variable
`party`

. Underneath each value is its label (if any). The rows of the
table give the number of observations taking the value and the percent
of nonmissing observations taking that value.

`TABLE`

command is used to produce two-way, three-way, and higher
dimensional contingency tables. `TABLE`

crosstabulates the first
variable specified in the `VAR`

subop (the row variable) by the second
variable in the `VAR`

subop (the column variable). In its simplest and
most common use, only two variables will be specified and one table will be
created.
Output from the `TABLE`

command shows the number of occurrences in each
cell as well as the percentage this represents of the column. If one
wants percentages of the row total instead, specify the `ROW`

subop.

Contingency tables are most useful when the data for each variable only takes on only a few values. In our example, most of the variables are continuous (taking many distinct values, with the same value rarely occuring more than once) so a contingency table on these variables directly would be of little use. One possibility is to recode these variables into a few categories. For our purposes, two categories will be enough for each variable:

recode var[money inflat unemp] map[(lo thru 70=0,else=1]

We will label the categories `low`

and `high`

. First we try a
two-way table:

table var[inflat money]

The `TABLE`

output is:

********** Crosstabulation of inflat by money ********** money |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL inflat|---------|---------|---------| | 0| 13 | 3 | 16 | low| 81.3 | 60.0 | 76.2 |---------|---------|---------| | 1| 3 | 2 | 5 | high| 18.8 | 40.0 | 23.8 |---------|---------|---------| COLUMN 16 5 21 TOTAL 76.2 23.8 100.0

Note that there are 13 observations for which both money and inflat are
zero, which means that both money growth and inflation are under seven
percent. The numbers around the edges of the table are row and column
marginal totals. For example, the number of `low`

values for the
variable `money`

is given by the column total 16. Similarly for the
row totals give the marginal distribution of the row variable `inflat`

.
The 16 observations for which inflation is `low`

(i.e., under seven
percent) represent 76.2% of the total of 21 observations.

The default is for SST to compute column percentages. If you would
like for the cell entries to be row percentages, specify the subop
`ROW`

and SST will calculate percentages this way.

SST will also compute chi-square statistics for testing the
independence of two discrete variables if you add the `MEASURES`

subop to the `TABLE`

command:

table var[money inflat] measures

SST prints out the value of the chi-square statistic for the table along with its degrees of freedom. The hypothesis of independence is rejected if the computed value of this statistic exceeds the critical value for corresponding to the significance level chosen for the test. The critical value is chosen so that the probability of obtaining a value of the test statistic larger than the critical level if the null hypothesis of independence is correct is equal to the significance level. The critical value can be determined by consulting a table of the chi-square distribution.

Instead of determining the critical value from a table of the chi-square
distribution, you may prefer to compute a p-value for the test
statistic. The p-value is the probability (under the null hypothesis
of independence) of obtaining a value of the test statistic greater than
or equal to the observed value of the test statistic. Suppose for
example that SST produces a chi-square statistic of 5.02 with one degree
of freedom. The `CALC`

command can be used to evaluate the upper
tail probability:

calc 1.0-cumchi(5.02,1) 0.025

Thus the p-value for this test statistic is 0.025. This means that if the significance level for the hypothesis test is larger than 0.05, then the null hypothesis can be rejected.

`VAR`

subop. For example:

table var[money inflat unemp]

will produce two tables. The variable `unemp`

has been recoded so that
it takes two values (zero and one). The `TABLE`

command will produce a
cross-tabulation of the variables `money`

and `inflat`

for each
value of the variable `unemp`

:

********** Crosstabulation of money by inflat ********** unemp = low ( 0 ) inflat |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL money|---------|---------|---------| | 0| 12 | 1 | 13 | low| 85.7 | 33.3 | 76.5 |---------|---------|---------| | 1| 2 | 2 | 4 | high| 14.3 | 66.7 | 23.5 |---------|---------|---------| COLUMN 14 3 17 TOTAL 82.4 17.6 100.0 ********** Crosstabulation of money by inflat ********** unemp = hi ( 1 ) inflat |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL money|---------|---------|---------| | 0| 1 | 2 | 3 | low| 50.0 | 100.0 | 75.0 |---------|---------|---------| | 1| 1 | 0 | 1 | high| 50.0 | 0.0 | 25.0 |---------|---------|---------| COLUMN 2 2 4 TOTAL 50.0 50.0 100.0

If additional variables are specified in the `VAR`

subop, a separate
table crosstabulating the first and the second variables for each
combination of values in the remaining variables will be created.
In this manner, an n-way table is constructed.

`COVA`

command computes descriptive statistics (means, standard
deviation, ranges, correlations) on a set of one or more variables. We
will first consider the use of the `COVA`

command for producing
univariate statistics. If you do not include any subops other than
the `VAR`

subop, `COVA`

will calculate by default the mean,
minimum, maximum, and standard deviation of each variable in the `VAR`

subop. Using an asterisk (

cova var[*]

SST produces the following output:

nobs mean min max std dev year 21 1970 1960 1980 6.055 money 21 5.3 0.7 9.3 2.211 inflat 21 4.729 0.9 9.3 2.625 unemploy 21 5.571 3.5 8.5 1.312 party 21 0.429 0 1 0.495

Thus we see that the variable year ranges from 1960 to 1980 with its mean equal to 1970 and variance equal to 6.055. The same information is provided for the remaining variables.

`COV`

is added to the `COVA`

command, SST will
produce a matrix of correlations and covariances for the variables
specified in the `VAR`

subop instead of the univariate statistics
obtained above:

cova var[*] cov

The output looks like:

Correlation and Covariance matrix year money inflat unemp party year 36.6666667 9.5285715 14.4666668 3.5428573 0.4761905 money 0.7117068 4.8885714 2.7971430 0.5428572 -0.0571429 inflat 0.9101309 0.4819420 6.8906124 1.3865307 0.3639456 unemploy 0.4458577 0.1870996 0.4025120 1.7220408 0.0931973 party 0.1589104 -0.0522250 0.2801657 0.1435123 0.2448980

The entries along the diagonal of the matrix are the variances of each variable. Below the diagonal are Pearson correlation coefficients between the row and column variables. Above the diagonal are covariances among the variables.

`COVA`

command by specifying
subops indicating which statistics you want output. The possible options
are `MEAN`

(for means), `MIN`

(for minimums), `MAX`

(for
maximums), `STDDEV`

(for standard deviations), and `COV`

(for a
correlation/covariance matrix).
As with most SST commands, the range of observations over which the
statistics are calculated can be altered by adding the `IF`

or
`OBS`

subops. To obtain the mean inflation rate in Republican
administrations, one would enter:

cova var[inflat] mean if[party==1] nobs mean inflat 9 5.578

The `IF`

and `OBS`

subops determine which observations currently
active under the `RANGE`

statement will be used to calculate the
reqested statistics; thus these subops act as further qualifications to
the `RANGE`

statement.

If data is missing for an observation for any variable specified in the variable list, the entire observation is omitted from the calculations on all variables. This is "listwise" deletion of missing data so that the statistics for all variables depend upon a common set of observations.