IF
and OBS
subops. However, you may want to use the same range
for several procedures and avoid typing the IF
and OBS
subops over and over. To do this, issue a RANGE
statement
with the IF
or OBS
subops. For example:
range obs[100-200] if[x > 0]
Until you issue a new RANGE
statement, only observations numbered
between 100 and 200 for which the value of the variable x
is positive
would be used.
FREQ
command calculates one-way frequency distributions. In our
example, the command:
freq var[party]
produces the following output:
party 0 1 Democrat Repub ---------- ---------- Count 12 9 Percent 57.14 42.86
Across the top of the frequency table are the values of the variable
party
. Underneath each value is its label (if any). The rows of the
table give the number of observations taking the value and the percent
of nonmissing observations taking that value.
TABLE
command is used to produce two-way, three-way, and higher
dimensional contingency tables. TABLE
crosstabulates the first
variable specified in the VAR
subop (the row variable) by the second
variable in the VAR
subop (the column variable). In its simplest and
most common use, only two variables will be specified and one table will be
created.
Output from the TABLE
command shows the number of occurrences in each
cell as well as the percentage this represents of the column. If one
wants percentages of the row total instead, specify the ROW
subop.
Contingency tables are most useful when the data for each variable only takes on only a few values. In our example, most of the variables are continuous (taking many distinct values, with the same value rarely occuring more than once) so a contingency table on these variables directly would be of little use. One possibility is to recode these variables into a few categories. For our purposes, two categories will be enough for each variable:
recode var[money inflat unemp] map[(lo thru 70=0,else=1]
We will label the categories low
and high
. First we try a
two-way table:
table var[inflat money]
The TABLE
output is:
********** Crosstabulation of inflat by money ********** money |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL inflat|---------|---------|---------| | 0| 13 | 3 | 16 | low| 81.3 | 60.0 | 76.2 |---------|---------|---------| | 1| 3 | 2 | 5 | high| 18.8 | 40.0 | 23.8 |---------|---------|---------| COLUMN 16 5 21 TOTAL 76.2 23.8 100.0
Note that there are 13 observations for which both money and inflat are
zero, which means that both money growth and inflation are under seven
percent. The numbers around the edges of the table are row and column
marginal totals. For example, the number of low
values for the
variable money
is given by the column total 16. Similarly for the
row totals give the marginal distribution of the row variable inflat
.
The 16 observations for which inflation is low
(i.e., under seven
percent) represent 76.2% of the total of 21 observations.
The default is for SST to compute column percentages. If you would
like for the cell entries to be row percentages, specify the subop
ROW
and SST will calculate percentages this way.
SST will also compute chi-square statistics for testing the
independence of two discrete variables if you add the MEASURES
subop to the TABLE
command:
table var[money inflat] measures
SST prints out the value of the chi-square statistic for the table along with its degrees of freedom. The hypothesis of independence is rejected if the computed value of this statistic exceeds the critical value for corresponding to the significance level chosen for the test. The critical value is chosen so that the probability of obtaining a value of the test statistic larger than the critical level if the null hypothesis of independence is correct is equal to the significance level. The critical value can be determined by consulting a table of the chi-square distribution.
Instead of determining the critical value from a table of the chi-square
distribution, you may prefer to compute a p-value for the test
statistic. The p-value is the probability (under the null hypothesis
of independence) of obtaining a value of the test statistic greater than
or equal to the observed value of the test statistic. Suppose for
example that SST produces a chi-square statistic of 5.02 with one degree
of freedom. The CALC
command can be used to evaluate the upper
tail probability:
calc 1.0-cumchi(5.02,1) 0.025
Thus the p-value for this test statistic is 0.025. This means that if the significance level for the hypothesis test is larger than 0.05, then the null hypothesis can be rejected.
VAR
subop. For example:
table var[money inflat unemp]
will produce two tables. The variable unemp
has been recoded so that
it takes two values (zero and one). The TABLE
command will produce a
cross-tabulation of the variables money
and inflat
for each
value of the variable unemp
:
********** Crosstabulation of money by inflat ********** unemp = low ( 0 ) inflat |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL money|---------|---------|---------| | 0| 12 | 1 | 13 | low| 85.7 | 33.3 | 76.5 |---------|---------|---------| | 1| 2 | 2 | 4 | high| 14.3 | 66.7 | 23.5 |---------|---------|---------| COLUMN 14 3 17 TOTAL 82.4 17.6 100.0 ********** Crosstabulation of money by inflat ********** unemp = hi ( 1 ) inflat |---------|---------|---------| | COUNT | low| high| ROW | COL PCT | 0| 1| TOTAL money|---------|---------|---------| | 0| 1 | 2 | 3 | low| 50.0 | 100.0 | 75.0 |---------|---------|---------| | 1| 1 | 0 | 1 | high| 50.0 | 0.0 | 25.0 |---------|---------|---------| COLUMN 2 2 4 TOTAL 50.0 50.0 100.0
If additional variables are specified in the VAR
subop, a separate
table crosstabulating the first and the second variables for each
combination of values in the remaining variables will be created.
In this manner, an n-way table is constructed.
COVA
command computes descriptive statistics (means, standard
deviation, ranges, correlations) on a set of one or more variables. We
will first consider the use of the COVA
command for producing
univariate statistics. If you do not include any subops other than
the VAR
subop, COVA
will calculate by default the mean,
minimum, maximum, and standard deviation of each variable in the VAR
subop. Using an asterisk (`*') to match all variables, we obtain a
complete set of univariate statistics for the variables in memory:
cova var[*]
SST produces the following output:
nobs mean min max std dev year 21 1970 1960 1980 6.055 money 21 5.3 0.7 9.3 2.211 inflat 21 4.729 0.9 9.3 2.625 unemploy 21 5.571 3.5 8.5 1.312 party 21 0.429 0 1 0.495
Thus we see that the variable year ranges from 1960 to 1980 with its mean equal to 1970 and variance equal to 6.055. The same information is provided for the remaining variables.
COV
is added to the COVA
command, SST will
produce a matrix of correlations and covariances for the variables
specified in the VAR
subop instead of the univariate statistics
obtained above:
cova var[*] cov
The output looks like:
Correlation and Covariance matrix year money inflat unemp party year 36.6666667 9.5285715 14.4666668 3.5428573 0.4761905 money 0.7117068 4.8885714 2.7971430 0.5428572 -0.0571429 inflat 0.9101309 0.4819420 6.8906124 1.3865307 0.3639456 unemploy 0.4458577 0.1870996 0.4025120 1.7220408 0.0931973 party 0.1589104 -0.0522250 0.2801657 0.1435123 0.2448980
The entries along the diagonal of the matrix are the variances of each variable. Below the diagonal are Pearson correlation coefficients between the row and column variables. Above the diagonal are covariances among the variables.
COVA
command by specifying
subops indicating which statistics you want output. The possible options
are MEAN
(for means), MIN
(for minimums), MAX
(for
maximums), STDDEV
(for standard deviations), and COV
(for a
correlation/covariance matrix).
As with most SST commands, the range of observations over which the
statistics are calculated can be altered by adding the IF
or
OBS
subops. To obtain the mean inflation rate in Republican
administrations, one would enter:
cova var[inflat] mean if[party==1] nobs mean inflat 9 5.578
The IF
and OBS
subops determine which observations currently
active under the RANGE
statement will be used to calculate the
reqested statistics; thus these subops act as further qualifications to
the RANGE
statement.
If data is missing for an observation for any variable specified in the variable list, the entire observation is omitted from the calculations on all variables. This is "listwise" deletion of missing data so that the statistics for all variables depend upon a common set of observations.