IF
and OBS subops. However, you may want to use the same range
for several procedures and avoid typing the IF and OBS
subops over and over. To do this, issue a RANGE statement
with the IF or OBS subops. For example:
range obs[100-200] if[x > 0]
Until you issue a new RANGE statement, only observations numbered
between 100 and 200 for which the value of the variable x is positive
would be used.
FREQ command calculates one-way frequency distributions. In our
example, the command:
freq var[party]
produces the following output:
    party
                     0           1
              Democrat       Repub
            ----------  ----------
Count              12           9
Percent         57.14       42.86
Across the top of the frequency table are the values of the variable
party. Underneath each value is its label (if any). The rows of the
table give the number of observations taking the value and the percent
of nonmissing observations taking that value.
TABLE command is used to produce two-way, three-way, and higher
dimensional contingency tables. TABLE crosstabulates the first
variable specified in the VAR subop (the row variable) by the second
variable in the VAR subop (the column variable). In its simplest and
most common use, only two variables will be specified and one table will be
created.
Output from the TABLE command shows the number of occurrences in  each
cell as well as the percentage this represents of the column. If one
wants percentages of the row total instead, specify the ROW subop.
Contingency tables are most useful when the data for each variable only takes on only a few values. In our example, most of the variables are continuous (taking many distinct values, with the same value rarely occuring more than once) so a contingency table on these variables directly would be of little use. One possibility is to recode these variables into a few categories. For our purposes, two categories will be enough for each variable:
recode var[money inflat unemp] map[(lo thru 70=0,else=1]
We will label the categories low and high. First we try a
two-way table:
table var[inflat money]
The TABLE output is:
********** Crosstabulation of inflat by money **********
                   money
         |---------|---------|---------|
         |  COUNT  |      low|     high|   ROW
         | COL PCT |        0|        1|  TOTAL
   inflat|---------|---------|---------|
         |        0|     13  |      3  |     16
         |      low|   81.3  |   60.0  |   76.2
         |---------|---------|---------|
         |        1|      3  |      2  |      5
         |     high|   18.8  |   40.0  |   23.8
         |---------|---------|---------|
            COLUMN       16         5        21
             TOTAL     76.2      23.8     100.0
Note that there are 13 observations for which both money and inflat are
zero, which means that both money growth and inflation are under seven
percent. The numbers around the edges of the table are row and column
marginal totals. For example, the number of low values for the
variable money is given by the column total 16.  Similarly for the
row totals give the marginal distribution of the row variable inflat.
The 16 observations for which inflation is low (i.e., under seven
percent) represent 76.2% of the total of 21 observations.
The default is for SST to compute column percentages. If you would
like for the cell entries to be row percentages, specify the subop
ROW and SST will calculate percentages this way.
SST will also compute chi-square statistics for testing the
independence of two discrete variables if you add the MEASURES
subop to the TABLE command:
table var[money inflat] measures
SST prints out the value of the chi-square statistic for the table along with its degrees of freedom. The hypothesis of independence is rejected if the computed value of this statistic exceeds the critical value for corresponding to the significance level chosen for the test. The critical value is chosen so that the probability of obtaining a value of the test statistic larger than the critical level if the null hypothesis of independence is correct is equal to the significance level. The critical value can be determined by consulting a table of the chi-square distribution.
Instead of determining the critical value from a table of the chi-square
distribution, you may prefer to compute a p-value for the test
statistic. The p-value is the probability (under the null hypothesis
of independence) of obtaining a value of the test statistic greater than
or equal to the observed value of the test statistic. Suppose for
example that SST produces a chi-square statistic of 5.02 with one degree
of freedom. The CALC command can be used to evaluate the upper
tail probability:
calc 1.0-cumchi(5.02,1)
        0.025
Thus the p-value for this test statistic is 0.025. This means that if the significance level for the hypothesis test is larger than 0.05, then the null hypothesis can be rejected.
VAR subop. For example:
table var[money inflat unemp]
will produce two tables. The variable unemp has been recoded so that
it takes two values (zero and one). The TABLE command will produce a
cross-tabulation of the variables money and inflat for each
value of the variable unemp:
********** Crosstabulation of money by inflat **********
       unemp = low ( 0 )
                   inflat
         |---------|---------|---------|
         |  COUNT  |      low|     high|   ROW
         | COL PCT |        0|        1|  TOTAL
    money|---------|---------|---------|
         |        0|     12  |      1  |     13
         |      low|   85.7  |   33.3  |   76.5
         |---------|---------|---------|
         |        1|      2  |      2  |      4
         |     high|   14.3  |   66.7  |   23.5
         |---------|---------|---------|
            COLUMN       14         3        17
             TOTAL     82.4      17.6     100.0
********** Crosstabulation of money by inflat **********
       unemp = hi ( 1 )
                   inflat
         |---------|---------|---------|
         |  COUNT  |      low|     high|   ROW
         | COL PCT |        0|        1|  TOTAL
    money|---------|---------|---------|
         |        0|      1  |      2  |      3
         |      low|   50.0  |  100.0  |   75.0
         |---------|---------|---------|
         |        1|      1  |      0  |      1
         |     high|   50.0  |    0.0  |   25.0
         |---------|---------|---------|
            COLUMN        2         2         4
             TOTAL     50.0      50.0     100.0
If additional variables are specified in the VAR subop, a separate
table crosstabulating the first and the second variables for each
combination of values in the remaining variables will be created.
In this manner, an n-way table is constructed.
COVA command computes descriptive statistics (means, standard
deviation, ranges, correlations) on a set of one or more variables. We
will first consider the use of the COVA command for producing
univariate statistics. If you do not include any subops other than
the VAR subop, COVA will calculate by default the mean,
minimum, maximum, and standard deviation of each variable in the VAR
subop. Using an asterisk (`*') to match all variables, we obtain a
complete set of univariate statistics for the variables in memory:
cova var[*]
SST produces the following output:
          nobs        mean         min         max     std dev
year        21        1970        1960        1980       6.055
money       21         5.3         0.7         9.3       2.211
inflat      21       4.729         0.9         9.3       2.625
unemploy    21       5.571         3.5         8.5       1.312
party       21       0.429           0           1       0.495
Thus we see that the variable year ranges from 1960 to 1980 with its mean equal to 1970 and variance equal to 6.055. The same information is provided for the remaining variables.
COV is added to the COVA command, SST will
produce a matrix of correlations and covariances for the variables
specified in the VAR subop instead of the univariate statistics
obtained above:
cova var[*] cov
The output looks like:
Correlation and Covariance matrix
                 year        money       inflat        unemp        party
year       36.6666667    9.5285715   14.4666668    3.5428573    0.4761905
money       0.7117068    4.8885714    2.7971430    0.5428572   -0.0571429
inflat      0.9101309    0.4819420    6.8906124    1.3865307    0.3639456
unemploy    0.4458577    0.1870996    0.4025120    1.7220408    0.0931973
party       0.1589104   -0.0522250    0.2801657    0.1435123    0.2448980
The entries along the diagonal of the matrix are the variances of each variable. Below the diagonal are Pearson correlation coefficients between the row and column variables. Above the diagonal are covariances among the variables.
COVA command by specifying
subops indicating which statistics you want output. The possible options
are MEAN (for means), MIN (for minimums), MAX (for
maximums), STDDEV (for standard deviations), and COV (for a
correlation/covariance matrix).
As with most SST commands, the range of observations over which the
statistics are calculated can be altered by adding the IF or
OBS subops. To obtain the mean inflation rate in Republican
administrations, one would enter:
cova var[inflat] mean if[party==1]
          nobs        mean
inflat       9       5.578
The IF and OBS subops determine which observations currently
active under the RANGE statement will be used to calculate the
reqested statistics; thus these subops act as further qualifications to
the RANGE statement.
If data is missing for an observation for any variable specified in the variable list, the entire observation is omitted from the calculations on all variables. This is "listwise" deletion of missing data so that the statistics for all variables depend upon a common set of observations.
 
 
