IF
subop, you supply a logical expression. This logical
expression is evaluated for each observation. If it is true for that
observation, the command is specified for that observation; otherwise
the observation is ignored for purposes of this command.
We have already encountered logical expressions before when we
mentioned in passing the use of the PRINT
command. In that case, recall that we only wanted observations printed
when:
year > 1975
The variable
The double equals (`==') is used to test equality and should not be
confused with the single equals (`=') which is used in the
would print the values of the variables
The logical and (`&') and logical or (`|') symbols may require some
explanation. SST allows you to build up fairly complicated logical
expressions where multiple conditions are tested. For example:
would print data only for those observations where party equals one
and inflat was greater than zero. The command:
would print data for those observations where either party equals one
or inflat was greater than zero or both. Thus `|' means
the non-exclusive or: either one condition or both conditions must be true
for the expression to be true.
You will now have a variable which is equal to one for all observations.
(This is convenient for regressions and other statistical procedures.)
There is always only one variable name on the lefthand side of the
equals sign in a
The variable
SST uses `*' to indicate multiplication, `/' for division, and
`^' for exponentiation. When you build up more complicated
expressions, either surround terms of the expression in parentheses:
or remember the rules of precedence:
The previous two expressions have the same meaning. When the order of
operations is not determined by parentheses, exponentiation is performed
first, followed by either multiplication or division, and addition or
subtraction is performed last. Operations of equal order of precedence
(e.g., multiplication and division) are performed from left to right.
Whenever in doubt, add parentheses to be on the safe side.
For the most part, the use of these functions is straightforward. You
type an arithmetic expression substituting variable names, numbers or
other expressions in place of the arguments of these predefined functions.
For example:
would be equivalent to . When a function
takes multiple arguments (such as the bivariate normal distribution
function), the arguments should be separated by commas. For example:
gives the the probability that two jointly normal standard variables,
each with mean zero and variance one and correlation 0.5 will both
be negative.
The functions described above are scalar functions: they take as
arguments single numbers. SST also supports several vector functions
which take vectors as arguments. For example, you might want to
"standardize" a variable
The vector functions available in SST include:
where n indicates the number of nonmissing observations in the
variable
In evaluating vector functions, SST uses all valid observations of its
argument. This means that mean(x+y) may not equal mean(x)+mean(y) since
missing value deletion is not done listwise. (The
In addition, you can define your own functions using the
Both of these functions do not take arguments so that you do not supply
parentheses or arguments when using them. To create 1000 random variables
from a normal distribution with mean 1.0 and standard deviation 4.0, give
the commands:
For example, the variable
The variable
would set all observations on the variable
The values of xlag will correspond to the values of x for the preceding
observation. The first observation of the lagged variable "xlag" will
be missing.
The
and SST will avoid those observations for which
For observations one through ten, the value of
and SST responds with the calculator prompt:
You can now enter the same kind of expressions that you would
with the
If you specify a variable name in calculator mode, it should normally
be accompanied by a reference to the observation you want. For example,
since 1960 is the value of the variable
and SST would respond by printing the mean of
SST will calculate the value you want, print it on the screen, and return
you to normal command mode.
The same operation could be performed specifying how each old value is
to be recoded, though this is tedious:
The
When a range is specified using the keyword
If you want to preserve the old variable in the
In the example above, the variable
The
Note that anything that can be
done by the
The
To change all missing values to a numerical value, use (for example):
Once SST marks a value as missing it does not preserve the old
value. Thus the second command above would change all values
which were missing to -99, not just the ones that were recoded to
missing by the first
The same operation could be performed using the
The last command works because the logical expression year
took values ranging from 1960 to 1980. If, for
instance, year equalled 1964, the above expression would be false, and the
effect of the
< less than
<= less than or equal to
== equals
>= greater than or equal to
> greater than
!= not equals
& logical and
| logical or
SET
statement (described below) for assignment. For example:
print var[money inflat] if[year == 1980]
money
and inflat
for the
year 1980. We add spaces around logical operators to make the expressions
easier to read, but the spaces have no significance for SST and you may
omit them if you like.
print var[money inflat] if[(party == 1) & (inflat > 0)]
print var[money inflat] if[(party == 1) | (inflat > 0)]
The SET statement
The SET
statement performs data transformations either modifying
existing variables or creating new ones using arithmetic expressions.
The simplest SET
statement would create a constant taking the
same value for all observations:
set one = 1
SET
statement. The expression on the righthand
side of the equals sign can be as complicated as you like. Let's start
with some simple examples before building up to more complicated cases.
Suppose, for example, that you have a variable x
and you would like
a copy of it. Try:
set y = x
y
contains exactly the same data values as the variable
x
. You can now manipulate y
as you like without fear of
disturbing the values in x
. SST allows you to use virtually any
arithmetic expression that you might desire in the SET
statement:
set y = x+z
set y = x*z
set y = x/z
set y = x^z
set x = y+(z/w)
set x = y+z/w
Some functions available in SST
SST provides a number of functions that can be used in the SET
statement:
exp(x) exponential function, e^x
log(x) natural logarithm, ln x
abs(x) absolute value, |x|
sqrt(x) square root,
sin(x) sine function (x in radians)
cos(x) cosine function (x in radians)
tan(x) tangent function (x in radians)
cumnorm(x) cumulative normal distribution function evaluated at x
invnorm(x) inverse cumulative normal evaluated at x
bvnorm(h,k,r) bivariate normal probability with correlation r
phi(x) normal probability density evaluated at x
floor(x) greatest integer less than or equal to x
set x = sin(x+exp(y+1))
set x = bvnorm(0,0,0.5)
Other functions available in SST
x
by deviating it from its mean and
dividing by its standard deviation. To do this, you would use the
mean()
and stddev()
functions:
set z = (x-mean(x))/stddev(x)
sum(x) sum of the values of x,
mean(x) mean of x,
stddev(x) standard deviation of x,
x
.
COVA
command,
on the other hand, uses listwise deletion of missing values so that all
statistics are based on a common set of observations.)
DEFINE
command
described in Chapter 8 of the User's Guide.
Random number generators
SST makes available two random number generators that enable you to
perform Monte Carlo simulations. These are the urnd
function for
generating random variables with a uniform distribution over the
interval [0,1] and the nrnd
function for generating standard normal
random variables:
urnd uniform random variable
nrnd normal random variable
range obs[1-1000]
set x = 1.0 + 4.0*nrnd
Reserved variable names
SST allows you to use a number of predefined variables in SET
statements. These include:
nobs number of observations for the variable
obsno the number of the observation being evaluated
x
can be set equal to the observation
number:
set x = obsno
x
will equal one for the first observation, two for
the second observation, and so forth.
How to lag a variable
SST allows you to refer to the value of a variable for a particular
observation by enclosing the observation number in parentheses after
a reference to that variable. For example, the command:
set y = x(1)
y
equal to
whatever value happened to be stored as the first observation of the
variable x
. This device can be used to "lag" variables. For
example, to set xlag
equal to the value of x
lagged one period:
set xlag = x(obsno-1)
Conditional transformations
Sometimes you will want to perform a transformation on only a subset
of the data. This can be done by including the OBS
subop in the SET
statement.
The OBS
subops are optional; if present, they determine
which observations will be affected by the SET
statement.
SET
statement differs from other SST commands in that any
subops, such as OBS
must be separated from the
righthand side of the arithmetic expression in a SET
statement by a
semicolon. This is how SST knows that you are finished entering your
arithmetic expression. For example, you can't take the square root of a
negative number and it's best not to try to tempt SST to do so. If you
want to set y
to be the square root of x
when x
is
non-negative, give the command:
set y = sqrt(x); if[x>=0.0]
x
is negative.
Values not included in the range specified by the IF
subop, SST would have
assigned missing values to those observations for which the operation was
illegal.)
The RP subop
In performing a conditional transformation, observations not satisfying
the condition specified in the OBS
subop will not be affected
by the SET
statement. If the variable on the lefthand side of the
SET
statement is mentioned for the first time, these observations
will be assigned missing values. If the variable has already been
created, however, old values not in the active observation range
determined by the OBS
subops will be left with
their old values.
Occasionally you will want to replace old values outside of the
active observation range with missing values. To do this, use the
RP
subop and these observations will be
overwritten as missing data. For example:
set x = 1; obs[1-10] rp
x
will equal one. For
all other observations, x
will be missing, regardless of what previous
data values were stored there.
Calculating single values
Sometimes you may want to transform only a single value of a variable
or to check one data value. This can be done by putting SST into
"calculator mode". Type:
calc
CALC>
SET
statement and the answer will appear on your
screen:
CALC>sqrt(4)
2.00000
CALC>year(1)
1960.00000
year
corresponding to the first
observation. With vector functions, no reference to an observation number
is used. For example, to obtain the mean of the variable x
, type:
CALC>mean(x)
x
. To exit calculator
mode, type `quit' or `q'.
CALC
can be run from the command line without entering calculator
mode by typing the expression that you want on the command line:
calc 2+2
Recoding data
The RECODE
command allows you to reassign values of a variable.
You supply a list of variable names in the VAR
subop and a "map"
in the MAP
subop that instructs SST how to reassign the values
of the variables. In the MAP
subop, you provide a list of values,
enclosed in parentheses if the list consists of more than one value,
followed by a new value to which the old values in the list are to be
recoded. For example,
suppose the variable x
takes the values 1, 2 and 3, but we want to
make it into a 0-1 dummy variable with the old value 1 coded as 0 and the
values 2 and 3 coded as 1. Enter:
recode var[x] map[1=0,(2,3)=1]
recode var[x] map[1=0,2=1,3=1]
RECODE
command allows you to simplify the MAP
subop using
the keywords hi
, lo
, thru
and else
. For
example, to recode all negative values of x
as -1 and positive
values as 1, you could enter:
recode var[x] map[(lo thru 0)=-1,(0 thru hi)=1,0=0]
thru
, SST assumes that
the range includes its endpoints. When a value falls into more than one
range specified in the MAP
subop, the last recoding is the
one used. Thus, in the above example, zero would be recoded as one.
RECODE
command,
specify a new variable name for the recoded values using the TO
subop:
recode var[x] map[(lo thru 0)=-1,(0 thru hi)=1,0=0] to[y]
x
would remain unchanged while the
new variable y
would receive the recoded values. If the TO
subop
is specified, the same number of variables must be included in the TO
subop as in the VAR
subop or an error will occur.
RECODE
command can also be used in conjunction with the
OBS
, and RP
subops. OBS
control
which observations the recoding will be applied to, while RP
determines
whether or not old values of the variable not included in any of the
ranges specified in the MAP
subop will be recoded as missing.
Missing values are admissible as either old or new values of the variable
and are designated using either `md' or a period (`.').
RECODE
command can also be done using a series of
SET
statements. For example, the above recoding could also be
accomplished by:
set y = -1; if[x < 1]
set y = 1; if[x >= 2]
RECODE
command just decreases the amount of typing that is
necessary.
Setting missing values
The RECODE
command can be used to assign missing values. For
example, if you would like the value -99 of the variable x
to
be treated as missing, use:
recode var[x] map[-99=md]
recode var[x] map[md=-99]
RECODE
.
Creating dummy variables
Many times you may want to create a dummy variable which takes the value
one if some condition holds and otherwise takes the value zero. There are
two simple ways to create a dummy variable in SST which we now illustrate.
Suppose we want to create a dummy variable y
which equals one if the
variable x
is greater than or equal to 100 and zero otherwise. We
could use the RECODE
command:
recode var[x] map[(lo thru 100)=0,(100 thru hi)=hi] to[y]
SET
command:
set y = (x>=100)
x>=100
is
assigned the value one if it is true and zero if it is false. This
is a somewhat exotic usage of the SET
statement, but if you
understand it, it can save you some typing.