Annual data for the period from 1960 to 1980 are taken from the Economic Report of the President. The data are as follows:
Year Money Inflation Unemployment Party 1960 0.7 1.6 5.5 1 1961 3.2 0.9 6.7 0 1962 1.8 1.8 5.5 0 1963 3.7 1.5 5.7 0 1964 4.6 1.5 5.2 0 1965 4.7 2.2 4.5 0 1966 2.5 3.2 3.8 0 1967 6.6 3.0 3.8 0 1968 7.7 4.4 3.6 0 1969 3.2 5.1 3.5 1 1970 5.3 5.4 4.9 1 1971 6.5 5.0 5.9 1 1972 9.3 4.2 5.6 1 1973 5.5 5.7 4.9 1 1974 4.4 8.7 5.6 1 1975 5.0 9.3 8.5 1 1976 6.6 5.2 7.7 1 1977 8.1 5.8 7.1 0 1978 8.3 7.3 6.1 0 1979 7.2 8.5 5.8 0 1980 6.4 9.0 7.1 0
Money
is the money supply growth rate (percent increase in M1
over each year). Inflation
is the percent increase in the implicit
GNP price deflator. Unemployment
is measured as a percent of
the civilian labor force. Party
is the party holding the presidency
(one for Republicans, zero for Democrats).
ENTER
, READ
, and
LOAD
. ENTER
is used to input small amounts of data from the
keyboard, READ
to enter data stored in text files, and
LOAD
to enter previously saved data from SST or other programs.
ENTER
command is used to enter new data or change existing data
from the keyboard in interactive mode. You tell SST the variables you
wish to create or alter and a range of observations and then SST prompts
you for data values. The syntax for the ENTER
command is:
enter to[variable list] obs[observation list]
SST will prompt you for data values on the variables
specified in the TO
subop in the range specified by the OBS
subop. When finished with data entry, type the letter `q' or `quit'.
SST will supply the variable name, with the observation number in parentheses, followed by the current value of that variable in brackets. ("MD" indicates missing data if no value currently exists for the particular observation.) You can either change the value by typing a new value, followed by a carriage return, or leave the value unchanged by typing a carriage return. To enter a missing value, type either `MD', `md' or a period `.'.
Multiple values can be entered on one line, separated by blanks or commas. After the carriage return is pressed, the program the prompts you for the next data value. If you have not supplied a data value for all variables for the particular observation, it will remind you which variable comes next. If all data has been entered for a particular observation, it then prompts you for the next observation.
For example to enter data to the data listed at the start of this chapter, type:
enter to[year money inflat unemp party] obs[1-21]
SST responds with the prompt:
year(1) [ MD ]:
You could then type `1960' followed by a carriage return. The remainder of the session might continue as follows (carriage returns are entered after each list of data values):
money(1) [ MD ]: 0.7 inflat(1) [ MD ]: 1.6 unemp(1) [ MD ]: 5.5 party(1) [ MD ]: 1
To speed things up, you may want to type more than one value after each prompt. For example:
year(2) [ MD ]: 1961 3.2 0.9 unemp(2) [ MD ]: 6.7 0
Thus, the value of inflat
for observation 2 is 3.2, the value of
unemp
for observation 2 is 0.9, and so forth.
For small amounts of data, ENTER
works well, but you probably
would not want to enter large amounts of data this way. To stop
entering data at any point during the ENTER
command, just type
`q' or `quit':
year(3) [ MD ]: quit
and SST will be ready to accept new commands.
SST allows you to designate some values as missing with the ENTER
command. When asked for a value, type either `MD' or a period (`.')
when prompted, and SST will mark that data value as missing.
ENTER
command lets you input data from
the keyboard in response to prompts. In some situations, however, it
may be faster to create a text file with your data and input it
using the READ
command. For instance, you may have already
typed your data into a file or have been given your data in this format.
Text files on the IBM-PC (and most other computers) respresent characters
using standard ASCII codes that can be displayed by using the DOS
type command. Some spreadsheet and statistical programs (including
SST) store data in a more compact binary format that cannot be
displayed using the type command. If your data is in this form,
check the LOAD
command for details.
A data file can be created using a text editor or word processor (such as WordStar). If a word processor is used, be sure to use the "non-document" mode so that your word processor does not insert "invisible" control characters into the text file. SST normally ignores control characters. Although data need only be separated by a comma, space, or carriage return, the data set will be easier for you to read if the data is separated into fixed columns. You may want to set up tabs to input data into a fixed column format.
For our example, the "observations" correspond to years. For the data to be organized by observation, the input file would look like:
1960 0.7 1.6 5.5 1 1961 3.2 0.9 6.7 0
and so on. On each line of the input file, there are five data values
corresponding to the year, money supply growth rate, inflation rate,
unemployment rate, and party holding the presidency for the particular
observation (year) in question. The data for a single observation can
occupy more than one line of the input file, but in general you might
think of a data file organized by observation as being a rectangular
array with variables defining columns and rows defining observations.
Unless you state otherwise, the READ
command expects data to
be stored by observation.
Data organized by variable for the above example might look like:
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 0.7 3.2 1.8 3.7 4.6 4.7 2.5 6.6 7.7 3.2 5.3
and so on. For data organized by variable, all data is input for one variable before data is entered on the next variable. If the data for each variable could fit on one line of the input file, then data organized by observation could be viewed as a rectangular array with variables defining rows and observations defining columns.
In the example we are using, the distinction between variables and
observations is probably pretty clear. In other cases, however, the
distinction will depend on how one wants to use the data. For example, each
year a number of organizations prepare forecasts of GNP, inflation and
other macroeconomic indicators. Suppose one wanted to analyze this data. In
this case, it is not obvious what the variables should be. One possibility
is to make the variable the forecast of a particular organization. Thus one
has variables like OMB
, CBO
, DRI
, CHASE
, and
WHARTON
, with observations defined by which macroeconomic indicator
is being forecast. Another possibility is to have variables corresponding
to different macroeconomic indicators with each obseration corresponding to
the organization that produced the forecast. Which way of thinking about
the data in terms of variables and observations will depend on how one
wants to use the data.
Note that the computer simply reads left to right, by row. Thus there is no difference to the computer between the following two data sets:
4.2 3.5 4.0 3.3 0.5 1.0 1.5 1.2
and
4.2 3.5 4.0 3.3 0.5 1.0 1.5 1.2
Of course if one is creating the data set, it may be simpler to read if each column and row correspond to a different variable or observation.
To summarize, decide which are variables and which are observations for your purpose. If the data set exists, then see whether the computer reading by row will first see different observations for the first variable (data by variable) or the values of different variables for the first observation (data by observation). If you are creating the data set, the method of organization is a matter of convenience.
READ
command you must supply two pieces of information:
the name of the file which contains the data and the names of the variables
in the file. If the data in our example were organized by obseration,
the SST command to read the data from the file mydata
would be:
read to[year money inflat unemp party] file[mydata]
Unless instructed otherwise, the READ
command expects data to
be organized by obseration. If the data in the file mydata
were
organized by variable, the appropriate command would be instead:
read to[year money inflat unemp party] obs[1-21] byvar
The BYVAR
subop tells SST that the data is organized by
variable instead of by observation.
READ
command. First, the data file may contain illegal characters. Data
files used in the READ
command should only contain valid numbers.
Valid numbers can be in integer, decimal, or exponential format. For
example:
1 1.0 +1.0e0
are all examples of valid numbers (each with the same meaning). On the other hand, a file containing:
1 abc 2.0
would cause SST to issue an error message and abort the READ
command.
Second, the number of values in the data file may not correspond to
your instructions in the READ
command. The number of data values
in a file should be a multiple of the number of variables specified in
the TO
subop. If data is read by variable, SST has no way to
determine the number of observations in the file other than to divide
the total number of data values in the file by the number of variables.
If these numbers are different, it assumes that you have made a mistake
and issues an error. If data is read by variable, it will issue a warning,
generally this means something is amiss and you should examine your
data file.
Third, reading data normally
requires two passes through the data file: one to determine
how many observations are in the data file, and another to process the
data. If you know how many observations are in a file, you can speed up
the READ
command by giving it this information in the NOBS
subop:
read to[year money inflat unemp party] nobs[21]
In reading large datasets, you may run out of memory. Reading data
by variable is somewhat more efficient than reading data by observation.
The former only requires that the data on a single variable fit into
memory at once, while the latter requires that the entire data set fit
into memory at once. If you run out of memory, try entering data in
smaller batches and saving them using the LOAD
command which
can handle very large data sets efficiently.
1 2 3 , 4 5
the same as it will read:
1,2,3 4,5
For your own sanity, we suggest that you use a consistent system for data entry, but don't worry about SST -- it's very tolerant.
Users accustomed to mainframe computing often prefer to store their
data in a fixed column format without spaces, commas, or other delimiters
between data values. This format saves space, though it is somewhat
difficult to examine. SST allows users to specify a FORTRAN style format
statement for data in this form using the FMT
subop.
In fixed format the data are required to appear in specified positions
within the file. A summary of FORTRAN format statements appears in an
appendix so we will only provide a few simple examples here. The letter
F
in a FORTRAN format statement tells SST that you will be inputting a
floating point number. The letter F
is followed by an integer indicating
how many columns the number will occupy. Thus F3
tells SST that you
will be inputting a floating point number occupying three columns in
the data file. For example, the following data file:
123456
could be read using the FMT
statement:
fmt[F3,F3]
The first number (123
) occupies three columns and the second number
(456
) also occupies three columns. Instead of repeating the
specification F3
twice, you could specify repeats of the same
specification by preceding the letter F
with an integer indicating the
number of times the specification is to be repeated. Thus:
fmt[2F3]
is equivalent to the specification above. FORTRAN format statements are quite flexible, though perhaps a bit complicated for new users.
LIST
command:
list
SST will now provide you a listing of all variables entered, the number of non-missing observations on each variable, the date created, and the variable's label, if any (see below, for details of how to label a variable). For example:
Listing of variables in memory: year 21 Thu Jan 09 14:41:06 1986 money 21 Thu Jan 09 14:41:06 1986 change in M1 from year earlier inflat 21 Thu Jan 09 14:41:06 1986 change in GNP implicit price deflator unemploy 21 Thu Jan 09 14:41:06 1986 civilian unemployment rate party 21 Thu Jan 09 14:41:06 1986 republican president dummy
Are all the variables entered that you thought
should be entered? Does each variable have the number of observations
that you expected? If you just input a variable using the READ
command, the date and time on the variable should be very recent.
Even if the information supplied by the LIST
command is what you
expected, you will still want to check if the data values are correct.
There are several ways to do this. If you don't have too much data, you
can examine it using the PRINT
command. For example, type:
print var[year money]
and SST will print the values of the variables year
and money
that you have input. For large datasets,
you will probably want to restrict the observations printed by
specifying a limited observation range:
print var[year money] obs[1-10]
The OBS
subop restricts which values will be printed out on
the screen. The above example would only print the data for observations
one through ten. Alternatively, the observation range can be restricted
using the IF
subop:
print var[year money] if[year > 1975]
which would print out data for years after 1975.
Another way to check the data that you have input is to compute some
descriptive statistics on the data. If the data are discrete (i.e., take
only a few distinct values), the FREQ
command will show you
which values the variable takes and the percentage of observations
falling into each category. For example:
freq var[party]
would compute a frequency distribution for the variable party
.
For variables that take a large number of distinct values (any of
the other variables in our data set), the COVA
command will
produce a few useful descriptive statistics on the variable:
cova var[year money inflat unemp]
The COVA
command automatically produces the mean, standard
deviation, minimum, and maximum of the variable specified in the
VAR
subop. Usually if there has been some error in data entry,
one or more of these statistics will tip you off.
Further details of the PRINT
, FREQ
and COVA
commands
can be found in Chapter 4 of the User's Guide.
LABEL
command. For example,
type:
label var[money] lab[change in M1 from year earlier]
Inside the LAB
subop, you type whatever description you want
attached to the variable. The variable label ordinarily should not
exceed thirty characters. The label will be printed when you issue the
LIST
command and at other points when you access the variable.
party
which takes only a few values (in our case,
Republican and Democratic) can also have labels assigned to specific
values. We have coded party equal to one when Republicans hold the
White House and zero when the Democrats hold the White House:
label var[party] val[1 Repub 0 Democrat]
In the VAL
subop, you first list a value of the variable in
the VAR
subop followed by its label and continue until you
have finished labelling the values. Value labels are restricted to
a maximum of eight characters and must not contain spaces or
commas. Multiple variables whose categories have the same variables
can be labelled simultaneously by specifying more than one variable
name in the VAR
subop. In principle, there is no limit to the
number of value labels that can be assigned, but few people have
enough patience to type more than ten labels.
LABEL
command with the VAR
subop, but
omit both the LAB
and VAL
subops, SST will remove all
labelling information from the variables specified. Since labelling
information requires relatively little storage and is an invaluable
reminder when you return to a data set that you have not worked with
for awhile, we recommend that you keep as much labelling information
as possible.
READ
command). The command to save all data in currently in SST into
a file mydata.sav
is:
save file[myfile]
SST automatically adds the extension `.sav' to the filename you
specify in the FILE
subop. (If for some reason you wanted another
extension, you would have to specify the full filename and extension in
the FILE
subop.) You may not want to save all variables
in memory. In this case SST allows you to list which variables you
want saved:
save file[myfile] var[year money inflat]
Alternatively, you might only want to save some subset of the
data. To save only the first ten observations, add an observation
range using the OBS
subop:
save file[myfile] obs[1-10]
The observation range can also be restricted using the IF
subop. To save only the post 1975 data, type:
save file[myfile] if[year > 1975]
LIST
command
with the FILE
subop:
list file[myfile]
SST only reads the "header" off the system file, so issuing this command does not cause the data to be actually entered into SST. It tells you what is in the file, but does not waste time reading through the entire file.
LOAD
command is used to load a data set previously saved during
an SST session. Once you have gone to the trouble of saving data in the
form of an SST system file, reloading it is easy. Just type;
load file[myfile]
and SST loads the data and labelling information. It's fast and simple. If
no filename extension is specified in the FILE
subop, SST assumes
the extension `.sav'. To load only selected variables stored in the
system file myfile.sav
, include the variables that you want in the
TO
subop:
load file[myfile] to[year money]
LOAD
command, and the additional
variables will be loaded into memory. (Caution: If some of the variables
in the second data set have the same names as variables in the first,
the old values will be overwritten.)
On other occassions, you may have two or more samples of data on the
same variables. For example, you may have several household
expenditure surveys conducted in different years. The variables in
each data set are the same (or at least overlapping) and you want
to combine the various samples. To do this, just add the APPEND
subop to the LOAD
command:
load file[yourdata] append
The variables in the file yourdata
will be appended to whatever
variables are currently in memory. The starting observation for the new
data is determined by the maximum observation number of the data currently
in memory (which can be determined using the LIST
command).
LOAD
command.
Give the LOAD
command and add the DB2
subop:
load file[filename] db2
SST assumes an extension of `.dbf' to the filename specified in the
FILE
subop, unless told otherwise. (dBASE II uses this extension
by default when it produces a file in its standard format.)
Another common format for files produced by spreadsheet programs is the DIF (Data Interchange Format) format used by VisiCalc and other programs. To load a DIF file, enter:
load file[filename] dif
If no extension is specified, `.dif' is assumed when the DIF
subop is present. With DIF files, column labels are used for variable
names. If no column labels are present, names are assigned by SST.
DEL
command:
del var[x y]
After they have been deleted, the variables x
and y
are lost
unless you previously saved them in a file. Use the DEL
command with caution!
A wholesale delete of all variables (and everything else) from
memory can be accomplished using the CLEAR
command:
clear
CLEAR
also resets the range, so you should issue a new RANGE
statement after the CLEAR
command. The primary use of the CLEAR
command is to restart an SST sesssion without having to QUIT
and
reload the program into memory. Remember, however, that CLEAR
removes everything from memory.
It does not affect files that have been written to disk, but it is your
responsibility to SAVE
any data that you will need in the future.
SORT
command sorts observations specified in the VAR
subop according to values of the variables specified in the BY
subop. If more than one variable is specified in the BY
subop,
the sort is lexicographic--that is, first the data is sorted according
to the first variable, and the second variable is only used to break ties
in the first variable, and so on. Variables are sorted in ascending
order: low values are put ahead of high values. Missing values are
treated as large values so that missing values in the variables specified
in the BY
subop tend to end up at the bottom of the data file.
The VAR
subop is optional. If it is omitted, SST assumes that you
want all of your data sorted so that observations are kept intact.
The SORT
command writes over the variables specified in the
VAR
subop; it is wise to save your data using the SAVE
command prior to using SORT
.
One use of the SORT
command is to arrange your data in a way that
permits visual (as opposed to statistical) analysis. For example,
suppose wanted to examine the relationship between growth in the money
supply and inflation.
You could sort the data by money supply growth, and then look at the
associated inflation rates, as below:
sort by[money] print var[year money inflat unemp] OBS VARIABLES year money inflat unemp 1: 1960 0.7 1.6 5.5 2: 1962 1.8 1.8 5.5 3: 1966 2.5 3.2 3.8 4: 1969 3.2 5.1 3.5 5: 1961 3.2 0.9 6.7 6: 1963 3.7 1.5 5.7 7: 1974 4.4 8.7 5.6 8: 1964 4.6 1.5 5.2 9: 1965 4.7 2.2 4.5 10: 1975 5.0 9.3 8.5 11: 1970 5.3 5.4 4.9 12: 1973 5.5 5.7 4.9 13: 1980 6.4 9.0 7.1 14: 1971 6.5 5.0 5.9 15: 1967 6.6 3.0 3.8 16: 1976 6.6 5.2 7.7 17: 1979 7.2 8.5 5.8 18: 1968 7.7 4.4 3.6 19: 1977 8.1 5.8 7.1 20: 1978 8.3 7.3 6.1 21: 1972 9.3 4.2 5.6
It appears that years with high money supply growth rates accompany years with high inflation rates. This relationship could then be investigated further using the statistical procedures described in later chapters.
The same sorting could be accomplished with:
sort by[money] var[year money inflat unemploy year party]
since SST assumes that you want all variables sorted if the VAR
subop is omitted. If we had specified only a subset of variables in the
VAR
subop, then the data on different observations would be
split up.
DIF
or DB2
subops to the SAVE
command, SST will write either a DIF file or a dBASE II file. For
DIF files, variable names will be used for column names. For a
dBASE II files, variable names will be used for field names.
WRITE
command. Unless a FORTRAN format is specified using the FMT
subop, the data will be output with a space separating each data
value. The default output format is by observation. For example:
write var[year money] file[myfile.out]
would create a file myfile.out
with contents:
1960 0.7 1961 3.2
and so on.