SST expansion filters


An Overview

SST uses several filters when parsing commands. These filters can be broken up into three distinct levels:

Level                   Filters
-----                   -------
command preprocessor    history, macros, arrays
scanner                 comments, output redirection
expansion               wildcards, ranges, sets

The command preprocessor and scanner filters are covered elsewhere; this section concerns the expansion level filters only. The expansion filters are a set of filters which process the list of variable names used by certain commands and subops. The wildcard filter substitutes wildcard symbols (`*' or `?') with all possible matching variable names. The range filter selects existing variables from a variable range. The set expansion filter can be used as a shorthand for specifying a number list.

When the expansion filters can be used

The expansion filters are only used inside those subops which take a list of variables as their argument:

CENSOR  WEIGHT  KEY     IV      BY      DEP     IND     VAR
PROB    RSD     SRSD    PRED    HAT     TO      IVALT   MODEL

The expansion filters also process the FOREACH and ARRAY commands. The list of values for the index of a FOREACH statement is passed through the expansion filters before being broken up into individual words. Similarly, the string to the right of the equals sign in an ARRAY command is sent through the expansion filters before being parsed into a list of words.

Wildcard expansion

The wildcard characters, `*' and `?', allow variable name substitution in a manner similar to MSDOS and UNIX filename substitution. If a word contains wildcard characters then the word is expanded to a list of matching variable names (separated by spaces). The following rules apply toward determining which variables match a wildcard string:

  1. Any character except a special character matches itself.
  2. A `?' matches any single character
  3. A `*' matches any string of characters (including the empty string)
  4. A backslash followed by a special character (`?', `*', or `"') matches that character (see section on quoting, below).

It is an error if a wildcard expression does not match any variables.

Consider the following sample list of variables:

pop15   pop45   pop75   pop105

The following examples illustrate some of the valid wildcard expansions:

pop1*   -->     pop15 pop105
pop1?   -->     pop15
pop1?*  -->     pop15 pop105
pop?5   -->     pop15 pop45 pop75
pop*5   -->     pop15 pop45 pop75 pop105

Range expansion

Range expansion operates on strings of the form

stemXXX-stemYYY

where XXX and YYY are two positive integers such that XXX <= YYY. SST replaces such strings with a list of all variables that have the prefix stem followed by a number in the range XXX to YYY (inclusive). Using our previous set of variables the following expansions hold:

pop10-pop75     -->     pop15 pop45 pop75
pop1-pop200     -->     pop15 pop45 pop75 pop105

Stem matching is performed using the wildcard matching algorithm so the characters `*' and `?' can appear in the stem:

p*1-p*100       -->     pop15 pop45 pop75

It is an error if the stems are not identical or if no variables in the range exist.

Set expansion

A set is a string of the form {set_description2 (the braces are required). The set description is one of two things:

  1. A list of words separated by commas or spaces
  2. A numeric range of the form start-stop (where start and stop are integers with start <= stop).

In either case the string expands to all the possible values of the set expression. The following examples illustrate the use of set expansion:

pop{15,17,992 -->     pop15 pop17 pop99
pop{15-202    -->     pop15 pop16 pop17 pop18 pop19 pop20

In addition, a head string and tail string may be concatenated with the set for more powerful expansions:

he{5,l,ar2d --> he5d held heard

Unlike wildcard and range expansions, the expanded words do not have to already exist as variables. The set description may also contain other sets:

{a,b{1,2,32,c2      -->     a b1 b2 b3 c

Lists

Most of the commands that use the expansion filters treat the expanded string as a list of words (often variable names). In SST a list is a string of words separated by either spaces or commas. Commas and spaces inside nested parentheses and braces are not considered word separators. One consequence of this definition of a word is that the expansion filters take a list of words and generate another list of words.

The grammar for the expansion routines might look something like:

list:   list ' ' word
        list ',' word

        word:   any number of non-special characters
        word '*' word                   # wildcard strings
        word '?' word
        word int '-' word int           # variable range
        word '{' list '2' word          # set expansion
        word '{' int '-' int '2' word
        '(' expression ')'              # parenthetical expression

Quoting

There are two mechanisms available for stopping the action of the special characters in the expansion filters. The first is the use of the backslash character, `\'. The backslash character has the effect of stopping the default action of any single character (this holds for all of SST, not just the expansion routines). Thus we could use a backslash to include a dash in a FOREACH command:

foreach (i; cmd\-file1 cmd\-file2) {
    run $i
}

Without the backslash, the expansion routines would treat the dash as part of a variable range. Since cmd-file1 is not a valid variable range an error message would be printed out. Backslashes can also be used to include spaces and commas within a word. We might want to do this if we want to run a FOREACH loop over a set of variable pairs:

foreach (i; a\ b, b\ c, c\ d) {
    reg ind[x] dep[$i]
}

This loop would execute the commands

reg ind[x] dep[a b]
reg ind[x] dep[b c]
reg ind[x] dep[c d]

For compatibility with MSDOS, where a backslash is often the directory separator, backslashes in front of regular characters are passed through without modification. To enter a single backslash in front of a special character without disabling the effect of that character precede the backslash with another backslash. Thus

\\{a,b,c2     -->     \a \b \c

Another method of temporarily turning off the expansion filters is the use of double quotes. If a string which is passed through the expansion filters is enclosed in double quotes all characters inside the quotes lose any special meaning. In particular:

  1. No expansions are performed on the string
  2. Any blanks or commas in the string are ignored for the purpose of splitting a string into a list of words.

To enclose a quote in a quoted string we precede it with a backslash. All other backslashes are treated just like a regular (non-special) character (i.e. they are passed through the filter unchanged). Our previous example of multiple regressions using pairs of variables could have been implemented as follows:

foreach (i; "a b", "b c", "c d") {
    reg ind[x] dep[$i]
}

Technical note:

In order to operate properly it is important that special characters have only one meaning, otherwise it is not possible to use the above mentioned escape symbols (quotes and backslashes) properly. As an example consider the problem of echoing the following words: `(1', `2', and `3' (the quotes are used here for clarity). The obvious solution seems to be:
foreach (i; \(1 2 3) echo $i

The backslash in front of the first open parenthesis is intended to stop the scanner from misreading about the nesting level. If we try this however, we get the following error message:

Error: missing )

The problem is that `\(' is converted by the scanner to `(' and is treated as a normal character (i.e. it does not increment the nesting level). When this is passed to the expansion filters they receive the following string:

(1 2 3

The expansion filters will read the unmatched open parenthesis (remember that a string contained in parenthesis is considered a word) and issue an error message.

Another attempt at a solution might be to use double or triple backslashes: "\\(1 2 3" or "\\\(1 2 3". However, if we were to covert double backslashes to a single backslash at each level then in order to enter a backslash in a FOREACH statement we would need to type 8 backslashes. For this reason SST does not convert double backslashes until the very last stage of processing.


SST Back SST