TABULATE & REPORT

This document is based on a SAS session (1992-1993) of KUCC (Korea University Computer Club) and a statistics seminar (1998) of the Department of Public Administration, the Graduate School of Korea University. It was primarily tested with SAS release 6.12, but there is no difference in grammars, except detailed system specifications and requirements, across the releases.

What is the SAS Tabulate?

PROC TABULATE "summarizes values for all observations in a data set and prints the summaries in table format. It can produce tables with one to three dimensions, and within each dimension it allows multiple variables to be reported one after another or hierarchically."

"PROC TABULATE constructs tables of descriptive statistics from class variables, analysis variables, and keywords for statistics." It "treats the values of variables used to create a table in one of the two ways: as discrete classifications of data (class variables) or as continuous values for which you can request descriptive statistics (analysis variables)."

If you want to directly generate complicated and customized tables without a spreadsheet program, the TABULATE will be best alternative. There might be some difficulties being familiar with the grammars of TABULATE, but it will highly worth investing some time and efforts to do that (I never agree with Mr. Cho!). As a good tool for the customized tables, TABULATE is equipped with high flexibility, various functionality, and diverse formats.

Although FREQ provides similar features including chi-square test (see the below cross table), it cannot support various statistics (e.g., mean, sum, and standard deviation). And PROC REPORT provides excellent functionality of generating professional reports, but its focus is on dealing efficiently with huge data set, not producing table-based outputs.

Available statistics are as follows. N and NMISS do not require any nonmissing observations. SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at ltast one nonmissing observation. VAR, STD, STDERR, CV, T, and PRT require at least two observations.

- CSS: The sum of squares corrected for the mean
- CV: The percent coefficient of variation, MEAN<>0
- MAX: The maximum value
- MEAN: The arithmetic mean
- MIN: The minimum value
- N: the number of observations with nonmissing variable values (frequency)
- NMISS: The number of observations with missing variable values
- PCTN: The percentage that one frequency represents of another frequency
- PCTSUM: The percentage that one sum represents of another sum
- PRT: The two-tailed p-value for Student's t with n-1 degree of freedom, STD>0
- RANGE: The range= MAX-MIN
- STD: The standard deviation
- STDERR: The standard error of the mean
- SUM: The weighted sum
- SUMWGT: The sum of weight
- USS: The uncorrected sum of squares
- T: Student's t for H0: population mean=0, STD>0
- VAR: The variance
- ALL: Total value

Basic Principles

The TABULATE statement is always accompanied by one or more TABLE statements specifying the tables to be produced. You need to specify the variables to be used in the table in either a VAR statement (for analysis variables) or a CLASS statement (for class variables), but not both.

Class variable means "any variable, numeric or character, that you want to use to classify data into groups or categories of information." Normally each class variable has a small number of discrete values or unique levels. It can have character, integer, or even decimal values, but the number of unique values should be limited. CLASS statement is to "identify variables in the input data set as class variables.

Analysis variable means "any numeric variable for which you want to compute statistics." It is a quantitative or continuous variable as well as a qualitative or discrete variable for which you want descriptive statistics. VAR statement is to identify analysis variables in the input data set.

Missing values: "If an observation contains missing values for any variable listed in the CLASS statement, the observation is not included in the table unless you specify the MISSING option in the PROC TABULATE statement." Missing values for class variables cause observations to be omitted (skipped) from the analysis performed to produce tables. "If an observation contains missing values for a variable listed in the VAR statement, the value is omitted from calculations of all statistics except N and NMISS; Missing value for analysis variable affect only the statistics for those variables. A different type of missing value that is often confused with missing values in observations is the missing value used to represent empty table cells."

"By default, PROC TABULATE evaluates each page it prints and omits columns and rows for combinations of class variable values that do not exist. To change the default, you can specify the PRINTMISS option in the TABLE statement, and TABULATE will print the same headings for each subtable."

TABULATE provides the PCTN and PCTSUM statistics to allow you to print the percentage of the value in a single cell to the value in another cell or to the total of a group of cells. You need to construct the PCTN or PCTSUM expression using a denominator definition that describes to TABULATE what categories of information should be summed to arrive at the denominator. Without the denominator, TABULATE automatically summarizes the values in all SUM cells (for the PCTSUM denominator) or all N cells (for the PCTN denominator): defaults of each.

The format for specifying a denominator definition is to enclose it in brackets <> that is appended to the PCTN or PCTSUM statistcs (e.g., PCTN<gender*marital>).To obtain the percentage of the value in one table cell to the total for the column containing the cell, use all class variables that define the row as the denominator definition (TABLE gender*marital, party*PCTN<gender*marital>;). For the percentage of the value in one table cell to the total for the row containing the cell, use all class variable that define the column as the denominator definition (TABLE gender, marital*party*PCTN<marital*party>;).

When you concatenate ALL with other elements in the column dimension, TABULATE prints a separate column that summarizes the observations reported in each row of the table.

Removing levels of headings: there are times when the multiple levels of heading for class and analysis variables are not necessary in your tables. this often occurs when you replace a default heading with more descriptive text that actually includes the information in two level of heading, table share='number of shares'*sum=' '; no cell for sum title

Typical Form of the Tabulate

PROC TABULATE DATA=korea F=12.;

CLASS gender marital;

VAR income;

TABLE gender,income*(N MEAN SUM STD MIN MAX)*(F=7.1)

LABEL income='Annual Income' marital='Marital Status';

RUN;

CLASS gender marital;

VAR income;

TABLE gender,income*(N MEAN SUM STD MIN MAX)*(F=7.1)

LABEL income='Annual Income' marital='Marital Status';

RUN;

- PROC TABULATE DATA=korea; It represents the beginning of the TABULATE function.
- F=12.; It is a format option, which represents the cell format of 12 digits without decimal point (default is 12.2).
- CLASS gender; It specifies classification variables that are located in the left (rows). All classification variables shown in the table must be specified beforehand.
- VAR income; It specifies variables to be analyzed and to be located in the column.
- Table gender,income*(N MEAN SUM STD MIN MAX); it provides the form of a table to be generated.
- Comma divides the location of variables; the former gender will be shown in the left (row) in the table and the latter income will be located in the column position.
- income*(N MEAN SUM STD MIN MAX); It specifies that mean, sum, and total will be included in the income variable.
- (F=7.1); It specifies the format of each cell, total 7 digit with 1 decimal digit.

- LABEL income='Annual Income'; It define the label to be shown in the output. Instead of 'income', 'Annual Income' will be shown in the table.
- RUN; It tells the computer to begin the procedure.

Options

TABULATE Specifications (Basic grammar)

PROC TABULATE <option list>;

CLASS...; VAR ...; TABLE ... <option list>;

FORMAT...; LABEL...; KEYLABEL...;

BY ...; FREQ...; WEIGHT...;

RUN;

CLASS...; VAR ...; TABLE ... <option list>;

FORMAT...; LABEL...; KEYLABEL...;

BY ...; FREQ...; WEIGHT...;

RUN;

PROC TABULATE <options> includes DATA=data set; DEPTH=number; FORMAT=format name; FC or FORMCHAR<Index list>='string'; MISSING; NOSEPS; ORDER=order; and VARDEF=divisor.

ORDER=FORMATTED (ordered by the formatted values); ORDER=DATA (the order that the observations are read from the data set); ORDER=FREQ (order the values of the class variables so the value that occurs most frequently in the data set appears first); ORDER=INTERNAL (ordered by the SORT procedure: defaults)

The FORMCHAR=a string of 11 characters. The default is FORMCHAR='|----|+---'; FORMCHAR=' 11 blank space '; for no horizontal and vertical line. FORMCHAR='4FBFACBFBC4F8F4FABBFBB'X; for IMB 1403 printer FORMCHAR='FABFACCCBCEB8FECABCBBB'X; for IBM 6670 printer FORMCHAR='B3C4DAC2BFC3C5B2C0C1D9'X;

MISSING considers missing values as valid levels for the class variables. Unless the Missing option is specified, TABULATE does not include observations with a missing value for one or more class variables in the analysis.

NOSEPS eliminates horizontal separator lines from the row titles and body of the printed table except the column title section of the table.

VARDEF=divisor specifies the divisor to be used in the calculation of the variances. If divisor is DF(default), the degrees of Freedom (N-1) is used as the divisor.

TABLE page expression, row-expression, column expression /option list;

/BOX=value specifies the text to be placed in the empty box above the row titles. It allows you to move the page heading into the top left box of the table or insert either a variable name (or label) or a descriptive string in the box. When BOX=_PAGE_, the page-dimension text appears in the box. If the page-dimension text does not fit, it is placed in its default position and the box is left empty. When BOX='sting', the quoted string appears in the box. Any name, label, or quoted string that does not fit is truncated. When BOX=variable, the name or label of a variable appears in the box. (TABLE gender*marital, income tax /BOX='Income and Tax';).

/CONDENSE prints multiple logical pages on a single physical page.

- Comma ,: Separates dimensions of table and crosses elements across dimensions
- Asterisk *: Crosses elements within a dimension
- Blank space: Concatenates elements in a dimension
- Parentheses (): Group elements and associate operator with entire group
- Brackets <>: Specify denominator definitions
- Equal sign =: Assigns a label to a variable or statistic, or completes a format modifier

/PRINTMISS species that row and column headings are the same for all logical pages of the table.

/ROW=spacing specifies whether all title elements in a row crossing are allotted space even when they are blank. When ROW=CONSTANT (OR CONST), the default, all row title elements have space allotted to them; when ROW=FLOAT, the row title space is divided equally among the nonblank title elements in the crossing.

/RTSPACE=number or RTS=number supplies an integer value that species the number of print positions allotted to the headings in the row dimension. the default value I on-fourth of the LINESIZE=value.

Format USAGE in the TABLE statement: variable*F=format (income*F=6.1).

Remove or change variable name in the TABLE statement (TABLE sum*stock=' ';)

FORMAT variable list formats the values of class variables used as headings in the page, row, and column dimensions. It may be used in combination with the PROC FORMAT; to group values of class variables. This statement has no effect on either analysis variables or the content of table cells.

PROC FORMAT;

VALUE inc_FMT

.='Income Unknown'

0-999='Under 999'

1000-4999='1,000-4,999'

5000-9999='5,000-9,999'

OTHERS='10,000 and Over';

PROC TABULATE DATA=korea F=12.;

CLASS ...; VAR income...;

FORMAT income inc_FMT;

TABLE ...

RUN;

VALUE inc_FMT

.='Income Unknown'

0-999='Under 999'

1000-4999='1,000-4,999'

5000-9999='5,000-9,999'

OTHERS='10,000 and Over';

PROC TABULATE DATA=korea F=12.;

CLASS ...; VAR income...;

FORMAT income inc_FMT;

TABLE ...

RUN;

KEYLABEL keyword='label' ...; replace text in the label anywhere the specified keyword is use, unless another label is assigned in the TABLE statement (KEYLABEL MEAN='Average';).

LABEL variable='label'; replaces the name of the variable in the page, row, or column heading where the variable appears. The maximum length of the label (in both KEYLABEL and LABEL)is 40 characters.

Hierarchical positioning: An Asterisk '*' between variables indicate statistics will be listed in a hierarchical position. On the other hand, a SPACE between variables indicates statistics of them will be listed in the parallel position, not in a hierarchical position. For instance, 'TABLE gender marital, income*...' will add two rows for the marital status below gender.

Frequency Cross Table (One by One)

PROC TABULATE; CLASS gender party;

TABLE gender ALL, party*(N PCTN<party<)*(F=4.) ALL*F=7. /RTS=8;

RUN;

TABLE gender ALL, party*(N PCTN<party<)*(F=4.) ALL*F=7. /RTS=8;

RUN;

/RTS=8; It is an option for adjusting the cell length (Equivalent to /RTSPACE=8;). You can compare the above with the following result of FREQ procedure (PROC FREQ; TABLE gender*party /NOCOL MISSING; RUN;).

ALL and Parentheses Usage

PROC TABULATE; CLASS gender marital party;

VAR income;

TABLE gender*marital, ALL*F=5. party*(N*F=5. PCTN='%'*F=4.1) income*(MEAN*F=5. STD*F=5. PCTSUM<gender*marital>='%' *F=4.1) /RTS=17 BOX='All Usage';

TABLE gender*marital ALL*F=5., party*(N*F=5. PCTN='%'*F=4.1) income*(MEAN*F=5. STD*F=5. PCTSUM<gender*marital ALL>='%' *F=4.1) /RTS=17 BOX='All Usage';

RUN;

VAR income;

TABLE gender*marital, ALL*F=5. party*(N*F=5. PCTN='%'*F=4.1) income*(MEAN*F=5. STD*F=5. PCTSUM<gender*marital>='%' *F=4.1) /RTS=17 BOX='All Usage';

TABLE gender*marital ALL*F=5., party*(N*F=5. PCTN='%'*F=4.1) income*(MEAN*F=5. STD*F=5. PCTSUM<gender*marital ALL>='%' *F=4.1) /RTS=17 BOX='All Usage';

RUN;

PROC TABULATE; CLASS gender marital;

VAR income;

TABLE (gender ALL)*marital*income=''*(MEAN*F=5. STD*F=4.1) /RTS=10;

TABLE gender*(ALL marital)*income=''*(MEAN*F=5. STD*F=4.1) /RTS=10 ;

RUN;

VAR income;

TABLE (gender ALL)*marital*income=''*(MEAN*F=5. STD*F=4.1) /RTS=10;

TABLE gender*(ALL marital)*income=''*(MEAN*F=5. STD*F=4.1) /RTS=10 ;

RUN;

Two Dimentional Tables

The following SAS script illustrates how to construct two dimentional tables.

PROC TABULATE; CLASS gender marital party;

VAR income;

LABEL income='Annual Income' marital='Marital Status';

TABLE (gender ALL)*(party ALL), marital*income*(N MEAN STD)*(F=5.) ALL*F=7.;

RUN;

VAR income;

LABEL income='Annual Income' marital='Marital Status';

TABLE (gender ALL)*(party ALL), marital*income*(N MEAN STD)*(F=5.) ALL*F=7.;

RUN;

PROC TABULATE; CLASS gender marital;

VAR income tax;

TABLE (gender*marital)*F=5. (income tax)*(MEAN STD)*F=5. /RTS=17;

TABLE gender*F=5.*(marital*F=5. (income tax)*(MEAN STD)*F=5.) /RTS=17;

RUN;

VAR income tax;

TABLE (gender*marital)*F=5. (income tax)*(MEAN STD)*F=5. /RTS=17;

TABLE gender*F=5.*(marital*F=5. (income tax)*(MEAN STD)*F=5.) /RTS=17;

RUN;

Three Dimensional Table

The following SAS script constructs a three dimentional table.

PROC TABULATE; CLASS gender marital party religion;

VAR income tax;

TABLE gender ALL, marital party, religion*(income*(MEAN STD)*F=5. tax* (MEAN STD)*F=5.)/RTS=10;

RUN;

VAR income tax;

TABLE gender ALL, marital party, religion*(income*(MEAN STD)*F=5. tax* (MEAN STD)*F=5.)/RTS=10;

RUN;

- marital party Notice that the blank between the variables causes the parallel arrangement of them. (Different from 'marital*party')
- income*(MEAN STD)*F=5.; The tabulate function allows a variety of formats, including decimal point position and cell length, in a table.
- Notice that there are two tables omitted here: one for Gender M and the other for ALL.

PCTN and PCTSUM Usage

PROC TABULATE; CLASS gender marital;

VAR income tax;

TABLE gender*marital ALL, (income tax)*

(SUM*F=5. STD*F=5. PCTSUM<gender*marital ALL>='%'*F=5.1)/RTS=17;

RUN;

VAR income tax;

TABLE gender*marital ALL, (income tax)*

(SUM*F=5. STD*F=5. PCTSUM<gender*marital ALL>='%'*F=5.1)/RTS=17;

RUN;

- PCTSUM<gender*marital ALL>; The percent sum will be calculated by each cell combination of the gender and marital status variables.
- ='%'*F=5.1; It specifies the title of the PCTSUM, '%', and its format of '5.1'.
- For example, 'age='Official Age'*MEAN='Average Age';' allows you to change the names of the variable and statistics. 'Offician Age' instead of 'age' and 'Average Age' instead of 'MEAN' will be shown in the table.
- Notice that the part of the right side is cut for the convenience's sake. But it (table for Christianian) has the same fashion as that of Buddhism.

PROC TABULATE;

CLASS gender marital;

VAR income tax;

TABLE gender ALL, (marital ALL)*(income tax)*F=4.*

(SUM*F=4. PCTSUM<ALL gender>='%'*F=3.) /RTS=8;

TABLE ALL gender, (ALL marital)*(income tax)*F=4.*

(SUM*F=4. PCTSUM<ALL gender>='%'*F=3.) /RTS=8;

CLASS gender marital;

VAR income tax;

TABLE gender ALL, (marital ALL)*(income tax)*F=4.*

(SUM*F=4. PCTSUM<ALL gender>='%'*F=3.) /RTS=8;

TABLE ALL gender, (ALL marital)*(income tax)*F=4.*

(SUM*F=4. PCTSUM<ALL gender>='%'*F=3.) /RTS=8;

More Complicated Tables

The following SAS scripts are examples of more complicated tables.

PROC TABULATE;

CLASS gender marital party;

VAR income tax;

TABLE gender*marital ALL, party ALL*(income*F=7. tax*F=6.)

/RTS=17 BOX='Default Test';

TABLE gender*marital ALL, (party ALL)*F=5.*(income*SUM=''*F=7. tax*

SUM=''*F=6.) /RTS=17 BOX='Sum of Each Cell';

RUN;

CLASS gender marital party;

VAR income tax;

TABLE gender*marital ALL, party ALL*(income*F=7. tax*F=6.)

/RTS=17 BOX='Default Test';

TABLE gender*marital ALL, (party ALL)*F=5.*(income*SUM=''*F=7. tax*

SUM=''*F=6.) /RTS=17 BOX='Sum of Each Cell';

RUN;

- party ALL; Default statistics of a nominal variable is N (frequency), so it will produce the number of observations falling in the each cell. The statement is equivalent to party*N ALL;
- income*F=7. tax*F=6. ; Default statistics of a ordinal or interval variable is SUM, so it will produce the sum of observations falling in the each cell. The statement is equivalent to 'income*SUM*F=7. tax*SUM*F=6.;'.
- /BOX='Default Test'; specifies the location of the table's name.
- SUM=' '; removes variable name in the table.
- The following is the result of adding an option, ...(party ALL)*F=5. * (income...

Standard Normal Distribution Table

The following SAS script, adapted from Kim(1993), generates the Normal probability distribution table using PROBNORM(), which returns the probability from the standard normal distribution.

DATA normal;

DO row=0.0 TO 3.4 BY 0.1;

DO column=0.00 TO 0.09 BY 0.01;

z=row+column;

prob=PROBNORM(z); OUTPUT;

END; END;

RUN;

PROC TABULATE DATA=normal;

CLASS row column;

VAR prob;

TABLE row, column*prob=''*SUM=''*F=5.4/RTSPACE=5;

LABEL row='Z' column='Standard Normal Distribution';

RUN;

DO row=0.0 TO 3.4 BY 0.1;

DO column=0.00 TO 0.09 BY 0.01;

z=row+column;

prob=PROBNORM(z); OUTPUT;

END; END;

RUN;

PROC TABULATE DATA=normal;

CLASS row column;

VAR prob;

TABLE row, column*prob=''*SUM=''*F=5.4/RTSPACE=5;

LABEL row='Z' column='Standard Normal Distribution';

RUN;

Notice that prob=' '*SUM=' '; removes variable names prob and SUM from the table, otherwise they would appear on the top of the table.

Student T Distribution Table

The following SAS script generates the Student T probability distribution table using TINV(), which returns a quantile from the t distribution.

DATA student;

DO df=1 TO 30 BY 1;

DO prob=.80, .85, .90, .925, .95, .975, .99;

t=TINV(prob, df); OUTPUT;

END; END;

RUN;

PROC TABULATE;

CLASS df prob;

VAR t;

TABLE df, prob*t=''*MEAN=''*F=7.4/RTSPACE=7 /BOX='T';

LABEL prob='Probability';

RUN;

DO df=1 TO 30 BY 1;

DO prob=.80, .85, .90, .925, .95, .975, .99;

t=TINV(prob, df); OUTPUT;

END; END;

RUN;

PROC TABULATE;

CLASS df prob;

VAR t;

TABLE df, prob*t=''*MEAN=''*F=7.4/RTSPACE=7 /BOX='T';

LABEL prob='Probability';

RUN;

REFERENCES

- SAS Institute Inc. 1990. SAS Guide to the TABULATE Processing. 2nd ed. NC: SAS Institute Inc.
- SAS Institute Inc. 1990. SAS Guide to the REPORT Procedure: Usage and Reference. 1st ed. NC: SAS Institute Inc.
- Korea University Computer Center. 1980's. SAS Workshop Manual
- Kim, Choong Ryun. 1993). The Statistics Package Called SAS: Focusing on the Statistics Analysis and Marketing Research Methods. Seoul: Data Research.
- Cho, In Ho. 1993. The SAS Lecture and Statistics Consulting. Seoul: Hanwha Institute of Economics.