This page illustrates how to compare group means using T-test, various ANOVA (analysis of variance) including the repeated measure ANOVA, ANCOVA (analysis of covariance), and MANOVA (multivariate analysis of variance).

Intro
(PDF) |
Data Structure |
ANOVA |
T-test |
One-way ANOVA |
Two-way ANOVA

Factorial | Latin Square | Split-Plot | Repeated | ANCOVA | MANOVA | References

Factorial | Latin Square | Split-Plot | Repeated | ANCOVA | MANOVA | References

INTRODUCTION

- The t-test and ANOVA examine whether group means differ from one another. The t-test compares two groups, while ANOVA can do more than two groups.
- The t-test ANOVA have three assumptions: independence assumption (the elements of one sample are not related to those of the other sample), normality assumption (samples are randomly drawn from the normally distributed populstions with unknown population means; otherwise the means are no longer best measures of central tendency, thus test will not be valid), and equal variance assumption (the population variances of the two groups are equal)
- ANCOVA (analysis of covariance) includes covariates, interval independent variables, in the right-hand side to control their impacts. MANOVA (multivariate analysis of variance) has more than one left-hand side variable.

Analysis | LHS (interval) | RHS (categorical) | Notes |

T-test | Single | Single (binary) | |

One-way | Single | Single | |

Two-way | Single | Two (multiple) | |

ANCOVA | Single | Multiple | Covariates |

MANOVA | Multiple | Multiple |

The following diagram summarizes the t-tes and one-way ANOVA.

- SAS has the UNIVARIATE, MEANS, and TTEST procedures for t-test, while SAS ANOVA, GLM, and MIXED procedures conduct ANOVA.
- The ANOVA procedure is able to handle balanced data only, but the GLM and MIXED procedures can deal with both balanced and unbalanced data. The t-test and one-way ANOVA do not matter whether data are balanced or not.
- STATA has the .ttest, and the .ttesti commands for t-test, and the .anova, and .manova commands conduct ANOVA. Note STATA .glm command is not used for ANOVA.

DATA STRUCTURE

It is useful to read multiple observations in a data line. Note that @@ is a line holder in SAS.

LIBNAME js 'c:\data\sas';

DATA js.data1;

INPUT group block $ response @@;

DATALINES;

1 A 34.5 1 B 54.5 1 B 25.8 3 C 54.8

2 B 54.8 3 A 15.8 2 C 14.5 2 A 15.1

...

RUN;

/* Data read ******************

1 1 A 34.5

2 1 B 54.5

3 1 B 25.8

...

*******************************/

DATA js.data1;

INPUT group block $ response @@;

DATALINES;

1 A 34.5 1 B 54.5 1 B 25.8 3 C 54.8

2 B 54.8 3 A 15.8 2 C 14.5 2 A 15.1

...

RUN;

/* Data read ******************

1 1 A 34.5

2 1 B 54.5

3 1 B 25.8

...

*******************************/

The DO statement allows to read more complicated data. You may list the particular numbers in the DO statement rather than set a range of values (e.g., DO treatment=1 TO 2;). The @ may not be omitted. This tip is very useful especially when you type in data for the randomized complete block design (RCB) and the Latin square design (LSD).

DATA js.data2;

DO block=1 TO 3;

DO treatment=1,5;

INPUT response @;

OUTPUT;

END;

END;

DATALINES;

4.91 4.63 4.76 5.04 5.38 6.21

5.60 5.08 4.91 4.63 4.76 5.04

...

RUN;

/* Data read *********************

1 1 1 4.91

2 1 5 4.63

3 2 1 4.76

4 2 5 5.04

5 3 1 5.38

...

**********************************/

DO block=1 TO 3;

DO treatment=1,5;

INPUT response @;

OUTPUT;

END;

END;

DATALINES;

4.91 4.63 4.76 5.04 5.38 6.21

5.60 5.08 4.91 4.63 4.76 5.04

...

RUN;

/* Data read *********************

1 1 1 4.91

2 1 5 4.63

3 2 1 4.76

4 2 5 5.04

5 3 1 5.38

...

**********************************/

If data are arranged in the long format, you need to rearranged into the wide format.

DATA js.wide1;

SET js.long;

IF period=1;

RENAME response=response1;

PROC SORT DATA=js.wide1;

BY id;

RUN;

...

DATA js.wide;

MERGE js.long1 js.long2 ...;

BY id;

RUN;

SET js.long;

IF period=1;

RENAME response=response1;

PROC SORT DATA=js.wide1;

BY id;

RUN;

...

DATA js.wide;

MERGE js.long1 js.long2 ...;

BY id;

RUN;

STATA has the .pkshape command to transform a data set in the latin square form into the corresponing data set for analysis.

. list, noobs

+---------------+

|id row c1 c2 c3|

|---------------|

|100 1 74 97 54 |

|101 2 54 84 25 |

|102 3 15 57 64 |

+---------------+

. pkshape id r c1-c3, order(abc cab bca) outcome(y) sequence(row) treat(treat) period(col)

+---------------+

|id row c1 c2 c3|

|---------------|

|100 1 74 97 54 |

|101 2 54 84 25 |

|102 3 15 57 64 |

+---------------+

. pkshape id r c1-c3, order(abc cab bca) outcome(y) sequence(row) treat(treat) period(col)

T-TEST

One Sample T-Test

The MU0 option specifies a value of the null hypothesis. The ALPHA option specifies the significance level. The T option in the MEANS procedure runs the t-test.

PROC UNIVARIATE MU0=0 ALPHA=.01;

VAR response;

RUN;

. ttest response=0, level(99)

VAR response;

RUN;

. ttest response=0, level(99)

PROC UNIVARIATE MU0=10 VARDEF=DF NORMAL ALPHA=.05;

VAR response;

RUN;

. ttest response=10

VAR response;

RUN;

. ttest response=10

PROC MEANS T PROBT;

VAR response;

RUN;

. ttest response=0

VAR response;

RUN;

. ttest response=0

PROC MEANS MEAN STD STDERR T VARDEF=DF PROBT CLM ALPHA=.01;

VAR response;

RUN;

VAR response;

RUN;

Paired T-Test

PROC TTEST;

PAIRED pre*post;

RUN;

. ttest pre=post,level(95)

PAIRED pre*post;

RUN;

. ttest pre=post,level(95)

Note that STATA .ttest command does not have the "unpaired" option. SAS PAIRED statement is able to compare multiple pairs.

PROC TTEST;

PAIRED (a b)*(c d);

RUN;

PAIRED (a b)*(c d);

RUN;

Two Independent Samples T-Test

The TTEST procedure reports two T statistics: one under the equal variance assumptio and the other for unequal variance. Users have to check the equal variance test (F test) first. If not rejected, read the T statistic and its p-value of pooled analysis. If rejected, read the T statistic and its p-value of Satterthwaite or Cochran/Cox approximation.

PROC TTEST COCHRAN;

CLASS male;

VAR response;

RUN;

. ttest response, by(male)

. ttest response, by(male) unequal

CLASS male;

VAR response;

RUN;

. ttest response, by(male)

. ttest response, by(male) unequal

STATA is able to conduct the t-test for two independnet samples even When data are arranged in two variables without a group varialbe. The unpaired option indicates that the two variables are independent, and the welch option asks STATA produces Welch approximation of degree of freedom. Note STATA does not give us Cochran/Cox approximation.

. ttest response1=response2, unpaired level(99)

. ttest response1=response2, unpaired unequal welch

. ttest response1=response2, unpaired unequal welch

T-Test on Aggregate Data

The FREQ statement in the TTEST procedure can handle aggregate data

PROC TTEST H0=5 ALPHA=.01;

CLASS smoke;

VAR lung;

FREQ count;

RUN;

CLASS smoke;

VAR lung;

FREQ count;

RUN;

STATA .ttesti command enables you to conduct t-test using aggregated descriptive statistics. The numbers listed are the number of observation, mean, and standard deviation of first sample and of second sample.

. ttesti 30 4.5 0.54 // One sample T-test

. ttesti 30 4.5 0.54 30 5.0 1.44 // Two sample T-test

. ttesti 30 4.5 0.54 30 5.0 1.44 // Two sample T-test

ONE-WAY ANOVA

This experimental design is often called completely randomized design (CRD). SAS has the ANOVA, GLM (Generalized Linear Model), MIXED Procedures for one-way ANOVA. Their usages are identical.

PROC ANOVA;

CLASS treatment;

MODEL response=treatment;

RUN;

CLASS treatment;

MODEL response=treatment;

RUN;

STATA has the .anova and .oneway command for one-way ANOVA.

. anova response treatment

. oneway response treatment, tabulate

. oneway response treatment, tabulate

You may add the MEANS statement in both ANOVA and GLM procedures to compute means of groups and perform multiple comparison tests such as DUNCAN, TUKEY, DUNNETT, and BON.

PROC GLM;

CLASS treatment;

MODEL response=treatment;

MEANS treatment /T DUNCAN;

RUN;

CLASS treatment;

MODEL response=treatment;

MEANS treatment /T DUNCAN;

RUN;

TWO-WAY ANOVA

Randomized Complete Block (RCB): Treatments are assigned at random within blocks of adjacent subjects, each treatment once per block. The number of blocks is the number of replications. Any treatment can be adjacent to any other treatment, but not to the same treatment within the block.

Again, the ANOVA, GLM, and MIXED conduct the two-way ANOVA with the identical usage.

PROC GLM;

CLASS treat1 treat2;

MODEL response=treat1 treat2;

RUN;

CLASS treat1 treat2;

MODEL response=treat1 treat2;

RUN;

In the case of the randomized complete block design, you may have one observation in each cell. So, including an interaction term is meaningless, producing awkward results. But it is noteworthy that the sum of squares due to error (SSE) is equivalent to the sum of squares of interaction (SSI).

You may compare group means using the MEANS or the LSMEANS (least squares means) statement. The LSMEANS statement is not available in the ANOVA procedure.

PROC ANOVA;

CLASS treatment block;

MODEL response=treatment block;

MEANS treatment block /TUKEY;

RUN;

CLASS treatment block;

MODEL response=treatment block;

MEANS treatment block /TUKEY;

RUN;

PROC GLM;

CLASS treatment block;

MODEL response=treatment block;

LSMEANS treatment block /ADJUST=TUKEY;

RUN;

CLASS treatment block;

MODEL response=treatment block;

LSMEANS treatment block /ADJUST=TUKEY;

RUN;

If there is subsamples, you need to use nested scheme as follows.

PROC GLM;

CLASS treatment sub;

MODEL response=treatment treatment(sub);

RUN;

. ttest response treatment / sub | treatment /

CLASS treatment sub;

MODEL response=treatment treatment(sub);

RUN;

. ttest response treatment / sub | treatment /

FACTORIAL DESIGN

If there are subsamples (more than one observation in each cell) in a two-way ANOVA, you may consider the interaction effects. This is the two-way factorial design on CRD.

Block1 | block2 | block3 | |

Treat1 | 54, 67, 87 | 57, 67 | 31, 54, 87, 95 |

Treat2 | 35, 67 | 54, 87, 15, 75, 55 | 68, 17, 16, 68 |

Treat3 | 98, 45, 12, 57, 87 | 31, 14, 54 | 24, 87 |

The interaction is expressed by asterisk (*). The | indicates all possible combinations. Thus, the following procedures return the same result.

PROC ANOVA;

CLASS treatment block;

MODEL response=treatment | block;

RUN;

PROC GLM;

CLASS treatment block;

MODEL response=treatment block treatment*block;

RUN;

CLASS treatment block;

MODEL response=treatment | block;

RUN;

PROC GLM;

CLASS treatment block;

MODEL response=treatment block treatment*block;

RUN;

You may compare group means using the MEANS or the LSMEANS (least squares means) statement. The LSMEANS statement is not available in the ANOVA procedure.

PROC ANOVA;

CLASS treatment block;

MODEL response=treatment | block;

MEANS treatment block treatment*block/TUKEY;

RUN;

CLASS treatment block;

MODEL response=treatment | block;

MEANS treatment block treatment*block/TUKEY;

RUN;

PROC GLM;

CLASS treatment block;

MODEL response=treatment | block;

LSMEANS treatment | block /ADJUST=TUKEY;

RUN;

CLASS treatment block;

MODEL response=treatment | block;

LSMEANS treatment | block /ADJUST=TUKEY;

RUN;

Two-Way Factorial Design on RCB

PROC GLM;

CLASS treat1 treat2 block;

MODEL response=treat1 treat2 block treat1*treat2;

RUN;

. anova response treatment block treatment*block

CLASS treat1 treat2 block;

MODEL response=treat1 treat2 block treat1*treat2;

RUN;

. anova response treatment block treatment*block

Three-Way Factorial Design on RCB

PROC GLM;

CLASS treat1 treat2 treat3 block;

MODEL response=treat1 treat2 block treat1*treat2 treat1*treat3 treat2*treat3 treat1*treat2*treat3;

RUN;

CLASS treat1 treat2 treat3 block;

MODEL response=treat1 treat2 block treat1*treat2 treat1*treat3 treat2*treat3 treat1*treat2*treat3;

RUN;

SPLIT-PLOT DESIGN

Split-Plot Design on CRD

PROC GLM;

CLASS treat repeat sub;

MODEL response=treat sub treat(repeat) treat*sub;

RUN;

CLASS treat repeat sub;

MODEL response=treat sub treat(repeat) treat*sub;

RUN;

Split-Plot Design on RCB

PROC GLM;

CLASS treat block sub;

MODEL response=treat block sub treat*block treat*sub;

RUN;

CLASS treat block sub;

MODEL response=treat block sub treat*block treat*sub;

RUN;

LATIN SQUARE DESIGN

The latin square design (LSD) has the equal number of rows, columns and treatments. Treatments are assigned at random within rows and columns, with each treatment once per row and once per column. Each cell of the squared table has only one observation. This LSD is useful to control variation in two row and column.

PROC GLM;

CLASS row column treatment;

MODEL response=row column treatment;

RUN;

.anova response row column treat

CLASS row column treatment;

MODEL response=row column treatment;

RUN;

.anova response row column treat

The degree of freedom of main effects (block, group, and treatment) is r, the number of row or column. The degree of freedom of SSE is (r-1)(r-2). Finally, the degree of freedom of SST is N-1 = r*r-1.

REPEATED MEASURE ANOVA

The REPEATED statement in SAS and the repeated() option are used to indicate repeated measure analysis.

PROC GLM;

CLASS treat block;

MODEL resp1 resp2 resp3=treat block;

REPEATED response;

RUN;

PROC GLM;

CLASS treat block;

MODEL output1 - output5 = treat block;

REPEATED id 5 (0 1 2 3 4) / POLYNOMIAL SUMMARY PRINTE;

RUN;

. anova response treat time, repeated(time)

CLASS treat block;

MODEL resp1 resp2 resp3=treat block;

REPEATED response;

RUN;

PROC GLM;

CLASS treat block;

MODEL output1 - output5 = treat block;

REPEATED id 5 (0 1 2 3 4) / POLYNOMIAL SUMMARY PRINTE;

RUN;

. anova response treat time, repeated(time)

RANDOM EFFECT MODELS

The followings are examples of random effects models using MIXED and GLM.

PROC MIXED;

CLASS treat block;

MODEL response = treatk /SOLUTION;

RANDOM block /SOLUTION;

RUN;

CLASS treat block;

MODEL response = treatk /SOLUTION;

RANDOM block /SOLUTION;

RUN;

PROC MIXED COVTEST METHOD=TYPE3;

CLASS subject type; /* type is a characteristic of subject */

MODEL response = type /SOLUTION;

RANDOM subject(type) /SOLUTION;

LSMEANS type /DIFF;

RUN;

CLASS subject type; /* type is a characteristic of subject */

MODEL response = type /SOLUTION;

RANDOM subject(type) /SOLUTION;

LSMEANS type /DIFF;

RUN;

PROC GLM COVTEST;

CLASS subject type; /* type is a characteristic of subject */

MODEL response = type subject(type);

RANDOM subject(type) /TEST;

RUN;

CLASS subject type; /* type is a characteristic of subject */

MODEL response = type subject(type);

RANDOM subject(type) /TEST;

RUN;

PROC MIXED COVTEST;

CLASS town block plant treat ;

MODEL response = treat /SOLUTION;

RANDOM area plant area*plant block(area) /SOLUTION;

RUN;

CLASS town block plant treat ;

MODEL response = treat /SOLUTION;

RANDOM area plant area*plant block(area) /SOLUTION;

RUN;

ANCOVA

ANCOVA controls variation in an experiment by measuring an independent factor on each experimental subject.

PROC GLM;

CLASS treat;

MODEL response=covariate treat /SOLUTION;

LSMEANS treat /STDERR;

RUN;

. anova response treat covariate, continuous(covariate)

CLASS treat;

MODEL response=covariate treat /SOLUTION;

LSMEANS treat /STDERR;

RUN;

. anova response treat covariate, continuous(covariate)

MANOVA

The MANOVA statement indicates that this model is the multivariate analysis of variance.

PROC GLM;

CLASS treat1 treat2;

MODEL response1-response3= treat1-treat5/NOUNI;

MANOVA H=treat;

RUN;

. manova response1-response3 = treat1-treat5

CLASS treat1 treat2;

MODEL response1-response3= treat1-treat5/NOUNI;

MANOVA H=treat;

RUN;

. manova response1-response3 = treat1-treat5

REFERENCES

- Littell, Ramon C., Walter W. Stroup, and Rudolf J. Freund. 2002. SAS for Linear Models, 4th ed. Cary, NC: SAS Institute.
- Littell, Ramon C., George A. Milliken, Walter W. Stroup, and Russell D. Wolfinge. 2006. SAS System for Mixed Models. 2nd ed. Cary, NC: SAS Institute.
- Stata Press. 2003. Stata Base Reference Manual Release 8. College Station, TX: Stata Press.
- http://www.tfrec.wsu.edu/ANOVA/