Notes on ANOVA

ANOVA, or ANalysis Of VAriance, is a procedure for comparing two or more group means. Typically, it is only used for MORE than two groups, since a t-test is used when there are two groups. However, ANOVA may also be used with only two groups, and the results are identical to the t-test results.

We make the same assumptions for ANOVA that we did for the 2-sample t-test. That is, we assume that each population is normally distributed, that we have an independent sample from each population, and that the variances of the populations are all the same. As with the t-test, these are assumptions we should attempt to check before proceeding with ANOVA.

Open the heights dataset that was used in a previous assignment. Recall that 1 means female and 2 means male.

First, run a 2-sample t-test (in the Stat...Basic Statistics menu) to compare Males to Females assuming equal variances. Be sure to assume equal variances and a two-sided alternative. Keep the output to refer to later.

Now, run a One-way ANOVA procedure. "One-way" refers to the fact that there is only a single grouping variable, namely sex. Select the Stat...ANOVA...One-way menu and you should obtain the dialog box shown in Figure 4.


Figure 4

In the parlance of ANOVA, a factor is a categorical (grouping) variable, whereas the response is the quantitative measure that forms the basis for our comparisons. In this case, therefore, Sex is the factor and Height is the response. Enter these on the appropriate lines of the dialog box shown in Figure 4 and then click OK.

The output from the ANOVA consists of two parts: A table of summary statistics accompanied by a graph of individual 95% confidence intervals and the so-called ANOVA table. The ANOVA table has 6 columns (source, DF, SS, MS, F, and P) and 3 rows (Sex, Error, Total). Here is what those things stand for:

Source means "Source of variation";
DF stands for "Degrees of Freedom";
SS stands for "Sum of Squares";
MS stands for "Mean Square";
F is the column for the F statistic(s);
P is the column for the p-value(s).

Sex is the row containing information for the sex effect. It is also sometimes referred to as the between-group variation.
Error is the row containing information on the random variation not accounted for by the main (sex) effect. It is also sometimes referred to as the within-group variation.
Total is the sum of the between- and within-group entries for DF and SS.

Many of the quantities calculated for the t-test are present in the ANOVA table. Notice first of all that the p-value for the t-test is identical to the p-value for the F test in the ANOVA table. Secondly, find the pooled standard deviation and t-statistic from the t-test output try to locate them in the ANOVA table (you'll have to square the t-statistic to find it in the table). The lesson here is that the 2-sample t-test with equal variances is equivalent to the ANOVA F-test applied to the same problem.


dhunter@stat.psu.edu