-
Class 5: ANOVA (Analysis of Variance) and
F-tests
I. What is ANOVA
What is ANOVA? ANOVA is the short name
for the Analysis of Variance. The essence
of ANOVA is to decompose the total
variance of the dependent variable into two
additive
components, one for the
structural part, and the other for the stochastic
part, of a regression.
Today we are
going to examine the easiest case.
II. ANOVA: An Introduction
Let the model be
y
?
X
?
?
?
.
Assuming x
i
is a
column vector (of length p) of independent
variable values for the
i
th'
observation,
y
i<
/p>
?
x
i
'
?
?
?
i
.
Then
x
i
p>
'
b
is the predicted
value.
sum of squares total:
SST
?
?
y
i
?
Y
< br>?
?
y
i
?
x
i
'
b
?
x
i
'
p>
b
?
Y
?
?
?
y
i
?
x
i
< br>'
b
?
?
?
x
i
'
b
?
Y
?
2
p>
?
?
y
i
?
x
i
'
b
?
< br>x
i
'
b
-
Y
2
?
?
2
?
?
2
?
?
2
p>
?
?
?
?
because
?
?
y
?
x
'
b<
/p>
?
?
x
'
b
?
Y
?
?
?
e
?
x
'
b
?
Y
?
?
0
.
?
?
?
e
i
?
?
?<
/p>
x
i
'
b
?
Y
2
2
i
i
i
i
i
This is always true by
OLS.
= SSE + SSR
Important: the total variance of the
dependent variable is decomposed into two additive
parts:
SSE, which is due to errors, and
SSR, which is due to regression.
Geometric interpretation:
[
blackboard
]
Decomposition of Variance
If
we treat X as a random variable, we can decompose
total variance to the between-group
portion and the within-group portion in
any population:
V
?
y
i
?
?
V
?
x
i
p>
'
?
?
?
V
?
?
i
?
Prove:
V
?
p>
y
i
?
?
V
x
i
'
?
?
?
i
< br>
?
V
x
i
'
?
?
V
?
?
i
?
p>
?
2
Cov
x
p>
i
'
?
,
?
i
i
i
?
?
?
< br>?
?
V
?
x
'
?
?
?
V
?
?
?
p>
?
?
(by
the assumption that
Cov
x
k
'
?
,
?
?
0
,
for all possible k.)
The ANOVA table is
to estimate the three quantities of equation (1)
from the sample.
As the sample size
gets larger and larger, the ANOVA table will
approach the equation closer
and
closer.
In a sample, decomposition of
estimated variance is not strictly true. We thus
need to
separately decompose sums of
squares and degrees of freedom. Is ANOVA a
misnomer?
III. ANOVA in
Matrix
I will try to give a simplied
representation of ANOVA as follows:
?
p>
?
SST
?
?
p>
y
i
?
Y
?
?
y
i
?
Y
?
< br>2
Y
y
i
2
?
?
2
?
2
?
?
p>
?
y
i
?
?
Y
?
?
2
Y
y
i
< br>
2
2
?
?
y
i
?
n
Y
?
2
n
p>
Y
(because
?
y
i
?
n
< br>Y
)
2
2
2
?
?
y
i
?
n
Y
2
2
?
y
p>
'
y
?
n
Y
?
y
'
y
?
1
< br>/
n
y
'
J
y
(in your textbook,
monster look)
SSE = e'e
2
SSR
?
?
x
i
'
p>
b
?
Y
?
?
2
?
?
?
x
i
< br>'
b
?
?
Y
?
2
?
x
i
'
b
?
p>
Y
2
?
2
?
?
?
?
x
i
'
b
?
?
n
Y
?
2
Y
?
?
x
i
p>
'
b
?
2
?
?
2
?
?
?
?
?
?
x
'
b
?
?
?
n
Y
?
?
?<
/p>
?
x
'
b
?
?
?
n
Y
2
?
?
?
x
i
'
b
?
?
n
Y
?
2
Y
<
/p>
?
?
y
i
?
e
i
?
2
2
i
2
?
2
n
Y
(because
?
y<
/p>
i
?
n
Y
,
?
e
i
?
0
, as
always)
2
2
2
i
?
b'
X'
Xb
?
n
Y
?
b'
< br>X'
y
?
1
/
n
y
'
J
y
(in your textbook,
monster look)
IV. ANOVA
Table
SOURCE
Regression
Error
Total
Let us use
a real example. Assume that we have a regression
estimated to be
y = - 1.70 + 0.840 x
ANOVA Table
SOURCE
Regression
Error
Total
We know
SS
6.44
3.40
9.84
DF
1
18
19
MS
6.44
0.19
F
with
6.44/0.19=33.89
1, 18
SS
SSR
SSE
SST
DF
DF(R)
DF(E)
DF(T)
MS
MSR
MSE
F
MSR/MSE
with
DF(R)
DF(E)
2
?
x
i
?
100
,
?
y
i
?
50
,
?
x
i
?
5
09
.
12
,
?
y
i
?
1
34
.
84
,
?
x
i
y
i
?
257
.
6
6
. If we
2
2
know that DF
for SST=19, what is n?
n= 20
Y
?
50
/
20
?
2
.
5
SST
?
?
y
i
?
n
p>
Y
?
134
.
p>
84
?
20
?
p>
2
.
5
?
2
.
5
?
9
.
84
2
2
SSR
?
?
?
-
1.7
?
0.84x
i
?
?
125
.
0
2
?
?
1
.
p>
7
?
1
.
7
?
0
.
84
?
0
.
84
x
i
?
< br>2
?
1
.
7
?
0
.
8
4
?
x
i
?<
/p>
125
.
0
<
/p>
2
?
?
?
?
= 20
?
1.7
?
1.7+0.84
?
0.84
?
509.12-2
?
1.7
?
0.84
< br>?
100- 125.0
= 6.44
SSE =
SST-SSR=9.84-6.44=3.40
DF (Degrees of
freedom): demonstration. Note: discounting the
intercept when calculating
SST.
MS = SS/DF
p = 0.000 [ask
students]. What does the p-value say?
V. F-Tests
F-tests are more general than t-tests,
t-tests can be seen as a special case of F-tests.
If you have difficulty with F-tests,
please ask your GSIs to review F-tests in the lab.
F-tests
takes the form of a fraction of
two MS's.
F
df1
,
p>
df2
?
MSR/MSE
< br>
An F statistic has two degrees of
freedom associated with it: the degree of freedom
in
the numerator, and the degree of
freedom in the denominator.
An F
statistic is usually larger than 1. The
interpretation of an F statistics is that
whether the explained variance by the
alternative hypothesis is due to chance. In other
words,
the null hypothesis is that the
explained variance is due to chance, or all the
coefficients are
zero.
The
larger an F-statistic, the more likely that the
null hypothesis is not true. There is a
table in the back of your book from
which you can find exact probability values.
In our example, the F is 34, which is
highly significant.
VI.
R
2
R
2
= SSR / SST
The proportion of variance explained by
the model.
In our example,
R-sq = 65.4%
VII. What happens if we increase more
independent variables.
1. SST stays the
same.
2. SSR always increases.
3. SSE always decreases.
4.
R
2
always increases.
5. MSR usually increases.
6.
MSE usually decreases.
-
-
-
-
-
-
-
-
-
上一篇:SAT常见数学词汇汇总
下一篇:二轮复习高考英语 阅读理解试题(含答案)