北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿_高中生题库网|高考真题|高考试题-「密云二中」

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿

作者：高考题库网

来源：https://www.bjmy2z.cn/gaokao

2021-02-13 00:06

tags:

-

2021年2月13日发(作者：关系副词)

Class 5: ANOVA (Analysis of Variance) and F-tests

I. What is ANOVA

What is ANOVA? ANOVA is the short name for the Analysis of Variance. The essence

of ANOVA is to decompose the total variance of the dependent variable into two additive

components, one for the structural part, and the other for the stochastic part, of a regression.

Today we are going to examine the easiest case.

II. ANOVA: An Introduction

Let the model be

Assuming x

is a column vector (of length p) of independent variable values for the

th'

observation,

i

.

Then

is the predicted value.

sum of squares total:

SST

?

'

x

because

b

?

?

2

i

i

i

i

i

This is always true by OLS.

= SSE + SSR

Important: the total variance of the dependent variable is decomposed into two additive parts:

SSE, which is due to errors, and SSR, which is due to regression.

Geometric interpretation: [

blackboard

]

Decomposition of Variance

If we treat X as a random variable, we can decompose total variance to the between-group

portion and the within-group portion in any population:

V

?

y

i

?

?

V

?

x

i

'

?

?

?

V

?

?

i

?

Prove:

V

?

y

i

?

?

V

x

i

'

?

?

?

i
 

?

V

x

i

'

?

?

V

?

?

i

?

?

2

Cov

x

i

'

?

,

?

i

i

i

?

?

?
 ?

?

V

?

x

'

?

?

?

V

?

?

?

?

?

(by the assumption that

Cov

x

k

'

?

,

?

?

0

, for all possible k.)

The ANOVA table is to estimate the three quantities of equation (1) from the sample.

As the sample size gets larger and larger, the ANOVA table will approach the equation closer

and closer.

In a sample, decomposition of estimated variance is not strictly true. We thus need to

separately decompose sums of squares and degrees of freedom. Is ANOVA a misnomer?

III. ANOVA in Matrix

I will try to give a simplied representation of ANOVA as follows:

?

?

SST

?

?

y

i

?

Y

?

?

y

i

?

Y

?
 2

Y

y

i

2

?

?

2

?

2

?

?

?

y

i

?

?

Y

?

?

2

Y

y

i
 

2

2

?

?

y

i

?

n

Y

?

2

n

Y

(because

?

y

i

?

n
 Y

)

2

2
2

?

?

y

i

?

n

Y

2

2

?

y

'

y

?

n

Y

?

y

'

y

?

1
 /

n

y

'
J

y

(in your textbook, monster look)

SSE = e'e

2

SSR

?

?

x

i

'

b

?

Y

?

?

2

?

?

?

x

i
 '

b

?

?

Y

?

2

?

x

i

'

b

?

Y

2

?

2

?

?

?

?

x

i

'

b

?

?

n
Y

?

2

Y

?

?

x

i

'

b

?

2

?

?

2

?

?

?

?

?

?

x

'
b

?

?

?

n

Y

?

?

?

?

x

'

b

?

?

?

n

Y

2

?

?

?

x

i

'
b

?

?

n

Y

?

2

Y



?

?

y

i

?

e

i

?


2

2

i

2

?

2

n
Y

(because

?

y

i

?

n

Y

,

?

e

i

?

0

, as always)

2

2

2

i

?

b'

X'

Xb

?

n

Y

?

b'
 X'

y

?

1
/

n

y

'

J

y

(in your textbook, monster look)

IV. ANOVA Table

SOURCE

Regression

Error

Total

Let us use a real example. Assume that we have a regression estimated to be

y = - 1.70 + 0.840 x

ANOVA Table

SOURCE

Regression

Error

Total

We know

SS

6.44

3.40

9.84

DF

1

18

19

MS

6.44

0.19

F

with

6.44/0.19=33.89

1, 18

SS

SSR

SSE

SST

DF

DF(R)

DF(E)

DF(T)

MS

MSR

MSE

F

MSR/MSE

with

DF(R)

DF(E)

2
?

x

i

?

100

,

?

y

i

?

50

,

?

x

i

?

5 09

.

12

,

?

y

i

?

1 34

.

84

,

?

x

i

y

i

?

257

.

6 6

. If we

2

2

know that DF for SST=19, what is n?

n= 20

Y

?

50

/

20

?

2

.

5

SST

?

?

y

i

?

n

Y

?

134

.

84

?

20

?

2

.

5

?

2

.

5

?

9

.

84

2

2

SSR

?

?

?

-

1.7

?

0.84x

i

?

?

125

.

0


2

?

?

1

.

7

?

1

.

7

?

0

.

84

?

0

.

84

x

i

?
 2

?

1

.

7

?

0

.

8 4

?

x

i

?

125

.

0



2

?

?

?

?

= 20

?

1.7

?

1.7+0.84

?

0.84

?

509.12-2

?

1.7

?

0.84
 ?

100- 125.0

= 6.44

SSE = SST-SSR=9.84-6.44=3.40

DF (Degrees of freedom): demonstration. Note: discounting the intercept when calculating

SST.

MS = SS/DF

p = 0.000 [ask students]. What does the p-value say?

V. F-Tests

F-tests are more general than t-tests, t-tests can be seen as a special case of F-tests.

If you have difficulty with F-tests, please ask your GSIs to review F-tests in the lab. F-tests

takes the form of a fraction of two MS's.

F

df1

,

df2

?

MSR/MSE
 

An F statistic has two degrees of freedom associated with it: the degree of freedom in

the numerator, and the degree of freedom in the denominator.

An F statistic is usually larger than 1. The interpretation of an F statistics is that

whether the explained variance by the alternative hypothesis is due to chance. In other words,

the null hypothesis is that the explained variance is due to chance, or all the coefficients are

zero.

The larger an F-statistic, the more likely that the null hypothesis is not true. There is a

table in the back of your book from which you can find exact probability values.

In our example, the F is 34, which is highly significant.

VI. R

2

R

2

= SSR / SST

The proportion of variance explained by the model.

In our example,

R-sq = 65.4%

VII. What happens if we increase more independent variables.

1. SST stays the same.

2. SSR always increases.

3. SSE always decreases.

4. R

2

always increases.

5. MSR usually increases.

6. MSE usually decreases.

-

-

-

-

-

-

-

-

本文更新与2021-02-13 00:06，由作者提供，不代表本网站立场，转载请注明出处：https://www.bjmy2z.cn/gaokao/646719.html

返回列表：英语

上一篇：SAT常见数学词汇汇总
下一篇：二轮复习高考英语阅读理解试题(含答案)

当前您在：主页 > 英语 >

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿

-

-

-

-

-

-

-

-

-

返回列表：英语

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿的相关文章

爱心与尊严的高中作文题库

爱心与尊严高中作文题库

爱心与尊重的作文题库

爱心责任100字作文题库

爱心责任心的作文题库

爱心责任作文题库

爱心长在作文题库

爱心中国感恩励志作文题

爱心助考作文题库

爱心助农作文题库

爱心尊重宽容拒绝作文题

爱心尊重作文题库

爱心作文题库好段

爱心作文题库120字

爱心作文题库读者

爱心作文题库分论点

爱心作文题库简短

爱心作文有哪些题库

爱需要被尊重作文题库

爱需要传递200字作文题库

爱需要公平作文题库

爱需要行动作文800高中作

爱需要行动作文题库

爱需要交流与沟通作文题

当前您在： 主页 > 英语 >

-

-

-

-

-

-

-

-

-

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿的相关文章

当前您在：主页 > 英语 >