关键词不能为空

当前您在: 主页 > 英语 >

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-13 00:06
tags:

-

2021年2月13日发(作者:关系副词)


Class 5: ANOVA (Analysis of Variance) and F-tests



I. What is ANOVA


What is ANOVA? ANOVA is the short name for the Analysis of Variance. The essence


of ANOVA is to decompose the total variance of the dependent variable into two additive


components, one for the structural part, and the other for the stochastic part, of a regression.


Today we are going to examine the easiest case.



II. ANOVA: An Introduction


Let the model be


y


?


X



?


?


?


.


Assuming x


i


is a column vector (of length p) of independent variable values for the


i


th'


observation,


y


i< /p>


?


x


i


'


?


?


?


i

< p>
.


Then


x


i


'


b


is the predicted value.


sum of squares total:


SST


?


?


y


i


?


Y


< br>?


?


y


i


?


x


i


'


b


?


x


i


'


b


?


Y



?


?


?


y


i


?


x


i

< br>'


b


?


?


?


x


i


'


b


?


Y


?


2



?


?


y


i


?


x


i


'


b


?


< br>x


i


'


b


-


Y



2


?


?


2


?


?


2


?


?


2


?


?


?


?


because


?


?


y


?


x


'


b< /p>


?


?


x


'


b


?


Y


?

< p>
?


?


e


?


x


'


b


?

Y


?


?


0


.


?


?


?


e


i


?


?


?< /p>


x


i


'


b


?


Y



2

< p>
2


i


i


i


i


i


This is always true by OLS.


= SSE + SSR



Important: the total variance of the dependent variable is decomposed into two additive parts:


SSE, which is due to errors, and SSR, which is due to regression.


Geometric interpretation: [


blackboard


]



Decomposition of Variance


If we treat X as a random variable, we can decompose total variance to the between-group


portion and the within-group portion in any population:



V


?


y


i


?


?


V


?


x


i


'


?


?


?


V


?


?


i


?



Prove:





V


?


y


i


?


?


V


x


i


'


?


?


?


i

< br>


?


V


x


i


'


?


?


V


?


?


i


?


?


2


Cov


x


i


'


?


,


?


i



i


i


?


?


?

< br>?


?


V


?


x


'


?


?


?


V


?


?


?



?


?


(by the assumption that


Cov


x


k


'


?


,



?


?


0


, for all possible k.)


The ANOVA table is to estimate the three quantities of equation (1) from the sample.


As the sample size gets larger and larger, the ANOVA table will approach the equation closer


and closer.


In a sample, decomposition of estimated variance is not strictly true. We thus need to


separately decompose sums of squares and degrees of freedom. Is ANOVA a misnomer?



III. ANOVA in Matrix


I will try to give a simplied representation of ANOVA as follows:


?


?


SST


?


?


y


i


?


Y



?


?


y


i


?


Y


?

< br>2


Y


y


i



2


?


?


2


?


2


?


?


?


y


i


?


?


Y


?


?


2


Y


y


i

< br>


2


2


?


?


y


i


?


n


Y


?


2


n


Y


(because


?


y


i


?


n

< br>Y


)


2


2

2


?


?


y


i


?


n


Y



2


2


?


y


'


y


?


n


Y



?


y


'


y


?


1

< br>/


n


y


'

J


y


(in your textbook, monster look)



SSE = e'e



2


SSR


?


?


x


i


'


b


?


Y



?


?


2


?


?


?


x


i

< br>'


b


?


?


Y


?


2


?


x


i


'


b


?


Y



2


?


2


?


?


?

< p>
?


x


i


'


b


?


?


n

Y


?


2


Y



?


?


x


i


'


b


?



2


?


?


2

< p>
?


?


?


?


?


?


x


'

b


?


?


?


n


Y


?


?


?< /p>


?


x


'


b


?


?


?


n

< p>
Y


2


?


?


?


x


i


'

b


?


?


n


Y


?


2


Y


< /p>


?


?


y


i


?


e


i


?

< p>


2


2


i


2


?


2


n

Y


(because


?


y< /p>


i


?


n


Y


,



?


e

< p>
i


?


0


, as always)



2


2


2


i


?


b'


X'


Xb


?


n


Y



?


b'

< br>X'


y


?


1

/


n


y


'


J


y


(in your textbook, monster look)



IV. ANOVA Table



SOURCE


Regression


Error


Total



Let us use a real example. Assume that we have a regression estimated to be


y = - 1.70 + 0.840 x



ANOVA Table


SOURCE


Regression


Error


Total



We know


SS


6.44


3.40


9.84


DF


1


18


19


MS


6.44


0.19



F




with








6.44/0.19=33.89


1, 18


SS


SSR


SSE


SST


DF


DF(R)


DF(E)


DF(T)


MS


MSR


MSE



F


MSR/MSE




with


DF(R)


DF(E)



2

?


x


i


?


100


,


?


y


i


?


50


,


?


x


i


?


5 09


.


12


,


?


y


i


?


1 34


.


84


,


?


x


i


y


i


?


257


.


6 6


. If we


2


2


know that DF for SST=19, what is n?


n= 20


Y


?


50


/


20


?


2


.


5



SST


?


?


y


i


?


n


Y


?


134


.


84


?


20


?


2


.


5


?


2


.


5


?


9


.


84



2


2


SSR


?


?


?


-


1.7


?


0.84x


i


?

< p>
?


125


.


0

< p>


2




?


?


1


.


7


?


1


.


7


?


0


.


84


?


0


.


84


x


i


?

< br>2


?


1


.


7


?


0


.


8 4


?


x


i


?< /p>


125


.


0


< /p>


2


?


?


?


?


= 20


?


1.7


?


1.7+0.84


?


0.84


?


509.12-2


?


1.7


?


0.84

< br>?


100- 125.0




= 6.44


SSE = SST-SSR=9.84-6.44=3.40


DF (Degrees of freedom): demonstration. Note: discounting the intercept when calculating


SST.


MS = SS/DF


p = 0.000 [ask students]. What does the p-value say?



V. F-Tests


F-tests are more general than t-tests, t-tests can be seen as a special case of F-tests.


If you have difficulty with F-tests, please ask your GSIs to review F-tests in the lab. F-tests


takes the form of a fraction of two MS's.


F


df1


,


df2


?


MSR/MSE

< br>


An F statistic has two degrees of freedom associated with it: the degree of freedom in


the numerator, and the degree of freedom in the denominator.


An F statistic is usually larger than 1. The interpretation of an F statistics is that


whether the explained variance by the alternative hypothesis is due to chance. In other words,


the null hypothesis is that the explained variance is due to chance, or all the coefficients are


zero.


The larger an F-statistic, the more likely that the null hypothesis is not true. There is a


table in the back of your book from which you can find exact probability values.


In our example, the F is 34, which is highly significant.


VI. R


2



R


2


= SSR / SST


The proportion of variance explained by the model.


In our example,


R-sq = 65.4%



VII. What happens if we increase more independent variables.


1. SST stays the same.


2. SSR always increases.


3. SSE always decreases.


4. R


2


always increases.


5. MSR usually increases.


6. MSE usually decreases.

-


-


-


-


-


-


-


-



本文更新与2021-02-13 00:06,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/646719.html

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿的相关文章

  • 爱心与尊严的高中作文题库

    1.关于爱心和尊严的作文八百字 我们不必怀疑富翁的捐助,毕竟普施爱心,善莫大焉,它是一 种美;我们也不必指责苛求受捐者的冷漠的拒绝,因为人总是有尊 严的,这也是一种美。

    小学作文
  • 爱心与尊严高中作文题库

    1.关于爱心和尊严的作文八百字 我们不必怀疑富翁的捐助,毕竟普施爱心,善莫大焉,它是一 种美;我们也不必指责苛求受捐者的冷漠的拒绝,因为人总是有尊 严的,这也是一种美。

    小学作文
  • 爱心与尊重的作文题库

    1.作文关爱与尊重议论文 如果说没有爱就没有教育的话,那么离开了尊重同样也谈不上教育。 因为每一位孩子都渴望得到他人的尊重,尤其是教师的尊重。可是在现实生活中,不时会有

    小学作文
  • 爱心责任100字作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
  • 爱心责任心的作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
  • 爱心责任作文题库

    1.有关爱心,坚持,责任的作文题库各三个 一则150字左右 (要事例) “胜不骄,败不馁”这句话我常听外婆说起。 这句名言的意思是说胜利了抄不骄傲,失败了不气馁。我真正体会到它

    小学作文
北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿随机文章