关键词不能为空

当前您在: 主页 > 英语 >

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-13 00:06
tags:

-

2021年2月13日发(作者:关系副词)


Class 5: ANOVA (Analysis of Variance) and F-tests



I. What is ANOVA


What is ANOVA? ANOVA is the short name for the Analysis of Variance. The essence


of ANOVA is to decompose the total variance of the dependent variable into two additive


components, one for the structural part, and the other for the stochastic part, of a regression.


Today we are going to examine the easiest case.



II. ANOVA: An Introduction


Let the model be


y


?


X



?


?


?


.


Assuming x


i


is a column vector (of length p) of independent variable values for the


i


th'


observation,


y


i< /p>


?


x


i


'


?


?


?


i

< p>
.


Then


x


i


'


b


is the predicted value.


sum of squares total:


SST


?


?


y


i


?


Y


< br>?


?


y


i


?


x


i


'


b


?


x


i


'


b


?


Y



?


?


?


y


i


?


x


i

< br>'


b


?


?


?


x


i


'


b


?


Y


?


2



?


?


y


i


?


x


i


'


b


?


< br>x


i


'


b


-


Y



2


?


?


2


?


?


2


?


?


2


?


?


?


?


because


?


?


y


?


x


'


b< /p>


?


?


x


'


b


?


Y


?

< p>
?


?


e


?


x


'


b


?

Y


?


?


0


.


?


?


?


e


i


?


?


?< /p>


x


i


'


b


?


Y



2

< p>
2


i


i


i


i


i


This is always true by OLS.


= SSE + SSR



Important: the total variance of the dependent variable is decomposed into two additive parts:


SSE, which is due to errors, and SSR, which is due to regression.


Geometric interpretation: [


blackboard


]



Decomposition of Variance


If we treat X as a random variable, we can decompose total variance to the between-group


portion and the within-group portion in any population:



V


?


y


i


?


?


V


?


x


i


'


?


?


?


V


?


?


i


?



Prove:





V


?


y


i


?


?


V


x


i


'


?


?


?


i

< br>


?


V


x


i


'


?


?


V


?


?


i


?


?


2


Cov


x


i


'


?


,


?


i



i


i


?


?


?

< br>?


?


V


?


x


'


?


?


?


V


?


?


?



?


?


(by the assumption that


Cov


x


k


'


?


,



?


?


0


, for all possible k.)


The ANOVA table is to estimate the three quantities of equation (1) from the sample.


As the sample size gets larger and larger, the ANOVA table will approach the equation closer


and closer.


In a sample, decomposition of estimated variance is not strictly true. We thus need to


separately decompose sums of squares and degrees of freedom. Is ANOVA a misnomer?



III. ANOVA in Matrix


I will try to give a simplied representation of ANOVA as follows:


?


?


SST


?


?


y


i


?


Y



?


?


y


i


?


Y


?

< br>2


Y


y


i



2


?


?


2


?


2


?


?


?


y


i


?


?


Y


?


?


2


Y


y


i

< br>


2


2


?


?


y


i


?


n


Y


?


2


n


Y


(because


?


y


i


?


n

< br>Y


)


2


2

2


?


?


y


i


?


n


Y



2


2


?


y


'


y


?


n


Y



?


y


'


y


?


1

< br>/


n


y


'

J


y


(in your textbook, monster look)



SSE = e'e



2


SSR


?


?


x


i


'


b


?


Y



?


?


2


?


?


?


x


i

< br>'


b


?


?


Y


?


2


?


x


i


'


b


?


Y



2


?


2


?


?


?

< p>
?


x


i


'


b


?


?


n

Y


?


2


Y



?


?


x


i


'


b


?



2


?


?


2

< p>
?


?


?


?


?


?


x


'

b


?


?


?


n


Y


?


?


?< /p>


?


x


'


b


?


?


?


n

< p>
Y


2


?


?


?


x


i


'

b


?


?


n


Y


?


2


Y


< /p>


?


?


y


i


?


e


i


?

< p>


2


2


i


2


?


2


n

Y


(because


?


y< /p>


i


?


n


Y


,



?


e

< p>
i


?


0


, as always)



2


2


2


i


?


b'


X'


Xb


?


n


Y



?


b'

< br>X'


y


?


1

/


n


y


'


J


y


(in your textbook, monster look)



IV. ANOVA Table



SOURCE


Regression


Error


Total



Let us use a real example. Assume that we have a regression estimated to be


y = - 1.70 + 0.840 x



ANOVA Table


SOURCE


Regression


Error


Total



We know


SS


6.44


3.40


9.84


DF


1


18


19


MS


6.44


0.19



F




with








6.44/0.19=33.89


1, 18


SS


SSR


SSE


SST


DF


DF(R)


DF(E)


DF(T)


MS


MSR


MSE



F


MSR/MSE




with


DF(R)


DF(E)



2

?


x


i


?


100


,


?


y


i


?


50


,


?


x


i


?


5 09


.


12


,


?


y


i


?


1 34


.


84


,


?


x


i


y


i


?


257


.


6 6


. If we


2


2


know that DF for SST=19, what is n?


n= 20


Y


?


50


/


20


?


2


.


5



SST


?


?


y


i


?


n


Y


?


134


.


84


?


20


?


2


.


5


?


2


.


5


?


9


.


84



2


2


SSR


?


?


?


-


1.7


?


0.84x


i


?

< p>
?


125


.


0

< p>


2




?


?


1


.


7


?


1


.


7


?


0


.


84


?


0


.


84


x


i


?

< br>2


?


1


.


7


?


0


.


8 4


?


x


i


?< /p>


125


.


0


< /p>


2


?


?


?


?


= 20


?


1.7


?


1.7+0.84


?


0.84


?


509.12-2


?


1.7


?


0.84

< br>?


100- 125.0




= 6.44


SSE = SST-SSR=9.84-6.44=3.40


DF (Degrees of freedom): demonstration. Note: discounting the intercept when calculating


SST.


MS = SS/DF


p = 0.000 [ask students]. What does the p-value say?



V. F-Tests


F-tests are more general than t-tests, t-tests can be seen as a special case of F-tests.


If you have difficulty with F-tests, please ask your GSIs to review F-tests in the lab. F-tests


takes the form of a fraction of two MS's.


F


df1


,


df2


?


MSR/MSE

< br>


An F statistic has two degrees of freedom associated with it: the degree of freedom in


the numerator, and the degree of freedom in the denominator.


An F statistic is usually larger than 1. The interpretation of an F statistics is that


whether the explained variance by the alternative hypothesis is due to chance. In other words,


the null hypothesis is that the explained variance is due to chance, or all the coefficients are


zero.


The larger an F-statistic, the more likely that the null hypothesis is not true. There is a


table in the back of your book from which you can find exact probability values.


In our example, the F is 34, which is highly significant.


VI. R


2



R


2


= SSR / SST


The proportion of variance explained by the model.


In our example,


R-sq = 65.4%



VII. What happens if we increase more independent variables.


1. SST stays the same.


2. SSR always increases.


3. SSE always decreases.


4. R


2


always increases.


5. MSR usually increases.


6. MSE usually decreases.

-


-


-


-


-


-


-


-



本文更新与2021-02-13 00:06,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/646719.html

北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿的相关文章

  • 余华爱情经典语录,余华爱情句子

    余华的经典语录——余华《第七天》40、我不怕死,一点都不怕,只怕再也不能看见你——余华《第七天》4可是我再也没遇到一个像福贵这样令我难忘的人了,对自己的经历如此清楚,

    语文
  • 心情低落的图片压抑,心情低落的图片发朋友圈

    心情压抑的图片(心太累没人理解的说说带图片)1、有时候很想找个人倾诉一下,却又不知从何说起,最终是什么也不说,只想快点睡过去,告诉自己,明天就好了。有时候,突然会觉得

    语文
  • 经典古训100句图片大全,古训名言警句

    古代经典励志名言100句译:好的药物味苦但对治病有利;忠言劝诫的话听起来不顺耳却对人的行为有利。3良言一句三冬暖,恶语伤人六月寒。喷泉的高度不会超过它的源头;一个人的事

    语文
  • 关于青春奋斗的名人名言鲁迅,关于青年奋斗的名言鲁迅

    鲁迅名言名句大全励志1、世上本没有路,走的人多了自然便成了路。下面是我整理的鲁迅先生的名言名句大全,希望对你有所帮助!当生存时,还是将遭践踏,将遭删刈,直至于死亡而

    语文
  • 三国群英单机版手游礼包码,三国群英手机单机版攻略

    三国群英传7五神兽洞有什么用那是多一个武将技能。青龙飞升召唤出东方的守护兽,神兽之一的青龙。玄武怒流召唤出北方的守护兽,神兽之一的玄武。白虎傲啸召唤出西方的守护兽,

    语文
  • 不收费的情感挽回专家电话,情感挽回免费咨询

    免费的情感挽回机构(揭秘情感挽回机构骗局)1、牛牛(化名)向上海市公安局金山分局报案,称自己为了挽回与女友的感情,被一家名为“实花教育咨询”的情感咨询机构诈骗4万余元。

    语文
北大暑期课程《回归分析》(Linear-Regression-Analysis)讲义PKU5教学文稿随机文章