-
n
?
2
?
?
2.10
(iii)
From
(2.57),
Var(
?
1
)
=
?
/
?<
/p>
?
(
x
i
?
x
)
2
?
.
由提示
:
:
?
i
?
1
p>
?
n
n
2
i
?
x
i
?
1
?
n
?
p>
(
x
i
?
1
i
i
2
?
x
)
,
and so
V
ar(
?
?
1
)
?
Var(
?
?
1
)
.
A
n
more direct way to see this
is to write(
一个更直接的方式看到这是编写
)<
/p>
n
?
(
x
i
?
1
?
x
)
=
2
?
x
i
?
1
2
i
2
?
n
(
x
)
, which is less
than
?
x
i
?
1
2
i
unless
x
= 0.
?
)
的相关性也增加
.
?
小时
?
?
的偏差也小
.
因此
,
在均方误
差的
(iv)
给定的
c
x
i
2
但随着
x
的增加
,
?
?
1
的方差与
V
ar(
?
1
1
0
基础上不管我们选择
?
0
还是
?
?
1
要取决于
?
0
,
x
,
和
n
的大小
(
除了
n<
/p>
?
x
i
?
1
2
i
的大小
).
3.7
We can use
Table 3.2.
By definition,
?
2
> 0, and by
assumption, Corr(
x
1
,
x
2
) < 0.
Therefore, there is a
negative bias
in
?
?
1
:
E(
?
?<
/p>
1
) <
?
1
.
This means that, on average across
different random samples, the simple
regression estimator underestimates the
effect of the training program.
It is even possible that
E(
?
?
1
)
is
negative even though
?
1
>
0.
我们可以使用表
3.2
。根据定义,
>
0
,
由假设,科尔(
X1
,
X2
)
<0
。因此,有一个负
偏压为:
E
()
<
。这意味着,平均在不同的随机抽样,简单的回归估计低估的培训计划的效果。
E
(下),它甚至可
能是负的,
即使
p>
>0
。
我们可以
使用表格
3.2
。
根据定义
,>
0,
通过假设
,<
/p>
柯尔
(x1,x2)<
0
。
因此
,
有一种负面的偏见
:E()<
。
这意味着
,
平均跨不同的随机样本
,
简
单的回归估计低估了培训项目的效果。甚至可能让
E()
是负的
,
尽管
>
0
。
3.8
Only (ii), omitting an important variable, can
cause bias, and this is true only when the omitted
variable is
correlated with the
included explanatory variables.
The homoskedasticity assumption, MLR.5,
played no role
in showing that the OLS
estimators are unbiased.
(Homoskedasticity was used to obtain
the usual variance
?
.)
Further, the degree of
collinearity between the explanatory variables in
the sample,
formulas for the
?
j
even if it is
reflected in a correlation as high as .95, does
not affect the Gauss-Markov assumptions.
Only if
there is
a
perfect
linear
relationship among two or more explanatory
variables is MLR.3 violated.
只有
< br>3.8(ii),
遗漏重要变量
,
会造成偏见确实是这样
,
只有当省略变量就与包括解释变量。
homoskedasticity
的假设
,
多元线性回归。
5,
没有发挥作
用在显示
OLS
估计量是公正的。
(H
omoskedasticity
是用来获取通常的方差公式。
)
进一步
,
共线的程度解
释变量之间的样品中
,
即使它是反映在尽可能高的相
关性。
95
年
,
不影响的高斯
-
马尔可夫假定。
只要
有一个完美的线性
关系在两个或更多的解释变量是多元线性回归。三违反了。
3.9
(i)
Because
x
1
is
highly correlated with
x
2
and
x
3
, and these
latter variables have large partial effects
on
y
, the simple
and multiple regression coefficients on
x
1
can
differ by large amounts. We have not done this
case explicitly, but given equation
(3.46) and the discussion with a single omitted
variable, the intuition is
pretty
straightforward.
因为
< br>是高度相关
,
和这些后面的变量有很大部分影响
y,
简单和多元回归系数的差异可大量。
我们还
没有做到
,
这种情况下显式
,
但鉴于方程
(3.46)
和以讨论单个变量遗漏
,
直觉是相当简单的。
(ii) Here we would
expect
?
?
1
and
?
?
1
to be similar (subject, of course, to
what we mean by “almost uncorrelated”).
The amount
of
correlation between
x
2
and
x
3
does not directly effect the multiple
regression estimate on
x
1
if
x
1
is
x
2
essentially uncorrelated with
如果本质上是不相关的和。
and
x
3
.
这里我们将期待和相似
(
主题
,
当然对我们所说的“几乎不相关的”)。相关性的数量
,
但不会直接影响了多元回归估计
(iii) (iii) In this case we
are (unnecessarily) introducing multicollinearity
into the regression:
x
2
x
2
and
x
3
have small partial effects on
y
and yet
and
x
3
are
highly correlated
with
x
1
.
Adding
x
2
and
x
3
like
increases the standard
error of the coefficient on
x
1
substantially, so se(
?
?
1
) is likely to be
much larger than
se(
?
?
1
).
在这种情况下我们<
/p>
(
不必要的
)
引
入重合放入回归
:,
有微小的部分影响
,
但
y,
是高度相关的。添加和像增加
标准
错误的系数显著
,
所以
se()
可能会远远大于
se()
。
(iv) In this case,
adding
x
2
and
x
3
will decrease the residual variance
without causing much collinearity
x
2
(because
x
1
is
almost uncorrelated with
of correlation
between
x
2
and
x
3
), so
we should see se(
?
?
1
) smaller than se(
?
?
1
). The amount
and
x
3
does not directly affect
p>
se(
?
?
1
p>
).
在这种情况下
,
添加和将减少剩余方差
,
也没
有引起
共线
(
因为几乎是不相关的
,),
p>
所以我们应该看到
se()
小于
se()
。相关性的数量
,
但不会直接影响
se()
。
3.11
(i)
?
1
< 0 because more pollution can be expected to
lower housing values; note that
?
1
is
the elasticity
of
price
with respect to
nox
.
?
2
is probably positive because
rooms
roughly measures the
size of a house.
(However, it does not
allow us to distinguish homes where each room is
large from homes where each room is
small.) < 0,
因为更多的污染可以预期较低的房屋
价值
;
注意
,
价格弹性对氮氧化物。可能是积极的因为房间粗略地度量
大小的房子。
< br>(
然而
,
不允许我们自己去辨别
的家中
,
每个房间都是大从家中
,
p>
每个房间小。
)
(ii) If we assume that
rooms
increases with quality
of the home, then log(
nox
)
and
rooms
are negatively
correlated when poorer neighborhoods
have more pollution, something that is often true.
We can use Table 3.2
to determine the
direction of the bias. If
?
2
> 0 and Corr(<
/p>
x
1
,
x
2
) < 0, the simple regression
estimator
?
?
1
has a
downward bias. But
because
?
1
< 0,
this means that the simple regression, on average,
overstates the
importance of pollution.
[E(
?
?
1
) is more negative than
?
1
.]
如果我们假设房间随质量的家里
,
然后日志
(nox)
和房间
< br>反比当没那么富裕的社区有更多的污染
,
这往往是正确的
。我们可以使用表
3.2
来确定方向的偏见。如果
> 0
和柯尔
(x1,x2)<
0,
那么简单的
(iii) This is what we expect from the
typical sample based on our analysis in part (ii).
The simple
regression estimate,
?
1.043, is more negative
(larger in magnitude) than the multiple regression
estimate,
?
0.718. As those
estimates are only for one sample, we can never
know which is closer
to
?
1
. But if this
is a
“typical”
sample,
?
1
is
closer to
?
0.718.
这是我们期待的东西从典型的示例基于我们的分析部分
(ii)
。
简单的回归估
计
,?1.043,
是更多的负面
(
大级
< br>)
比多元回归估计
,?0.718
。作为这些估计仅供一个样品
,
我们永远也不会知道
,
更靠近。
但是如果这是一个“典型”的示例<
/p>
,
接近
?0.718
6.4
(i) The answer is not
entire obvious, but one must properly interpret
the coefficient on
alcohol
in either case.
If we
include
attend
, then we are
measuring the effect of alcohol consumption on
college GPA, holding
attendance fixed.
Because attendance is
likely to be an important mechanism through which
drinking affects
performance, we
probably do not want to hold it fixed in the
analysis.
If we do include
attend
, then we
interpret the estimate of
?
alcohol
as being
those effects on
colGPA
that
are not due to attending class.
(For
example, we could be
measuring the effects that drinking alcohol has on
study time.)
To get a total
effect of
alcohol consumption, we would
leave
attend
out.
答案并不完全是显而易见的
,
但你必须正确解析系数酒精在这
两
种情况下。
如果我们包括参加
p>
,
那么我们正在测量效果的酒精消费对大学
GPA,
持有出席固定。
因为出勤率可能是一个重
要的机制
,
通过这种机制
,
饮酒会影响性能
,
我们可能不想把它固
定在分析。如果我们确实包括参加
,
然后我们把这些影
响的估计是在
colGPA,
不是由于
atten
(ii) We would
want to include
SAT
and
hsGPA
as controls, as these
measure student abilities and motivation.
Drinking behavior in
college could be correlated with one’s performance
in high school and on standardized
tests.
Other
factors, such as family background, would also be
good controls.
我们想要包括
SAT
和
hsGPA
作为对照
组
,
这些测量学生的能力和动力。
在大
学的饮酒行为可以与一个人的表现在高中和
标准化考试。其他因素
,
如家庭背景
,
也将是良好的控制。
6.6
The second
equation is clearly preferred, as its adjusted
R
-squared is notably larger
than that
in the other two equations.
The second equation contains the same number of
estimated
parameters as the first, and
the one fewer than the third. The second equation
is also easier to
interpret
than the third.
第二个方程显然是首选的
,
因为它是大调整平方比其他两个方程。
第二个等式包含相同数量
的
估计参数作为第一个
,
和一个少于第
三。第二个方程也更容易解释第三。
7.3
(i) The
t
statistic on
hsize
2
is over
four in absolute value, so there is very strong
evidence that it belongs in
?
t
the
equation.
We obtain this by
finding the turnaround point;
this is the value of
hsize
that maximizes
sa
(other things fixed):
19.3/(2
?
2.19)
?
4.41.
Because
hsize
is
measured in hundreds, the optimal size of
graduating class is about 441.
在
hsize2
t
< br>统计超过
4
在绝对价值
,
所以有非常有力的证据
,
它是属于方程。我们
通过
发现获得这样的转变点
,
这是
p>
hsize
的价值最大化
(
其他东西固定
):19.3 /(2 2.19)?4.41
< br>。因为
hsize
是以数百
,<
/p>
最佳
的毕业生的人数大约是
441
。
(ii)
This is given by the coefficient on
female
(since
black
= 0):
nonblack females have SAT scores about
45 points lower than nonblack males.
The
t
statistic is about
–
10.51,
so the difference is very statistically
significant.
(The very large sample size certainly
contributes to the statistical significance.)
这是当系数对妇女
(
因为黑色
< br>= 0):
非黑人女性
45
点<
/p>
SAT
分数低于非黑人男性。
t
统计大约
-10.51,
所以差异非常显著。<
/p>
(
非常大的样本量
肯定有助于统计意义。
)
(iii) Because
female
= 0, the coefficient
on
black
implies that a
black male has an estimated SAT score
almost 170 points less than a
comparable nonblack male.
The
t
statistic
is over 13 in absolute value, so we
easily reject the hypothesis that there
is no ceteris paribus difference.
因为女
= 0,
系数在黑色意味着一个
黑人男性
估计有近
170
点的
SAT
分数低于可比的非黑人男性。
t
统计是在
13
在绝对价值
,
所以我们很容易拒绝假说
,
没有其
他条
件不变时不同。
(iv)
We plug in
black
= 1,
female
= 1 for black
females and
black
= 0 and
female
= 1 for nonblack
females.
The
difference is therefore
–
169.81 + 62.31 =
?
107.50. Because the
estimate depends on two coefficients, we
cannot construct a
t
statistic from the
information given. The easiest approach is to
define dummy variables for
three of the
four race/gender categories and choose nonblack
females as the base group. We can then obtain the
t
statistic we want as the
coefficient on the black female dummy variable.
我们用黑色
=
1,
女
=
1
为黑人女性和黑
人
=
0
和女
= 1
非黑人女性。不同的是<
/p>
,
因此
?-169.81 +
62.31 = 107.50
。因为取决于两个系数的估计
,
我们不能构造
t
统计值从给出的信息。
最简单的方法是定义虚拟变量三四个种族
/
性别分类和选择非黑
人女性为基地组织。然后我们可
以得到我们想要的
t
统计系数的黑人女虚拟变量
7.4
(i) The
approximate difference is just the coefficient on
utility
times 100, or
–
28.3%.
The
t
statistic
is
?
0.283/.099
?
?
2.86, which is very
statistically significant.
近似的区别仅仅在于在公用
系数乘以
100,
或
-28.3%
p>
。
t
统计是
?
p>
0.283/.099
?
?
2.86,
这是非常显著的。
(ii) 100
?
[exp(
?
.283)
–
1)
?
?
24.7%, and so the estimate
is somewhat smaller in magnitude.
100[exp(?.283)-
1)??24.7%,
所以该估计是相对较小的大小。
(iii) The proportionate
difference is0.181
?
0.158 =
0.023, or about 2.3%.
One
equation that can be
estimated to
obtain the standard error of this difference
is
的比例差异是
0.181
?
0.158 = 0.023
年<
/p>
,
或约
2.3%.
一个
方程
,
可以获得标准错误估计的
差别是
log(
salary
)
=
?
0
+ <
/p>
?
1
log(
s
ales
) +
?
2
roe
+
?
1
consprod
+
?
2
utilit
y
+
?
3
t
rans
+
u
,
where
trans
is a
dummy variable for the transportation industry.
Now, the base group is
finance
, and so the
coefficient
?
1
directly measures the difference
between the consumer products and finance
industries, and we
-
-
-
-
-
-
-
-
-
上一篇:统计学专业名词中英对照
下一篇:公共管理英语各单元全文翻译