STATA实用学习笔记_高中生题库网|高考真题|高考试题-「密云二中」

-

2021年2月8日发(作者：jpf)

北京科技大学

STATA

应用

学习摘录

第一章

STATA

的基本操作

一、设置内存容

set mem 500m, perm

一、

显示输入内容

Display 1

Display

“

clive

”

二、

显示数据集结构

describe

Describe /d

三、

编辑

edit

Edit

四、

重命名变量

Rename var1 var2

五、

显示数据集内容

list/browse

List in 1

List in 2

/10

六、

数据导入

数据文件是文本类型（

.csv

）

、

insheet:

insheet

using

“

C:Documents

and

SettingsAdministrator

桌

面

”

, clear

、

内存为空时才可以导入数据集，否则会出现（

you must start with an empty dataset

）

（

）

清空内存中的所有变量：

.drop _all

（

）

导入语句后加入“

clear

”命令。

七、

保存文件

、

save

“

C:Documents and SettingsAdministrator

桌面

”

、

save

“

C:Documents and SettingsAdministrator

桌面

”

, replace

八、

打开及退出已存文件

use

、

.Use

文件路径及文件名

, clear

、

. Drop _all/.exit

九、

记录命令和输出结果（

log

）

、

开始建立记录文件：

log using

、

暂停记录文件：

log off

、

重新打开记录文件：

log on

、

关闭记录文件：

log close

十一、创建和保存程序文件：

（

doedit, do

）

、

打开程序编辑窗口：

doedit

、

写入命令

、

保存文件，

.do.

、

运行命令：

.do

程序文件路径及文件名

十二、多个数据集合并为一个数据集（变量和结构相同）纵向合并

append

insheet using

save

insheet using

append using

save

十三、横向合并，在原数据集基础上加上另外的变量

merge

、

insheet using

sort companyid yearend

save

describe

insheet using

sort companyid yearend

merge companyid yearend using

save

describe

、

_merge==1 obs. From master data

_merge==2 obs. From using data

_merge==3 obs. From both master and using data

十四、帮助文件：

help

、

. Help describe

十五、描述性统计量

、

summarize incorporationyear

单个

summarize incorporationyear-big6

连续多个

summarize _all or simply summarize

所有

、更详细的统计量

summarize incorporationyear, detail

、

centile

auditfees, centile(0(10)100)

centile

auditfees, centile(0(5)100)

、

tabulate

不同类型变量的频数和比例

tabulate

companytype

tabulate

companytype big6, column

按列计算百分比

tabulate

companytype big6, row

按行计算百分比

tab

companytype big6 if companytype<=3, row col

同时按行列和条件计算百分比

、

计算满足条件观测的个数

count if big6==1

count if big6==0 | big6==1

、按离散变量排序，对连续变量计算描述性统计量：

（

）

by companytype, sort: summarize

auditfees, detail

（

）

sort companytype

By companytype:summarize auditees

十六、转换变量

 、按公司类型将公开发行股票公司赋值为

，其他为

gen listed=0

replace listed=1 if

companytype==2

replace listed=1 if companytype==3

replace listed=1 if companytype==5

replace listed=. if

companytype==.

十七、产生新变量

gen

Generate newvar=

表达式

十八、数据类型

、数值型

Storage type

Bytes

byte

int

long

float

double

、字符型

Storage type

str1

str2

…

str80

Bytes

Max length (characters)

Min

-127

-32,767

-2,147,483,647

-1.7*1038

-8.9884656743*10307

Max

+100

+32,740

2,147,483,620

1.7*1036

8.9884656743*10308

、新建变量的过程中定义数据类型

gen str3 gender=

list gender in 1

/10

、变量所占字节过长

drop gender

gen str30 gender=

browse

describe gender

compress gender

、日期数据类型：

%d dates,

which is a count of the number of days elapsed since January 1, 1960

。

（

）

date(

日期变量

)

gen fye=date(yearend,

应根据前面日期的排列顺序而定，结果显示的

是距离

1960

年

月

日的天数

list

yearend fye in 1

/10

（

）日期格式化

（

显示

fye

变量为日期形式，但数值并未真正变动）

：

format fye %d

list

yearend fye in 1

/10

sum fye

（

）利用日期天数求对应的年、月、日

gen year=year(fye)

gen month=month(fye)

gen day=day(fye)

list

yearend fye year month day in 1

/10

（

4

）将三个分别表示年、月、日的变量合并为一个日期变量

?

drop fye

?

gen fye=mdy(month, day, year)

?

format fye %d

?

list

yearend fye in 1

/10

(5)

将一个数值型的时间数据（

20080131

）转变为
ST

可识别的时间数据

?

gen year=int(date/10000)

?

gen month=int((date-year*10000)/100)

?

gen day=date- year*10000-month*100

?

list date year month day in 1

/10

?

gen edate=mdy(month, day, year)

?

format edate %d

?

list edate date in 1

/10

十九、存贮统计量的内部变量

R

（

）

?

sum auditfees

?

gen meanadjaf= auditfees-r(mean)

?

list

meanadjaf in 1

/10

SUM

命令后常见的几种

R

（）值

r(N)

r(sum_w)

r(mean)

r(var)

Number of cases

Sum of weights

Arithmetic mean

Variance

r(sd)

r(min)

r(max)

r(sum)

Standard deviation

Minimum

Maximum

Sum of variable

显示这些变量值的命令

?

sum

auditfees, detail

?

return list

二十、

recode

命令（

PPT61

）

1

、产生有多个值的变量的哑变量

recod e

recode year (min/1999 = 0) (2000/max = 1), gen (yeardum)

min/1999

表示小于等于

1999

的值全部赋值为

0

2000/max

表示大于等于

2 000

的值全部赋为

1

。

2

、对一个连续变量按一定值分为不同间隔的组

recode

gen assets_categ=recode(totalassets, 100, 500, 1000, 5000, 20000, 100000, 1000000)

。分
 组的值为每组的上限，包含该值。

sort assets_categ

5

by assets_categ: sum totalassets assets_categ

3

、

对一个连续变量按一定值分为相同间隔的组

autocode

autocode(variable name, # of intervals, min value, max value)

for example: gen assets_categ=autocode(totalassets, 10, 0, 10000) 

4

、对一个连续变量按每组样本数相同进行分组：

xtile

xtile

assets_categ=totalassets, nquantiles(10)

每组样本不一定完全相同

二十一、一次性计算同一变量不同组别的均值：

egen

命令

按公司类型先排序，再计算每一类型公司审计费用的均值并赋值给新变量：

by companytype, sort: egen meanaf2=mean(auditfees)

?

count()

?

mean()

?

median()

?

sum()

二十二、

_n

和

_N

命令

1

、

显示每个观测的序号并显示总观测数

sort

companyid fye

capture drop x

gen x=_n

capture drop y

gen y=_N

list companyid fye x y in 1

/

30

2

、分组显示每个组中变量的序号和每组总的样本数

?

capture drop x y

?

sort

companyid fye

?

by companyid: gen x=_n

?

by companyid: gen y=_N

?

list companyid fye x y in 1

/

30

3

、创建新变量等于每个分组中变量的第一个值或最后一个值

?

sort

companyid fye

?

by companyid: gen auditfees_first=auditfees[1]

?

by companyid: gen auditfees_last=auditfees[_N]

?

list companyid fye auditfees auditfees_first auditfees_last in 1

/

30

4

、创建新变量等于滞后一期或滞后两期的值

?

sort

companyid fye

?

by companyid: gen auditfees_lag1= auditfees[_n-1]

?

by companyid: gen auditfees_lag2= auditfees[_n-2]

?

list companyid fye auditfees auditfees_lag1 auditfees_lag2 in 1

/

30

二十三、转变数据集结构：

reshape

不同数据库的数据集结构不同：长型是指同一公司不同年度数据在不同的行。宽型数

据是指同一数据不同年度数据在现一行。二者间的转换可通过

reshape

命令来实现。需要注意的

6

是，在转换过程中对数据集是有要求的，一个公司只能有一个年度数据，否则会出错。

1

、长型转换为宽型：

reshape

wide

yearend

incorporationyear

companytype

sales

auditfees

nonauditfees

currentassets currentliabilities totalassets big6 fye, i(companyid) j(year)

2

、宽型转换为长型：

reshape

long

yearend

incorporationyear

companytype

sales

auditfees

nonauditfees

currentassets

currentliabilities totalassets big6 fye, i(companyid) j(year)

3

、第二次转换时命令可简化：

?

reshape wide

?

reshape long

二十四、计算

CAR

的例子：

已知股票日回报率，市场回报率，事件日，计算窗口期为三天的

CAR

。

1

、定义三天的窗口期：

?

sort ticker edate

?

gen window=0 if eventdate<.

（事件日为

0

）

?

replace window=-1 if window[_n+1]==0 & ticker==ticker[_n+1]

?

replace window=1 if window[_n-1]==0 & ticker==ticker[_n-1]

2

、计算

A R

和

CAR

?

gen ar=ret- vwretd

?

gen

car=ar+ar[_n-1]+ar[_n+1]

if

window==0

&

ticker==ticker[_n+1]

ticker==ticker[_n-1]

3

、检验

?

list ticker edate ret vwretd ar car window if window<.

二十五、

means

的

T

检验：

1

、检验总体上

big6
 的审计收费有无显著不同

?

use

?

gen lnaf=ln(auditfees)

?

by big6, sort: sum lnaf

?

test lnaf, by (big6)

2

、分年度比较

big6

的审计收费有无显著不同

,

加入

by year

命令。

?

gen fye=date(yearend,

?

format fye %d

?

gen year=year(fye)

?

sort year

?

by year: ttest lnaf, by(big6)

3

、均值等于特定值得的

T

检验：


?

sum lnaf

?

ttest lnaf=2.1

二十六、
meadian

的显著性检验：

1

、获取中位数的命令：

by big6, sort: sum lnaf, detail

by big6, sort: centile lnaf

7

&

2

、中位数检验：

?

median lnaf, by(big6)

?

ranksum lnaf, by(big6)

二十七、列联表检验：

1

、创建列联表的命令：

?

tabulate

companytype big6, row

第一个变量是表的最左侧一列的项目，第二个变量是表的第一行的项目。

2

、两变量之间的相关性检验：

chi2

tabulate

companytype big6, chi2 row

3

、相关矩阵：

pwcorr lnaf big6 year listed

4

、列出相关矩阵并进行符号检验

pwcorr lnaf big6 year listed, sig

5

、在矩阵中列出观测数

?

pwcorr lnaf big6 listed if year==2000, sig obs

二十八、创建一个不包含缺失值的数据集

1

、无缺失值的变量值为

1

，至少有一个的为

0

gen samp=1 if lnaf<. & big6<. & year<. & listed<.

2

、缺失值的变量值表示同一行中缺失值的个数

egen miss=rmiss(lnaf big6 year listed)

sum miss, detail

二十九、图形

1

、直方图

?

histogram incorporationyear, width(1)

?

histogram incorporationyear, bin(147)

width

表示分一小份的宽度。

bin

表示分成的份数。改变宽度值可以使图像看起来更合适。

?

选择起始点和间隔宽度：

hist lnaf if lnaf>=0 & lnaf<=5, width (0.25)

?

选择描述横轴和纵轴的单位和数据标识：

hist lnaf if lnaf>=0 & lnaf<=5, width (0.25)

xlabel(0(0.5)5)

?

是否与正态分布一致：

hist lnaf if lnaf>=0 & lnaf<=5, width(0.25) normal

2

、散点图（

scatter

）

?

scatter lnaf lnta

第一个变量是纵轴，第二个变量是横轴。

?

twoway (scatter lnaf lnta, msize(tiny)) (lfit lnaf lnta)

在散点图上加入最适合的一条直线。

三十、缩尾处理

winsor

. winsor

rev, gen(wrev) p(0.01)0.01

代表去掉的百分数。

Winsor rev, gen(wrev) h(5),5

代表去掉的个数

8

第二章

线性回归

内容简介：

?

?

?

?

?

?

?

?

?

?

?

2.1 The basic idea underlying linear regression

2.2 Single variable OLS

2.3 Correctly interpreting the coefficients

2.4 Examining the residuals

2.5 Multiple regression

2.6 Heteroskedasticity

2.7 Correlated errors

2.8 Multicollinearity

2.9 Outlying observations

2.10 Median regression

2.11

“

Looping

”

2.1 The basic idea underlying linear regression

1

．残差

F

为真实值，

为预测值，

ε

为残差。


OLS

回归就

是使残差最小。

2.

基本一元回归

regress y x

3

．回归结果的保存

回归结果的系数保存在

_b[

varname

]

内存变量中，常数项的系数保存在

(_cons)

内存变量

中。
 

4

、预测值及残差

?

predict yhat

?

predict yres, resid

yres

即为真实值得与预测值之差。

5

、残差与

X

的散点图

twoway (scatter y_res x) (lfit y_res x)

9

6

、衡量估计系数准确程度：标准误差。

用样本的标准偏差与系数之间的关系来衡量即

T

值（用系数除以标准差）
，同时

P

值是

根据

T

值的分布计算出来的，

表示系数落入标准对应上下限的可能性。

前提是残差符合以下假设：

同方差：

Homoscedasticity (i.e., the residuals have a constant variance)

独立不相关：

Non-correlation (i.e., the residuals are not correlated with each other)

正态分布：

Normality (i.e., the residuals are normally distributed)

7

、回归结果包含的一些内容的意思

?

各变差的自由度：

?

For the ESS, df = k-1 where k = number of regression coefficients (df = 2

–

1)

?

For the RSS, df = n

–

k where n = number of observations (= 11 - 2)

?

For the TSS, df = n-1 ( = 11

–

1)

?

MS

：变差除以自由度：

The

last

column

(MS)

reports

the

ESS,

RSS

and

TSS

divided

by

their

respective degrees of freedom

?

R

平方：

The R-squared = ESS / TSS

?

调整的

R

平方：

A dj R-squared = 1-(1-R2)(n-1)/(n-k)

，消除了加入相关度不高解释变量后

R

平

方增加的不足。

?

Root MSE = square root of RSS/n-k

：模型的平均解释能力

?

The F-statistic = (ESS/k-1)/(RSS/n-k)

：模型的总解释能力

2.3 Correctly interpreting the coefficients

1

、假如想检验
 big6

的审计费用在公开发行和非公开发行公司之间的区别时，可用交互变量。

Big6*listed.

10

2

、变量回归系数的解释


(1)

对连续变量系数的解释：估计系数的经济意义是指
 X

对

Y

的影响，可以有不同的方法来衡

量：一种是用

X

从

25%

变动到

75%

时

Y

的变动量。或

X

变动一个标准差时

Y

的变动。

?

reg

auditfees totalassets

?

sum totalassets if auditfees<., detail

?

gen fees_low=_b[_cons]+_b[totalassets]*r(p25)

?

gen fees_high=_b[_cons]+_b[totalassets]*r(p75)

?

sum fees_low fees_high

（

2

）对非连续变量的解释

一般使用

0

和

1

，而不是百分比。

?

reg lnaf big6

?

gen fees_nb6=exp(_b[_cons])

?

gen fees_b6=exp(_b[_cons]+_b[big6])

?

sum fees_nb6 fees_b6

2.4 Examining the residuals

1

、报告结果时，不仅用

R
 平方来衡量显著性，而且需要报告其他统计结果：

?

is there significant heteroscedasticity?

?

is there any pattern to the residuals?

?

are there any problems of outliers?

2

、

R2

的使用：

Gu (2007) points out that:

?

econometricians

consider

R2

values

to

be

relatively

unimportant

(accounting

researchers put far too much emphasis on the magnitude of the R2)

?

regression R2s should not be compared across different samples

?

in contrast there is a large accounting literature that uses R2s to determine whether

the value relevance of accounting information has changed over time

。

11

The R2 tells us nothing about whether our hypothesis about the determinants of Y is correct.

3

、适当使用

re sid

来评估模型的优劣。

2.5 Multiple regression

1

、判断模型中有无忽略相关解释变量：

?

theory

?

prior empirical studies

2

、

检验残差和所预测的值之间是否独立：

?

gen listed=0

?

replace listed=1 if

companytype==2 |

companytype==3 |

companytype==5

?

reg lnaf lnta big6 listed

?

predict lnaf_hat

（求预测值，因变量的估计值）

?

predict lnaf_res, resid

（将残差赋值给变量

lna f_res

）

?

twoway

(scatter

lnaf_res

lnaf_hat)

(lfit

lnaf_res

lnaf_hat)(
检验残差和预测值之间是

否相关

)

3

、另一种命令可以实现以上功能：

?

reg lnaf lnta big6 listed

?

rvfplot

2.6 Heteroscedasticity (hettest)

异方差性

1

、检验方差齐性的方法：

回归后使用

hettest

命令：

?

reg auditfees nonauditfees totalassets big6 listed

?

hettest

3

、

方差齐性不会使系数有偏，但会使使系数的标准差有偏。产生的原因有可能是数据

本身有界限，产生高的偏度。一些方差不齐可以通过取对数消除。当发现不齐性时

使用

Huber/White/sandwich

estimator

对标准差进行调整。

STATA

可以在回归时加上

robust

来实现。

?

reg auditfees nonauditfees totalassets big6 listed, robust

加

robust

后的回归系数相同，但标准差不同，

T

值变小，

P

值变大，

F

值变小，

R2

不

变。

2.7 Correlated errors(

自变量相关

)

1

、

The residuals of a given firm are correlated across years (

“

time series dependence

”

)

，

面板数据

12

（

In panel data

）

,

同一公司不可观测的特性对不同年度都会产生一定的影响，这时就会使数

据不独立。

t here are likely to be unobserved company-specific characteristics that are relatively

constant over time

?

2

、标准差会下偏，

This

problem

can

be

avoided

by

adjusting

the

standard

errors

for

the

clustering of yearly observations across a given company

3

、消除变量相关问题：

在回归中加入

robust cluster()

reg lnaf lnta big6 listed, robust cluster (companyid)

4

、如何验证同一公司不同年度数据的残差的相关性

?

reg lnaf lnta

?

predict res, resid

?

keep companyid year res

?

sort

companyid year

?

drop if companyid==companyid[_n-1] & year==year[_n-1]

?

reshape wide

res, i( companyid) j(year)

?

browse

?

pwcorr

res1998- res2002

5

、在使用面板数据时应注意：

?

只用

ro bust

控制

heteroscedasticity

，

而未用

cluster(

)

控制

time-series de pendence

，

T

统计量也会上偏。

?

如果

heteroscedasti city

也未控制，

T

统计量会上偏更严重。

?

因此在使用面板数据时应加入

robust

cluster()

option,

otherwise

your

“significant”

results from pooled regressions may be spurious.

2.8 Multicollinearity

1

、什么情况下会产生多重共线性

?

We have seen that when there is perfect collinearity between independent variables,

STATA

will

have

to

exclude

one

of

them.

For

example,

year_1

+

year_2

+

year_3

+

year_4 + year_5 = 1

?

reg lnaf year_1 year_2 year_3 year_4 year_5, nocons

?

STATA automatically throws away one of the year dummies so that the model can be

estimated

?

Even

if

the

independent

variables

are

not

perfectly

collinear,

there

can

still

be

a

problem if they are highly correlated

2

、后果：

?

the

standard

errors

of

the

coefficients

to

be

large

(i.e.,

the

coefficients

are

not

estimated precisely)

?

the coefficient estimates can be highly unstable

3

、衡量方法：

Variance-inflation factors (VIF)

可用来衡量是否存在多重共线性。

?

reg lnaf lnta big6 lnta1

?

vif

13

?

reg lnaf lnta big6

?

vif

4

、

多重共线性的严重程度：如果为

10

时可判断为高，为

20

时可判断为非常高。

2.9 Outlying observations

1

、异常值的衡量

Cook

’

s D

?

We can calculate the influence of each observation on the estimated coefficients using

Cook

’

s D

?

Values

of

Cook

’

s

D

that

are

higher

than

4

/N

are

considered

large,

where

N

is

the

number of observations used in the regression

2

、异常值的计算

?

reg lnaf lnta big6

?

predict cook, cooksd

（将

cooksd
 的值赋给

cook

）

?

sum cook, detail

?

gen max=4

/e(N) (

求

max, e(N)

是回归过程中的内部已知变量

)

?

count if cook>max & cook<.

4

、

去掉异常值后重新回归

?

reg lnaf lnta big6 if cook<=max

5

、用

winsorize

方法消除异常值

:

其缺点是

A

disadvantage

with

“

winsorizing

”

is

that

the

researcher is assuming that outliers lie only at the extremes of the variable

’

s distribution

。

?

winsor lnaf, gen(wlnaf) p(0.01)

?

winsor lnta, gen(wlnta) p(0.01)

?

sum lnaf wlnaf lnta wlnta, detail

?

reg wlnaf wlnta big6

2.10 Median regression

1

、中位数回归是当存在异常值问题时使用。

2

、原理：

OLS

估计是尽量使残差平方和最小：

中位数回归是尽量使

the sum of the absolute residuals

最小。

3

、

回归方法：

STATA

将中位数回归看作是

q uantile regressions

的一个特例。

qreg lnaf lnta big6

2.11

“

Looping

”

1

、当多次用到一个命令集时，我们可以建立一个程序集，以

program

开头，以

forvalues

引

14

导的内容，以
end

结束。使用时只须输入程序名“

ten

”

即可执行程序中的一引起命令集。

Example:

program ten

forvalues i = 1(1)10 {

display `i'

}

end

2

、修改命令集：

须首先删除内存中的命令集：

capture program drop ten

然后重新编写。

4

、

例子：利用

JONES

模型计算操控性应计。

?

use

?

gen one_sic=int(sic/1000)

?

gen ncca= current_assets- cash

?

gen ndcl= current_liabilities- debt_in_current_liabilities

?

sort cik year

?

gen ch_ncca=ncca-ncca[_n-1] if cik==cik[_n-1]

?

gen ch_ndcl=ndcl-ndcl[_n-1] if cik==cik[_n-1]

?

gen accruals=(ch_ncca-ch_ndcl)/assets[_n-1] if cik==cik[_n-1]

?

gen lag_assets=assets[_n-1] if cik==cik[_n-1]

?

gen ppe_scaled=ppe/assets[_n-1] if cik==cik[_n-1]

?

gen chsales_scaled=(sales- sales[_n-1])/assets[_n-1] if cik==cik[_n-1]

?

gen ab_acc=.

?

capture program drop ab_acc

?

program ab_acc

?

forvalues i = 0(1)9 {

?

capture reg

accruals lag_assets ppe_scaled chsales_scaled if one_sic==`i'

?

capture predict ab_acc`i' if one_sic==`i', resid

?

replace ab_acc= ab_acc`i' if one_sic==`i'

?

capture drop ab_acc`i'

?

}

?

end

?

ab_acc

15

第三章

因变量为非连续性变量时的回归分析

内容简介：

?

3.1 Why not OLS?

?

3.2 The basic idea underlying logit models

?

3.3 Estimating logit models

?

3.4 Multinomial models

?

3.5 Ordinal dependent variables

?

3.6 Count data models

?

3.7 Tobit models and interval regression

?

3.8 Duration models

?

3.1 Why not OLS?

1

、

two

statistical

problems

if

we

use

OLS

when

the

dependent

variable is categorical:

?

The predicted values can be negative or greater than one

?

The standard errors are biased because the residuals are heteroscedastic.

2

、

Instead of OLS, we can use a logit model

?

3.2 The basic idea underlying logit models

1

、

We need to create a variable that:

将离散型的因变量转变为符合

OLS

的形式。

?

has an infinite range,

?

reflects the likelihood of choosing a big6 auditor versus a non-big6 auditor.

2
 、

“

odds ration

”可实现上面的两项要求：

log(odds ration)

3

、

具体例子：

16

第一列为

big6

的可能性，第二列和第三列为优势比率，第四列为取自然对数后的值。

4

、

L

和

P

之间的转换关系。

5

、似然函数：使用最大似然法估计（

maximum likelihood” estimation

）

6

、回归命令

logit

和

logistic

?

logit reports the values of the estimated coefficients

?

logistic reports the odds ratios

一般报告系数估计所以使用

logit

。

17

7

、模型的解释能力参数：

pseudo-R2

和

Chi2

?

p

seudo-R2 = (ln(L

0

) - ln(L

N

)) /

ln(L

0

) = (-175224+146215) / -175224

ln(L0)

是第一个回归值，

ln(LN)

是最后一个回归值。

?

Chi2 = -2(ln(L

0

) - ln(L

N

)) = -2*(-175224+146215) = 58018

?

3.3 Estimating logit models

1

、回归模型

?

logit big6 lnta age, robust cluster(companyid)

加入

robust

命令是为了纠正异方差，加入

clus ter()

是为了纠正相关性错误。

2

、预测因变量的可能性

?

logit big6 lnta age, robust cluster(companyid)

?

drop big6hat

?

predict big6hat

?

sum big6hat, detail

用此命令产生的预测值为以下公式：

另一种产生预测因变量可能性的方法：

?

gen big6hat2=exp(big6hat1)/(1+exp(big6hat1))

?

sum big6hat big6hat1 big6hat2

3

、产生预测因变量的值：

?

gen big6hat1 = _b[_cons]+_b[lnta]*lnta + _b[age]*age

?

sum big6hat1, detail

另一种方法是

predict big6hat1, xb

4

、

计算自变量变动对因变量可能性的影响：

?

logit big6 lnta age, robust cluster(companyid)

?

gen

big10

=

exp(_b[_cons]+_b[lnta]*lnta

+

_b[age]*10)

(1+(exp(_b[_cons]+_b[lnta]*lnta + _b[age]*10)))

?

gen

big20

=

exp(_b[_cons]+_b[lnta]*lnta

+

_b[age]*20)

(1+(exp(_b[_cons]+_b[lnta]*lnta + _b[age]*20)))

?

sum big10 big20

5

、检验因变量与自变量之间单调性的方法：

?

xtile lnta_categ=lnta, nquantiles(10)

?

tabulate lnta_categ, gen (lnta_)

?

logit big6

lnta_2- lnta_10

age, robust cluster(companyid)

6

、另一种估计方法

probit

Log it

把

P

（

Y =1

）转换成

0-1

之间的数据，数据服从对数分布

Probit

把
P

（

Y=1

）转换成

0-1

之间的数据，数据服从正态分布。似然函数为

18

/

/

?

The coefficients tend to be larger in probit models but the levels of statistical significance are

often similar

例子：

?

?

?

?

?

?

capture drop big6hat big6hat1

logit big6 lnta age, robust cluster(companyid)

predict big6hat

probit big6 lnta age, robust cluster(companyid)

predict big6hat1

pwcorr big6hat big6hat1

3.4 Multinomial models

（多项式模型）

1

、适用情况：

因变量分为三个或以上分类，而且分类不排序，每一个分类都有

1
 和

0

两个变量。

如果用

logit

模型分别回归，将使回归后合计的可能性不等于

1

。

?

将公司类型分为三类

?

gen cotype1=0 if companytype==1 | companytype==6

?

replace cotype1=1 if companytype==4

?

replace cotype1=2 if companytype==2 | companytype==3 | companytype==5

?

将每类变量分为两种情况

?

gen private=0

?

replace private=1 if cotype1==0

?

gen public_nontraded=0

?

replace public_nontraded=1 if cotype1==1

?

gen public_traded=0

?

replace public_traded=1 if cotype1==2

?

用

log it

模型分单个变量进行回归

?

l

ogit private lnta, robust cluster(companyid)

?

p

redict private_hat

?

l

ogit public_nontraded lnta, robust cluster(companyid)

?

p

redict public_nontraded_hat

?

l

ogit public_traded lnta, robust cluster(companyid)

?

p

redict public_traded_hat

?

合计的可能性不等于

1

?

gen sum_prob= private_hat+ public_nontraded_hat+ public_traded_hat

?

sum sum_prob, detail

2

、多于

2

个分类时的因变量回归：

mprobit

或

mlogit

19

-

-

-

-

-

-

-

-

本文更新与2021-02-08 16:10，由作者提供，不代表本网站立场，转载请注明出处：https://www.bjmy2z.cn/gaokao/615501.html

返回列表：英语

上一篇：(完整word版)小学笔画笔顺
下一篇：建筑照明术语标准

当前您在：主页 > 英语 >

STATA实用学习笔记

-

-

-

-

-

-

-

-

-

返回列表：英语

STATA实用学习笔记的相关文章

爱心与尊严的高中作文题库

爱心与尊严高中作文题库

爱心与尊重的作文题库

爱心责任100字作文题库

爱心责任心的作文题库

爱心责任作文题库

爱心长在作文题库

爱心中国感恩励志作文题

爱心助考作文题库

爱心助农作文题库

爱心尊重宽容拒绝作文题

爱心尊重作文题库

爱心作文题库好段

爱心作文题库120字

爱心作文题库读者

爱心作文题库分论点

爱心作文题库简短

爱心作文有哪些题库

爱需要被尊重作文题库

爱需要传递200字作文题库

爱需要公平作文题库

爱需要行动作文800高中作

爱需要行动作文题库

爱需要交流与沟通作文题

当前您在： 主页 > 英语 >

-

-

-

-

-

-

-

-

-

STATA实用学习笔记的相关文章

当前您在：主页 > 英语 >