-
Using Arellano
–
Bond
Dynamic Panel GMM Estimators in Stata
Tutorial with Examples using Stata 9.0
(xtabond and xtabond2)
Elitza Mileva,
Economics
Department
Fordham
University
1. The model
The following model examines the impact
of capital flows on investment in a panel dataset
of 22 countries for 10 years (1995
–
2004):
I
it
=
β
1
I
i
p>
,
t
?
1
+
β
2
it
July 9, 2007
K
+
β
3
X
it
+
u
it
.
(1)
In equation
(1) above
I
it
is
gross fixed capital formation as a percentage of
GDP and
I
it-1
is
its
lagged value.
K
it
is a matrix
of the components of foreign resource flows
–
FDI, loans and
portfolio (equity and bonds)
–
as percentage shares of
GDP.
X
it
is a
matrix of the following control
variables: lagged real GDP growth to
account for the accelerator effect; the absolute
value of one
step ahead growth forecast
errors as a measure of uncertainty; the change in
the log terms of trade
to gauge the
price of imported capital goods; and, finally, the
deviation of M2 from its three-year
trend as a proxy for the liquidity
available to finance investment.
2. Why
the Arellano
–
Bond GMM
estimator?
Several econometric problems
may arise from estimating equation (1):
1. The capital flows variables in
K
it
are assumed
to be endogenous. Because causality may run in
both
directions
–
from
capital
inflows
to
investment
and
vice
versa
–
these
regressors
may
be
correlated with the error term.
2.
Time-invariant
country
characteristics
(fixed
effects),
such
as
geography
and
demographics,
may be correlated with the explanatory
variables. The fixed effects are contained in the
error term
in equation (1), which
consists of the unobserved country-specific
effects,
v
i
, and
the observation-
specific errors,
e
it
:
1
u
it
(2).
it
=
v
i
+
e
3. The presence of the lagged dependent
variable
I
it-1
gives rise to autocorrelation.
4. The panel dataset has a short time
dimension (
T =10
) and a
larger country dimension (
N
=22
).
To
solve
problem
1
(and
problem
2)
one
would
usually
use
fixed-effects
instrumental
variables estimation (two-stage least
squares or 2SLS), which is what I tried first. The
exogenous
instruments I used were the
following: the aggregate long-term capital inflows
to the countries in
our
sample
as
a
group
as
a
percentage
of
the
sum
of
their
cumulative
GDP
(I
labelled
these
‘regional
flows’),
an
index
of
financial
openness
and
the
EBRD
transition
index.
However,
the
first-stage statistics of the 2SLS
regressions showed that my instruments were weak.
With weak
instruments
the
fixed-effects
IV
estimators
are
likely
to
be
biased
in
the
way
of
the
OLS
estimators. Therefore, I
decided to use the Arellano
–
Bond (1991) difference GMM
estimator first
proposed
by
Holtz-Eakin,
Newey
and
Rosen
(1988).
Instead
of
using
only
the
exogenous
instruments
listed
above
lagged
levels
of
the
endogenous
regressors
in
K
it
(FDI,
loans
and
portfolio) are also added. This makes
the endogenous variables pre-determined and,
therefore, not
correlated with the
error term in equation (1).
To
cope
with
problem
2
(fixed
effects)
the
difference
GMM
uses
first-
differences
to
transform
equation (1) into
Δ
I
1
Δ
I
i
,
t
?
1
+
β
2
Δ<
/p>
K
it
+
β
3
Δ
X
it
+
Δ
u
it
it
=
β
(In general form the transformation is
given by:
Δ
y
it
=
α
Δ
y<
/p>
it
?
1
+
Δ
x
′
it<
/p>
β
+
Δ
u
By
transforming
the
regressors
by
first
differencing
the
fixed
country-specific
effect
is
removed, because it does
not vary with time. From equation (2) we get
Δ
u
it
=
Δ
v
i
+
Δ
e
it
or
u
it
?
u
i
,
t
p>
?
1
=
(
v
i
?
v
i
)
+
(
e
it
?
e
i
p>
,
t
?
1
)
=
e
it
?
e
i
p>
,
t
?
1
.
The first-differenced lagged
dependent variable (
problem
3
) is also instrumented with its
past levels.
(3).
it
.)
2
Finally, the Arellano
–
Bond estimator was
designed for small-T large-N panels
(
problem
4
).
In
large-T
panels
a
shock
to
the
country’s
fixed
effect,
which
shows
in
the
error
term,
will
decline with time.
Similarly, the correlation of the lagged dependent
variable with the error term
will be
insignificant (see Roodman, 2006). In these cases,
one does not necessarily have to use the
Arellano
–
Bond
estimator.
3. Using the Arellano
–
Bond difference GMM
estimator in Stata
3.1 Import data into
Stata
T
he
easiest way to get panel data into Stata is to
organize your Excel spreadsheet in the following
way:
c
try
A
LB
ALB
ALB
ALB
ALB
ALB
ALB
ALB
ALB
ALB
ARM
A
RM
ctry_dum
year
inv
1
1995
1
1996
1
1997
1
1998
1
1999
1
2000
1
2001
1
2002
1
2003
1
2004
2
1995
2
1996
18.000
21.044
16.829
16.296
20.005
24.736
29.215
26.156
25.013
23.686
16.154
17.885
growth
8.900
9.100
-10.200
12.700
10.100
7.300
7.200
3.400
6.002
5.900
6.900
5.865
uncert
8.444
tot
dev_m2
fin_integr
trans_index
fdi
3.000
3.000
9.447
3.281
1.444
2.572
2.654
2.937
-0.455
-1.298
3.000
3.024
3.024
3.024
3.024
3.024
3.024
3.024
2.000
3.000
2.333
2.519
2.519
2.519
2.557
2.778
2.814
2.814
2.814
2.889
2.112
2.444
0.861
0.994
0.580
0.480
0.389
1.245
1.648
1.002
1.225
2.701
0.394
0.309
loans
-0.005
0.050
-0.013
-0.019
-0.035
-0.009
-0.031
0.005
-0.019
0.188
0.000
0.000
portfolio
flows_eeca
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.033
1.121
1.198
1.783
2.365
1.826
1.488
1.263
1.718
1.894
3.288
1.121
1.198
0.215
.
6.614
-0.112
.
12.247
16.874
6.783
0.057
0.019
0.071
3.750
-0.006
4.023
-0.018
0.045
-0.010
3.716
2.543
0.011
0.040
17.601
-0.103
-10.808
7.872
0.302
0.261
…
Note that all
observations (i.e. country 1 period 1; country 1
period 2; etc.) are stacked vertically
and the variable are listed
horizontally.
Save the Excel worksheet
as a text file (.txt, .csv, etc.). Open Stata and
import the data by
choosing File,
Import, ASCII data created by spreadsheet, and
click on the Browse button.
Alternatively, you can type the
following command in the command window, if your
text file is
located on the C drive:
insheet using
(14 vars, 220
obs)
(Note that from now on text in
blue will show Stata commands or their
components.)
3.2 Set the dataset as a
panel
Next, save your dataset as a
panel by selecting Statistics, Longitudinal /
Panel data, Setup &
Utilities, Declare
dataset to be cross-sectional time series.
Choose a variable that identifies the
time dimension (year, in this example)
and a variable that identifies the panel ID
(ctry_dum, in this
3
example). Stata needs a
numerical variable for the panel ID so the
variable ctry, which is a string
variable,
won’t
work. Alternatively, you can type the following
command:
tsset ctry_dum year
panel variable:
ctry_dum
(strongly balanced)
time variable:
year, 1995 to 2004
3.3 Stata
command: xtabond
Two
Arellano
–
Bond estimators are
available for Stata 9.0
–
one incorporated into Stata 9 (called
xtabond
) and one proprietor
program written by Roodman (2006) (called
xtabond2
). First is
discussed the former (Stata 10.0 will
have two AB estimators built in, including it
version of the
system estimator).
Click on Statistics, Longitudinal /
Panel data, Dynamic panel data, Arellano
–
Bond regression
(RE). Stata displays a window, in which
you can easily select the dependent variable, the
endogenous and exogenous independent
variables as well as the lags of the instruments.
3.4 Stata command: xtabond2
Although the above-mentioned Stata menu
option is easier to use, I have found
Roodman’s
proprietary program
(
xtabond2
) better
–
it is more flexible and
has a better help file and
“how
to
do
xtabond2”
paper (see in the
references).
xtabond2
can do
everything that
xtabond
does
and has
many additional features. See
the Stata help file or the paper for a description
of the improvements
offered by
Roodman’s
program. The
disadvantage of xtabond2 is that you actually have
to type
the program code
–
there is no menu for it.
Since xtabond2 is not an official
command of Stata 9, it has to be downloaded from
the Internet
/c/boc/bocode/
or by typing the following command:
ssc install xtabond2
If you have to download all
xtabond2-related files from the
repec
website, make sure you save
each file in the appropriate ado folder
in your Stata folder, that is in the folder of the
first letter of
the file name as it is
listed on the website.
(
xtabond2 may
be directly
available with Stata 10, or it may include a
different system routine)
4
The following command shows
you the help file:
help
xtabond2
Below is the
command I used to estimate equation (1) followed
by the Stata output:
xtabond2 inv fdi
loans portfolio uncert tot dev_m2, gmm (inv fdi
loans portfolio, lag (2 2))
iv(fin_integr trans_index flows_eeca
uncert tot dev_m2) nolevel
small
Favoring space over
speed. To switch, type or click on mata: mata set
matafavor
speed, perm.
Warning: Number of instruments may be
large relative to number of observations.
Suggested rule of thumb: keep number of
instruments <= number of groups.
Arellano-Bond dynamic panel-data
estimation, one-step difference GMM results
---------------------------------------------
---------------------------------
Group
variable: ctry_dum
Time variable : year
Number of instruments = 39
F(8, 157)
Prob > F
=
=
6.88
0.000
Number of obs
Number of groups
Obs per
group: min =
=
=
3
avg =
max =
7.50
8
165
22
-------------------------
--------------------------------------------------
---
|
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------
------+-------------------------------------------
---------------------
inv |
L1. |
.2922856
.111738
2.62
0.010
.0715819
.5129893
fdi |
.5202847
.2094545
2.48
0.014
.1065725
.933997
loans |
.2789421
.1638248
1.70
0.091
-.044643
.6025271
portfolio |
-.0086876
.3376843
-0.03
0.980
-.6756779
.6583028
growth |
L1. |
.1167961
.0555715
2.10
0.037
.0070319
.2265604
uncert
|
.0397982
.0673439
0.59
0.555
-.0932187
.172815
tot |
.9193659
1.916147
0.48
0.632
-2.865388
4.704119
dev_m2
|
.0443079
.0760188
0.58
0.561
-.1058435
.1944594
------------------------------------------------ ------------------------------
Sargan
test of overid. restrictions: chi2(31) =
36.42
Prob > chi2 =
0.231
Arellano-Bond test for
AR(1) in first differences: z =
-0.01
Pr > z =
0.992
Arellano-Bond test for AR(2) in first
differences: z =
-0.48
Pr >
z =
0.628
As you can see,
the command
xtabond2
is
followed by the dependent variable (inv) and the
list of
all right-hand-side variables:
xtabond2 inv fdi loans portfolio
uncert tot dev_m2
The lag
operator is given by
l.
as
in
or
for 2 lags of inv.
5