||
Downloading your data:
You can download your data from the Econ 508 webpage (here) and save the file in your preferred directory (I'll save mine as "C:phuzics01.txt"). Note that the names of the variables are slightly different from the PS4. In the data set:
id: person identifier | sex: gender (female==1) | y: page equivalent in current year |
yr: current year - 1900 | rphd: rank of PhD | Y: discounted cumulative page equivalent |
phd: year of PhD - 1900 | ru: dummy for research university(res.pos.==1) | s: current annual salary |
In R:
See Appendix A for a quick session in R.
In STATA:
First you need to expand the memory dedicated to STATA in your computer:
set memory 1g
Then you can infile the data by typing:
infile id yr phd sex rphd ru y Y s using "C:phuzics01.txt"
Note: Drop the first line of obs with missing values (due to the labels of variables in .txt file).
After that you declare your data set as panel, sort by id, and summarize the data:
iis id
tis yr
sort id
summarize
Finally you can save it in the STATA format (I will save mine as "C:phuzics01.dta"), and upload it using a little STATA program you are going to write with your panel functions.
PQ.do
The first step towards the panel data estimation is to transform your data into group means and deviations of group means. There's a specific code in STATA for that, called PQ.do :
* A simple program for computing group means (P) and
* deviations from group means (Q).
capture program drop PQ
program define PQ
version 4.0
local options "Level(integer $S_level)"
local varlist "req ex"
parse "`*'"
parse "`varlist'",parse(" ")
sort id
quietly by id: gen P`1'=sum(`1')/sum(`1'~=.)
quietly by id: replace P`1'=P`1'[_N]
quietly gen Q`1'=`1'-P`1'
end
You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in STATA, it automatically runs the code. After that you can use the function by typing "PQvariablename". For example, if you type PQy, two tranformations of y will be added to your list of variables:
Py for the group means of y (used by the between estimators), and
Qy for the deviations of group means of y (used by the within estimators).
You should apply this function for all variables used in your estimations. For example, you will see that the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor Instrumental Variables estimators.
ht.do
The second step is to write your own program in order to compute the HTIV estimators. The Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can download the file in the same way you did above. Some details must be rexplained, though:
1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not work.
2) The program ht.do contains some features that should be adjusted according to the user, such as the path to access the data set, the directory where to create a log file, etc. So, don't forget to adjust the program to your machine.
3) The most important detail: the user should specify the model, create new variables, and decide which variables will be included in the regression and/or treated as instruments.
Thus, it is essential to read Koenker's Lecture 13 (2004) and Hausman-Taylor (1981), as well as a good interpretation of the PS4 and auxiliar papers, in order to understand what the program is doing and how you need to adjust it.
To facilitate your job, I included below a sample of the ht.do program (with small adjustments) to compute the productivity and the wages regressions:
The Productivity Equation
. do "C:produc01.do"
. use "C:phuzics01.dta", clear
. iis id
. tis yr
. sort id
. replace y=log(y)
(10623 real changes made)
. gen exp=yr-phd
. gen expsq=exp^2
. gen ier=1/(exp*rphd)
. gen d60=0
. replace d60=1 if phd>60
(8788 real changes made)
. quietly by id: gen y1=y[_n-1]
. quietly by id: gen y2=y[_n-2]
. drop if y2==.
(1000 observations deleted)
. * Why can't we ommitt the above command?
. * Stata can handle missing obs, but here we will
. * have problems if we let Stata do our work. Why?
. * Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead.
. PQ y
. PQ y1
. PQ y2
. PQ exp
. PQ expsq
. PQ rphd
. PQ ier
. PQ d60
. PQ sex
. PQ ru
. PQ Y
. PQ s
. * Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex, Qsex=0.
. * Nonetheless, we need Pd60 and Psex later. Can you see where and why?
. * POOLED OLS
. regress y y1 y2 exp expsq ier d60 sex
Source | SS df MS Number of obs = 9623
---------+------------------------------ F( 7, 9615) = 3485.58
Model | 2482.67217 7 354.667452 Prob > F = 0.0000
Residual | 978.354138 9615 .1017529 R-squared = 0.7173
---------+------------------------------ Adj R-squared = 0.7171
Total | 3461.02631 9622 .359699263 Root MSE = .31899
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .7788283 .0099088 78.600 0.000 .759405 .7982516
y2 | -.2373115 .0095526 -24.843 0.000 -.2560365 -.2185865
exp | .0837589 .0021972 38.120 0.000 .0794519 .088066
expsq | -.0016344 .0000441 -37.102 0.000 -.0017207 -.001548
ier | 3.407355 .1403287 24.281 0.000 3.132281 3.682429
d60 | .0204149 .0098334 2.076 0.038 .0011395 .0396904
sex | -.0189182 .0093459 -2.024 0.043 -.0372382 -.0005982
_cons | .4432838 .0192268 23.056 0.000 .4055953 .4809723
------------------------------------------------------------------------------
. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg y y1 y2 exp expsq ier d60 sex, fe
Fixed-effects (within) regression Number of obs = 9623
Group variable (i) : id Number of groups = 500
R-sq: within = 0.6535 Obs per group: min = 5
between = 0.7696 avg = 19.2
overall = 0.6594 max = 46
F(5,9118) = 3438.90
corr(u_i, Xb) = 0.2409 Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6573427 .0098663 66.625 0.000 .6380026 .6766828
y2 | -.3369351 .0094686 -35.585 0.000 -.3554957 -.3183745
exp | .101091 .0023692 42.669 0.000 .0964469 .1057351
expsq | -.0020004 .000046 -43.470 0.000 -.0020906 -.0019102
ier | .2556901 .2487721 1.028 0.304 -.231959 .7433392
d60 | (dropped)
sex | (dropped)
_cons | 1.037784 .0282838 36.692 0.000 .9823415 1.093227
------------------------------------------------------------------------------
sigma_u | .20105547
sigma_e | .3014717
rho | .30784989 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(499,9118) = 3.30 Prob > F = 0.0000
. hausman, save
. * BETWEEN ESTIMATORS
. xtreg y y1 y2 exp expsq ier d60 sex, be
Between regression (regression on group means) Number of obs = 9623
Group variable (i) : id Number of groups = 500
R-sq: within = 0.5435 Obs per group: min = 5
between = 0.9916 avg = 19.2
overall = 0.6503 max = 46
F(7,492) = 8303.54
sd(u_i + avg(e_i.))= .0323035 Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | 1.49359 .0292156 51.123 0.000 1.436188 1.550993
y2 | -.519665 .0294238 -17.661 0.000 -.5774767 -.4618533
exp | .0045644 .0027111 1.684 0.093 -.0007622 .0098911
expsq | -.0001547 .0000683 -2.264 0.024 -.0002889 -.0000204
ier | .0081957 .0830353 0.099 0.921 -.1549517 .1713432
d60 | -.0079103 .0113073 -0.700 0.485 -.0301268 .0143062
sex | -.0039254 .0042214 -0.930 0.353 -.0122197 .0043688
_cons | .0810791 .0155373 5.218 0.000 .0505515 .1116067
------------------------------------------------------------------------------
. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg y y1 y2 exp expsq ier d60 sex, re
Random-effects GLS regression Number of obs = 9623
Group variable (i) : id Number of groups = 500
R-sq: within = 0.6324 Obs per group: min = 5
between = 0.9390 avg = 19.2
overall = 0.7173 max = 46
Random effects u_i ~ Gaussian Wald chi2(7) = 24399.03
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .7788283 .0099088 78.600 0.000 .7594074 .7982492
y2 | -.2373115 .0095526 -24.843 0.000 -.2560342 -.2185889
exp | .0837589 .0021972 38.120 0.000 .0794524 .0880654
expsq | -.0016344 .0000441 -37.102 0.000 -.0017207 -.001548
ier | 3.407355 .1403287 24.281 0.000 3.132316 3.682395
d60 | .0204149 .0098334 2.076 0.038 .0011419 .039688
sex | -.0189182 .0093459 -2.024 0.043 -.0372359 -.0006005
_cons | .4432838 .0192268 23.056 0.000 .4056001 .4809675
---------+--------------------------------------------------------------------
sigma_u | 0
sigma_e | .3014717
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| Prior Current Difference S.E.
---------+-------------------------------------------------------------
y1 | .6573427 .7788283 -.1214856 .
y2 | -.3369351 -.2373115 -.0996236 .
exp | .101091 .0837589 .017332 .0008861
expsq | -.0020004 -.0016344 -.000366 .0000133
ier | .2556901 3.407355 -3.151665 .2054152
---------+-------------------------------------------------------------
b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test: Ho: difference in coefficients not systematic
chi2( 5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2114.69
Prob>chi2 = 0.0000
. * INSTRUMENTAL VARIABLES (1ST ROUND)
. regress y y1 y2 exp expsq ier d60 sex (Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 9623
---------+------------------------------ F( 7, 9615) = 2063.13
Model | 2274.04125 7 324.863035 Prob > F = 0.0000
Residual | 1186.98506 9615 .123451384 R-squared = 0.6570
---------+------------------------------ Adj R-squared = 0.6568
Total | 3461.02631 9622 .359699263 Root MSE = .35136
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6607854 .0114819 57.550 0.000 .6382785 .6832923
y2 | -.3351626 .0110306 -30.385 0.000 -.356785 -.3135403
exp | .1023073 .0027215 37.593 0.000 .0969726 .107642
expsq | -.0020375 .0000529 -38.501 0.000 -.0021412 -.0019338
ier | .3347733 .2895833 1.156 0.248 -.232871 .9024175
d60 | .0324825 .010838 2.997 0.003 .0112378 .0537272
sex | -.0198867 .010298 -1.931 0.053 -.0400729 .0002995
_cons | .9895473 .0332799 29.734 0.000 .9243117 1.054783
------------------------------------------------------------------------------
. predict r,res
. PQ r
. gen Prsq=Pr^2
. quietly by id: gen mark=_n
. *What does mark do? (see next regression)
. quietly by id: gen T=_N
. gen iT=1/T
. regress Prsq iT if mark==1
Source | SS df MS Number of obs = 500
---------+------------------------------ F( 1, 498) = 4.15
Model | .011654194 1 .011654194 Prob > F = 0.0420
Residual | 1.39687301 498 .002804966 R-squared = 0.0083
---------+------------------------------ Adj R-squared = 0.0063
Total | 1.4085272 499 .0028227 Root MSE = .05296
------------------------------------------------------------------------------
Prsq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
iT | .1525195 .0748252 2.038 0.042 .0055075 .2995315
_cons | .0291845 .0053924 5.412 0.000 .0185899 .0397791
------------------------------------------------------------------------------
. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace y=y-(1-theta)*Py
(9623 real changes made)
. replace y1=y1-(1-theta)*Py1
(9623 real changes made)
. replace y2=y2-(1-theta)*Py2
(9623 real changes made)
. replace exp=exp-(1-theta)*Pexp
(9623 real changes made)
. replace expsq=expsq-(1-theta)*Pexpsq
(9623 real changes made)
. replace ier=ier-(1-theta)*Pier
(9623 real changes made)
. replace d60=d60-(1-theta)*Pd60
(7874 real changes made)
. replace sex=sex-(1-theta)*Psex
(1358 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress y y1 y2 exp expsq ier d60 sex theta (Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta), noconstant
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 9623
---------+------------------------------ F( 8, 9615) = .
Model | 18502.9839 8 2312.87299 Prob > F = .
Residual | 907.099508 9615 .094342123 R-squared = .
---------+------------------------------ Adj R-squared = .
Total | 19410.0834 9623 2.01705117 Root MSE = .30715
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6578081 .01005 65.454 0.000 .638108 .6775082
y2 | -.3367606 .0096466 -34.910 0.000 -.3556698 -.3178513
exp | .1012754 .0024055 42.101 0.000 .09656 .1059907
expsq | -.0020057 .0000468 -42.898 0.000 -.0020974 -.0019141
ier | .2589409 .2534411 1.022 0.307 -.237857 .7557388
d60 | .033342 .023551 1.416 0.157 -.0128229 .079507
sex | -.01201 .0203523 -0.590 0.555 -.0519049 .0278848
theta | 1.009543 .0361539 27.924 0.000 .9386737 1.080412
------------------------------------------------------------------------------
. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
iT _cons
y1 .15251948 .02918446
. summarize theta
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
theta | 9623 .4463058 .0806179 .3194048 .7148795
. clear
end of do-file
The Wages Equation
. do "C:wages01.do"
. use "C:phuzics01.dta", clear
. iis id
. tis yr
. sort id
. quietly by id: gen s1=s[_n-1]
. quietly by id: gen Y1=Y[_n-1]
. *Shall I drop the missing variables here?
. drop if s1==.
(500 observations deleted)
. replace s=log(s/s1)
(10123 real changes made)
. replace Y=log(Y/Y1)
(10123 real changes made)
. PQ s
. PQ Y
. PQ sex
. PQ ru
. * POOLED OLS
. regress s Y ru sex
Source | SS df MS Number of obs = 10123
---------+------------------------------ F( 3, 10119) = 3335.56
Model | 7.67338445 3 2.55779482 Prob > F = 0.0000
Residual | 7.75952035 10119 .000766827 R-squared = 0.4972
---------+------------------------------ Adj R-squared = 0.4971
Total | 15.4329048 10122 .001524689 Root MSE = .02769
------------------------------------------------------------------------------
s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1795768 .0019558 91.816 0.000 .175743 .1834107
ru | .0142779 .0006994 20.414 0.000 .012907 .0156489
sex | -.0082762 .0007914 -10.458 0.000 -.0098275 -.0067249
_cons | -.0026954 .0004037 -6.676 0.000 -.0034869 -.001904
------------------------------------------------------------------------------
. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg s Y ru sex, fe
Fixed-effects (within) regression Number of obs = 10123
Group variable (i) : id Number of groups = 500
R-sq: within = 0.4614 Obs per group: min = 6
between = 0.6455 avg = 20.2
overall = 0.4855 max = 47
F(2,9621) = 4120.97
corr(u_i, Xb) = 0.0939 Prob > F = 0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1793187 .0021088 85.035 0.000 .1751851 .1834523
ru | .0064123 .001264 5.073 0.000 .0039346 .0088899
sex | (dropped)
_cons | -.0022472 .0004157 -5.406 0.000 -.0030619 -.0014324
------------------------------------------------------------------------------
sigma_u | .01011922
sigma_e | .02691058
rho | .12388261 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(499,9621) = 2.19 Prob > F = 0.0000
. hausman, save
. * BETWEEN ESTIMATORS
. xtreg s Y ru sex, be
Between regression (regression on group means) Number of obs = 10123
Group variable (i) : id Number of groups = 500
R-sq: within = 0.4591 Obs per group: min = 6
between = 0.7096 avg = 20.2
overall = 0.4971 max = 47
F(3,496) = 404.06
sd(u_i + avg(e_i.))= .0088949 Prob > F = 0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .2020762 .007132 28.334 0.000 .1880636 .2160888
ru | .0167146 .0011768 14.203 0.000 .0144025 .0190268
sex | -.0080017 .0011612 -6.891 0.000 -.0102832 -.0057203
_cons | -.0069383 .0011596 -5.983 0.000 -.0092167 -.00466
------------------------------------------------------------------------------
. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg s Y ru sex, re
Random-effects GLS regression Number of obs = 10123
Group variable (i) : id Number of groups = 500
R-sq: within = 0.4602 Obs per group: min = 6
between = 0.7063 avg = 20.2
overall = 0.4969 max = 47
Random effects u_i ~ Gaussian Wald chi2(3) = 9324.37
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1785767 .0019929 89.608 0.000 .1746707 .1824826
ru | .0123244 .0008565 14.390 0.000 .0106458 .014003
sex | -.0081517 .001123 -7.259 0.000 -.0103527 -.0059508
_cons | -.0022302 .0005162 -4.320 0.000 -.0032419 -.0012184
---------+--------------------------------------------------------------------
sigma_u | .00597206
sigma_e | .02691058
rho | .04693787 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| Prior Current Difference S.E.
---------+-------------------------------------------------------------
Y | .1793187 .1785767 .0007421 .0006894
ru | .0064123 .0123244 -.0059121 .0009296
---------+-------------------------------------------------------------
b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test: Ho: difference in coefficients not systematic
chi2( 2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 56.15
Prob>chi2 = 0.0000
. * INSTRUMENTAL VARIABLES (1ST ROUND)
. regress s Y ru sex (PY QY Qru Psex)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 10123
---------+------------------------------ F( 3, 10119) = 3168.81
Model | 7.58349777 3 2.52783259 Prob > F = 0.0000
Residual | 7.84940703 10119 .00077571 R-squared = 0.4914
---------+------------------------------ Adj R-squared = 0.4912
Total | 15.4329048 10122 .001524689 Root MSE = .02785
------------------------------------------------------------------------------
s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1835144 .0020489 89.569 0.000 .1794983 .1875306
ru | .0067056 .0013071 5.130 0.000 .0041434 .0092678
sex | -.0084882 .0007966 -10.656 0.000 -.0100496 -.0069268
_cons | -.0016707 .0004326 -3.862 0.000 -.0025186 -.0008227
------------------------------------------------------------------------------
. predict r,res
. PQ r
. gen Prsq=Pr^2
. quietly by id: gen mark=_n
. *What does mark do? (see next regression)
. quietly by id: gen T=_N
. gen iT=1/T
. regress Prsq iT if mark==1
Source | SS df MS Number of obs = 500
---------+------------------------------ F( 1, 498) = 4.46
Model | 7.4757e-08 1 7.4757e-08 Prob > F = 0.0352
Residual | 8.3462e-06 498 1.6759e-08 R-squared = 0.0089
---------+------------------------------ Adj R-squared = 0.0069
Total | 8.4209e-06 499 1.6876e-08 Root MSE = .00013
------------------------------------------------------------------------------
Prsq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
iT | .0004526 .0002143 2.112 0.035 .0000316 .0008736
_cons | .0000653 .0000141 4.630 0.000 .0000376 .000093
------------------------------------------------------------------------------
. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace s=s-(1-theta)*Ps
(10078 real changes made)
. replace Y=Y-(1-theta)*PY
(10123 real changes made)
. replace sex=sex-(1-theta)*Psex
(1426 real changes made)
. replace ru=ru-(1-theta)*Pru
(6318 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress s Y ru sex theta (PY QY Qru Psex theta), noconstant
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 10123
---------+------------------------------ F( 4, 10119) = .
Model | 8.00935131 4 2.00233783 Prob > F = .
Residual | 7.19637615 10119 .000711175 R-squared = .
---------+------------------------------ Adj R-squared = .
Total | 15.2057275 10123 .001502097 Root MSE = .02667
------------------------------------------------------------------------------
s | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1809964 .0020571 87.988 0.000 .1769642 .1850287
ru | .0065199 .0012524 5.206 0.000 .004065 .0089748
sex | -.0080601 .0015735 -5.122 0.000 -.0111445 -.0049757
theta | -.0014172 .00067 -2.115 0.034 -.0027306 -.0001039
------------------------------------------------------------------------------
. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
iT _cons
y1 .00045259 .0000653
. summarize theta
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
theta | 10123 .4892002 .0797415 .3584998 .7321285
. clear
end of do-file
Appendix A: Hausman-Taylor Instrumental Variables in R
Here you can reproduce the results above using R. You should start running the functions PQ.R, tsls.R, and htiv.R available at the Econ 508 webpage (Routines, panel.R).:
"htiv" <-
function(x, y, id, d, z = NULL)
{
#input:
# x design matrix partitioned as given in d
# y response vector
# id strata indicator
# d list of column numbers indicating partitioning of x
# x[,d[[1]]] is x1 -- exogonous time varying vars
# x[,d[[2]]] is x2 -- endogonous time varying vars
# x[,d[[3]]] is z1 -- exogonous time invariant vars
# x[,d[[4]]] is z2 -- endogonous time invariant vars
# z may contain excluded exogonous variables if there are any
# NB. intercept is automatically included
x <- as.matrix(cbind(x, 1))
Tx <- PQ(x, id)
d[[3]] <- c(d[[3]], dim(x)[2])
Z <- cbind(z, Tx$Ph[, d[[1]]], Tx$Qh[, d[[1]]], Tx$Qh[, d[[2]]],
x[, d[[ 3]]])
r <- tsls(x, Z, y, int = F)$resid
Ti <- table(id)
Ti.inv <- 1/table(id)
rdot2 <- tapply(r, id, mean)^2
v <- lm(rdot2~Ti.inv)
v <- v$coef
theta <- as.vector(sqrt(v[2]/(v[2] + v[1] * Ti[Tx$is])))
x <- x - (1 - theta) * Tx$Ph
y <- y - (1 - theta) * PQ(y, id)$Ph
fit <- tsls(x, Z, y, int = F)
list(fit=fit,v=v)
}
"PQ" <-
function(h, id)
{
if(is.vector(h))
h <- matrix(h, ncol = 1)
Ph <- unique(id)
Ph <- cbind(Ph, table(id))
for(i in 1:ncol(h))
Ph <- cbind(Ph, tapply(h[, i], id, mean))
is <- tapply(id, id)
Ph <- Ph[is, - (1:2)]
Qh <- h - Ph
list(Ph=as.matrix(Ph), Qh=as.matrix(Qh), is=is)
}
"tsls" <-
function(x, z, y, int = T)
{
# two stage least squares
if(int){
x <- cbind(1, x)
z <- cbind(1, z)
}
xhat <- lm(x~z-1)$fitted.values
R <- lm(y ~ xhat -1)
R$residuals <- c(y - x %*% R$coef)
return(R)
}
Next you should prepare your data, and then to run the htiv.R. Here is a code for that:
#a simple analysis of wages
d <- read.table("data.05",header=TRUE)
n <- length(d[,1])
h <- as.matrix(d)
h <- cbind(h[3:n,],h[3:n,]-h[2:(n-1),],h[2:(n-1),]-h[1:(n-2),])#difference
h <- h[h[,10]==0,]#ignore obs whose first diff confounds people
h <- h[h[,19]==0,]#ignore obs whose second diff confounds people
h <- cbind(h[,1:9],h[,7:9]-h[,16:18],h[,7:9]-h[,16:18]-h[,25:27])
h <- cbind(h[,1:4],h[,2]-h[,3],(h[,2]-h[,3])^2,h[,5:15])
dimnames(h)[[2]] <- c("id","yr","phd","sex","exp","exp^2","rphd","ru",
"y","Y","s","y <- 1","Y_1","s_1",
"y <- 2","Y_2","s_2")
#h <- h[h[,1]<124,]
#fit productivity equation by Hausman Taylor Method
y <- log(h[,9])
x <- h[,c(12,15,5:7,3,4)]
x[,1:2] <- log(x[,1:2])
#x[,5] <- 1/x[,5]#rank of phd program
x[,6] <- as.numeric(x[,6]>60) #vintage effect of phd
x[,5] <- 1/(x[,3]*x[,5])#rank of phd program
id <- h[,1]
dimnames(x)[[2]] <- c("y","y_1","exp","exp2","rphd","phd","sex")
vl <- list(3:4,c(1:2,5),6:7,NULL)
v <- htiv(x,y,id,vl)
print("these are the variance component estimates")
print(v$v)
print(summary(v$fit))
References:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-7-18 12:35
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社