高山流水分享 http://blog.sciencenet.cn/u/friendpine 走在科学路上的一位无名侠客,只是静静的走着.........

博文

R中包含约束的线性回归

已有 12420 次阅读 2010-1-12 21:50 |个人分类:统计学与R语言学习|系统分类:科研笔记

有一个应变量Y和几个自变量X,现在需要对Y和X进行多元回归,但是对于X有约束,这个问题应该怎样做?
如果没有约束,那么可以直接用lm()函数进行,但是如果有约束该怎么办?
无约束的线性回归的原理在于使得实际值和估计值间的差值总和最小,如果有约束的话,自变量的系数取值空间是有限制的,这其实是对应规划问题.R中的约束线性规划可以通过以下两种方法实现:

1 limSolve()包中的lsei()函数,把线性回归转化成解多元线性方程组
lsei()函数解决的问题是:
AX-B=e (A是线性方程组的系数矩阵,B是应变量组成的向量,该方程组构成目标函数)
EX=F    (该等式是约束作用)
GX>=H  (该不等式也是约束作用的)
因为我的问题只是对各个自变量有大小限制,因此不需要等式约束,只需要构建不等数约束.不等式约束通过下面的语句实现:
G <- rbind(diag(N),(-1)*diag(N))
H <- c(各个自变量的大小范围)
然后直接输入到lsei()函数就可以了

2 使用quadprog包中的solve.QP()函数
这是把线性回归转化成二次规划问题(见http://zh.wikipedia.org/wiki/%E4%BA%8C%E6%AC%A1%E8%A7%84%E5%88%92)来解决,详细的可见下面的说明(转自Berwin A Turlach,http://finzi.psych.upenn.edu/Rhelp08/2008-March/155990.html)

library(MASS)   ## to access the Boston data
> designmat <- model.matrix(medv~., data=Boston)
> Dmat <- crossprod(designmat, designmat)
> dvec <- crossprod(designmat, Boston$medv)
> Amat <- cbind(1, diag(NROW(Dmat)))
> bvec <- c(1, rep(0,NROW(Dmat))
> meq <- 1
> library(quadprog)
> res <- solve.QP(Dmat, dvec, Amat, bvec, meq)
#我也不知道为什么Dmat系数矩阵和dvec向量要这么求(望高手指教),它们对应二次规划
f(x) = (1 / 2)xTQx + cTx
中的Q矩阵和C向量.
The solution seems to contain values that are, for all practical
purposes, actually zero:

> res$solution
[1] 4.535581e-16 2.661931e-18 1.016929e-01 -1.850699e-17
[5] 1.458219e-16 -3.892418e-15 8.544939e-01 0.000000e+00
[9] 2.410742e-16 2.905722e-17 -5.700600e-20 -4.227261e-17
[13] 4.381328e-02 -3.723065e-18

So the estimates seem to follow the constraints.

And the unconstrained solution is:

> res$unconstrainted.solution
[1] 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02
[5] 2.686734e+00 -1.776661e+01 3.809865e+00 6.922246e-04
[9] -1.475567e+00 3.060495e-01 -1.233459e-02 -9.527472e-01
[13] 9.311683e-03 -5.247584e-01

which seems to coincide with what lm() thinks it should be:

> coef(lm(medv~., Boston))
(Intercept) crim zn indus chas
3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00
nox rm age dis rad
-1.776661e+01 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01
tax ptratio black lstat
-1.233459e-02 -9.527472e-01 9.311683e-03 -5.247584e-01

So there seem to be no numeric problems. Otherwise we could have done
something else (e.g calculate the QR factorization of the design
matrix, say X, and give the R factor to solve.QP, instead of
calculating X'X and giving that one to solve.QP).

If the intercept is not supposed to be included in the set of
constrained estimates, then something like the following can be done:

> Amat[1,] <- 0
> res <- solve.QP(Dmat, dvec, Amat, bvec, meq)
> zapsmall(res$solution)
[1] 6.073972 0.000000 0.109124 0.000000 0.000000 0.000000 0.863421
[8] 0.000000 0.000000 0.000000 0.000000 0.000000 0.027455 0.000000

Of course, since after the first command in that last block the second
column of Amat contains only zeros
> Amat[,2]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
we might as well have removed it (and the corresponding entry in bvec)
> Amat <- Amat[, -2]
> bvec <- bvec[-2]
before calling solve.QP().

Note, the Boston data set was only used to illustrate how to fit such
models, I do not want to imply that these models are sensible for these
data. :-)
值得注意的是这里的不等式约束中的系数矩阵Amat是没有经过转置的矩阵,在用它表达
不等式约束的时候需要注意.

3 structural equation modeling的方法(转载自Mike W.L. Cheung,http://finzi.psych.upenn.edu/Rhelp08/2008-March/155972.html)

One approach is to use structural equation modeling (SEM). Some SEM
packages, such as LISREL, Mplus and Mx, allow inequality and nonlinear
constraints. Phantom variables (Rindskopf, 1984) may be used to impose
inequality constraints. Your model is basically:
y = b0 + b1*b1*x1 + b2*b2*x2 +...+ bp*bp*xp + e
1 = b1*b1 + b2*b2 +...+ bp*bp

Alternatively, you can set some condition bounds on the parameter
estimates. Then you only have to impose the second constraint.

Rindskopf, D. (1984). Using phantom and imaginary latent variables to
parameterize constraints in linear structural models. Psychometrika,
49, 37-47.
我也没有搞懂这种方法,望高手赐教


https://blog.sciencenet.cn/blog-54276-286639.html

上一篇:Linux中的alias命令
下一篇:confidence interval & prediction interval & tolerance interval
收藏 IP: .*| 热度|

0

发表评论 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-3-29 22:24

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部