|||
多个自变量,如何确定哪个更重要?
梅卫平
Basic knowledge worth spreading!
这篇文章有答案 http://mp.weixin.qq.com/s/mqUFwAuE1oHpBVEjgs3odg
如题,举例比如y~x1,x2…,其中 x1 和 x2 哪个对 y 的影响更大?
1. 比较相关系数r或决 定系数R^2的大小? 错,系数大小和自变量的重要性没有必然关系。Correlation doesNOT tell anything about the effect of y (independent variable) and x (dependent variable) [1].
补充:如果纯粹只是想比较“系数大小差异是否显著”的话,可以用R语言里的package cocor |
2. 比较显著 性p-value? 错,统计学的显著结果可能实际并不那么显著的重要[2]。 |
3. 比较 Standardized regre ssion coefficients 貌似可行[3,4],不知对否? 貌似可以用这个方法,不知对否???恳请批评指正!!! 计算 方法可以使用 R packagere relaimpo, 具体说明[5]和 操作指南[6]见参考文献。 |
范例R代码如下:
library(relaimpo)
data(“swiss”)
cor(swiss)
linmod <- lm(Fertility ~ ., data = swiss)
summary(linmod)
metrics <- calc.relimp(linmod, type = c("lmg", "first", "last","betasq", "pratt","genizi","car"), rela= TRUE)
# type 不推荐"first"(因为可能会把无显著性的自变量也分配较高的contribution),推荐使用"last", "betasq","pratt" 等
# rela=TRUE 表示将各自变量的contribution的总和设置为100%
Details lmg is the R^2 contribution averaged over orderings among regressors, cf. e.g. Lindeman, Merenda and Gold 1980, p.119ff or Chevan and Sutherland (1991). pmvd is the proportional marginal variance decomposition as proposed by Feldman (2005) (non-US version only). It can be interpreted as a weighted average over orderings among regressors, with data-dependent weights. last is each variables contribution when included last, also sometimes called "usefulness". first is each variables contribution when included first, which is just the squared covariance between y and the variable. betasq is the squared standardized coefficient. pratt is the product of the standardized coefficient and the correlation. genizi is the R^2 decomposition according to Genizi 1993 car is the R^2 decomposition according to Zuber and Strimmer 2010, also available from package care (squares of scores produced by function carscore |
metrics
plot(metrics)
metrics01 <- calc.relimp(linmod, type = "betasq", rela = TRUE)
metrics01
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
betasq
Agriculture 0.12911291
Examination 0.03580132
Education 0.59260934
Catholic 0.15931588
Infant.Mortality 0.08316055
Relative importance(betasq方法)从大到小排序 | |||||
相对重要性排序 | Education | Catholic | Agriculture | Infant.Mortality | Examination |
Fertility | 0.59260934 | 0.15931588 | 0.12911291 | 0.08316055 | 0.03580132 |
注:相对重要性的各自变量顺序,不同于显著性或相关性大小顺序。 | |||||
显著性排序 | Education | Examination | Catholic | Infant.Mortality | Agriculture |
Fertility | 3.659e-07 | 9.45e-07 | 0.001029
| 0.003585
| 0.01492
|
相关性排序 | Education | Examination | Catholic | Infant.Mortality | Agriculture |
Fertility | -0.66378886 | -0.6458827 | 0.4636847 | 0.41655603 | 0.35307918 |
plot(metrics01)
参考文献
[6] https://cran.r-project.org/web/pack ages/relaimpo/relaimpo.pdf
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-21 21:37
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社