|||
怎样用R语言处理表格导入数据中的缺省值
熊荣川
中国科学院成都生物研究所
在收集数据阶段我们经常使用各种表格,如微软的Excel表格软件,然后在数据分析阶段我们常常需要把这些表格数据导入分析软件,如R平台。
通常R平台导入的表格为csv格式表格数据,使用Excel的另存为功能,在保存格式中选择csv即可。
另外表格通常由行和列组成,然后我们收集的数据往往“不整齐”,及行与行之间(或是列与列之间)所包含的数据个数不一致。这样在导入R平台后,那些数据量小的行(或是列)就出现了缺省值。这些缺省值如果不去掉,通常会给数据的运算和分析带来麻烦。
下面我就来演示一下怎么处理这些缺省值
案例二,从表格输入数据
解决方案二, 待更新 |
||
> |
data<-read.csv("D:\ziliao\zhuanye\R bear\harmo.csv") |
导入表格harmo.csv数据,存在矩阵data中 |
> |
data |
查看data空间 |
结果 |
F3 F4 F5 1 5446.635 5858.746 6843.617 2 6654.305 7005.958 6924.623 3 7172.323 7169.585 6940.106 4 7311.956 7525.422 6977.832 5 8058.262 11214.798 7580.775 6 8116.698 11505.268 7915.856 7 8312.447 11544.472 8472.229 8 8667.151 11810.009 9492.205 9 9093.771 11881.491 9934.308 10 9103.865 11908.340 10787.881 11 9215.153 11909.480 10882.602 12 9461.068 12028.218 11161.032 13 9625.699 12112.602 11307.906 14 9896.211 12115.175 11497.302 15 9912.590 12174.058 11696.347 16 9942.585 12524.577 11800.720 17 10049.769 12533.796 11944.833 18 10120.347 12778.348 12439.379 19 10153.751 12862.208 13929.031 20 10160.672 12894.624 14691.925 21 10365.499 13010.052 14794.472 22 10393.181 13101.747 15368.400 23 10482.846 13188.517 15461.178 24 10583.587 13189.188 15558.937 25 10591.209 13243.095 15559.645 26 10640.676 13300.190 15623.761 27 10675.679 13664.360 15657.390 28 10723.670 14070.819 15714.827 29 10729.272 14155.309 15746.066 30 10900.465 14284.890 15756.113 31 10945.887 14287.759 15777.929 32 10989.468 14297.971 15793.488 33 11042.734 14318.933 15815.089 34 11095.552 14354.718 15817.700 35 11199.642 14530.077 15818.295 36 11200.800 14681.983 15860.937 37 11211.056 14685.578 16003.928 38 11243.513 14766.872 16013.536 39 11324.318 14817.897 16015.512 40 11394.809 14821.364 16052.696 41 11429.242 15015.235 16059.643 42 11514.232 15043.056 16067.201 43 11515.162 15236.188 16069.255 44 11547.771 15280.614 16079.015 45 11576.145 15549.818 16079.814 46 11671.397 15640.638 16081.685 47 11747.088 15746.188 16120.205 48 11775.793 15808.372 16134.122 49 12207.104 16510.259 16140.042 50 12921.229 NA 16151.293 51 12925.543 NA 16152.269 52 NA NA 16184.463 53 NA NA 16230.646 54 NA NA 16243.203 55 NA NA 16251.307 56 NA NA 16337.266 57 NA NA 16360.895 58 NA NA 16367.404 59 NA NA 16406.798 60 NA NA 16709.307 61 NA NA 16744.672 62 NA NA 16860.826 63 NA NA 16957.370 64 NA NA 16975.136 65 NA NA 17054.482 66 NA NA 17193.373 67 NA NA 18040.702 68 NA NA 19188.307
|
三列,由于每列数据量的不同,所以出现缺省值 |
> |
tem <- data[,1] |
data数据第一列赋值给临时向量tem |
> |
x <- tem[!is.na(tem)] |
tem中非缺省值赋值给x |
> |
x |
查看x向量空间 |
|
[1] 5446.635 6654.305 7172.323 7311.956 8058.262 8116.698 8312.447 [8] 8667.151 9093.771 9103.865 9215.153 9461.068 9625.699 9896.211 [15] 9912.590 9942.585 10049.769 10120.347 10153.751 10160.672 10365.499 [22] 10393.181 10482.846 10583.587 10591.209 10640.676 10675.679 10723.670 [29] 10729.272 10900.465 10945.887 10989.468 11042.734 11095.552 11199.642 [36] 11200.800 11211.056 11243.513 11324.318 11394.809 11429.242 11514.232 [43] 11515.162 11547.771 11576.145 11671.397 11747.088 11775.793 12207.104 [50] 12921.229 12925.543
|
|
> |
tem <- data[,2] |
data数据第2列赋值给临时向量tem |
> |
y <- tem[!is.na(tem)] |
tem中非缺省值赋值给y |
> |
y |
查看y向量空间 |
|
[1] 5858.746 7005.958 7169.585 7525.422 11214.798 11505.268 11544.472 [8] 11810.009 11881.491 11908.340 11909.480 12028.218 12112.602 12115.175 [15] 12174.058 12524.577 12533.796 12778.348 12862.208 12894.624 13010.052 [22] 13101.747 13188.517 13189.188 13243.095 13300.190 13664.360 14070.819 [29] 14155.309 14284.890 14287.759 14297.971 14318.933 14354.718 14530.077 [36] 14681.983 14685.578 14766.872 14817.897 14821.364 15015.235 15043.056 [43] 15236.188 15280.614 15549.818 15640.638 15746.188 15808.372 16510.259
|
查询结果 |
> |
tem <- data[,3] |
data数据第3列赋值给临时向量tem |
> |
z <- tem[!is.na(tem)] |
tem中非缺省值赋值给z |
> |
z |
查看z向量空间 |
|
[1] 6843.617 6924.623 6940.106 6977.832 7580.775 7915.856 8472.229 [8] 9492.205 9934.308 10787.881 10882.602 11161.032 11307.906 11497.302 [15] 11696.347 11800.720 11944.833 12439.379 13929.031 14691.925 14794.472 [22] 15368.400 15461.178 15558.937 15559.645 15623.761 15657.390 15714.827 [29] 15746.066 15756.113 15777.929 15793.488 15815.089 15817.700 15818.295 [36] 15860.937 16003.928 16013.536 16015.512 16052.696 16059.643 16067.201 [43] 16069.255 16079.015 16079.814 16081.685 16120.205 16134.122 16140.042 [50] 16151.293 16152.269 16184.463 16230.646 16243.203 16251.307 16337.266 [57] 16360.895 16367.404 16406.798 16709.307 16744.672 16860.826 16957.370 [64] 16975.136 17054.482 17193.373 18040.702 19188.307
|
查询结果 |
如上例所示,缺省去掉了。祝您科研愉快。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-8 04:36
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社