|||
一 读纯文本文件
1.1 read.table读取表格形式的文件。
如houses.data,第一行是变量名,第一列是记录序号.
Price Floor Area Rooms Age Cent.heat
01 52.00 111.0 830 5 6.2 no
02 54.75 128.0 710 5 7.5 no
>d1=read.table("houses.data")
> dim(d1)
[1] 5 6
> str(d1)
'data.frame':5 obs. of 6 variables:
$ Price : num 52 54.8 57.5 57.5 59.8
$ Floor : num 111 128 101 131 93
$ Area : int 830 710 1000 690 900
$ Rooms : int 5 5 5 6 5
$ Age : num 6.2 7.5 4.2 8.8 1.9
$ Cent.heat: Factor w/ 2 levels "no","yes": 1 1 1 1 2
> row.names(d1)
[1] "01" "02" "03" "04" "05"
如果没有第一列记录序号,如
Price Floor Area Rooms Age Cent.heat
52.00 111.0 830 5 6.2 no
54.75 128.0 710 5 7.5 no
> d2=read.table("houses2.data",header=T)
> d1=read.table("houses.data")
> dim(d2)
[1] 5 6
> str(d2)
'data.frame':5 obs. of 6 variables:
$ Price : num 52 54.8 57.5 57.5 59.8
$ Floor : num 111 128 101 131 93
$ Area : int 830 710 1000 690 900
$ Rooms : int 5 5 5 6 5
$ Age : num 6.2 7.5 4.2 8.8 1.9
$ Cent.heat: Factor w/ 2 levels "no","yes": 1 1 1 1 2
> row.names(d2) #R会自动加上记录序号
[1] "1" "2" "3" "4" "5"
read.table的用法。
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)
file是要读入的文件名(及路径),sep是数据的分隔符,skip表示读入数据时跳过的行数。
1.2 scan直接读纯文本文件数据。
如文件scan_test.txt。
25 38 39 29 28 40
> s1=scan("scan_test.txt")
Read 6 items
> dim(s1)
NULL
> str(s1)
num [1:6] 25 38 39 29 28 40
> class(s1) #读入后成为一个向量
[1] "numeric"
文件h_w.data中第一列表示身高,第二列表示体重。
172.4 75.0 169.3 54.8 169.3 64.0 171.4 64.8 166.5 47.4
171.4 62.2 168.2 66.9 165.1 52.0 168.8 62.2 167.8 65.0
165.8 62.2 167.8 65.0 164.4 58.7 169.9 57.5 164.9 63.5
> s2=scan("h_w.data",list(height=0,weight=0))
Read 100 records
> dim(s2)
NULL
> str(s2)
List of 2
$ height: num [1:100] 172 169 169 171 166 ...
$ weight: num [1:100] 75 54.8 64 64.8 47.4 62.2 66.9 52 62.2 65 ...
> class(s2) #读入后成为一个列表对象
[1] "list"
可以将scan_test.txt的数据存放成矩阵形式。
> s3=matrix(scan("scan_test.txt",0),nrow=3,ncol=2,byrow=T)
Read 6 items
> s3
[,1] [,2]
[1,] 25 38
[2,] 39 29
[3,] 28 40
> dim(s3)
[1] 3 2
> str(s3)
num [1:3, 1:2] 25 39 28 38 29 40
> class(s3)
[1] "matrix"
scan用法.
scan(file = "", what = double(), nmax = -1, n = -1, sep = "",
quote = if(identical(sep, "\n")) "" else "'\"", dec = ".",
skip = 0, nlines = 0, na.strings = "NA",
flush = FALSE, fill = FALSE, strip.white = FALSE,
quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE,
comment.char = "", allowEscapes = FALSE,
fileEncoding = "", encoding = "unknown", text)
what可以指定一个列表,列表内容是要读取的数据类型。
二 读其他格式的数据文件
加载foreign包可以读取其他统计软件的数据。
read.spss()读spss文件;
read.xport()读SAS文件;
read.S()读S_PLUS文件;
read.dta()读Stata文件。
文本文件(制表符分隔)用read.delim读入,如
Col1Col2Col3Col4Col5
A23151
B6891
> r1=read.delim("educ_scores.txt")
> dim(r1)
[1] 8 5
> str(r1)
'data.frame':8 obs. of 5 variables:
$ Col1: Factor w/ 8 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8
$ Col2: int 2 6 5 9 11 12 1 7
$ Col3: int 3 8 2 4 10 15 4 3
$ Col4: int 15 9 7 3 2 1 12 4
$ Col5: int 1 1 0 1 0 0 1 0
CSV(逗号分隔)文件用read.csv读入。
如,用UE打开的CSV文件
Col1,Col2,Col3,Col4,Col5
A,2,3,15,1
B,6,8,9,1
> r2=read.csv("educ_scores.csv")
> dim(r2)
[1] 8 5
> str(r2)
'data.frame':8 obs. of 5 variables:
$ Col1: Factor w/ 8 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8
$ Col2: int 2 6 5 9 11 12 1 7
$ Col3: int 3 8 2 4 10 15 4 3
$ Col4: int 15 9 7 3 2 1 12 4
$ Col5: int 1 1 0 1 0 0 1 0
三 写数据文件
3.1 write(x, file = "data",
ncolumns = if(is.character(x)) 1 else 5,
append = FALSE, sep = " ")#append=F,写一个新文件,T在原文件上追加数据;
如一个矩阵x
> class(x)
[1] "matrix"
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> write(x)
默认输出的文件名是"data",内容如下
1 2 3 4 5
6 7 8 9 10
> write(t(x))
输出如下
2 4 6 8 10
3.2 write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
eol = "\n", na = "NA", dec = ".", row.names = TRUE,
col.names = TRUE, qmethod = c("escape", "double"),
fileEncoding = "") #write.csv()类似
数据框如下
> dd
Name Sex Age Height Weight
1 Alice F 13 56.5 84.0
2 Becka F 13 65.3 98.0
> str(dd)
'data.frame':19 obs. of 5 variables:
$ Name : Factor w/ 19 levels "Alfred","Alice",..: 2 3 5 10 11 12 15 16 17 1 ...
$ Sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 2 ...
$ Age : num 13 13 14 12 12 15 11 15 14 14 ...
$ Height: num 56.5 65.3 64.3 56.3 59.8 66.5 51.3 62.5 62.8 69 ...
$ Weight: num 84 98 90 77 84.5 ...
> write.table(dd,file="dd.txt")
输出的文件如下所示
"Name" "Sex" "Age" "Height" "Weight"
"1" "Alice" "F" 13 56.5 84
"2" "Becka" "F" 13 65.3 98
> write.csv(dd,file="dd.csv")
输出的CSV文件如下所示
"","Name","Sex","Age","Height","Weight"
"1","Alice","F",13,56.5,84
"2","Becka","F",13,65.3,98
四 访问数据库
最方便的是通过RODBC包。
dsn_test=odbcConnect("dsn_1",uid="scott",pwd="***") #建立一个连接dsn_test
emp1=sqlFetch(dsn_test,emp) #读取数据库中的表emp到一个数据框emp1中
q1=sqlQuery(dsn_test,"select * from emp") #提交一个查询并将结果返回为数据框q1
sqlDrop(dsn_test,"emp2") #删除数据库中的表emp2
close(dsn_test) #关闭连接dsn_test
一些针对某些特定数据库的包,如RMySQL、ROracle 、teradataR等。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-22 09:54
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社