vesperlight的个人博客分享 http://blog.sciencenet.cn/u/vesperlight

博文

R包:stringr

已有 7117 次阅读 2015-12-3 23:23 |个人分类:R|系统分类:科研笔记| R语言, 字符串, STR, R包, stringr

R包:stringr

作者:Hadley Wickham

用途:用于处理字符串

 

一、基础函数

1str_to_upper(string)

将字符串中的字母转化为大写字母

示例:

fruit <- c("apple","banana","pear","pinapple")

str_to_upper(fruit)

输出结果

"APPLE"    "BANANA"   "PEAR"     "PINAPPLE"

 

2str_to_lower(string)

将字符串中的字母转化为小写字母

示例:

fruit <- c("APPLE","BANANA", "PEAR", "PINAPPLE")

str_to_lower(fruit)

输出结果:"apple"    "banana"   "pear"     "pinapple"

 

3str_to_tittle(string)

将字符串中的首字母转化为大写字母

示例:

fruit <- c("apple","banana","pear","pinapple")

str_to_tittle(fruit)

输出结果:"Apple"    "Banana"   "Pear"     "Pinapple"

 

4str_c函数,用于联结字符串

str_c(string,sep=””, collapse=””)sep 决定字符之间用什么分隔符分隔开,collapse决定用什么字符分隔已合并的字符串。

示例:

fruit <- c("apple","banana","pear","pinapple")

res <-str_c(1:4,fruit,sep=” ”,collapse=” ”)

输出结果:

“1 apple 2 banana3 pear 4 pinapple”

1 apple之间分隔符由sep的参数决定

Apple 2 之间的分隔符由collapse的参数决定

 

 

5str_length(string)

字符串的长度统计

示例:

str_length(fruit)

输出结果:

5 6 4 8

 

6str_count(string,pattern = "")

统计字符串中含有某类字符的数目

示例:

str_count(fruit,a)

输出结果:

1 3 1 1

 

7str_sub(string,start = 1L, end = -1L)

按位置提取子字符串,start为起始位置,end为终止位置

str_sub(fruit,1,3)

输出结果:

"app""ban" "pea" "pin"

 

8str_dup(string,times)

重复字符串,times为每个字符串的重复次数

str_dup(fruit1:4)

输出结果:

"apple""bananabanana" "pearpearpear" "pinapplepinapplepinapplepinapple"

 

9str_trim(string,side = c("both", "left", "right"))

去除字符串一段或两端的空白,side参数选择去除哪一端的空白

str_trim("  String with trailing and leading whitespace")

输出结果:

"String withtrailing and leading white space"

 

10str_pad(string,width, side = c("left", "right", "both"), pad =" ")

在字符串中加入填补,width为加上填补后的总字符串长度,side为填补加入位置,pad为填补的单个字符类型

示例:

str_pad(fruit,5,side=”both”,pad=”-”)

输出结果:

"--apple---""--banana--" "---pear---" "-pinapple-"

 

二、匹配函数

1str_detect(string,pattern)

检测字符串中是否含有某元素,含有返回TURE,反之False

示例:

str_detect(fruit,“ n”)

输出结果:

FALSE  TRUE FALSE TRUE

 

2str_locate(string,pattern)

查找某个pattern所在位置,查找多patternstr_locate_all

示例

str_locate(fruit,“an”)

输出结果:

    start end

[1,]    NA NA

[2,]     2  3

[3,]    NA NA

[4,]    NA NA

 

3str_extract

提取包含某一类元素的字符,提取全部匹配的使用str_extract_all

示例:

str_extract(fruit,"[a-k]+")

输出结果:

"a"  "ba" "ea" "i"

 

4str_match

匹配符合某种格式的字符串,并提取出来

示例:

strings <-c(" 219 733 8965", "329-293-8753 ", "banana","595 794 7569", "387 287 6718", "apple","233.398.9187  ", "482 9523315", "239 923 8115 and 842 566 4692", "Work:579-499-7527", "$1000", "Home: 543.355.3679")

phone <-"([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"

str_extract(strings,phone)

输出结果:

"219 7338965" "329-293-8753" NA "595 794 7569" "387 2876718" NA "233.398.9187" "482 952 3315" "239 9238115" "579-499-7527" NA "543.355.3679"

 

5str_replace(string,pattern, replacement)

replacement替换string中第一个遇到的含有的某种patternstring中全部替换用str_replace_all

示例:

fruits <-c("one apple", "two pears", "three bananas")

str_replace(fruits,"[aeiou]", "-")

输出结果:

"-neapple"     "tw-pears"     "thr-e bananas"

示例:

str_replace_all(fruits,"[aeiou]", "-")

输出结果:

"-n--ppl-"     "tw-p--rs"     "thr-- b-n-n-s"

 

 

6str_split_fixed(string,pattern ,n)

去掉字符串中某些元素, pattern为去掉的元素,n为返回多少个部分

示例:

fruits <- c("applesand oranges and pears and bananas", "pineapples and mangos andguavas")

str_split_fixed(fruit,”and ”,n=5)

输出结果:

    [,1]         [,2]      [,3]    [,4]      [,5]

[1,]"apples"    "oranges" "pears" "bananas" ""  

[2,]"pineapples" "mangos" "guavas" ""       ""

 

 

参考文献

https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf

https://cran.r-project.org/web/packages/stringr/stringr.pdf

 




https://blog.sciencenet.cn/blog-2379401-940888.html

上一篇:R语言统计:偏最小二乘路径模型(plspm)
下一篇:R语言笔记(3):语法(1)
收藏 IP: 119.78.81.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-19 01:34

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部