|||
R包:stringr
作者:Hadley Wickham
用途:用于处理字符串
一、基础函数
1、str_to_upper(string)
将字符串中的字母转化为大写字母
示例:
fruit <- c("apple","banana","pear","pinapple")
str_to_upper(fruit)
输出结果
"APPLE" "BANANA" "PEAR" "PINAPPLE"
2、str_to_lower(string)
将字符串中的字母转化为小写字母
示例:
fruit <- c("APPLE","BANANA", "PEAR", "PINAPPLE")
str_to_lower(fruit)
输出结果:"apple" "banana" "pear" "pinapple"
3、str_to_tittle(string)
将字符串中的首字母转化为大写字母
示例:
fruit <- c("apple","banana","pear","pinapple")
str_to_tittle(fruit)
输出结果:"Apple" "Banana" "Pear" "Pinapple"
4、str_c函数,用于联结字符串
str_c(string,sep=””, collapse=””),sep 决定字符之间用什么分隔符分隔开,collapse决定用什么字符分隔已合并的字符串。
示例:
fruit <- c("apple","banana","pear","pinapple")
res <-str_c(1:4,fruit,sep=” ”,collapse=” ”)
输出结果:
“1 apple 2 banana3 pear 4 pinapple”
1 apple之间分隔符由sep的参数决定
Apple 2 之间的分隔符由collapse的参数决定
5、str_length(string)
字符串的长度统计
示例:
str_length(fruit)
输出结果:
5 6 4 8
6、str_count(string,pattern = "")
统计字符串中含有某类字符的数目
示例:
str_count(fruit,a)
输出结果:
1 3 1 1
7、str_sub(string,start = 1L, end = -1L)
按位置提取子字符串,start为起始位置,end为终止位置
str_sub(fruit,1,3)
输出结果:
"app""ban" "pea" "pin"
8、str_dup(string,times)
重复字符串,times为每个字符串的重复次数
str_dup(fruit,1:4)
输出结果:
"apple""bananabanana" "pearpearpear" "pinapplepinapplepinapplepinapple"
9、str_trim(string,side = c("both", "left", "right"))
去除字符串一段或两端的空白,side参数选择去除哪一端的空白
str_trim(" String with trailing and leading whitespace")
输出结果:
"String withtrailing and leading white space"
10、str_pad(string,width, side = c("left", "right", "both"), pad =" ")
在字符串中加入填补,width为加上填补后的总字符串长度,side为填补加入位置,pad为填补的单个字符类型
示例:
str_pad(fruit,5,side=”both”,pad=”-”)
输出结果:
"--apple---""--banana--" "---pear---" "-pinapple-"
二、匹配函数
1、str_detect(string,pattern)
检测字符串中是否含有某元素,含有返回TURE,反之False
示例:
str_detect(fruit,“ n”)
输出结果:
FALSE TRUE FALSE TRUE
2、str_locate(string,pattern)
查找某个pattern所在位置,查找多pattern用str_locate_all
示例
str_locate(fruit,“an”)
输出结果:
start end
[1,] NA NA
[2,] 2 3
[3,] NA NA
[4,] NA NA
3、str_extract
提取包含某一类元素的字符,提取全部匹配的使用str_extract_all
示例:
str_extract(fruit,"[a-k]+")
输出结果:
"a" "ba" "ea" "i"
4、str_match
匹配符合某种格式的字符串,并提取出来
示例:
strings <-c(" 219 733 8965", "329-293-8753 ", "banana","595 794 7569", "387 287 6718", "apple","233.398.9187 ", "482 9523315", "239 923 8115 and 842 566 4692", "Work:579-499-7527", "$1000", "Home: 543.355.3679")
phone <-"([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
str_extract(strings,phone)
输出结果:
"219 7338965" "329-293-8753" NA "595 794 7569" "387 2876718" NA "233.398.9187" "482 952 3315" "239 9238115" "579-499-7527" NA "543.355.3679"
5、str_replace(string,pattern, replacement)
用replacement替换string中第一个遇到的含有的某种pattern,string中全部替换用str_replace_all
示例:
fruits <-c("one apple", "two pears", "three bananas")
str_replace(fruits,"[aeiou]", "-")
输出结果:
"-neapple" "tw-pears" "thr-e bananas"
示例:
str_replace_all(fruits,"[aeiou]", "-")
输出结果:
"-n--ppl-" "tw-p--rs" "thr-- b-n-n-s"
6、str_split_fixed(string,pattern ,n)
去掉字符串中某些元素, pattern为去掉的元素,n为返回多少个部分
示例:
fruits <- c("applesand oranges and pears and bananas", "pineapples and mangos andguavas")
str_split_fixed(fruit,”and ”,n=5)
输出结果:
[,1] [,2] [,3] [,4] [,5]
[1,]"apples" "oranges" "pears" "bananas" ""
[2,]"pineapples" "mangos" "guavas" "" ""
参考文献
https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf
https://cran.r-project.org/web/packages/stringr/stringr.pdf
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-5 09:16
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社