R语言中字符串的处理(2/3)

您所在的位置:网站首页 execle截取指定字符前后 R语言中字符串的处理(2/3)

R语言中字符串的处理(2/3)

2023-10-26 20:17| 来源: 网络整理| 查看: 265

本文转自微信公众号: 一遇之见 的 大作 R中字符串处理:函数实现 。原文太长,分三次学习、消化。

字符串分割函数:strsplit,str_split和str_split_fixed

函数strsplit,str_split和str_split_fixed均可实现字符串的分割,但strsplit和str_split返回结果为列表,而str_split_fixed返回结果为矩阵。

fruits = c("Small Yellow Banana", " Red Apple", "Big Sweet Pear ", "Sour PineApple") strsplit(fruits, " ") # [[1]] # [1] "Small" "Yellow" "Banana" # [[2]] # [1] "" "Red" "Apple" # [[3]] # [1] "Big" "Sweet" "Pear" "" #其实这里是两个空格 # [[4]] # [1] "Sour" "PineApple" library(stringr) str_split(fruits, " ") # [[1]] # [1] "Small" "Yellow" "Banana" # [[2]] # [1] "" "Red" "Apple" # [[3]] # [1] "Big" "Sweet" "Pear" "" "" #这个函数识别出了两个空格 # [[4]] # [1] "Sour" "PineApple" str_split_fixed(fruits, " ", n = 3) # [,1] [,2] [,3] # [1,] "Small" "Yellow" "Banana" # [2,] "" "Red" "Apple" # [3,] "Big" "Sweet" "Pear " # [4,] "Sour" "PineApple" ""

函数unlist可将函数strsplit和str_split返回结果列表转化为向量。

unlist(strsplit(fruits, " ")) # [1] "Small" "Yellow" "Banana" "" "Red" "Apple" # [7] "Big" "Sweet" "Pear" "" "Sour" "PineApple" unlist(str_split(author, " ")) unlist(str_split(fruits, " ")) # [1] "Small" "Yellow" "Banana" "" "Red" "Apple" # [7] "Big" "Sweet" "Pear" "" "" "Sour" # [13] "PineApple"

三个字符串分割函数中,str_split_fixed的返回结果为数据框,方便对后期结果的引用。此外,函数str_split和str_split_fixed中都有参数n,但str_split中的参数可设置也可不设置,函数返回结果依旧是列表;str_split_fixed中参数n必须设置。其中参数n小于最大分割个数时,后面的不再分隔;参数n超过最大分割数时,后面内容为空。

str_split(fruits, " ", n = 2) # [[1]] # [1] "Small" "Yellow Banana" # [[2]] # [1] "" "Red Apple" # [[3]] # [1] "Big" "Sweet Pear " # [[4]] # [1] "Sour" "PineApple" str_split(fruits, " ", n = 5) # [[1]] # [1] "Small" "Yellow" "Banana" # [[2]] # [1] "" "Red" "Apple" # [[3]] # [1] "Big" "Sweet" "Pear" "" "" # [[4]] # [1] "Sour" "PineApple" str_split_fixed(fruits, " ", n = 3) # [,1] [,2] [,3] # [1,] "Small" "Yellow" "Banana" # [2,] "" "Red" "Apple" # [3,] "Big" "Sweet" "Pear " # [4,] "Sour" "PineApple" "" str_split_fixed(fruits, " ", n = 2) # [,1] [,2] # [1,] "Small" "Yellow Banana" # [2,] "" "Red Apple" # [3,] "Big" "Sweet Pear " # [4,] "Sour" "PineApple" str_split_fixed(fruits, " ", n = 5) # [,1] [,2] [,3] [,4] [,5] # [1,] "Small" "Yellow" "Banana" "" "" # [2,] "" "Red" "Apple" "" "" # [3,] "Big" "Sweet" "Pear" "" "" # [4,] "Sour" "PineApple" "" "" "" 字符串提取

函数substr(x, start,stop):对字符串x截取从start到stop的子字符串。

函数substring(text,first, last = 1000000L):对字符串text截取从first到last的子字符串,last默认值为1000000,可以不传参。

str_sub(x, start = 1L, end = -1L):对字符串x截取从start到end的子字符串,start和end有默认值,可以不传参。

txt 1 and only the first element will be used str_replace(c("HacHgd", "aeHfgH", "defg"), c("H","a"), c("I", "b")) ## [1] "IacHgd" "beHfgH" "defg" ## Warning in stri_replace_first_regex(string, pattern, ## fix_replacement(replacement), : longer object length is not a multiple of ## shorter object length gsub(c("H","a"), c("I", "b"),c("HacHgd", "aeHfgH", "defg")) ## [1] "IacIgd" "aeIfgI" "defg" ## Warning in gsub(c("H", "a"), c("I", "b"), c("HacHgd", "aeHfgH", "defg")): ## argument 'pattern' has length > 1 and only the first element will be used ## Warning in gsub(c("H", "a"), c("I", "b"), c("HacHgd", "aeHfgH", "defg")): ## argument 'replacement' has length > 1 and only the first element will be used str_replace_all(c("HacHgd", "aeHfgH", "defg"), c("H","a"), c("I", "b")) ## [1] "IacIgd" "beHfgH" "defg" ## Warning in stri_replace_all_regex(string, pattern, ## fix_replacement(replacement), : longer object length is not a multiple of ## shorter object length str_replace(c("HacHgd", "aeHfgH", "defg"), c("H","a","g", "d"), c("I", "b","H","e")) #此时返回结果长度为4 ## [1] "IacHgd" "beHfgH" "defH" "HacHge" ## Warning in stri_replace_first_regex(string, pattern, ## fix_replacement(replacement), : longer object length is not a multiple of ## shorter object length str_replace_all(c("HacHgd", "aeHfgH", "defg"), c("H","a","g", "d"), c("I", "b","H","e"))#此时返回结果长度为4 ## [1] "IacIgd" "beHfgH" "defH" "HacHge" ## Warning in stri_replace_all_regex(string, pattern, ## fix_replacement(replacement), : longer object length is not a multiple of ## shorter object length

此外,函数str_repalce_all还可以实现多个字符串的同时替换(str_replac没有此功能)。

y = c(c("I", "b")) names(y) = c("H","a") str_replace_all(c("HacHgd", "aeHfgH", "defg"),y) ## [1] "IbcIgd" "beIfgI" "defg"

针对函数str_repalce_all的多个字符串的同时替换功能,有时会出现意想不到的结果,而mgsub::mgsub可以产生另外一种结果。

y = c(c("a", "H")) names(y) = c("H","a") str_replace_all(c("HacHgd", "aeHfgH", "defg"),y) ## [1] "HHcHgd" "HeHfgH" "defg" mgsub::mgsub(c("HacHgd", "aeHfgH", "defg"), c("H","a"),c(c("a", "H"))) ## [1] "aHcagd" "Heafga" "defg"


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3