package :: stringr

Published by onesixx on

개요

문자열 처리 방법, stringi을 기반으로 한 string manipulation functions

특징

1. factor와 character를 같은 방식으로 처리

2. 연관성 있는 함수명과 인수
– stringr의 모든 함수는 str_ 로 시작
– 첫번째 인수는 항상 string 벡터이기 때문에 pipe(%>%) 사용이 쉽다.
– 다른 함수의 입력값으로 사용하기 편리한 출력값. 길이 0인 입력값에 대해 길이 0인 결과를 돌려줌
– 입력값 NA가 포함되어 있을 때는 그 부분의 결과를 NA로 돌려줌

letters %>% .[1:10] %>% str_pad(3, "right") %>%  str_c(letters[2:11]) 
# [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k" 

3. 사용빈도가 떨어지는 문자열 조작 처리를 과감하게 제거하여 간략화시킴

paste("Hello", c("Jared", "Bob", "David"), c("Goodbye", "Seeya"))
#[1] "Hello Jared Goodbye" "Hello Bob Seeya"     "Hello David Goodbye" 
 waitTime <- 25 sprintf("Hello %s, your party of %s will be seated in %s minutes",
         c("Jared", "Bob"), c("eight", 16, "four", 10), waitTime) 
#[1] "Hello Jared, your party of eight will be seated in 25 minutes" 
#[2] "Hello Bob, your party of 16 will be seated in 25 minutes"      
#[3] "Hello Jared, your party of four will be seated in 25 minutes"  
#[4] "Hello Bob, your party of 10 will be seated in 25 minutes"

Sample String 

> sentences
  [1] "The birch canoe slid on the smooth planks."                "Glue the sheet to the dark blue background."              
  [3] "It's easy to tell the depth of a well."                    ...
[719] "She called his name many times."                           "When you hear the bell, come quickly."                    

> fruit
 [1] "apple"             "apricot"           "avocado"           "banana"            ...
[78] "tangerine"         "ugli fruit"        "watermelon"       

> words
  [1] "a"           "able"        "about"       "absolute"    "accept"      "account"     ...
[980] "young"      

 pattern matching engines

modifier함수를 이용하여, 매칭방법에 활용

fixed()match exact bytes
coll()match human letters
boundary()match boundaries
regex()
ignore.case()

예제>

bananas <- c("banana", "Banana", "BANANA")
fruit   <- c("apple", "banana", "pear", "pineapple", "사과")
patr <- "\\\\w{6}$"

str_count(fruit, patr)
str_detect(fruit, patr)

str_extract_all(fruit, patr)
str_match_all(fruit, patr)

str_locate_all(fruit, patr)

String Manuplation

str_*(string, ...)

stringr설명Base function
str_lengthstring의 길이

str_length(string)

nchar()
str_c여러 string을 하나의 string으로 Concatenate

str_c (str, sep='', collapse=NULL)

- sep은 각 string간 seperator,
- collapse는 하나의 character vector간 구분자.

paste()
paste0()
str_substing에서 일부substrings 를 Extract
string에서 일부substrings 를 Replace

str_sub(string, start=1L, end=-1L)
str_sub(string, start=1L, end=-1L, omit_na=FALSE) <- value

str_sub(str, start=1, end=6)
substr()

str_sub( , start, end)

> "123456" %>% str_sub( start=2,  end=5)
[1] "2345"
> "123456" %>% str_sub( 2,  2)
[1] "2"
> 
> "123456" %>% str_sub(start= 2,)   #end는 끝까지
[1] "23456"
> "123456" %>% str_sub(start=-2,)
[1] "56"
> 
> "123456" %>% str_sub(end= 2)   #start는 맨처음
[1] "12"
> "123456" %>% str_sub(end=-2)
[1] "12345"
> 
> # negarive 값은 출발점 끝점 구할때 시작점과 관련이 있지만, 
> # 읽어나가는 방향은 일정하게, 왼쪽에서 오른쪽. 
> "123456" %>% str_sub(-5, 3)  
[1] "23"
> "123456" %>% str_sub( 5, -3) 
[1] ""

cf. glue()

date <- "20150205" %>% ymd()
year <- year(date)
url <- glue("http://cran-logs.rstudio.com/{year}/{date}.csv.gz")

# http://cran-logs.rstudio.com/2015/2015-02-05.csv.gz

Pattern matching

RegEx
https://rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

str_* (string, pattern = " ")

stringr설명결과Base function
str_detect
Detect the presence/absence of a pattern
in a string.
=> Keep strings matching a pattern, or
find positions
대소문자구분
dd[ str_detect(name,"(?i)korea"),]

T/F

grepl(pattern, x)
str_subsetwrapper around 
x[str_detect(x, pattern)]
Vectorgrep(pattern, x, value=T)
str_whichwrapper around 
str_detect(x, pattern) %>% which()
idxgrep(pattern, x)
str_countCount the number of matches in a string.
str_length와 비슷하지만, pattern을 줄수 있다.
Vector
str_extract
str_extract_all
Extract matching patterns from a stringvector
str_match
str_match_all
Extract matched groups from a string
매치된 부분 문자열을 추출하고
참조를 행렬로 돌려줌.
1열에,  str_extract(string, pattern)의 결과를
2열 이후에,  각 괄호에 매치된 이후의 결과를
보여줌
matrix
str_locate
str_locate_all
Locate the position of patterns in a string.start, end
str_replace
str_replace_all
Replace matched patterns in a string.sub(pattern, replacement, x)
gsub()
str_replace_naTurn NA into "NA"
str_split
str_split_fixed
Split up a string into pieces.
최대 n 개의 분할을 지정할 수 있음.
list strsplit(x, pattern)
str_view
str_view_all
View HTML rendering of
regular expression match.
> str_replace_all("sixx123", "[[:digit:]]","")
[1] "sixx"
> str_replace("sixx123", "[[:digit:]]","")
[1] "sixx23"
> str_replace_all("sixx123", "[^[:digit:]]","")
[1] "123"

> gsub("[[:digit:]]","","sixx123")
[1] "sixx"
> gsub("[^[:digit:]]", "", "sixx123")
[1] "123"

Formatting (Whitespace)

stringr설명Base function
str_padPad a string.
폭을 width 만큼 늘려서 side를 기준으로
공백을 pad에 지정된 문자로 채워넣음
str_pad(string, width, side="left", pad=" ")
str_truncTruncate a character string.
폭을 width 만큼 남기고,  side를 기준으로 
ellipsis을 채워넣음
str_trunc(string, width,
side = c("right", "left", "center"),
ellipsis = "...")
str_trimTrim whitespace from a string.str_trim(string, side="left|right|both")
str_squish공백 제거
str_wrap지정한 폭으로 줄바꿈.
indent는 선두행의 왼쪽 여백,
exdent는 그 이외 행의 왼쪽여백.
str_wrap(string, width=80,
indent=0, exdent=0)

Locale sensitive

 stringr설명
 str_order
 str_sort
Order or sort a character vector.
 str_to_upper ,
 str_to_lower  ,
str_to_title
Convert case of a string.
 stringr 설명
invert_matchSwitch location of matches to location of non-matches.
str_convSpecify the encoding of a string.
str_dupDuplicate and concatenate strings within a character vector.
str_glue
str_glue_data
Format and interpolate a string with glue
wordExtract words from a sentence.
http://r4ds.had.co.nz/strings.html
http://stringr.tidyverse.org
https://cran.r-project.org/web/packages/stringr/stringr.pdf
http://wsyang.com/r/2014/07/04/stringr-package/
Data Wrangling with R 5.3
https://stackoverflow.com/questions/12775085/the-difference-between-concatenating-character-strings-with-paste-vs-cat
https://www.jaredlander.com/r-for-everyone/
stringr: mordern, consistent string processing

onesixx

Blog Owner

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x