R에서 한글 (encoding 관련) – csv, xlsx…

Published by onesixx on

기본설정

R studio > tools >

  • Global options… > Code > saving > Default text encoding: UTF-8
  • Project options… > Code Editing > Text excoding: UTF-8로 설정

Linux ( Ubuntu)

일단 Ubuntu의 locale설정이 제대로 되어 있어야, 웹상에 해결방법이 잘 적용된다.

https://lintut.com/how-to-set-up-system-locale-on-ubuntu-18-04/

system locale확인 방법

~$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
~$ localectl status
   System Locale: LANG=en_US.UTF-8
       VC Keymap: us
      X11 Layout: us

설정가능한 모든 locale

~$ locale -a
C
C.UTF-8
en_US.utf8
ko_KR
ko_KR.euckr
ko_KR.utf8
korean
korean.euc
POSIX

사용하고자 하는 system locale for the region이 없을 경우, 아래 명령어로 화면에서 추가해 준다.

~$ sudo dpkg-reconfigure locales

locale 수정

~$ sudo vi /etc/default/locale

수정후에는 logout해주어야하고, 확인 후 R도 restart해준다.

encoding을 고려한 read 함수

http://philogrammer.com/2017-03-15/encoding/
# library(devtools)
# install_github("plgrmr/readAny", force = TRUE)
# library(readAny)
library(readr)
uF_readAny <- function(fileNm, sep="", ...) {
  encoding <- as.character(guess_encoding(fileNm)[1,1])
  extension <- as.character(tools::file_ext(fileNm))
  if(extension=='xlsx'){
    result <- read_excel(fileNm)
  } else {
    if(sep != "" | !(extension %in% c("csv","txt")) ) extension <- "custom"
    separate <- list(csv=",", txt="\
", custom=sep)
    result <- read.table(fileNm, sep=separate[[extension]], fileEncoding=encoding, ...)
  }
  return(result)
}
dd <- read.table(fileNm, header=T)
dd <- uF_readAny(fileNm, header=T) %>% setDT()
dd <- fread(fileNm, encoding='UTF-8')

# read.csv("파일위치/파일명", fileEncoding="euc-kr")
# read.table("파일위치/파일명", fileEncoding="euc-kr")
https://studyforus.tistory.com/167

RStudio encoding설정

  • Tools -> Global Options…
    Code> Saving     "Default text encoding: " : "UTF-8"
  • Tools -> Project Options…
    Code Editing     "Text encoding: " : "UTF-8"
https://r-bong.blogspot.com/2016/03/rstudio_26.html
> Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;
     LC_NUMERIC=C;
     LC_TIME=en_US.UTF-8;
     LC_COLLATE=en_US.UTF-8;
     LC_MONETARY=en_US.UTF-8;
     LC_MESSAGES=en_US.UTF-8;
     LC_PAPER=en_US.UTF-8;
     LC_NAME=C;
     LC_ADDRESS=C;
     LC_TELEPHONE=C;
     LC_MEASUREMENT=en_US.UTF-8;
     LC_IDENTIFICATION=C"
> Sys.setlocale("LC_CTYPE", "C")  # 강제 언어삭제
> localeToCharset()
[1] "UTF-8"     "ISO8859-1"
Sys.getlocale()  # Sys.getlocale("LC_ALL")
#[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Sys.getlocale("LC_COLLATE")
Sys.getlocale("LC_CTYPE")
Sys.getlocale("LC_MONETARY")
Sys.getlocale("LC_NUMERIC")
Sys.getlocale("LC_TIME")

Sys.setlocale(category = "LC_CTYPE", locale = "ko_KR.UTF-8")
#[1] "en_US.UTF-8/ko_KR.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Categories: R Basic

onesixx

Blog Owner