R에서 한글 (encoding 관련) – csv, xlsx…
기본설정
R studio > tools >
- Global options… > Code > saving > Default text encoding: UTF-8
- Project options… > Code Editing > Text excoding: UTF-8로 설정
Linux ( Ubuntu)
일단 Ubuntu의 locale설정이 제대로 되어 있어야, 웹상에 해결방법이 잘 적용된다.
https://lintut.com/how-to-set-up-system-locale-on-ubuntu-18-04/
system locale확인 방법
~$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=
~$ localectl status System Locale: LANG=en_US.UTF-8 VC Keymap: us X11 Layout: us
설정가능한 모든 locale
~$ locale -a C C.UTF-8 en_US.utf8 ko_KR ko_KR.euckr ko_KR.utf8 korean korean.euc POSIX
사용하고자 하는 system locale for the region이 없을 경우, 아래 명령어로 화면에서 추가해 준다.
~$ sudo dpkg-reconfigure locales
locale 수정
~$ sudo vi /etc/default/locale
수정후에는 logout해주어야하고, 확인 후 R도 restart해준다.
encoding을 고려한 read 함수
http://philogrammer.com/2017-03-15/encoding/
# library(devtools)
# install_github("plgrmr/readAny", force = TRUE)
# library(readAny)
library(readr)
uF_readAny <- function(fileNm, sep="", ...) {
  encoding <- as.character(guess_encoding(fileNm)[1,1])
  extension <- as.character(tools::file_ext(fileNm))
  if(extension=='xlsx'){
    result <- read_excel(fileNm)
  } else {
    if(sep != "" | !(extension %in% c("csv","txt")) ) extension <- "custom"
    separate <- list(csv=",", txt="\
", custom=sep)
    result <- read.table(fileNm, sep=separate[[extension]], fileEncoding=encoding, ...)
  }
  return(result)
} 
 dd <- read.table(fileNm, header=T)
dd <- uF_readAny(fileNm, header=T) %>% setDT()
dd <- fread(fileNm, encoding='UTF-8')
# read.csv("파일위치/파일명", fileEncoding="euc-kr")
# read.table("파일위치/파일명", fileEncoding="euc-kr") 
https://studyforus.tistory.com/167
RStudio encoding설정
- Tools -> Global Options… 
 Code> Saving "Default text encoding: " : "UTF-8"
- Tools -> Project Options…
 Code Editing "Text encoding: " : "UTF-8"
https://r-bong.blogspot.com/2016/03/rstudio_26.html

> Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;
     LC_NUMERIC=C;
     LC_TIME=en_US.UTF-8;
     LC_COLLATE=en_US.UTF-8;
     LC_MONETARY=en_US.UTF-8;
     LC_MESSAGES=en_US.UTF-8;
     LC_PAPER=en_US.UTF-8;
     LC_NAME=C;
     LC_ADDRESS=C;
     LC_TELEPHONE=C;
     LC_MEASUREMENT=en_US.UTF-8;
     LC_IDENTIFICATION=C"
> Sys.setlocale("LC_CTYPE", "C")  # 강제 언어삭제
> localeToCharset()
[1] "UTF-8"     "ISO8859-1" 
 Sys.getlocale()  # Sys.getlocale("LC_ALL")
#[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Sys.getlocale("LC_COLLATE")
Sys.getlocale("LC_CTYPE")
Sys.getlocale("LC_MONETARY")
Sys.getlocale("LC_NUMERIC")
Sys.getlocale("LC_TIME")
Sys.setlocale(category = "LC_CTYPE", locale = "ko_KR.UTF-8")
#[1] "en_US.UTF-8/ko_KR.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"