read.table in r 예외사항

Published onesixx on

http://stackoverflow.com/questions/31331135/how-to-remove-special-characters-while-loading-a-csv-in-r

https://martinsbioblogg.wordpress.com/2014/03/06/using-r-common-errors-in-table-import/

> DD <- Auto <- read.table(file="http://www-bcf.usc.edu/~gareth/ISL/Auto.csv", header=T, sep=",")
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  EOF within quoted string

quotes=””

> DD
   mpg cylinders displacement horsepower weight acceleration year origin
1   18         8          307        130   3504         12.0   70      1
2   15         8          350        165   3693         11.5   70      1
3   18         8          318        150   3436         11.0   70      1
4   16         8          304        150   3433         12.0   70      1
5   17         8          302        140   3449         10.5   70      1
6   15         8          429        198   4341         10.0   70      1
7   14         8          454        220   4354          9.0   70      1
8   14         8          440        215   4312          8.5   70      1
9   14         8          455        225   4425         10.0   70      1
10  15         8          390        190   3850          8.5   70      1
11  15         8          383        170   3563         10.0   70      1
12  14         8          340        160   3609          8.0   70      1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ... <truncated>
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ... <truncated>
2                     

 

quote quoting characters의 설정
모든 quoting을 disable하려면,  quote = “” 

the behaviour on quotes embedded in quotes.
scan ==>the set of quoting characters as a single character string or NULL. In a multibyte locale the quoting characters must be ASCII (single-byte).
Quoting은 만약 colClasses가 구체적으로 명시되지 않으면, 단지 Column을 읽을때  그것들 모두를  Character로 여겨진다. 

 

#이 있는 경우, Comment로 생각해 읽지 않을 수 있으므로, (comment.char = “#”)

comment.char = “”

> DD[c(337,338,339),]
     mpg cylinders displacement horsepower weight acceleration year origin               name
337 23.6         4          140          ?   2905         14.3   80      1 ford mustang cobra
338 32.4         4          107         72   2290         17.0   80      3       honda accord
339 27.2         4          135         84   2490         15.7   81      1   plymouth reliant

na.strings=c(“NA”, “-“, “?”)

DD <- Auto <- read.table( "http://www-bcf.usc.edu/~gareth/ISL/Auto.csv",
                    header=T,
                    sep=",",
                    quote = "\"", 
                    na.strings=c("NA", "-", "?"))

 

> DD[c(337,338),]
     mpg cylinders displacement horsepower weight acceleration year origin               name
337 23.6         4          140         NA   2905         14.3   80      1 ford mustang cobra
338 32.4         4          107         72   2290         17.0   80      3       honda accord

 

R는 Character를 기본적으로  factor로 해석하는데, 가끔 의도와 다르게 진짜 Character 그래로 사용해야 할 경우가 있다.
자주 id역할의 컬럼을 factor로 인식하는 경우가 있어 id를 비교/매칭할때 이상한 결과가 나오는 경우가 있다.
 stringsAsFactors=F

data <- read.table(“data.txt”, stringsAsFactors=F)

 

 

Categories: Reshaping

onesixx

Blog Owner

Leave a Reply

Your email address will not be published.