Scale(), Normalization표준화

Published by onesixx on 17-03-1017-03-10

Scaling의 종류

https://en.wikipedia.org/wiki/Feature_scaling
http://www.projectl33t.xyz/archives/8359
http://pds9.egloos.com/pds/200807/14/28/Chapter4.pdf

Standardization

각 feature의 평균을 빼고 표준편차로 나눈다. 평균 0 분산 1로 표준화 (standard scores)

ML 기법 예) support vector machines, logistic regression, and neural networks 등에서 다차원 데이터를 사용할때
데이터 예) audio signals 이나 image 의 pixel 값등

Re-scaling

관측값의 range를 0~1 또는 -1 ~ 1로 변환

여기서, x 는 변환전 original값 , x` 는 변환후 normalized 값

ex) 한 반의 학생의 몸무게게 60kg에서 100kg 사이일때, 먼저 60을 빼고, 범위인 40으로 나눈다.

Scaling to unit length

feature 벡터를 벡터의 Euclidean 거리로 나누어 길이가 1로 만든다.
응용으로 feature vector의 L1 norm (ex, Manhattan Distance, City-Block Length , Taxicab Geometry) 을 사용하기도 한다.

https://cos.name/cn/topic/101615/

scale()

default는 TRUE, TRUE

Center (mean 0)	Scale( sd 1)
T	T
F	T
T	F

#scale
x <- as.matrix(c(1:10))
mean(x); sd(x)

# center=T, scale=T
scale(x, center=T, scale=T)
scale(x, center=T, scale=apply(x,2,sd))
apply(x, 2, function(x){(x-mean(x))/sd(x)})

#  center=T, scale=F
scale(x, center=T, scale=F)
apply(x, 2, function(x){(x-mean(x))})

#  center=F, scale=T
scale(x, center=F, scale=T)
apply(x, 2, function(x){x/sqrt(sum(x^2)/(length(x)-1))})


scale(x, center=F, scale=F)
apply(x, 2, function(x){x/sqrt(sum(x^2)/(length(x)-1))})


#참고
scale.default

(centered.x <- scale(x, scale=TRUE))
cov(centered.scaled.x <- scale(x)) # all 1
apply(centered.scaled.x, 2, mean)
apply(centered.scaled.x, 2, sd)


(centered.x <- scale(x, scale = FALSE))
cov(centered.scaled.x <- scale(x)) # all 1
colMeans(centered.scaled.x)
apply(x, 2, mean)
apply(centered.scaled.x, 2, mean)
apply(centered.scaled.x, 2, sd)


x <- matrix(1:10, ncol = 2)
x1 <- x[,1]
x2 <- x[,2]

(centered01 <- scale(x))
#(centered01 <- scale(x, center=TRUE,  scale=TRUE ))
(centered02 <- scale(x, center=TRUE,  scale=FALSE))
(centered03 <- scale(x, center=FALSE, scale=TRUE ))

(centered04 <- transform(x, V1=x1-min(x1)/max(x1)-min(x1), V2=x2-min(x2)/max(x2)-min(x2)))
(centered05 <- transform(x, V1=x1-mean(x1)/sd(x1), V2=x2-mean(x2)/sd(x2)))
(centered05 <- transform(x, V1=x1/sd(x1), V2=x2-mean(x2)/sd(x2)))

source("http://bioconductor.org/biocLite.R")
biocLite("bioDist")
h <- matrix(rnorm(200), nrow = 5)
euc(h)

f
colMeans(centered01)
colMeans(centered02)
colMeans(centered03)
colMeans(centered04)
colMeans(centered05)

apply(centered01, 2, sd)
apply(centered02, 2, sd)
apply(centered03, 2, sd)
apply(centered04, 2, sd)
apply(centered05, 2, sd)

Why does scale return NaN for zero variance columns?

> x1 <- c(1,1,1,1,1)
> scale(x1)
     [,1]
[1,]  NaN
[2,]  NaN
[3,]  NaN
[4,]  NaN
[5,]  NaN

> scale(x1, 
+       center = TRUE, 
+       scale = (var(x1)!=0))
     [,1]
[1,]    0
[2,]    0
[3,]    0
[4,]    0
[5,]    0
attr(,"scaled:center")
[1] 1

Scale(), Normalization표준화

Scaling의 종류

https://en.wikipedia.org/wiki/Feature_scaling
http://www.projectl33t.xyz/archives/8359
http://pds9.egloos.com/pds/200807/14/28/Chapter4.pdf

Standardization

Re-scaling

Scaling to unit length

https://cos.name/cn/topic/101615/

scale()

Why does scale return NaN for zero variance columns?

onesixx

Simple Imputation (simputation)

seq()

group , cycle by rule

Scale(), Normalization표준화

Scaling의 종류

https://en.wikipedia.org/wiki/Feature_scalinghttp://www.projectl33t.xyz/archives/8359http://pds9.egloos.com/pds/200807/14/28/Chapter4.pdf

Standardization

Re-scaling

Scaling to unit length

https://cos.name/cn/topic/101615/

scale()

Why does scale return NaN for zero variance columns?

onesixx

Related Posts

Simple Imputation (simputation)

seq()

group , cycle by rule

https://en.wikipedia.org/wiki/Feature_scaling
http://www.projectl33t.xyz/archives/8359
http://pds9.egloos.com/pds/200807/14/28/Chapter4.pdf