Sparse matrices

  (흔히 넓은 지역에 분포된 정도가) 드문, (밀도가) 희박한 sparse  matrix 희소 행렬 :  행렬 내부의 원소가 거의 0이고  실제 값이 들어간 건 몇 개 안 되는 행렬입니다 In numerical analysis and computer science, a sparse matrix or sparse array is a matrix in which most of the elements are zero. By contrast, if most of Read more…

Quantile Regression for robust regression

ranger …predict if (is.null(data)) {   stop(“Error: Argument ‘data’ is required for non-quantile prediction.”)   } Error: Argument ‘data’ is required for non-quantile prediction     https://cran.r-project.org/web/packages/quantreg/quantreg.pdf library(moments) ## skewness and kurtosis library(quantreg) ## for robust regression

Galton’s father & son height data

https://rstudio-pubs-static.s3.amazonaws.com/204984_dd2112475db84af2a03260c4a4f830ac.html   #install.packages(“UsingR”) library(tidyverse) library(data.table) library(UsingR) data(father.son) dd <- father.son %>% data.table() g <- dd %>% ggplot(aes(x=fheight, y=sheight)) + geom_point(size=2, alpha=0.7) + xlab(“Height of father”) + ylab(“Height of son”) + ggtitle(“Father-son Height Data”) # mean and standard deviations of the father and son heights shmean <- mean(dd$sheight) fhmean <- mean(dd$fheight) Read more…

pca ex3

  https://www.kaggle.com/sagarnildass/red-wine-analysis-by-r/data   library(tidyverse) library(data.table) wine <- fread(‘~/Downloads/wineQualityReds.csv’) dd <- wine[,.(alcohol, pH)] dd %>% ggplot(aes(alcohol, pH))+geom_point() # centering —————————————————————- dd_centering <- data.frame(lapply(dd, function(x){((x-mean(x)))})) dd_centering %>% ggplot(aes(alcohol, pH))+geom_point() sigma1 <- cov(dd_centering) # Prcomp in r ———————————————————————— pr_c <- prcomp(dd, center=T) # Xmatrix1 pr_c$sdev^2 # eigen(sigma1)$values # 분산 설명력 pr_c$rotation # eigen(sigma1)$vectors*-1 Read more…

PCA eigen 관계

  # Data from https://ratsgo.github.io/machine%20learning/2017/04/24/PCA/ # n1 n2 n3 n4 n5 # p1 0.2 ,0.45,0.33,0.54,0.77 # p2 5.6 ,5.89,6.37,7.9 ,7.87 # p3 3.56,2.4 ,1.95,1.32,0.98 Xorigin <- data.frame(p1=c(0.2 ,0.45,0.33,0.54,0.77), p2=c(5.6 ,5.89,6.37,7.9 ,7.87), p3=c(3.56,2.4 ,1.95,1.32,0.98)) Xmatrix0 <- data.matrix(Xorigin) # centering —————————————————————- Xorigin_centering <- data.frame(lapply(Xorigin, function(x){((x-mean(x)))})) Xmatrix1 <- data.matrix(Xorigin_centering) sigma1 <- cov(Xmatrix1) # Read more…

Outlier Detection with Mahalanobis Distance

  https://www.steffenruefer.com/2016/12/outlier-detection-with-mahalanobis-distance/   In this tutorial I will discuss how to detect outliers in a multivariate dataset without using the response variable. I will first discuss about outlier detection through threshold setting, then about using Mahalanobis Distance instead. The Problem Outliers are data points that do not match the general Read more…

package :: mvnfast

https://cran.r-project.org/web/packages/mvnfast/index.html Fast Multivariate Normal and Student’s t Methods   Provides computationally efficient tools related to the multivariate normal and Student’s t distributions. The main functionalities are: simulating multivariate random vectors, evaluating multivariate normal or Student’s t densities and Mahalanobis distances. These tools are very efficient thanks to the use of Read more…

PCA 의 기본

PCA는 차원축소에 많이 사용되지만, PCA가 차원을 실제로 차원을 축소해 준다기 보다는 PCA의 결과에서  차원을 선별적으로 사용한다고 이해하는 것이 좋다.  PCA는 분산이 크기로 각 차원의 데이터에 대한 설명력을 보여준다. PCA는 서로 직교(독립)하는 Linear 축을 찾기위해 선형변환하는 것이다. 즉각 차원의 축(PC)은   변수들의 선형결합(linear combination)이고, X가 이미 서로 상관관계가 높지 않다면(correlation, off-diagonal 값이 Read more…

Chi-squared approximation may be incorrect

https://stats.stackexchange.com/questions/81483/warning-in-r-chi-squared-approximation-may-be-incorrect It gave the warning because many of the expected values will be very small and therefore the approximations of p may not be right. In R you can use chisq.test(a, simulate.p.value = TRUE) to use simulate p values. However, with such small cell sizes, all estimates will be poor. Read more…