1、 计算相关系数
(1) cor()函数可以计算以下三种相关系数:
(2) Pearson 极差相关系数:两个连续变量之间的线性相关程度。
(3) Spearman 等级相关系数:等级变量之间的相关程度。
(4) Kendall 等级相关系数:非参数的等级相关度量。
(5) 语法:cor(data, use= , method=)
data:矩阵或数据框;
use:缺失数据的处理方式。
all.obs:假设不存在缺失数据,遇到缺失数据将报错。
everything:遇到缺失数据时,相关系数的计算结果将被设置为 missing ;
complete.obs:行删除;
pairwise.obs: 成对删除。
method:指定相关系数的类型。pearson、spearman、kendall。
原示例
> states<- state.x77[, 1:6] > x<- states[,c("Population", "Income", "Illiteracy","HS Grad")] > y<-states[,c("Life Exp","Murder")] > cor(x,y) |
结果:
Life Exp Murder Population -0.06805195 0.3436428 Income 0.34025534 -0.2300776 Illiteracy -0.58847793 0.7029752 HS Grad 0.58221620 -0.4879710 |
探索 房子单价与 面积,所在楼层,总层高之间的相关性
数据准备
> house<- read.table("house_data.txt", header = TRUE, sep='|',fileEncoding ="UTF-8", + stringsAsFactors = FALSE, + colClasses = c("character","character","numeric", + "character","numeric","numeric","character", + "numeric","numeric","character")) > > houseXQ<- sqldf("select * from house where community_name!='东郊小镇' ",row.names=TRUE) Error in sqldf("select * from house where community_name!='东郊小镇' ", : could not find function "sqldf" > library(sqldf) 载入需要的程辑包:gsubfn 载入需要的程辑包:proto 载入需要的程辑包:RSQLite > houseXQ<- sqldf("select * from house where community_name!='东郊小镇' ",row.names=TRUE) > communityFactor<- factor(houseXQ$community_name, order=FALSE) > houseXQ <-cbind(houseXQ, communityFactor) |
总价与面积,当前楼层,总层高,单价的相关性
x<- houseXQ [,c("house_total")] y<- houseXQ [,c("house_area","house_floor_curr","house_floor_total","house_avg")] > cor(x,y) 结果: house_area house_floor_curr house_floor_total house_avg [1,] 0.9450675 -0.02058832 0.03570221 0.4395242 |
总价与面积高度相关。
相关系统的显著性检测:由结果可见,它们高度相关
cor.test(houseXQ[, c("house_total")],houseXQ[, c("house_area")] )
Pearson's product-moment correlation
data: houseXQ[, c("house_total")] and houseXQ[, c("house_area")] t = 39.537, df = 187, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9274393 0.9585053 sample estimates: cor 0.9450675 |
单价与 面积,当前楼层,总层高,总价的相关性
x<- houseXQ [,c("house_avg")] y<- houseXQ [,c("house_area","house_floor_curr","house_floor_total","house_total")] cor(x,y) 结果: house_area house_floor_curr house_floor_total house_total [1,] 0.1659645 0.2139952 0.3024903 0.4395242 |