请问大家,有谁做过 k-means的并行计算? 需要利用parallel包,想请教一下,谢谢!

0
已邀请:
0

marxsong 2016-11-15 回答

参考stackoverflow
library(parallel)
library(BLR)

data(wheat)

mc = mclapply(2:6, function(x,centers)kmeans(x, centers), x=X)

> summary(mc)
Length Class Mode
[1,] 9 kmeans list
[2,] 9 kmeans list
[3,] 9 kmeans list
[4,] 9 kmeans list
[5,] 9 kmeans list
改进方案:
(pars = expand.grid(i=1:3, cent=2:4))

i cent
1 1 2
2 2 2
3 3 2
4 1 3
5 2 3
6 3 3
7 1 4
8 2 4
9 3 4

L=list()
# zikes horrible
pars2=apply(pars,1,append, L)
mc = mclapply(pars2, function(x,pars)kmeans(x, centers=pars$cent,nstart=pars$i ), x=X)

> summary(mc)
Length Class Mode
[1,] 9 kmeans list
[2,] 9 kmeans list
[3,] 9 kmeans list
[4,] 9 kmeans list
[5,] 9 kmeans list
[6,] 9 kmeans list
[7,] 9 kmeans list
[8,] 9 kmeans list
[9,] 9 means list
或者是用clusterApply函数:

library(parallel)
nw <- detectCores()
cl <- makeCluster(nw)
clusterSetRNGStream(cl, iseed=1234)
set.seed(88)
mydata <- matrix(rnorm(5000 * 100), nrow=5000, ncol=100)

# Parallelize over the "nstart" argument
nstart <- 100
# Create vector of length "nw" where sum(nstartv) == nstart
nstartv <- rep(ceiling(nstart / nw), nw)
results <- clusterApply(cl, nstartv,
function(n, x) kmeans(x, 3, nstart=n, iter.max=1000),
mydata)
# Pick the best result
i <- sapply(results, function(result) result$tot.withinss)
result <- results[[which.min(i)]]
print(result$tot.withinss)

要回复问题请先登录注册