OpenBlas让R的矩阵计算加速

浏览: 1998

Clipboard Image.png

前言

昨天在IBM的大会上,又一次见到了OpenBlas主题的分享,这次必须要试一下。我第一次了解OpenBlas是在2年前的R语言大会上,听到了OpenBlas的各种优势,不过听完也就过去了。2年后再来这个项目,团队成员稳定,项目进展顺利,已经广泛接受,再不尝试一下就真的要落伍了。

目录

  1. OpenBlas介绍
  2. R和OpenBlas的安装
  3. 让R语言加速

1. OpenBlas介绍

OpenBlas是一个开源项目,是由 中科院软件所并行软件与计算科学实验室 发起的基于GotoBLAS2 1.13 BSD版的开源BLAS库高性能实现。

BLAS(Basic Linear Algebra Subprograms 基础线性代数程序集)是一个应用程序接口(API)标准,用以规范发布基础线性代数操作的数值库(如矢量或矩阵乘法)。该程序集最初发布于1979年,并 用于建立更大的数值程序包(如LAPACK)。在高性能计算领域,BLAS被广泛使用。例如,LINPACK的运算成绩则很大程度上取决于BLAS中子程 序DGEMM的表现。为提高性能,各軟硬件厂商则针对其產品对BLAS接口实现进行高度优化。

项目主页:http://www.openblas.net/

2. R和OpenBlas的安装

OpenBlas可以为各种语言底层,提供矩阵计算的性能提升,那么让我们把R和OpenBlas结合试试吧!

本机的系统环境:

  • Linux Ubuntu 14.01.1
  • CPU 双核 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
  • 内存 4G

通过命令查看系统参数


# 操作系统
~ cat /etc/issue
Ubuntu 14.04.1 LTS \n \l

# CPU
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping : 4
microcode : 0x428
cpu MHz : 2600.048
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips : 5200.09
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping : 4
microcode : 0x428
cpu MHz : 2600.048
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips : 5200.09
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

# 内存
~ cat /proc/meminfo
MemTotal: 4046820 kB
MemFree: 1572372 kB
Buffers: 40588 kB
Cached: 709684 kB
SwapCached: 0 kB
Active: 1953940 kB
Inactive: 418084 kB
Active(anon): 1621840 kB
Inactive(anon): 5732 kB
Active(file): 332100 kB
Inactive(file): 412352 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 24 kB
Writeback: 0 kB
AnonPages: 1623792 kB
Mapped: 34936 kB
Shmem: 5828 kB
Slab: 58024 kB
SReclaimable: 45252 kB
SUnreclaim: 12772 kB
KernelStack: 1512 kB
PageTables: 8980 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2023408 kB
Committed_AS: 2556460 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9664 kB
VmallocChunk: 34359723308 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1562624 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 28672 kB
DirectMap2M: 4296704 kB

首先,我们要安装R语言的运行环境,在Linux Ubuntu中一条命令就可以搞定。


# 安装R语言
~ sudo apt-get install r-base

#查看R语言的版本
~ R --version
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

我们安装的R语言最新版本是3.2.2。

接下来,我们跑一个矩阵的计算,让2个3000行列的方阵相乘。


# 启动R
~ R

# 方阵相乘
> x <- matrix(1:(3000 * 3000), 3000, 3000)

# 计算耗时
> system.time(tmp <- x %*% x)
user system elapsed
33.329 0.332 33.788

接下来,我们安装OpenBlas来提高计算性能。在Ubuntu中安装OpenBlas非常简单,只需要一条命令就可以搞定。


~ sudo apt-get install libopenblas-base

切换blas的计算引擎,使用openblas替换libblas。


~ sudo update-alternatives --config libblas.so.3
There are 2 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).

Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
1 /usr/lib/libblas/libblas.so.3 10 manual mode
2 /usr/lib/openblas-base/libblas.so.3 40 manual mode

Press enter to keep the current choice[*], or type selection number: 0

选择0,使用openblas-base引擎。

我们重新打开R运行环境,再次执行刚才的矩阵相乘计算。


~ R
> x <- matrix(1:(3000 * 3000), 3000, 3000)

# 计算耗时
> system.time(tmp <- x %*% x)
user system elapsed
7.391 0.127 3.869

神奇的事情发生了,速度提升了4倍多。由于OpenBlas可以对矩阵计算加速,那么我们对所有矩阵操作都做一下测试吧。

3. 让R语言加速

通过互联网我找到了两个用于R语言性能测试的脚本,我们可以在自己的环境中测试一下。Benchmarks脚本的发布页,脚本代码下载

我发现Revolution Analytics公司也用这个脚本进行了测试,并对比了Revolution企业版和R的官方发行版的区别。

下载脚本


~ wget http://brettklamer.com/assets/files/statistical/faster-blas-in-r/R-benchmark-25.R
--2015-09-24 12:06:05-- http://brettklamer.com/assets/files/statistical/faster-blas-in-r/R-benchmark-25.R
Resolving brettklamer.com (brettklamer.com)... 199.96.156.242
Connecting to brettklamer.com (brettklamer.com)|199.96.156.242|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13666 (13K)
Saving to: ‘R-benchmark-25.R’

100%[============================================================================================>] 13,666 --.-K/s in 0s

2015-09-24 12:06:06 (203 MB/s) - ‘R-benchmark-25.R’ saved [13666/13666]

执行脚本。


~ R

# 运行脚本
> source("R-benchmark-25.R")
Loading required package: Matrix
Loading required package: SuppDists


R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 1.103
2400x2400 normal distributed random matrix ^1000____ (sec): 0.812333333333333
Sorting of 7,000,000 random values__________________ (sec): 0.962666666666667
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.547
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.828000000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.957989159036612

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.365333333333335
Eigenvalues of a 640x640 random matrix______________ (sec): 1.43466666666667
Determinant of a 2500x2500 random matrix____________ (sec): 0.895999999999998
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.832000000000003
Inverse of a 1600x1600 random matrix________________ (sec): 0.724333333333334
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.814310314522547

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.776666666666661
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.269666666666671
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.570666666666663
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.506666666666665
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.533000000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.536138937440438


Total time for all 15 tests_________________________ (sec): 12.162
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.747841037469598
--- End of test ---

我们再切换到,R语言默认的blas引擎运行一次。


~ sudo update-alternatives --config libblas.so.3
There are 2 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).

Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
1 /usr/lib/libblas/libblas.so.3 10 manual mode
2 /usr/lib/openblas-base/libblas.so.3 40 manual mode

Press enter to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/lib/libblas/libblas.so.3 to provide /usr/lib/libblas.so.3 (libblas.so.3) in manual mode

选择1,切换到libblas引擎。重启R语言环境,并执行脚本。


~ R
> source("R-benchmark-25.R")
Loading required package: Matrix
Loading required package: SuppDists


R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 1.09366666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 0.817333333333333
Sorting of 7,000,000 random values__________________ (sec): 0.954333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 15.3033333333333
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 7.155
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.95463154033118

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.363666666666669
Eigenvalues of a 640x640 random matrix______________ (sec): 1.131
Determinant of a 2500x2500 random matrix____________ (sec): 5.061
Cholesky decomposition of a 3000x3000 matrix________ (sec): 5.634
Inverse of a 1600x1600 random matrix________________ (sec): 4.142
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 2.87278425762591

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.775000000000006
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.259666666666665
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.633333333333345
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.533666666666666
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.647999999999996
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.602780428790226


Total time for all 15 tests_________________________ (sec): 44.505
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.5014435867612
--- End of test ---

从运行结果看到,用OpenBlas库在做矩阵计算时,性能优势是非常明显的。完成15个测试,OpenBlas库用时12秒,而默认的Blas库用时44秒。仅仅是切换一个底层算法库的成本,就可以让计算性能得到非常大的提升,各位R的小伙伴赶紧用起来吧。

作者介绍:

张丹,R语言中文社区专栏特邀作者,《R的极客理想》系列图书作者,民生银行大数据中心数据分析师,前况客创始人兼CTO。

10年IT编程背景,精通R ,Java, Nodejs 编程,获得10项SUN及IBM技术认证。丰富的互联网应用开发架构经验,金融大数据专家。个人博客 http://fens.me, Alexa全球排名70k。

著有《R的极客理想-工具篇》、《R的极客理想-高级开发篇》,合著《数据实践之美》,新书《R的极客理想-量化投资篇》(即将出版)。

Clipboard Image.png

《R的极客理想-工具篇》京东购买快速通道:https://item.jd.com/11524750.html

《R的极客理想-高级开发篇》京东购买快速通道:https://item.jd.com/11731967.html

《数据实践之美》京东购买快速通道:https://item.jd.com/12106224.html

推荐 1
本文由 张丹 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

0 个评论

要回复文章请先登录注册