平常心 - 天善智能：专注于商业智能BI和数据分析、大数据领域的垂直社区平台

1

推荐

1562

阅读

scala的脚本编写

1.环境 centos scala2.10.22.简单的示例# vi hello.sh #!/bin/sh exec scala "$0" "$@" !# println("HellO,Linux World")这里的#! 表示解释执行此脚本的shell路径 $0表示绑定的脚本名称hello.sh $@表...

发表了文章 • 2016-12-07 14:12 • 0 条评论

1

推荐

2128

阅读

1.DataFrame转化为RDDimport org.apache.spark.sql.SparkSession import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.mapred.JobConf import org.apache.hadoop.hbase.mapred.TableOutputFormat import org.apache.hadoop.hbase.client.Put import org.apache.hadoop.hbase.util.Bytes import...

发表了文章 • 2016-12-05 09:35 • 0 条评论

1

推荐

2416

阅读

spark2.0中求最值，平均值，TopN编写

对比MR，spark编写计算要简洁很多，代码如下：import org.apache.spark.sql.SparkSession object App { def main(args: Array[String]): Unit = { //测试最大最小值 // testMaxMin //测试平均值 // testAvg //测试Top N testTopN } def testMaxMin:Unit = { val sparkS...

发表了文章 • 2016-12-01 15:30 • 0 条评论

0

推荐

1735

阅读

java爬虫设计

1.爬虫系统设计1.1总体概览1.2模块划分1.2.1数据爬取模块 HttpClient进行html页面下载 HtmlCleaner+ Xpath Jsoup &nb...

发表了文章 • 2016-11-29 19:24 • 0 条评论

0

推荐

1311

阅读

基于redis设计爬虫队列

1.软件下载：http://download.redis.io/releases/ redis-2.8.1.tar.gz2.linux安装# cd /opt/# tar -zxvf redis-2.8.1.tar.gz# cd redis-2.8.1# make/bin/sh: cc: command not found //没有安装gcc-c++# yum install gcc-c++# makeerror: jemalloc/jemalloc.h: No such file ordirectory异常原因...

发表了文章 • 2016-11-28 09:59 • 0 条评论

2

推荐

2122

阅读

greenplum集群安装

1.环境信息1.1服务器 centos6.5 64位 8核/32gSZB-L0038784 master ，segment primary/mirrorSZB-L0038785 standby，segment primary/mirrorSZB-L0038786 segment primary/mirrorSZB-L0038787 segment primary/mirrorSZB-L0038788 segment primary/mirror1.2 Greenplum版本信息：greenplum-db-4....

发表了文章 • 2016-11-16 09:50 • 1 条评论

1

推荐

1811

阅读

基于spark和hive的thrift server的代理中间件开发

1.hive环境0.132.实现功能通过启动代理的thrift server服务，能够分发到实际启动的spark thrift，使spark thrift 服务不会是指定用户访问，交由代理层控制。3.代码结构

发表了文章 • 2016-11-14 11:26 • 1 条评论

1

推荐

1834

阅读

Drill集群安装

1.环境1.1 jdk/opt/jdk1.7.0_791.2Drill版本/opt/apache-drill-1.7.01.3zookeeper地址SZB-L0032013:2181,SZB-L0032014:2181,SZB-L0032015:21811.3Drill集群服务器 10.20.25.199 SZB-L0032015 10.20.25.241 SZB-L0032016 10.20.25.137 SZB-L0032017 2....

发表了文章 • 2016-11-11 09:15 • 0 条评论

1

推荐

2376

阅读

百度地图api数据展示示例

1.环境百度地图js ：http://api.map.baidu.com/api?v=2.0&ak=申请的秘钥2.框架：ssm3.代码3.1java代码import java.util.ArrayList; import java.util.List; import javax.servlet.http.HttpServletRequest; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; impor...

发表了文章 • 2016-11-10 19:00 • 0 条评论

2

推荐

2255

阅读

Zeppelin通过spark展示hive数据

1.hive数据库和表查看hive> show databases; OK default tpc hive> use tpc; OK hive> show tables; OK call_center catalog_page catalog_returns catalog_sales customer customer_address customer_demographics date_dim dbgen_version household_demographics income_band inventory item promotion reaso...

发表了文章 • 2016-11-04 11:05 • 1 条评论

2

推荐

2143

阅读

Greenplum创建外部表

发表了文章 • 2016-10-13 11:35 • 0 条评论

1

推荐

1819

阅读

scala跳出循环的3中方法

第一种：使用boolean变量 while代码： var res = 0 var flag = true while (flag){ println("out put res :" + res) res += 1 if(res == 5) flag = false } for代码： var res = 0 var ...

发表了文章 • 2016-10-06 14:03 • 0 条评论

2

推荐

2356

阅读

greenplum安装

1.参考文档：http://gpdb.docs.pivotal.io/4390/common/welcome.html2.环境信息linux: centos6.3 64 位greenplum-db-4.3.9.1-build-1-rhel5-x86_64.zip3. root用户安装4.linux环境设置4.1防火墙设置# vi /etc/selinux/configSELINUX=disabled# chkconfig iptables off# chkconfig --list iptables4.2配置IP映射和主机名#...

发表了文章 • 2016-10-04 18:34 • 0 条评论

4

推荐

3754

阅读

hive和hbase整合

Hive和hbase整合依赖hive-hbase-handler.jar，存在与hive的lib下面，应用目前只找到了两种操作：第一种：hive创建表同时创建hbase表第二种：hive映射已经有的hbase表第一部分：hive创建表同时创建hbase表1.hive创建hbase的表CREATE TABLE hive2hbase_tes...

发表了文章 • 2016-04-22 12:20 • 3 条评论

3

推荐

11421

阅读

初识hadoop的数据治理和元数据框架Atlas

为了能够理解Atlas，我们先来看看元数据和数据治理。一.元数据元数据就是描述数据的数据。如果是用java编程来说:public class Customer { private String id; private String name; private String address; private String ID; public Customer(String id, String name...

发表了文章 • 2016-03-31 11:06 • 1 条评论