HDP 平台oozie上sqoop节点的使用

浏览: 2512

1. sqoop import基本原理

a.读取db数据到临时目录

b.临时目录 load到hive表(LOAD DATA INPATH '/sparktest/ooziedata/sparktest00/'  OVERWRITE INTO TABLE spark_test03;)

sqoop import --connect jdbc:postgresql://bigdatamster:5432/public --username user --password pwd -m1 --table spark_test --hive-import --hive-database default --hive-overwrite --hive-table spark_test0 --target-dir hdfs://bigdatamster:8020/sparktest/ooziedata/sparktest00 

注意使用postgresql需要将postgresql-42.2.2.jar copy到oozie/lib目录

但不是oozie需要的

2.oozie 下sqoop的配置过程

2.1配置oozielib

由于和默认的hive不匹配,需要自己创建oozie下sqoop使用的hive库sqoophive

hdfs dfs -put /usr/hdp/2.6.4.0-91/hive/lib/*.jar /user/oozie/share/lib/lib_20180628161326/sqoophive
hdfs dfs -put postgresql-42.2.2.jar /user/oozie/share/lib/lib_20180628161326/sqoophive
执行
oozie admin -sharelibupdate
oozie admin -shareliblist

2.2运行

workflow manager增加sqoop节点后填写

import --connect jdbc:postgresql://bigdatamster:5432/public --username user --password pwd -m1 --table spark_test --hive-import --hive-database default --hive-overwrite --hive-table spark_test0 --target-dir hdfs://bigdatamster:8020/sparktest/ooziedata/sparktest00 

在提交的时候指定 oozie.action.sharelib.for.sqoop=sqoop,sqoophive

3.问题的寻找过程

3.1 exit code [1]

2018-07-06 10:08:48,535  WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_help] Can't find any associated Hadoop job counters
2018-07-06 10:09:03,354  WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_dblist] Can't find any associated Hadoop job counters
2018-07-06 10:09:20,309  WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_import] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

这个错误找了好久才解决掉,我的环境下是hive lib有问题,自己创建了一个

在提交的时候指定 oozie.action.sharelib.for.sqoop=sqoop,sqoophive

3.2 143错误

内存问题,调整yarn和mapreduce参数

Showing 4096 bytes of 31782 total. Click here for the full log.
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2018-07-06 13:17:44,629 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1530177734281_2205_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.

3.3 数据库dbjar没有

18/06/26 14:38:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: org.postgresql.Driver
java.lang.RuntimeException: Could not load db driver class: org.postgresql.Driver
        at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
        at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
推荐 2
本文由 seng 创作,采用 知识共享署名-相同方式共享 3.0 中国大陆许可协议 进行许可。
转载、引用前需联系作者,并署名作者且注明文章出处。
本站文章版权归原作者及原出处所有 。内容为作者个人观点, 并不代表本站赞同其观点和对其真实性负责。本站是一个个人学习交流的平台,并不用于任何商业目的,如果有任何问题,请及时联系我们,我们将根据著作权人的要求,立即更正或者删除有关内容。本站拥有对此声明的最终解释权。

2 个评论

怎么也用起HDP了
seng

seng 回复 梁勇

最近都在忙这个了, 用户相关的标签和实时数据处理会在这上面做.

要回复文章请先登录注册