1. sqoop import基本原理
a.读取db数据到临时目录
b.临时目录 load到hive表(LOAD DATA INPATH '/sparktest/ooziedata/sparktest00/' OVERWRITE INTO TABLE spark_test03;)
sqoop import --connect jdbc:postgresql://bigdatamster:5432/public --username user --password pwd -m1 --table spark_test --hive-import --hive-database default --hive-overwrite --hive-table spark_test0 --target-dir hdfs://bigdatamster:8020/sparktest/ooziedata/sparktest00
注意使用postgresql需要将postgresql-42.2.2.jar copy到oozie/lib目录
但不是oozie需要的
2.oozie 下sqoop的配置过程
2.1配置oozielib
由于和默认的hive不匹配,需要自己创建oozie下sqoop使用的hive库sqoophive
hdfs dfs -put /usr/hdp/2.6.4.0-91/hive/lib/*.jar /user/oozie/share/lib/lib_20180628161326/sqoophive
hdfs dfs -put postgresql-42.2.2.jar /user/oozie/share/lib/lib_20180628161326/sqoophive
执行
oozie admin -sharelibupdate
oozie admin -shareliblist
2.2运行
workflow manager增加sqoop节点后填写
import --connect jdbc:postgresql://bigdatamster:5432/public --username user --password pwd -m1 --table spark_test --hive-import --hive-database default --hive-overwrite --hive-table spark_test0 --target-dir hdfs://bigdatamster:8020/sparktest/ooziedata/sparktest00
在提交的时候指定 oozie.action.sharelib.for.sqoop=sqoop,sqoophive
3.问题的寻找过程
3.1 exit code [1]
2018-07-06 10:08:48,535 WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_help] Can't find any associated Hadoop job counters
2018-07-06 10:09:03,354 WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_dblist] Can't find any associated Hadoop job counters
2018-07-06 10:09:20,309 WARN SqoopActionExecutor:523 - SERVER[bigdatanode0] USER[oozie] GROUP[-] TOKEN[] APP[wkf_example_sqoop] JOB[0000194-180702162526364-oozie-oozi-W] ACTION[0000194-180702162526364-oozie-oozi-W@sqoop_import] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
这个错误找了好久才解决掉,我的环境下是hive lib有问题,自己创建了一个
在提交的时候指定 oozie.action.sharelib.for.sqoop=sqoop,sqoophive
3.2 143错误
内存问题,调整yarn和mapreduce参数
Showing 4096 bytes of 31782 total. Click here for the full log.
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2018-07-06 13:17:44,629 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1530177734281_2205_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.
3.3 数据库dbjar没有
18/06/26 14:38:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: org.postgresql.Driver
java.lang.RuntimeException: Could not load db driver class: org.postgresql.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)