请教,用sqoop 把mysql的数据导入到hive 总有一条重复数据。怎么破?

0
已邀请:
1

regan - run! run! run! happy runner! 我是奔跑的小米~ 2017-03-17 回答

用下面的方式试一试~IP=$1
PORT=$2
DB=$3
USERNAME=$4
PASSWORD=$5
TABLE=$6
HIVE_DB=$7
HIVE_TABLE=$8
HDFS_LOCATION=$9
PARTITION_COLUMN=${10}
CON_DATE=${11}
#format 2017-01-09
n_day=${12}
t_day=`date --date=$n_day '+%s'`
yesterday=$(date -d"yesterday $n_day" '+%s')



HIVE_DB_TABLE=$HIVE_DB.$HIVE_TABLE
MYSQL_JDBC=jdbc:mysql://$IP:$PORT/$DB
PARTITION_PATH=$HDFS_LOCATION/$n_day

echo "--------------------"
echo "--jdbc:mysql://$IP:$PORT/$DB"
echo "--username/password:$USERNAME/$PASSWORD"
echo "--mysql table:$TABLE"
echo "--hive table:$HIVE_DB.$HIVE_TABLE"
echo "--store hdfs:$HDFS_LOCATION"
echo "--partition column:$PARTITION_COLUMN"
echo "--partition path:$PARTITION_PATH"
echo "--increment date column:$CON_DATE"
echo "--schedule date:$n_day"

echo "-----start export------"

echo "select sql: select * from $TABLE where $CON_DATE >= '$yesterday'  and $CON_DATE < '$t_day' and 1=1"
 
sqoop import --connect $MYSQL_JDBC --table $TABLE --username $USERNAME  --password $PASSWORD  -m 10 --append  --hive-import --hive-overwrite --hive-database xxx_dw --hive-table can_schedule_list --hive-partition-key data_date --hive-partition-value $n_day --fields-terminated-by "\0001" --where "$CON_DATE>='$yesterday' and $CON_DATE<'$t_day'

要回复问题请先登录注册