数据仓库工具hive提供了复合数据类型:
- 结构体(struct): 基于对象的数据结构形式,struct内部的数据可以通过(.)来存取,例如,表中一列user的类型为STRUCT{name string; age int},我们可以通过user.name来访问域name
- 数组(array):基于数据(列表)的数据结构形式,array中的数据为相同类型,例如,假如array中元素['a','b','c'],则array[1]的值为'b'
- 字典(map):基于key-value的数据结构形式,map访问指定域可以通过["指定域名称"]进行,例如,一个map 包含了一个[name:天天向上]的key-value对,name的值可以通过map['name']来获取
结构体struct使用
建表:
hive> create table student(id INT, user struct<name:STRING, age:INT>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ':';
OK
Time taken: 0.446 seconds
'FIELDS TERMINATED BY' :字段与字段之间的分隔符
''COLLECTION ITEMS TERMINATED BY' :一个字段各个item的分隔符
导入数据:
$ cat test.txt
1,BIWork:30
2,mu:30
hive> LOAD DATA LOCAL INPATH '/data/test.txt' INTO TABLE student;
Copying data from file:/data/test.txt
Copying file: file:/home/work/data/test.txt
Loading data to table default.student
OK
Time taken: 0.35 seconds
测试:
hive> select user.age from student;
Total MapReduce jobs = 1
......
Total MapReduce CPU Time Spent: 490 msec
OK
30
30
Time taken: 21.677 seconds
数组(array)使用
建表:
hive> create table tbl_class(name string, student_id_list array<INT>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ':';
OK
Time taken: 0.099 seconds
导入数据:
$ cat test1.txt
biwork,1:2:3:4
test_user,5:6
hive> LOAD DATA LOCAL INPATH '/data/test1.txt' INTO TABLE tbl_class;
Copying data from file:/data/test1.txt
Copying file: file:/data/test.txt
Loading data to table default.tbl_class
OK
Time taken: 0.198 seconds
测试:
hive> select student_id_list[3] from tbl_class;
Total MapReduce jobs = 1
......
Total MapReduce CPU Time Spent: 480 msec
OK
4
NULL
Time taken: 21.574 seconds
字典(map)使用
建表:
hive> create table tbl_employee(id string, perf map<string, int>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':';
OK
Time taken: 0.144 seconds
导入数据:
$ cat test2.txt
1 job:80,team:60,person:70
2 job:60,team:80
3 job:90,team:70,person:100
hive> LOAD DATA LOCAL INPATH '/data/test2.txt' INTO TABLE tbl_employee;
测试:
hive> select perf['person'] from tbl_employee;
Total MapReduce jobs = 1
......
Total MapReduce CPU Time Spent: 460 msec
OK
70
NULL
100
Time taken: 20.902 seconds