Tuesday, August 6, 2013

Hive Several Problem

1.
insert overwrite table foo
select a.*
from
(select c, d, e from too1
union all
select c, d, e from too2
) a


This wouldn't work if the column sequence in foo is other than c, d, e
Hive will map data to wrong column based on the sequence. (if only the column type match)

2.
select * form table where id not in ('a', 'b', 'c');

The records filtered out are not only id = 'a' 'b' 'c', also include null
if id is null. the record will also be filtered out.

3.
avoid group by too many columns, especially long length strings, slow down the speed and easy to get error when processing the row. Use group by id1, id2, id3 and use group_first(other column)

No comments:

Post a Comment