#times of try
times=36
#interval in seconds
interval=450
if [ $(date +%u) -eq 1 ]
then
times=108
fi
echo "check if /data/hive/insights/brand/irm-free-insights-pipeline/${date}/_SUCCESS exits."
i=0
while [ $i -lt $times ]
do
s=$(ssh insights@jobs-aa-sched1 "(hadoop fs -ls /data/hive/insights/brand/irm-free-insights-pipeline/${date} | grep _SUCCESS | wc -l)")
if [ $s -eq 1 ]
then
echo "perks pipeline success."
exit 0
else
echo "sleep for $interval seconds and check again, tried $i out of $times"
sleep $interval
fi
i=$(expr $i + 1)
done
echo "insights pipeline failed."
exit 1
Thursday, August 22, 2013
Wednesday, August 14, 2013
Hubot Lock Structure
1. Two kinds of locks :
"Uploading Lock" single node in zookeeper
"Health" with two children "targetingA" and "targetingB"
2. Whenever uploading (indexing) one cluster
We acquire "Uploading Lock" and set "Health/targetingA" false. API cannot read from targetingA any more.
If you set "Health/targetingB" true (we call this "overwrite to targetingB"), that allows API read from targetingB
3. "Uploading Lock" make sure anytime, only one cluster is doing uploading.
"Health" lock mark the one API can use.
Tuesday, August 6, 2013
Hive Several Problem
1.
insert overwrite table foo
select a.*
from
(select c, d, e from too1
union all
select c, d, e from too2
) a
This wouldn't work if the column sequence in foo is other than c, d, e
Hive will map data to wrong column based on the sequence. (if only the column type match)
2.
select * form table where id not in ('a', 'b', 'c');
The records filtered out are not only id = 'a' 'b' 'c', also include null
if id is null. the record will also be filtered out.
3.
avoid group by too many columns, especially long length strings, slow down the speed and easy to get error when processing the row. Use group by id1, id2, id3 and use group_first(other column)
insert overwrite table foo
select a.*
from
(select c, d, e from too1
union all
select c, d, e from too2
) a
This wouldn't work if the column sequence in foo is other than c, d, e
Hive will map data to wrong column based on the sequence. (if only the column type match)
2.
select * form table where id not in ('a', 'b', 'c');
The records filtered out are not only id = 'a' 'b' 'c', also include null
if id is null. the record will also be filtered out.
3.
avoid group by too many columns, especially long length strings, slow down the speed and easy to get error when processing the row. Use group by id1, id2, id3 and use group_first(other column)
Subscribe to:
Posts (Atom)