Friday, December 6, 2013

hbase note

1.  pre-split regions
rm /tmp/region.splits; for i in $(seq 1 1 99); do printf "%02d00\n" $i >> /tmp/region.splits ; done
create 'networkProfile3', {NAME => 'c', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '1', TTL => '7776000', BLOCKSIZE => '65536', IN_MEMORY => 'false ', BLOCKCACHE => 'true'}, {SPLITS_FILE => '/tmp/region.splits'}


2. flush sharded redis server

 for i in $(seq 1 7)
 do 
 host="flredis0"$i"-mini.private"  
 ssh $host '(echo "flushdb" | redis-cli)'

 done

Tuesday, November 19, 2013

Scala Notes

1. A function calling async function should return a future too.

def foo (b: B) : Future[A] {
    val c = aync {
       ...
    }

2. access a future:

 for {
      s <- c
   } yield {
      ...
   }
}

3. change a list of future to future of list
Future.sequence(x)

4. wrap anything with Future:
sync(None)

Thursday, November 14, 2013

Test udf


 java -cp ./insights-etl-0.7.6.jar:~/insights/lib/*:/usr/lib/hive/lib/* com.klout.perk.GnipTupleUDTF

Friday, November 1, 2013

Hive Mapside Join Configuration

Disable mapside join
set hive.auto.convert.join=false;
Other configuration : go to https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties
search mapside join

Wednesday, September 25, 2013

Scala

Means case class returns not single types in different branches:

 Cannot prove that Product with Serializable <:< (T, U).

Tuesday, September 17, 2013

ElasticSearch

1. ElasticSearch transfers data scoll size setup. Will transfer up to 100 documents starting from the first one.

_search?from=0&size=100

Thursday, August 22, 2013

Bash Looping

#times of try
times=36
#interval in seconds
interval=450

if [ $(date +%u) -eq 1 ]
then
 times=108
fi

echo "check if /data/hive/insights/brand/irm-free-insights-pipeline/${date}/_SUCCESS exits."
i=0
while [ $i -lt $times ]
do
s=$(ssh insights@jobs-aa-sched1 "(hadoop fs -ls /data/hive/insights/brand/irm-free-insights-pipeline/${date} | grep _SUCCESS | wc -l)")
if [ $s -eq 1 ]
then
  echo "perks pipeline success."
  exit 0
else
  echo "sleep for $interval seconds and check again, tried $i out of $times"
  sleep $interval
fi
i=$(expr $i + 1)
done

echo "insights pipeline failed."
exit 1