Friday, September 12, 2014

Use scutil to set host name

scutil --set HostName "localhost"

scutil --get HostName 

Note: make sure "localhost" is set in your /etc/hosts file

Thursday, September 11, 2014

HBase Snapshot Restrictions

HBase Snapshot has two restrictions :

1. If in any case, regions got merged (usually manually) after the snapshot got taken, the snapshot would be invalid. Splitting regions are fine to snapshot.

2. If you restore data to create a table from snapshot, the replication on another cluster of that new table won't guarantee the data is integral.

Monday, August 25, 2014

Setup local dev env

apply the pitch for os x

uname -a

copy the ip and put in /etc/hosts as an alias of localhost

setup dns servers in system preference -> network -> advanced -> dns



Run Service in debug mode

gradle :apps:[Service_Name]:run --debug-jvm -Pdebug

Tuesday, July 15, 2014

Gradle Notes


1. To show cmd options of task "test"
 gradle help --task :test

Thursday, July 10, 2014

Find class in jar

in lib do:

grep package/or/classname *

result :

abc.jar

look thru classes within it:

less abc.jar


shift+g : to to bottum
shift+n : go backward
space: go forward page by page

Tuesday, May 6, 2014

HBase - advanced configuration on column family level

HBase - advanced configuration on column family level

Block Size
HFile block size.Default is 64k,If you want to good sequential scan performance,it;s better to have larger  block size.
        Setting is during table creation
        hbase(main):002:0> create 'mytable',   {NAME => 'colfam1', BLOCKSIZE => '65536'}
        Or with code
        On HColumnDescriptor there is a method : setBlocksize(int)

Block Cache
        You can block cache for specific column family in order to improve caching for other column families for example.

hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', BLOCKCACHE => 'false’}

Aggresive caching

       You can choose column families to be in highter priority for caching.

       hbase(main):002:0> create 'mytable',
       {NAME => 'colfam1', IN_MEMORY => 'true'}

Bloom filters

       You enable bloom filters on the column family, like this:
       hbase(main):007:0> create 'mytable',
       {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}

       A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is enabled with ROWCOL

 TTL
       By defining Time To Live on some column family will delete the data after given amount  of time
       Example:
      hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}

     Data in colfam1 that is older than 5 hours is deleted during the next major compaction.

Compression

    Compression defenition impacts HFiles and their data. This can save disk I/O and instead pay for higher CPU utilization.
    hbase(main):002:0> create 'mytable',
    {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}

Cell versioning

    By default 3 versions of values are saved. Can be changed

    hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5,
    MIN_VERSIONS => '1'}

Wednesday, April 16, 2014

Ops

1. disk info : df
io info : iostat
usage info: free
cpu info: top / htop (is pretty cool)
web io: iftop

Thursday, February 13, 2014

Redis Note

1) remove all keys with prefix
for key in `echo 'KEYS sectionId_*' | redis-cli | awk '{print $1}'`
 do echo DEL $key
done | redis-cli

Reset ttl
for key in `echo 'KEYS *' \| redis-cli \| awk '{print $1}'`
do
echo expire $key 86400
done | redis-cli

Tuesday, February 11, 2014

Edit Bash Prompt

Edit .bash_profile:
Between the quotation marks, you can add the following lines to customize your Terminal prompt:
  • \d – Current date
  • \t – Current time
  • \h – Host name
  • \# – Command number
  • \u – User name
  • \W – Current working directory (ie: Desktop/)
  • \w – Current working directory, full path (ie: /Users/Admin/Desktop)
So, let’s say you want your Terminal prompt to display the User, followed by the hostname, followed by the directory, the .bashrc entry would be:
export PS1="\u@\h\w$ "

Wednesday, February 5, 2014

Generic HBase Export


code/analytics/etl/bing/bing-assembly/src/assembly/oozie/generic_hbase_export.xml

com.klout.bing.hbaseexport.HbaseTableExportMapper

Wednesday, January 29, 2014

Convert word doc to md file

textutil -convert html ~/Downloads/UniquereachImpressionBreakdown.docx  -stdout | pandoc -f html -t markdown -o output.md

Monday, January 27, 2014

Bash Format and Send Email

printTable.sh

#!/bin/bash
#
#   Script that output a hive table to a file

if [ $# -ne 3 ]
then
  echo "Script that output a hive table to a file."
  echo "Usage: $0 <db_name> <table_name> <output_path> "
  exit 1
fi

DB_NAME=$1
TABLE_NAME=$2
OUTPUT_PATH=$3

echo "use $1; select * from $TABLE_NAME;" | /home/insights/insights/hive/hv | tail -n+3 | head -n-1 > $OUTPUT_PATH


if [ "$email" != "email" ]
then
for line in $(echo $email | tr ',' '\n')
do

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no abc@host1 "(cd insights/bin; sh printTable.sh bi_insights tmp_kfb_user_meta ~/gfan/kfb_user_meta_$date.tsv && echo -e "brand_ks_uid\thandle\tbrand\temail\ttw_followers\tfp_likes\tnum_activities_90days\tcity\tstate\tcountry\tkdc_created_at" | cat - ~/gfan/kfb_user_meta_$date.tsv > ~/gfan/tmp && mv ~/gfan/tmp ~/gfan/kfb_user_meta_$date.tsv && zip ~/gfan/kfb_user_meta_$date.zip ~/gfan/kfb_user_meta_$date.tsv && echo 'Data Format: brand_ks_uid | handle | brand | email | tw_followers | fp_likes | num_activities_90days | city | state | country | kdc_created_at' | mutt -s 'KFB User Meta' $line -a ~/gfan/kfb_user_meta_$date.zip)"

done
fi

Friday, January 10, 2014

JUnit Test Note

Test on CL:


java -cp /Users/guangle/.m2/repository//junit/junit/4.8.1/junit-4.8.1.jar:target/thunder-libs-0.0.71.jar org.junit.runner.JUnitCore com.klout.thunder.common.UrlSummaryNormalizerTest