Thursday, March 28, 2013

Bash Tool Crontab

Print all processes :

ps -ef

Bash Tool Crontab

1. To edit cron config file :

crontab -e

2. To print cron config file :

crontab -l

Wednesday, March 27, 2013

Bash pipe direct


To redirect stdout in bash, overwriting file
cmd > file.txt
To redirect stdout in bash, appending to file
cmd >> file.txt
To redirect both stdout and stderr, overwriting
cmd &> file.txt

redirect both stdout and stderr appending to file
cmd >>file.txt 2>&1

Monday, March 25, 2013

Java heap space or GC out of limit issue


set hive.map.aggr=true;
set hive.map.aggr.hash.force.flush.memory.threshold=0.75;
set hive.map.aggr.hash.percentmemory=0.3;
set hive.groupby.mapaggr.checkinterval=10000;
set mapred.child.java.opts=-Xmx3072M;
set hive.exec.compress.output=true;
set io.seqfile.compression.type=BLOCK;
all of those are good param
except maybe set hive.exec.compress.output=true;
and
set io.seqfile.compression.type=BLOCK;

Wednesday, March 20, 2013

Awk

cat ~/12 | awk '{print "/data/prod/"$1}'




echo "list_snapshots" | hbase shell | egrep "\([A-Za-z]{3} [A-Za-z]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} \+[0-9]{4} [0-9]{4}\)" | grep $(date +%b) | awk -F ' ' '{printf ("delete_snapshot '\''%s'\''\n", $1)}' | hbase shell




cat 1  | grep -v main | grep "\[.*\]" | egrep -o "\"[^,|^\"]*\"" | tr -d '"' | awk -v date=$dateString '{printf("snapshot '\''%s'\'', '\''%s-snapshot-%s'\''\n", $1, $1, date)}'

Oozie job weird error : No input path Specified


Today I debugged with a couple guys on a strange oozie error. The MapRed job with error
"No input path Specified"
But we have the input dir set up in configuration and workflow.
It turned out to be that we missed two configurations for oozie job to tell oozie use the new api :

<property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>

Tuesday, March 19, 2013

Bash Quich Note

1. Each line in a file:

for line in $(cat 2); do echo $line; done;

2. Sed
prefer to use '|' as delimiter if possible:

sed 's|my/home/directory||g' < in > out

in place replacement :
sed -i 's|analytics/etl/maxwell/src/assembly/hive/maxwell/||g' in


3. Sed replace \n to ,\n

sed ':a;N;$!ba;s/\n/,\n/g'

4. sort by column

sort -t "," -k 2 -n input.csv
sorted by column 2

5. bash loop in number

for i in $(seq 0 855)
do
date=$(date --date "$i day ago" "+%Y%m%d")
echo "alter table oauth_user_services add if not exists partition (dt = '$date') location '$date';" >> partitions
done

Friday, March 15, 2013

HBase Lock and Override

HBase lock is like gate keeper.

Before bulkloading, set HBase lock first, then set HBase Override.

After bulkloading, release Override first and then unlock HBase.

Thursday, March 7, 2013

SSH Key

ssh-keygen
With no passphrase
Keep the .ssh permission 700
Keep the .ssh/id_rsa permission 600