Friday, January 25, 2013

Thursday, January 17, 2013

Wednesday, January 16, 2013

Github Quick Note


1. Check unpushed commit
git log origin/master..HEAD
git diff origin/master..HEAD

2.Revert uncommitted changes

# Revert changes to modified files.
git reset --hard

# Remove all untracked files and directories.
git clean -fd


3. unstaging a staged file
git reset HEAD <file>

4. Amend last commmit :

$ git commit -m 'initial commit' $ git add forgotten_file $ git commit --amend

5. show diff of a commit with commit #:
git show 7f1ef64274b588b8d7430f31fbf915257a605f45

6. reset unpushed commit :
delete the most recent commit:
git reset --hard (HEAD~1 or head number)
Delete the most recent commit, without destroying the work you've done:
git reset --soft (HEAD~1 or head number)
7. revert a single file : 
git checkout filename
git reset --hard will revert all changes.

8.  Avoid others' changes in my check in records
git checkout master
git pull -rebase
git checkout -b your-branch

git commit -m "something"

git commit -m "more things"

git checkout master

git pull -rebase   (put your commit on the top of the stack)  or git pull -r

git checkout your-branch

git rebase master

git checkout master

git merge your-branch ==> fast forward push


9. git rebase -i HEAD~2
combine last two commits

Friday, January 11, 2013

Eclipse slow

Tweek eclipse.ini for more heap space

eclipse -vmargs -Xms512m -Xmx1024m

Tuesday, January 8, 2013

HBase Copy Table convenient jobs


hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=jobs-dev-zoo1,jobs-dev-zoo2,jobs-dev-zoo3:2181:/hbase tableName

reference : http://hbase.apache.org/book/ops_mgt.html#copytable


on the other hand, copy files between cluster is convenient using distcp


hadoop distcp hdfs://jobs-hnn1/data/prod/jobs/mr/gnip/harvester/user_data/20130110/20130110004838936/output/scor* hdfs://jobs-hnn2/data/prod/jobs/mr/gnip/harvester/user_data/20130110/20130110004838936/output/

Maven version

Sometime when you miss artifact in a extremely strange way, don't forget to check you maven version is compatible with the old pom.

Saturday, January 5, 2013

When you MapRed job yelling "Too many fetch-failures"

That happens when too many fetch failures happen on a specific reducer task node.

Three attributes could be check for this issure:

- mapred.reduce.slowstart.completed.maps = 0.80

allows reducers from other jobs to run while a big job waits on mappers

- tasktracker.http.threads = 80

specifies number of threads used by the reducer task node to serve output from mapper

- mapred.reduce.parallel.copies = sqrt(#of nodes) with a floor of 10

number of parallel copies used by reducers to fetch map output

Friday, January 4, 2013

ssh pub key auto passby

Scenario:
log in from you laptop to a cluster and then go from there to some other cluster. Need set up auto pass by my ssh pub key.

1. make sure add all you ssh key to ssh agent
ssh-add
ssh-add -l

2. ssh to the intermediate cluster with -A (auto passby):
ssh -A user@host
ssh-add -l

3. ssh to the destiny cluster:
ssh user@destiny

Thursday, January 3, 2013