Monday, August 27, 2012

Use VIP to workaround Hadoop/HBase localhost


In some cases, we want to use a dedicated IP address for pseudo-cluster instead of "localhost", for example, when your show-off machine changes IP address timely.

I'm not sure this would help because I didn't test by changing ip. Please give any comment you have.
I created a vip at a random address 192.168.1.130:

sudo ifconfig en1 inet 192.168.1.130 netmask 255.255.255.255 alias

I substitue localhost for all config files under my hadoop folder: 

find . -name *.xml -exec sed -i "" 's/localhost/192.168.1.130/g' {} \;

And I just started hbase/dfs and inserted some data,  and they work fine using that vip. 
And there created 

/hbase/.logs/192.168.1.130,60020,1346102408672 

using the vip.

To bring the interface down:

sudo ifconfig en1 inet down

Notice that: Max OSX sed has slightly different syntax than Linux, the empty double quote "" means replacing original file with output file.

Sunday, August 26, 2012

Hadoop Cluster Setup, SSH Key Authentication


First from your “master” node check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
On your master node try to ssh again (as the hadoop user) to your localhost and if you are still getting a password prompt then.
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
Now you need to copy (however you want to-do this please go ahead) your public key to all of your “slave” machine (don’t forget your secondary name node).  It is possible (depending on if these are new machines) that the slave’s hadoop user does not have a .ssh directory and if not you should create it ($ mkdir ~/.ssh)
$ scp ~/.ssh/id_dsa.pub slave1:~/.ssh/master.pub
Now login (as the hadoop user) to your slave machine.  While on your slave machine add your master machine’s hadoop user’s public key to the slave machine’s hadoop authorized key store.
$ cat ~/.ssh/master.pub >> ~/.ssh/authorized_keys
Now, from the master node try to ssh to slave.
$ssh slave1
If you are still prompted for a password (which is most likely) then it is very often just a simple permission issue.  Go back to your slave node again and as the hadoop user run this
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
Try again from your master node.
$ssh slave1
And you should be good to go. Repeat for all Hadoop Cluster Nodes.

If run on a localhost which restarts frequently, change "hadoop.tmp.dir" in core-site.xml


<property>
    <name>hadoop.tmp.dir</name>
    <value>/var/hadoop</value>
    <final>true</final>
  </property>

From http://allthingshadoop.com/2010/04/20/hadoop-cluster-setup-ssh-key-authentication/

Thursday, August 23, 2012

MRUNIT

1. Let CDH4 work with MRUNIT


<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>0.9.0-incubating</version>
<classifier>hadoop2</classifier>
</dependency>

reference: https://blogs.apache.org/mrunit/entry/apache_mrunit_0_9_0#comment-1337534033000.

Friday, August 10, 2012

Quick Note

1.   p7zip is the Unix port for 7-zip. So use "sudo port install p7zip" to install