Aptana eclipse plugin IP
http://download.aptana.com/studio3/plugin/install
Tuesday, December 18, 2012
Friday, December 14, 2012
Install m2e extra for Eclipse
repo address:
https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/
for m2e build-helper connector, etc
https://repository.sonatype.org/content/repositories/forge-sites/m2e-extras/0.14.0/N/0.14.0.201109282148/
for m2e build-helper connector, etc
Thursday, December 13, 2012
Hadoop Default Ports Quick Reference
Hadoop Default Ports Quick Reference
- Is it 50030 or 50300 for that JobTracker UI? I can never remember!
Hadoop’s daemons expose a handful of ports over TCP. Some of these ports are used by Hadoop’s daemons to communicate amongst themselves (to schedule jobs, replicate blocks, etc.). Others ports are listening directly to users, either via an interposed Java client, which communicates via internal protocols, or via plain old HTTP.
This post summarizes the ports that Hadoop uses; it’s intended to be a quick reference guide both for users, who struggle with remembering the correct port number, and systems administrators, who need to configure firewalls accordingly.
Web UIs for the Common User
The default Hadoop ports are as follows:
Daemon | Default Port | Configuration Parameter | |
---|---|---|---|
HDFS | Namenode | 50070 | dfs.http.address |
Datanodes | 50075 | dfs.datanode.http.address | |
Secondarynamenode | 50090 | dfs.secondary.http.address | |
Backup/Checkpoint node? | 50105 | dfs.backup.http.address | |
MR | Jobracker | 50030 | mapred.job.tracker.http.address |
Tasktrackers | 50060 | mapred.task.tracker.http.address | |
? Replaces secondarynamenode in 0.21. |
Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following:
- /logs
- Exposes, for downloading, log files in the Java system property hadoop.log.dir.
- /logLevel
- Allows you to dial up or down log4j logging levels. This is similar to hadoop daemonlog on the command line.
- /stacks
- Stack traces for all threads. Useful for debugging.
- /metrics
- Metrics for the server. Use /metrics?format=json to retrieve the data in a structured form. Available in 0.21.
Individual daemons expose extra daemon-specific endpoints as well. Note that these are not necessarily part of Hadoop’s public API, so they tend to change over time.
The Namenode exposes:
- /
- Shows information about the namenode as well as the HDFS. There’s a link from here to browse the filesystem, as well.
- /dfsnodelist.jsp?whatNodes=(DEAD|LIVE)
- Shows lists of nodes that are disconnected from (DEAD) or connected to (LIVE) the namenode.
- /fsck
- Runs the “fsck” command. Not recommended on a busy cluster.
- /listPaths
- Returns an XML-formatted directory listing. This is useful if you wish (for example) to poll HDFS to see if a file exists. The URL can include a path (e.g., /listPaths/user/philip) and can take optional GET arguments:/listPaths?recursive=yes will return all files on the file system; /listPaths/user/philip?filter=s.*will return all files in the home directory that start with s; and /listPaths/user/philip?exclude=.txt will return all files except text files in the home directory. Beware that filter and exclude operate on the directory listed in the URL, and they ignore the recursive flag.
- /data and /fileChecksum
- These forward your HTTP request to an appropriate datanode, which in turn returns the data or the checksum.
Datanodes expose the following:
- /browseBlock.jsp, /browseDirectory.jsp, tail.jsp, /streamFile, /getFileChecksum
- These are the endpoints that the namenode redirects to when you are browsing filesystem content. You probably wouldn’t use these directly, but this is what’s going on underneath.
- /blockScannerReport
- Every datanode verifies its blocks at configurable intervals. This endpoint provides a listing of that check.
The secondarynamenode exposes a simple status page with information including which namenode it’s talking to, when the last checkpoint was, how big it was, and which directories it’s using.
The jobtracker‘s UI is commonly used to look at running jobs, and, especially, to find the causes of failed jobs. The UI is best browsed starting at /jobtracker.jsp. There are over a dozen related pages providing details on tasks, history, scheduling queues, jobs, etc.
Tasktrackers have a simple page (/tasktracker.jsp), which shows running tasks. They also expose /taskLog?taskid=<id> to query logs for a specific task. They use /mapOutput to serve the output of map tasks to reducers, but this is an internal API.
Under the Covers for the Developer and the System Administrator
Internally, Hadoop mostly uses Hadoop IPC to communicate amongst servers. (Part of the goal of the Apache Avroproject is to replace Hadoop IPC with something that is easier to evolve and more language-agnostic; HADOOP-6170is the relevant ticket.) Hadoop also uses HTTP (for the secondarynamenode communicating with the namenode and for the tasktrackers serving map outputs to the reducers) and a raw network socket protocol (for datanodes copying around data).
The following table presents the ports and protocols (including the relevant Java class) that Hadoop uses. This table does not include the HTTP ports mentioned above.
Daemon | Default Port | Configuration Parameter | Protocol | Used for |
---|---|---|---|---|
Namenode | 8020 | fs.default.name? | IPC:ClientProtocol | Filesystem metadata operations. |
Datanode | 50010 | dfs.datanode.address | Custom Hadoop Xceiver: DataNodeand DFSClient | DFS data transfer |
Datanode | 50020 | dfs.datanode.ipc.address | IPC:InterDatanodeProtocol,ClientDatanodeProtocol ClientProtocol | Block metadata operations and recovery |
Backupnode | 50100 | dfs.backup.address | Same as namenode | HDFS Metadata Operations |
Jobtracker | Ill-defined.? | mapred.job.tracker | IPC:JobSubmissionProtocol,InterTrackerProtocol | Job submission, task tracker heartbeats. |
Tasktracker | 127.0.0.1:0¤ | mapred.task.tracker.report.address | IPC:TaskUmbilicalProtocol | Communicating with child jobs |
? This is the port part of hdfs://host:8020/. ? Default is not well-defined. Common values are 8021, 9001, or 8012. See MAPREDUCE-566. ¤ Binds to an unused local port. |
That’s quite a few ports! I hope this quick overview has been helpful.
Wednesday, December 12, 2012
Solution to Annoying Dependency Error when Running HBase Excutable Tool
When using the "hadoop" executable to run HBase programs of any kind, the right way is to do this:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:'hbase classpath':/usr/lib/hadoop/lib/*:/usr/lib/hbase/lib/*
This will ensure you run with all HBase dependencies loaded on the classpath, for code to find its HBase-specific resources.
This solves guava dependency errors like:
NoClassDefFoundError: com/google/common/collect/MultimapWhen running bulk loading toolexport HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/etc/hbase/conf/ HBASE_JAR=$(ls /usr/lib/hbase/hbase-0.*.*.jar | grep -v test | sort -n -r | head -n 1) RUN_CMD="hadoop jar $HBASE_JAR completebulkload -conf /etc/hbase/conf/hbase-site.xml "
$RUN_CMD /data/hive/thunder/hfile_user_topic_sets/${DATE} user_topic_sets || exit 1
Or calling method LoadIncrementalHFiles.Scan with filter on CL
scan 'url_meta', {FILTER => "(RowFilter (=,'regexstring:.*jobs.*') )" , LIMIT=>100}
Monday, September 24, 2012
Ganglia Setup On Mac OSX
The platform is OS X Mountain Lion.
A couple of preparations:
Ganglia has so many components, configurations that makes setting it up on Mac OSX quite a task.
Using package tool would be a better choice than tar ball.
1. I upgraded XCode 4.5, since brew complained on my previous XCode 4.1. And make sure "Command Line Tool" is being installed from XCode.
2. Brew is good, and I actually blow away Mac Port for it. (They are conflicting each other... bumm).
Uninstall Mac Port.
3. Install brew by running a single command:
ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"
4. Then install and config ganglia :
brew install ganglia
5. Generate config file :
gmond --default_config > /usr/local/etc/gmond.conf
6. Enable Apache and PHP. Then install additional php package for ganglia by running:
curl -s http://php-osx.liip.ch/install.sh | bash -
7. Download gnaglia-web tar ball, and install it.
8. Suppose you put ganglia-web at ~/Sites/ganglia, then there at that directory is a file "conf_default.php"
spot "/usr/bin/rrdtool" in that file and replace with "/usr/local/bin/rrdtool" or other path that rrdtool is installed. Then need to run "make install" again.
9. Start gmond and gmetad.
sudo gmetad --debug=1
sudo gmond --debug=3
10. Open ganglia web page:
localhost/~{username}/ganglia
A couple of preparations:
Ganglia has so many components, configurations that makes setting it up on Mac OSX quite a task.
Using package tool would be a better choice than tar ball.
1. I upgraded XCode 4.5, since brew complained on my previous XCode 4.1. And make sure "Command Line Tool" is being installed from XCode.
2. Brew is good, and I actually blow away Mac Port for it. (They are conflicting each other... bumm).
Uninstall Mac Port.
sudo port -fp uninstall installed
sudo rm -rf \ /opt/local \ /Applications/DarwinPorts \ /Applications/MacPorts \ /Library/LaunchDaemons/org.macports.* \ /Library/Receipts/DarwinPorts*.pkg \ /Library/Receipts/MacPorts*.pkg \ /Library/StartupItems/DarwinPortsStartup \ /Library/Tcl/darwinports1.0 \ /Library/Tcl/macports1.0 \ ~/.macports
3. Install brew by running a single command:
ruby -e "$(curl -fsSkL raw.github.com/mxcl/homebrew/go)"
4. Then install and config ganglia :
brew install ganglia
5. Generate config file :
gmond --default_config > /usr/local/etc/gmond.conf
6. Enable Apache and PHP. Then install additional php package for ganglia by running:
curl -s http://php-osx.liip.ch/install.sh | bash -
7. Download gnaglia-web tar ball, and install it.
8. Suppose you put ganglia-web at ~/Sites/ganglia, then there at that directory is a file "conf_default.php"
spot "/usr/bin/rrdtool" in that file and replace with "/usr/local/bin/rrdtool" or other path that rrdtool is installed. Then need to run "make install" again.
9. Start gmond and gmetad.
sudo gmetad --debug=1
sudo gmond --debug=3
10. Open ganglia web page:
localhost/~{username}/ganglia
Tuesday, September 18, 2012
Apache/PHP
Make localhost/~gfan/ accessible:
USER_DIR=$(basename $(echo ~))
sudo bash -c "cat > /etc/apache2/users/$USER_DIR.conf" <<TEXT
<Directory "/Users/$USER_DIR/Sites">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
TEXT
Wednesday, September 12, 2012
Hive/Yarn Debug
localhost:8088 --check mapreduce progress. --go into single MapReduce task to check syslog, which is a convenient way to check task log.
To use HBaseStorageHandler from Hive, got to have these dependencies in $HADOOP_HOME/shared/hadoop/mapreduce/
zookeeper-3.4.3-cdh4.0.0.jar
guava-11.0.2.jar
hive-common-0.8.1-cdh4.0.0.jar
hive-contrib-0.8.1-cdh4.0.0.jar
hive-exec-0.8.1-cdh4.0.0.jar
datanucleus-connectionpool-2.0.3.jar datanucleus-enhancer-2.0.3.jar
datanucleus-core-2.0.3.jar datanucleus-rdbms-2.0.3.jar
Wednesday, September 5, 2012
Hive Quick Note
1. open info log level:
bin/hive -hiveconf hive.root.logger=INFO,console
2. HIVE weird query result:
1> Where '=' comparing two things with different types. Would return result not error
3. hive CLI
echo "use bi_thunder; select * from tmp_kfb_task1_4;" | ./hv > ~/gfan/111
4. show create table table_name;will show the script that creates the table;
Manipulate Repository via SVN+SSH
Example to manipulate a directory in repo on server:
svn ls/co/copy/delete svn+ssh://gfan@www.serverip.com/path/to/directory -m "remove directory"
svn ls/co/copy/delete svn+ssh://gfan@www.serverip.com/path/to/directory -m "remove directory"
Monday, August 27, 2012
Use VIP to workaround Hadoop/HBase localhost
In some cases, we want to use a dedicated IP address for pseudo-cluster instead of "localhost", for example, when your show-off machine changes IP address timely.
I'm not sure this would help because I didn't test by changing ip. Please give any comment you have.
I created a vip at a random address 192.168.1.130:
sudo ifconfig en1 inet 192.168.1.130 netmask 255.255.255.255 alias
I substitue localhost for all config files under my hadoop folder:
find . -name *.xml -exec sed -i "" 's/localhost/192.168.1.130/g' {} \;
And I just started hbase/dfs and inserted some data, and they work fine using that vip.
And there created
/hbase/.logs/192.168.1.130,60020,1346102408672
using the vip.
To bring the interface down:
sudo ifconfig en1 inet down
Notice that: Max OSX sed has slightly different syntax than Linux, the empty double quote "" means replacing original file with output file.
To bring the interface down:
sudo ifconfig en1 inet down
Notice that: Max OSX sed has slightly different syntax than Linux, the empty double quote "" means replacing original file with output file.
Sunday, August 26, 2012
Hadoop Cluster Setup, SSH Key Authentication
First from your “master” node check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
On your master node try to ssh again (as the hadoop user) to your localhost and if you are still getting a password prompt then.
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
Now you need to copy (however you want to-do this please go ahead) your public key to all of your “slave” machine (don’t forget your secondary name node). It is possible (depending on if these are new machines) that the slave’s hadoop user does not have a .ssh directory and if not you should create it ($ mkdir ~/.ssh)
$ scp ~/.ssh/id_dsa.pub slave1:~/.ssh/master.pub
Now login (as the hadoop user) to your slave machine. While on your slave machine add your master machine’s hadoop user’s public key to the slave machine’s hadoop authorized key store.
$ cat ~/.ssh/master.pub >> ~/.ssh/authorized_keys
Now, from the master node try to ssh to slave.
$ssh slave1
If you are still prompted for a password (which is most likely) then it is very often just a simple permission issue. Go back to your slave node again and as the hadoop user run this
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
$ chmod 600 $HOME/.ssh/authorized_keys
$ chown `whoami` $HOME/.ssh/authorized_keys
Try again from your master node.
$ssh slave1
And you should be good to go. Repeat for all Hadoop Cluster Nodes.
If run on a localhost which restarts frequently, change "hadoop.tmp.dir" in core-site.xml
If run on a localhost which restarts frequently, change "hadoop.tmp.dir" in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
<final>true</final>
</property>
From http://allthingshadoop.com/2010/04/20/hadoop-cluster-setup-ssh-key-authentication/
Thursday, August 23, 2012
Friday, August 10, 2012
Thursday, June 21, 2012
Configure Datanucleus Plugin
1. The MANIFEST.MF must have correct content in MANIFEST.MF tag
3. persistence.xml must have all persistencable classes' correct full class names.
2. For datanucleus enhancer running every time you build, right click on project name, go to properties, Maven-> Lifecycle Mapping, change the two lifecycle to process-test-classes
Friday, May 11, 2012
ZooKeeper-Coprocessor-QuickNote
1. In Coprocessor-start, a call on env.getTable will lead to connection refused error.
To solve this, using HTable constructor HTable(configuration, tableName) instead, configuration is necessary since zookeeper port number is not consistant always.
Remember to call htable.close() after operation to salve resource.
2. look for the thread using port 60010
lsof | grep 60010
Then kill the thread
To solve this, using HTable constructor HTable(configuration, tableName) instead, configuration is necessary since zookeeper port number is not consistant always.
Remember to call htable.close() after operation to salve resource.
2. look for the thread using port 60010
lsof | grep 60010
Then kill the thread
Wednesday, April 25, 2012
mvn quick review
1. Get dependency source: mvn dependency:sources
2. Sometime, better remove the whole thing and install for multiple version updating
rm -Rf ~/.m2/repository/
In Eclipse, maven build... + insall + skip tests
3. An workaround for "Internal error when enable maven dependence management", check in with svn, them import --> import existing maven project, select local projects as a whole.
4. Switch maven version :
$ cd /usr/share/java
$ ls -q1 | grep maven #check if your desired maven version is there
apache-maven-2.0.9
maven-2.2.0
maven-2.2.1
maven-3.0.2
$ cd .. #go up
$ ls -l | grep maven #check what current version is
maven -> java/maven-3.0.2$ sudo rm maven #remove unwanted symlink
Password:
$ sudo ln -s java/maven-2.2.1 maven #set it to maven 2.2.1
5. mvn update version
3. An workaround for "Internal error when enable maven dependence management", check in with svn, them import --> import existing maven project, select local projects as a whole.
4. Switch maven version :
$ cd /usr/share/java
$ ls -q1 | grep maven #check if your desired maven version is there
apache-maven-2.0.9
maven-2.2.0
maven-2.2.1
maven-3.0.2
$ cd .. #go up
$ ls -l | grep maven #check what current version is
maven -> java/maven-3.0.2$ sudo rm maven #remove unwanted symlink
Password:
$ sudo ln -s java/maven-2.2.1 maven #set it to maven 2.2.1
5. mvn update version
mvn versions:set -DnewVersion=2.50.1-SNAPSHOT
It will adjust all pom versions, parent versions and dependency versions in a multi-module project.
update child module version
mvn versions:update-child-modules
If you made a mistake, do
mvn versions:revert
afterwards, or
mvn versions:commit
Friday, January 20, 2012
OpenSuse install Chrome
1. Download Chrome frome google.com
2. At terminal: sudo zypper in ~/Downloads/[package name]
2. At terminal: sudo zypper in ~/Downloads/[package name]
Thursday, January 19, 2012
Kubuntu Missing Virtical Scoll Bar
sudo modprobe -r psmouse
sudo modprobe psmouse proto=imps
This enables scrolling for your current session - test and make sure it works. If so, you can make it permanent by:
sudo gedit /etc/modprobe.d/options
and adding the line:
options psmouse proto=imps
Then close and reboot.
Subscribe to:
Posts (Atom)