Monday, July 6, 2015

Remove a region from HBase

Remove a region from HBase

Recently one of our hbase experienced outrage due to disk space full. Adding new nodes and balancing the cluster takes time.

After hdfs balancing with :

sudo -u hdfs hdfs dfsadmin -setBalancerBandwidth 65536000

sudo -u hdfs hdfs balancer -threshold 5

It gets back and functional. But when it runs MapReduce job from snapshot, it produces duplicate records in output. 

After digging into it a bit, I realized there are regions in HBase that are not online (shown in HBase UI), but since they are still residing in meta table and on the disk, the MapReduce job still running against them.

To remove those regions from HBase :

Find those regions from hbase:meta table
echo "scan 'hbase:meta'"| hbase shell &> dump.txt
And remove them:
deleteall 'hbase:meta', 'The name of the region'
Then run "hbase hbck PlatformData -repair" a couple times, until regions' data consistent.
Also find and remove the regions' file from hdfs :
sudo -u hdfs hadoop fs -rmr /hbase/data/default/user_profile/<regions hash>
Then the MapReduce job is good to go.

Friday, September 12, 2014

Use scutil to set host name

scutil --set HostName "localhost"

scutil --get HostName 

Note: make sure "localhost" is set in your /etc/hosts file

Thursday, September 11, 2014

HBase Snapshot Restrictions

HBase Snapshot has two restrictions :

1. If in any case, regions got merged (usually manually) after the snapshot got taken, the snapshot would be invalid. Splitting regions are fine to snapshot.

2. If you restore data to create a table from snapshot, the replication on another cluster of that new table won't guarantee the data is integral.

Monday, August 25, 2014

Setup local dev env

apply the pitch for os x

uname -a

copy the ip and put in /etc/hosts as an alias of localhost

setup dns servers in system preference -> network -> advanced -> dns



Run Service in debug mode

gradle :apps:[Service_Name]:run --debug-jvm -Pdebug

Tuesday, July 15, 2014

Gradle Notes


1. To show cmd options of task "test"
 gradle help --task :test

Thursday, July 10, 2014

Find class in jar

in lib do:

grep package/or/classname *

result :

abc.jar

look thru classes within it:

less abc.jar


shift+g : to to bottum
shift+n : go backward
space: go forward page by page

Tuesday, May 6, 2014

HBase - advanced configuration on column family level

HBase - advanced configuration on column family level

Block Size
HFile block size.Default is 64k,If you want to good sequential scan performance,it;s better to have larger  block size.
        Setting is during table creation
        hbase(main):002:0> create 'mytable',   {NAME => 'colfam1', BLOCKSIZE => '65536'}
        Or with code
        On HColumnDescriptor there is a method : setBlocksize(int)

Block Cache
        You can block cache for specific column family in order to improve caching for other column families for example.

hbase(main):002:0> create 'mytable',
{NAME => 'colfam1', BLOCKCACHE => 'false’}

Aggresive caching

       You can choose column families to be in highter priority for caching.

       hbase(main):002:0> create 'mytable',
       {NAME => 'colfam1', IN_MEMORY => 'true'}

Bloom filters

       You enable bloom filters on the column family, like this:
       hbase(main):007:0> create 'mytable',
       {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}

       A row-level bloom filter is enabled with ROW, and a qualifier-level bloom filter is enabled with ROWCOL

 TTL
       By defining Time To Live on some column family will delete the data after given amount  of time
       Example:
      hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}

     Data in colfam1 that is older than 5 hours is deleted during the next major compaction.

Compression

    Compression defenition impacts HFiles and their data. This can save disk I/O and instead pay for higher CPU utilization.
    hbase(main):002:0> create 'mytable',
    {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}

Cell versioning

    By default 3 versions of values are saved. Can be changed

    hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 5,
    MIN_VERSIONS => '1'}