Monday, April 22, 2013

Read content of HFile via CLI

hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f hdfs://jobs-aa-hnn:8020/data/prod/jobs/hfiles/primaryNetworkProfile/20130415/output/c/d3cc3d77adb8451187be4123a0964062

Wednesday, April 10, 2013

Bash Command

1. cut, rev, uniq, sort

cat 111 | egrep -o "Deleted: /.*/[0-9]{8}" | rev | cut -d "/" -f2- | rev | uniq -c | sort -nr


2. egrep all number and sum up

cat 111 | egrep -o "\[[0-9]+\] bytes" | egrep -o "[0-9]+" | awk '{sum+=$1} END {print sum}'

3. sh bash.sh  parameters

$@ means any parameters you passed to the script

4. strace all system call logs of a specific bash command

   strace -fvo /home/insights/insights/hive/tt3 -e\!futex -s 8192 bash ./hv


grep " open(" tt3|grep -v ENOENT|grep -v WR|awk -F\" '{print $2}'|sort -u | sed 's/home\/insights/xxx/g'|sed 's/xxx\/insights/yyy/g' | sed 's/yyy-etl-0.3.9-bin/yyy-etl-1.60-cdh4-bin/g' > files.7


5. For BSD or GNU grep you can use -B num to set how many lines before the match and -A num for the number of lines after the match.
grep -B 3 -A 2 foo README.txt
If you want the same amount of lines before and after you can use -C num.
grep -C 3 foo README.txt


6.  Copy or paste to clipboard (for mac OS)

pbcopy
pbpaste



7. Print number of fields of each line delimited by '\t'

cat kfb_topic_task1_v4 | awk -F'\t' '{print NF}' 

8. redirection doesn't work for sudo
e.g. this won't work if you don't have permission to write the file since sudo won't apply on the redirection

sudo echo 1 > /proc/sys/vm/overcommit_memory'
To solve this :
sudo sh -c 'echo 1 > /proc/sys/vm/overcommit_memory'

You can also do this easily by : echo 1 | sudo tee /proc/sys/vm/overcommit_memory




Friday, April 5, 2013

Regex



re.match(r"^[a-z]+[*]?$", s)
  1. The ^ matches the start of the string.
  2. The [a-z]+ matches one or more lowercase letters.
  3. The [*]? matches zero or one asterisks.
  4. The $ matches the end of the string.
Your original regex matches exactly one lowercase character followed by one or more asterisks.

Monday, April 1, 2013

HBase Maintainence Tool


Usage: fsck [opts] {only tables}
 where [opts] are:
   -help Display help options (this)
   -details Display full report of all regions.
   -timelag {timeInSeconds}  Process only regions that  have not experienced any metadata updates in the last  {{timeInSeconds} seconds.
   -sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checking if the fix worked if run with -fix
   -summary Print only summary of the tables and status.
   -metaonly Only check the state of ROOT and META tables.

  Metadata Repair options: (expert features, use with caution!)
   -fix              Try to fix region assignments.  This is for backwards compatiblity
   -fixAssignments   Try to fix region assignments.  Replaces the old -fix
   -fixMeta          Try to fix meta problems.  This assumes HDFS region info is good.
   -fixHdfsHoles     Try to fix region holes in hdfs.
   -fixHdfsOrphans   Try to fix region dirs with no .regioninfo file in hdfs
   -fixHdfsOverlaps  Try to fix region overlaps in hdfs.
   -fixVersionFile   Try to fix missing hbase.version file in hdfs.
   -maxMerge <n>     When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default)
   -sidelineBigOverlaps  When fixing region overlaps, allow to sideline big overlaps
   -maxOverlapsToSideline <n>  When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default)
   -fixSplitParents  Try to force offline split parents to be online.
   -ignorePreCheckPermission  ignore filesystem permission pre-check

  Datafile Repair options: (expert features, use with caution!)
   -checkCorruptHFiles     Check all Hfiles by opening them to make sure they are valid
   -sidelineCorruptHfiles  Quarantine corrupted HFiles.  implies -checkCorruptHfiles

  Metadata Repair shortcuts
   -repair           Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps
   -repairHoles      Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
Heap
 par new generation   total 176960K, used 28318K [0x0000000412e00000, 0x000000041ee00000, 0x000000041ee00000)
  eden space 157312K,  18% used [0x0000000412e00000, 0x00000004149a7b50, 0x000000041c7a0000)
  from space 19648K,   0% used [0x000000041c7a0000, 0x000000041c7a0000, 0x000000041dad0000)
  to   space 19648K,   0% used [0x000000041dad0000, 0x000000041dad0000, 0x000000041ee00000)
 concurrent mark-sweep generation total 5312K, used 0K [0x000000041ee00000, 0x000000041f330000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 10311K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)