HBase Performance Tuning

Column Family

  • Restrict to 2 o 3 column family

  • Flushing and compactions are done per region basis i.e. if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small.

  • query one column family or the other but usually not both at the one time.

Row-Key Design

  • keep column family, column name as short as possible

  • useful shorter rowkey

Constants

Instead of

Get get = new Get(rowkey);
Result r = htable.get(get);
byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value

use

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(rowkey);
Result r = htable.get(get);
byte[] b = r.getValue(CF, ATTR);  // returns current version of value

Writing to HBase

  • Use pre-split regions

  • AutoFlush - When performing a lot of Puts, make sure that setAutoFlush is set to false on your HTable instance. Otherwise, the Puts will be sent one at a time to the RegionServer. Puts added via htable.add(Put) and htable.add( <List> Put) wind up in the same write buffer. If autoFlush = false, these messages are not sent until the write-buffer is filled. To explicitly flush the messages, call flushCommits. Calling close on the HTable instance will invoke flushCommits.

Last updated

Was this helpful?