-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Totally cribbed from HappyBase
There is a lack of good gems to interface with HBase from Ruby (not JRuby).
Gems like hbase-ruby and
hbaserb are too basic; hbase-stargate uses REST, not Thrift; Rhino hasn't been updaed in 3 years; and finally, Massive Record does not meet our performance expectations.
After limping along with Massive Record for too long, I decided it would be better and quicker to start from scratch than to try to fix Massive Record.
"But, Kiss doesn't really have the amount of message data to really require HBase! Isn't this just wasted effort?"
True, Kiss doesn't yet have the volume of data to require HBase, but we do generate a lot of event data, and, hopefully, so will other Labs products. It is the analysis of and metrics on this data that HBase would be ideal for.
Besides, it's an interesting exercise!
One of the biggest issues we've seen with usage Massive Record is speed, or lack there of. We use HBase to back the messaging system on Kiss.com, and for users with large inboxes, retrieval is painfully slow (~1000+ messages take 20sec or more). Writes were slow too, ~500ms is the norm. Some of this slowness is probably due to improper HBase configuration, but we saw signifigant differences reading the same data from different libraries. So we know it's possible to have faster HBase code in Ruby.
We have come to the conclusion (somewhat obvious in hindsight) that a traditional ORM pattern is a poor fit for HBase, and the overhead not only slows things down, but unnecessarily complicates things. With ok-hbase I am trying to find a balance: keep it simple, but still provide some nice features above and beyond being just a thin wrapper around the Thrift libraries.
Building just another thin wrapper around the Thrift libraries wouldn't be very useful, so with ok-hbase, I hope to provide a bit more. While a traditional ORM is not a good fit for HBase, there are some ORM-like features that would be nice to have. These features will be implemented as mixins or concerns: You can use the basic table class, or subclass it and add the mixins you want. Check out the TODO section to see what's planned.
- So far, ok-hbase is performing as fast, if not faster than the fastest Ruby test code we've written.
- It appears that we can make use of basic filters at the region server, this will hopefully improve perfomance.
- We can make use of compression in HBase
- We can make use on the
in_memory=true
setting for column families to improve performance in HBase
- Basic Connection class done
- Table creation/deletion
- Table enabling/disabling
- Table listing
- Basic Table class started
- scanning implemented, including support for:
- start/stop scans
- prefix scans
- timestamps (versions)
- limits
- caching (batch size)
- column family listing completed
- scanning implemented, including support for:
- Table class
- Basic functionality
- Cell retrieval
- Row retrieval by id
- Row retrieval by id list
- Data writing
- Data deletion
- Atomic counter operations
- Basic functionality
- Batch class
- Batch writes
- Batch deletes
- Batch class
- Added
transaction
method to emulate HappyBase's use of context managers
- Added
- The Good Stuff™
- Table class
- Support for optional default column families, so cells can be referenced without an explicit column family
- Row class
- Implicit cell access through meta-programming.
- Table class
- Basic Functionality
- Connection class
- Connection pooling
- Instrumentation
- Leverage ActiveSupport::Notifications, like massive_record does
- Connection class
- The Good Stuff™
- Table class
- Support for optional default column families, so cells can be referenced without an explicit column family
-
filter_string
generators for table scanning (analogous to a SQLwhere
clause that is processed at the region server) - Support optional indexers: save multiple copies of a row with different row keys for different access patterns
- Row class
- Implicit cell access through meta-programming.
- Support optional per-cell serializers and deserializers (for numbers, hashes, etc)
- Table class