Archive

Posts Tagged ‘Cassandra’

Devoxx – University – Cassandra, HBase

December 6th, 2010 at 9:20pm

During the morning session FIXME Ellison gave an introduction to the distributed NoSQL database Cassandra. Being generally based on the Dynamo paper from Amazon the key-value store distributes key/value pairs according to a consistent hashing schema. Nodes can be added dynamically making the system well suited for elastic scaling. In contrast to Dynamo, Cassandra can be tuned for the required consistency level. The system is tuned for storing moderately sized key/value pairs. Large blobs of data should not be stored into it. A recent addition to Cassandra has been the integration with Hadoop’s Map/Reduce implementation for data processing. In addition Cassandra comes with a Thrift interface as well as higher level interfaces such as Hector.

In the afternoon Michael Stack and Jonathan Grey gave an overview of HBase covering basic installation, going into more detail concerning the APIs. Most interesting to me was the HBase architecture and optimisation details. The systems is inspired by Google’s BigTable paper. It uses Apache HDFS as storage back-end inheriting the failure resilience of HDFS. The system uses Zookeeper for co-ordination and meta-data storage. However fear not, Zookeeper comes packaged with HBase, so in case you have not yet setup your own Zookeeper installation, HBase will do that for you on installation.

HBase is split into a master server for holding meta-data and region servers for storing the actual data. When storing data HBase optimises storage for data locality. This is done in two ways: On write the first copy usually goes to the local disk of the client, so even when storage exceeds the size of one block and remote copies get replicated to differing nodes, at least the first copy gets stored on one machine only. During a compaction phase that is scheduled regularly (usually every 24h, jitter time can be added to lower the load on the cluster) data is re-shuffled for optimal layout. When running Map/Reduce jobs against HBase this data locality can be easily exploited. HBase comes with its own input and output formats for Hadoop jobs.

HBase comes not only with Map/Reduce integration, it also publishes a Thrift interface, a REST interface and can be queried from an HBase shell.

General , , ,

Berlin Buzzwords - Early bird registration

April 10th, 2010 at 3:02pm

I would like to invite everyone interested in data storage, analysis and search to join us for two days on June 7/8th in Berlin for Berlin Buzzwords - an in-depth, technical, developer-focused conference located in the heart of Europe. Presentations will range from beginner friendly introductions on the hot data analysis topics up to in-depth technical presentations of scalable architectures.

Our intention is to bring together users and developers of data storage, analysis and search projects. Meet members of the development team working on projects you use. Get in touch with other developers you may know only from mailing list discussions. Exchange ideas with those using your software and get their feedback while having a drink in one of Berlin’s many bars.

Early bird registration has been extended until April 17th - so don’t wait too long.

If you would like to submit a talk yourself: Conference submission is open for little more than one week. More details are available online in the call for presentations:

Looking forward to meeting you in the beautiful, vibrant city of Berlin this summer for a conference packed with high profile speakers, awesome talks and lots of interesting discussions.

Berlin Buzzwords , , , , , , ,