Saturday, January 12, 2013

Looking at Cassandra DB

Few days ago I was asked "what do you think about Cassandra?" At that moment I'd read only few short notes about this database, all these notes was pretty old (around year ago) so there was not many useful or attractive information. But I've decide to check what has been changed with this DB. And... it was really interesting.

Already after week I've read book Cassandra: The Definitive Guide and lot of information from StackOverflow, SlideShare and googled articles. Book is very outdated, most of useful information you can find in Datastax's Cassandra documentation. To update your knowledge after reading that book, I recommend to check list of articles at (for example, SuperColumns were replaced by Composite Columns and Thrift was replaced by CQL3 and binary protocol).

And what is so interesting in Cassandra?

First of all - fault tolerance

I'm sure it's very important for clients loyalty to see working service. They shouldn't see "server is down due maintenance" message never - it's unacceptable and it's shame for every service.

Most of web-projects currently use MySQL as database and it's not so easy to achieve real fault tolerance. Master-slave replication is not enough - what will you do when master-node will be rebooted for software/hardware upgrade or will be not available in case of network issues? You will panic and try to switch some slave into master mode or restore master. It's a downtime, in good case - few minutes, in bad - few hours. Users loyalty, reputation of service, money, time, sleepless night...

It's good that we have Galera for MySQL. Update: Don't use Galera, it's painful.
But Amazon Dynamo, Cassandra, Riak were build with idea of decentralization as the main idea so it's not "workaround" for database, it's intended behavior. That's why I personally trust Cassandra's fault tolerance more than MySQL's with Galera.
Just read about all possibilities and settings of replication in Cassandra and you will see that there is a lot of features for great fault tolerance. Cross-datacenter ring native support is awesome.

Second - performance

I spent whole day to find this page with benchmarks graphics of Cassandra, MySQL, Riak, HBase, MongoDB. But here I found very important information - performance of Cassandra is very good, not just "good enough" but exactly very good.

Ready to use

We, modest programmers of high-level web-languages, are always in a dependence of "clients" for compiled software.
It was not surprisingly to find an existing PHP-client for Cassandra, PHPCassa, but I don't like static methods, hardcoding and global variables, so I don't want use code with so ugly lines as "$GLOBALS['\cassandra\E_IndexOperator']" (it's line from PHPCassa code). Other clients have lot of static method also.

I thought there is 2 variants - write own client (as wrapper for Thrift API) or find existing client with most clear code. But... today I found very cool thing - Cassandra PDO driver for PHP! And there is SQL-like syntax (CQL) It's amazing and now I'm deploying few Linode VPS to try Cassandra PDO.

I'm inspired and that's what I wish to you :)


  1. How did your testing go? Particularly on Linode

  2. Is there another post where you go into more detail regarding the experience that made you write: "Update: Don't use Galera, it's painful"?