A month ago I left Rackspace to start Riptano , a Cassandra support and services company.
I was in the unusal position of being a technical person looking for a business-savvy co-founder. For whatever reason, the converse seems a lot more common . Maybe technical people tend to sterotype softer skills as being easy.
But despite some examples to the contrary (notably for me, Josh Coates at Mozy ), I found that starting a company …
Cassandra has seen some impressive adoption success over the past months, leading some to conclude that Cassandra is the frontrunner in the highly scalable databases space (a subset of the hot NoSQL category ). Among all the attention, some misunderstandings have been propagated, which I'd like to clear up.
Fiction : " Cassandra relies on high-speed fiber between datacenters" and can't reliably replicate between datacenters with more than a few ms of latency between them.
Cassandra is participating in the Google Summer of Code, which opened for proposal submission today . Cassandra is part of the Apache Software Foundation, which has its own page of guidelines up for students and mentors.
We have a good mix of project ideas involving both core and non-core areas, from straightforward code bashing to some pretty tricky stuff, depending on your appetite. Core tickets aren't necessarily harder than non-core, but they will require reading and understanding more existing code.
There's been a lot of new articles about Cassandra deployments in the past month, enough that I thought it would be useful to summarize in a post.
Twitter: Ryan King explained in an interview with Alex Popescu why Twitter is moving to Cassandra for tweet storage, and why they selected Cassandra over the alternatives. My experience is that the more someone understands large systems and the problems you can run into with them from an operational standpoint, …
Several of the reports of the recently-concluded NoSQL Live event mentioned that I took a contrarian position on the " NoSQL in the Cloud" panel, arguing that traditional, bare metal servers usually make more sense. Here's why.
There are two reasons to use cloud infrastructure (and by cloud I mean here "commodity VMs such as those provided by Rackspace Cloud Servers or Amazon EC2):
You only need a fraction of the capacity of a single machine
Handling deletes in a distributed, eventually consistent system is a little tricky, as demonstrated by the fairly frequent recurrence of the question, " Why doesn't disk usage immediately decrease when I remove data in Cassandra ?"
As background, recall that a Cassandra cluster defines a ReplicationFactor that determines how many nodes each key and associated columns are written to. In Cassandra (as in Dynamo ), the client controls how …
Apache Cassandra 0.5.0 was released over the weekend, four months after 0.4. ( Upgrade notes ; full changelog .) We're excited about releasing 0.5 because it makes life even better for people using Cassandra as their primary data source -- as opposed to a replica, possibly denormalized, of data that exists somewhere else.
The Cassandra distributed database has always had a commitlog to provide durable writes, and in 0.4 we added an option to waiting for …
I want to write about Cassandra performance tuning, but first I need to cover some basics: how to use vmstat, iostat, and top to understand what part of your system is the bottleneck -- not just for Cassandra but for any system.
vmstat You will typically run vmstat with "vmstat sampling-period", e.g., "vmstat 5." The output looks like this:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi …
I put together this list for a co-worker who wants to learn more about Cassandra: ( 0.5 beta 2 out now!)
Getting Started : Cassandra is surprisingly easy to try out. This walks you through both single-node and clustered setup.
The Dynamo paper and Amazon's related article on eventual consistency : Cassandra's replication model is strongly influenced by Dynamo's. Almost everything you read here also applies to Cassandra. (The …