Hey, it's HighScalability time:
( Earth sized solar flare , some more flair )
Google I/O to world: Just try to keep up with us. You can't. But go ahead and try. Nah na na na nah...
17 billion : Google Cloud Messaging messages per day with 60ms latency; 1B page views : 500px; 121 billion : edge graph using Titan; 4 billion hours : hours watched on Netflix per quarter; 4.5 trillion : BigTable transactions per month
As Spanner is a not so distant cousin of BigTable, the NoSQL component should be no surprise. Spanner is charged with spanning millions of machines inside any number of geographically distributed datacenters. What is surprising is how OldSQL has been embraced. In an earlier 2011 talk given by Alex at the HotStorage conference, the reason for embracing OldSQL was the desire to make it easier and faster for programmers to build applications. The main ideas will seem quite familiar: …
See the website .
See the slides .
He doesn't cover setting up a production cluster.
Using a schema is optional.
Cassandra is like a combination of Dynamo from Amazon and BigTable from Google.
It uses timestamps for conflict resolution. The clients determine the time. There are other approaches to conflict resolution as well.
Data in Cassandra looks like a multi-level dict.
By default, Cassandra eats 1/2 of your RAM. You might want to change that ;)
…you get is a nice database engine for certain type of workloads. In fact, Google's BigTable, Hadoop's HBase, and Cassandra amongst others are all using a variant or a direct copy of this very architecture.
Simple on the surface, but as usual, implementation details matter a great deal. Thankfully, Jeff Dean and Sanjay Ghemawat , the original contributors to the SSTable and BigTable infrastructure at Google released LevelDB earlier last year , …
…can only assume DynamoDB either lets the last write win or has a scheme similar to BigTable, using timestamps for each attribute.
Writes don't allow you to specify something like a quorum, telling DynamoDB how consistent you'd like the write to be, it seems to be up to the system to decide when and how quickly replication to other datacenters is done. Alex Popescu's summary on DynamoDB and Werner Vogels' introduction suggest that writes are replicated …
…around this, even with a range-based key location. HBase (and Google's BigTable, for that matter) stores ranges of data in separate tablets. As tablets grow beyond their maximum size, they're split up and the remaining parts re-distributed. The advantage of this is that the original range is kept, even as you scale up.
Consistent Hashing Enables Partitioning
When you have a consistent hash, everything looks like a partition. The idea is simple. Consistent hashing forms a keyspace, …
such as SimpleDB, BigTable, Cassandra, CouchDB, and MongoDB .
I recommend a few excellent podcasts on Cassandra , CouchDB and MongoDB to get a sense of what this is all about.
Recently, MongoDB has received a lot of attention due to the following factors:
availability on many platforms
rich language support: C, C++, C#, Java, Javascript, Perl, PHP, Python, Ruby
binary json for efficient storage
equivalent of …
…Google and Amazon's broad use of their non-relational BigTable and Dynamo systems. We evaluated all the usual open source NoSQL suspects. After considerable debate, we decided to go with Cassandra.
We don't deserve anything. Publishers can do whatever they want. If you don't like it, don't send them nasty emails or browse their sites with ad-blockers: just don't support them. Don't read their content, don't link to them, and …
…engineer Stu Hood, who explained Cassandra's appeal: "Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure. When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single row to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values."
…you've got a bunch of other data models in the middle: graph databases, tabular databases like BigTable, and document databases like Mongo.
There are a few different ways that you can think about document databases. One of the nice things about document databases is that they're closely mapped to how most developers are writing code, whereas SQL databases were designed for accounting and banking 30 or 40 years ago, prior to the advent of web applications and the rise of object-oriented …