Tuesday, November 5, 2013

Riak Compared to MongoDB

1.4.2
This is intended to be a brief, objective and technical comparison of Riak and MongoDB. The MongoDB version described is 2.2.x. The Riak version described is Riak 1.2.x. If you feel this comparison is unfaithful at all for whatever reason, please fix it or send an email to docs@basho.com.

At A Very High Level

  • Riak is Apache 2.0 licensed; MongoDB is distributed under the AGPL
  • Riak is written primarily in Erlang with some bits in C; MongoDB is written in C++

Feature/Capability Comparison

The table below gives a high level comparison of Riak and MongoDB features/capabilities. To keep this page relevant in the face of rapid development on both sides, low level details are found in links to Riak and MongoDB online documentation.
Feature/CapabilityRiakMongoDB
Data ModelRiak stores key/value pairs in a higher level namespace called a bucket. MongoDB’s data format is BSON (binary equivalent to JSON) stored as documents (self-contained records with no intrinsic relationships). Documents in MongoDB may store any of the defined BSON types and are grouped in collections.
Storage ModelRiak has a modular, extensible local storage system which lets you plug-in a backend store of your choice to suit your use case. The default backend is Bitcask. You can also write your own storage backend for Riak using our backend API.MongoDB’s default storage system is the Memory-Mapped Storage Engine. It uses memory mapped files for all disk I/O. It is the responsibility of the OS to manage flushing data to disk and paging data in and out.
Data Access and APIsRiak offers two primary interfaces (in addition to raw Erlang access): Riak Client libraries are wrappers around these APIs, and client support exists for dozens of languages. MongoDB uses a custom, socket-based wire protocol with BSON as the interchange format. 10Gen and the Mongo community support many client libraries.
Query Types and Query-abilityThere are currently four ways to query data in Riak MongoDB has a query interface that has some similarities to relational databases, including secondary indexes that can be derived from the stored documents. MongoDB also has a facilities for performing MapReduce queries and ad-hoc queries on documents. Hadoop support is available, too.
Data Versioning and ConsistencyRiak uses a data structure called a vector clock to reason about causality and staleness of stored values. Vector clocks enable clients to always write to the database in exchange for consistency conflicts being resolved at read time by either application or client code. Vector clocks can be configured to store copies of a given datum based on size and age of said datum. There is also an option to disable vector clocks and fall back to simple time-stamp based “last-write-wins”. MongoDB exhibits strong consistency. Eventually consistent reads can be accomplished via secondaries. A MongoDB cluster (with auto-sharding and replication) has a master server at a given point in time for each shard.
ConcurrencyIn Riak, any node in the cluster can coordinate a read/write operation for any other node. Riak stresses availability for writes and reads, and puts the burden of resolution on the client at read time.MongoDB relies on locks for consistency. As of version 2.2, MongoDB has a DB Level Lock for all operations.
ReplicationRiak’s replication system is heavily influenced by the Dynamo Paper and Dr. Eric Brewer’s CAP Theorem. Riak uses consistent hashing to replicate and distribute N copies of each value around a Riak cluster composed of any number of physical machines. Under the hood, Riak uses virtual nodes to handle the distribution and dynamic rebalancing of data, thus decoupling the data distribution from physical assets. The Riak APIs expose tunable consistency and availability parameters that let you select which level configuration is best for your use case. Replication is configurable at the bucket level when first storing data in Riak. Subsequent reads and writes to that data can have request-level parameters. Mongo manages replication via replica sets, a form of asynchronous master/slave replication. Traditional master/slave replication is available but not recommended.
Scaling Out and InRiak allows you to elastically grow and shrink your cluster while evenly balancing the load on each machine. No node in Riak is special or has any particular role. In other words, all nodes are masterless. When you add a physical machine to Riak, the cluster is made aware of its membership via gossiping of ring state. Once it’s a member of the ring, it’s assigned an equal percentage of the partitions and subsequently takes ownership of the data belonging to those partitions. The process for removing a machine is the inverse of this. Riak also ships with a comprehensive suite of command line tools to help make node operations simple and straightforward. Mongo relies on sharding for scaling out. This involves designating a certain server to hold certain chunks of the data as the data set grows. To scale in, MongoDB has support for removing shards from your database.
Multi-Datacenter Replication and AwarenessRiak features two distinct types of replication. Users can replicate to any number of nodes in one cluster (which is usually contained within one datacenter over a LAN) using the Apache 2.0 licensed database. Riak Enterprise, Basho’s commercial extension to Riak, is required for Multi-Datacenter deployments (meaning the ability to run active Riak clusters in N datacenters). MongoDB can be configured to run in multiple datacenters via various options.
Graphical Monitoring/Admin ConsoleRiak ships with Riak Control, an open source graphical console for monitoring and managing Riak clusters. MongoDB does not ship with a graphical monitoring/admin console. However, several community projects have developed graphical monitoring/admin programs. 10Gen offers a hosted monitoring service.

No comments: