Tuesday, November 5, 2013

Riak Compared to MongoDB

1.4.2

This is intended to be a brief, objective and technical comparison of Riak and MongoDB. The MongoDB version described is 2.2.x. The Riak version described is Riak 1.2.x. If you feel this comparison is unfaithful at all for whatever reason, please fix it or send an email to docs@basho.com.

At A Very High Level

Riak is Apache 2.0 licensed; MongoDB is distributed under the AGPL
Riak is written primarily in Erlang with some bits in C; MongoDB is written in C++

Feature/Capability Comparison

The table below gives a high level comparison of Riak and MongoDB features/capabilities. To keep this page relevant in the face of rapid development on both sides, low level details are found in links to Riak and MongoDB online documentation.

Feature/Capability	Riak	MongoDB
Data Model	Riak stores key/value pairs in a higher level namespace called a bucket. Buckets, Keys, and Values	MongoDB’s data format is BSON (binary equivalent to JSON) stored as documents (self-contained records with no intrinsic relationships). Documents in MongoDB may store any of the defined BSON types and are grouped in collections. Documents Data Types and Conventions
Storage Model	Riak has a modular, extensible local storage system which lets you plug-in a backend store of your choice to suit your use case. The default backend is Bitcask. Riak Supported Storage Backends You can also write your own storage backend for Riak using our backend API.	MongoDB’s default storage system is the Memory-Mapped Storage Engine. It uses memory mapped files for all disk I/O. It is the responsibility of the OS to manage flushing data to disk and paging data in and out. Caching
Data Access and APIs	Riak offers two primary interfaces (in addition to raw Erlang access): HTTP Protocol Buffers Riak Client libraries are wrappers around these APIs, and client support exists for dozens of languages. Client Libraries Community Projects	MongoDB uses a custom, socket-based wire protocol with BSON as the interchange format. Mongo Wire Protocol 10Gen and the Mongo community support many client libraries. Client-Libraries
Query Types and Query-ability	There are currently four ways to query data in Riak Primary key operations (GET, PUT, DELETE, UPDATE) MapReduce Using Secondary Indexes Using Search	MongoDB has a query interface that has some similarities to relational databases, including secondary indexes that can be derived from the stored documents. MongoDB also has a facilities for performing MapReduce queries and ad-hoc queries on documents. Hadoop support is available, too. Querying Indexes MapReduce MongoDB Hadoop Adapter
Data Versioning and Consistency	Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. Vector clocks enable clients to always write to the database in exchange for consistency conflicts being resolved at read time by either application or client code. Vector clocks can be configured to store copies of a given datum based on size and age of said datum. There is also an option to disable vector clocks and fall back to simple time-stamp based “last-write-wins”. Vector Clocks Why Vector Clocks Are Easy Why Vector Clocks Are Hard	MongoDB exhibits strong consistency. Eventually consistent reads can be accomplished via secondaries. A MongoDB cluster (with auto-sharding and replication) has a master server at a given point in time for each shard. On Distributed Consistency
Concurrency	In Riak, any node in the cluster can coordinate a read/write operation for any other node. Riak stresses availability for writes and reads, and puts the burden of resolution on the client at read time.	MongoDB relies on locks for consistency. As of version 2.2, MongoDB has a DB Level Lock for all operations. Locks DB Level Locking How Does Concurrency Work?
Replication	Riak’s replication system is heavily influenced by the Dynamo Paper and Dr. Eric Brewer’s CAP Theorem. Riak uses consistent hashing to replicate and distribute N copies of each value around a Riak cluster composed of any number of physical machines. Under the hood, Riak uses virtual nodes to handle the distribution and dynamic rebalancing of data, thus decoupling the data distribution from physical assets. Replication Clustering The Riak APIs expose tunable consistency and availability parameters that let you select which level configuration is best for your use case. Replication is configurable at the bucket level when first storing data in Riak. Subsequent reads and writes to that data can have request-level parameters. Reading, Writing, and Updating Data	Mongo manages replication via replica sets, a form of asynchronous master/slave replication. Traditional master/slave replication is available but not recommended. Replication Replica Sets Master/Slave
Scaling Out and In	Riak allows you to elastically grow and shrink your cluster while evenly balancing the load on each machine. No node in Riak is special or has any particular role. In other words, all nodes are masterless. When you add a physical machine to Riak, the cluster is made aware of its membership via gossiping of ring state. Once it’s a member of the ring, it’s assigned an equal percentage of the partitions and subsequently takes ownership of the data belonging to those partitions. The process for removing a machine is the inverse of this. Riak also ships with a comprehensive suite of command line tools to help make node operations simple and straightforward. Adding and Removing Nodes Command Line Tools	Mongo relies on sharding for scaling out. This involves designating a certain server to hold certain chunks of the data as the data set grows. Sharding in MongoDB Sharding Introduction Sharding (on Wikipedia) To scale in, MongoDB has support for removing shards from your database. Removing Shards
Multi-Datacenter Replication and Awareness	Riak features two distinct types of replication. Users can replicate to any number of nodes in one cluster (which is usually contained within one datacenter over a LAN) using the Apache 2.0 licensed database. Riak Enterprise, Basho’s commercial extension to Riak, is required for Multi-Datacenter deployments (meaning the ability to run active Riak clusters in N datacenters). Riak Enterprise	MongoDB can be configured to run in multiple datacenters via various options. Datacenter Awareness
Graphical Monitoring/Admin Console	Riak ships with Riak Control, an open source graphical console for monitoring and managing Riak clusters. Riak Control Introducing Riak Control	MongoDB does not ship with a graphical monitoring/admin console. However, several community projects have developed graphical monitoring/admin programs. Monitoring and Diagnostics Admin UIs 10Gen offers a hosted monitoring service. Mongo Monitoring Service

Kumar's Blog

Tuesday, November 5, 2013

Riak Compared to MongoDB

At A Very High Level

Feature/Capability Comparison

No comments: