Thursday, January 30, 2014

DataNucleus v3 and MongoDB

There has obviously been a recent shift to look at highly scalable datastores for use in "the cloud". Google wrote a plugin for their own BigTable datastore back in 2009, providing access to some of the features of JDO/JPA. Unfortunately they didn't have the intention of providing a full and fair reflection of those persistence specifications, and so reaction to it was mixed. Some people attempted to argue that APIs like JDO did not match these types of datastores (I see nothing in the API or query language of JDO that led to this conclusion, but anyway) and that using standard APIs on them was inappropriate; they were asked to provide concrete examples of features of these datastores could not be handled adequately by JDO but unfortunately didn't come up with anything.

With DataNucleus v3 we have the opportunity to spend some time on providing good support for these types of datastores, adding support for missing features. A previous blog post documented efforts to upgrade the support for HBase. In this blog post we describe the features in the new plugin for the MongoDB document-based "NoSQL" store.

Features that this plugin currently supports include
  • Support for single MongoDB instances, and for MongoDB replica sets
  • Support for application identity (defined by the user), and datastore identity (surrogate field in the JSON document)
  • Basic persistence/update/delete of objects
  • Support for persistence of (unembedded) Collections/Maps/arrays by way of storing the identity of any related object(s) in the field.
  • Persistence of related objects (1-1/N-1) as "flat" embedded, where all fields of the related object are fields of the owner JSON document. This also supports nested related objects (unlimited depth).
  • Persistence of related objects (1-1/N-1) as nested embedded, where the related object is stored as a nested JSON document within the owner JSON document. This also supports nested related objects (unlimited depth).
  • Persistence of related collections/arrays (1-N) as nested embedded, where the related objects are stored as an array of nested JSON documents. This also supports nested relations.
  • Persistence of related maps (1-N) as nested embedded, where the related map is a nested array of map entries with fields "key","value". Supports nested relations.
  • Persistence of fields as serialised.
  • Polymorphic queries. If the query requests a class or subclasses then that is what is returned. This implies the execution of any query against the MongoDB collection for each of the possible candidate classes.
  • Access to the native MongoDB "DB" object via the standard JDO datastore connection accessor
  • Support for persistence of object version, stored in a separate field in the JSON document and support for optimistic version checking
  • Support for "identity" value generation using the MongoDB "_id" field value. The only restriction on this is that a field/property using "identity" value generation has to be of type String
  • Support for "increment" value generation (numeric fields).
  • Support for SchemaTool creation/deletion of schemas. This supports the document collection for the classes, as well as any indices required (including unique).
  • JDOQL/JPQL querying, including support for fetch groups, so you can restrict how much data is returned by the query.
  • Basic JDOQL/JPQL filter clauses (comparison operations) are evaluated in the datastore where possible.
  • Support for running queries on MongoDB slave instances.
  • Support for persistence of discriminators, so a user can store multiple classes into the same MongoDB collection, and we use the discriminator to determine the object type being returned.

As you can see, we already provide a very good JDO (and JPA) capability for MongoDB, and the feature list is shown as a matrix here.

Input to this plugin is obviously desired, particularly from people with more intimate knowledge of MongoDB. Source code can be found at SourceForge, and issues are tracked via the NUCMONGODB project in JIRA.