Thursday, January 30, 2014

DataNucleus v3 and MongoDB

There has obviously been a recent shift to look at highly scalable datastores for use in "the cloud". Google wrote a plugin for their own BigTable datastore back in 2009, providing access to some of the features of JDO/JPA. Unfortunately they didn't have the intention of providing a full and fair reflection of those persistence specifications, and so reaction to it was mixed. Some people attempted to argue that APIs like JDO did not match these types of datastores (I see nothing in the API or query language of JDO that led to this conclusion, but anyway) and that using standard APIs on them was inappropriate; they were asked to provide concrete examples of features of these datastores could not be handled adequately by JDO but unfortunately didn't come up with anything.

With DataNucleus v3 we have the opportunity to spend some time on providing good support for these types of datastores, adding support for missing features. A previous blog post documented efforts to upgrade the support for HBase. In this blog post we describe the features in the new plugin for the MongoDB document-based "NoSQL" store.

Features that this plugin currently supports include
  • Support for single MongoDB instances, and for MongoDB replica sets
  • Support for application identity (defined by the user), and datastore identity (surrogate field in the JSON document)
  • Basic persistence/update/delete of objects
  • Support for persistence of (unembedded) Collections/Maps/arrays by way of storing the identity of any related object(s) in the field.
  • Persistence of related objects (1-1/N-1) as "flat" embedded, where all fields of the related object are fields of the owner JSON document. This also supports nested related objects (unlimited depth).
  • Persistence of related objects (1-1/N-1) as nested embedded, where the related object is stored as a nested JSON document within the owner JSON document. This also supports nested related objects (unlimited depth).
  • Persistence of related collections/arrays (1-N) as nested embedded, where the related objects are stored as an array of nested JSON documents. This also supports nested relations.
  • Persistence of related maps (1-N) as nested embedded, where the related map is a nested array of map entries with fields "key","value". Supports nested relations.
  • Persistence of fields as serialised.
  • Polymorphic queries. If the query requests a class or subclasses then that is what is returned. This implies the execution of any query against the MongoDB collection for each of the possible candidate classes.
  • Access to the native MongoDB "DB" object via the standard JDO datastore connection accessor
  • Support for persistence of object version, stored in a separate field in the JSON document and support for optimistic version checking
  • Support for "identity" value generation using the MongoDB "_id" field value. The only restriction on this is that a field/property using "identity" value generation has to be of type String
  • Support for "increment" value generation (numeric fields).
  • Support for SchemaTool creation/deletion of schemas. This supports the document collection for the classes, as well as any indices required (including unique).
  • JDOQL/JPQL querying, including support for fetch groups, so you can restrict how much data is returned by the query.
  • Basic JDOQL/JPQL filter clauses (comparison operations) are evaluated in the datastore where possible.
  • Support for running queries on MongoDB slave instances.
  • Support for persistence of discriminators, so a user can store multiple classes into the same MongoDB collection, and we use the discriminator to determine the object type being returned.

As you can see, we already provide a very good JDO (and JPA) capability for MongoDB, and the feature list is shown as a matrix here.

Input to this plugin is obviously desired, particularly from people with more intimate knowledge of MongoDB. Source code can be found at SourceForge, and issues are tracked via the NUCMONGODB project in JIRA.

DataNucleus - Tutorial for JDO using MongoDB

Background

An application can be JDO-enabled via many routes depending on the development process of the project in question. For example the project could use Eclipse as the IDE for developing classes. In that case the project would typically use the DataNucleus Eclipse plugin. Alternatively the project could use Ant, Maven or some other build tool. In this case this tutorial should be used as a guiding way for using DataNucleus in the application. The JDO process is quite straightforward.
  1. Prerequisite : Download DataNucleus AccessPlatform
  2. Step 1 : Define their persistence definition using Meta-Data.
  3. Step 2 : Define the "persistence-unit"
  4. Step 3 : Compile your classes, and instrument them (using the DataNucleus enhancer).
  5. Step 4 : Write your code to persist your objects within the DAO layer.
  6. Step 5 : Run your application.
The tutorial guides you through this. You can obtain the code referenced in this tutorial from SourceForge (one of the files entitled "datanucleus-samples-jdo-tutorial-*").

Prerequisite : Download DataNucleus AccessPlatform

You can download DataNucleus in many ways, but the simplest is to download the distribution zip appropriate to your datastore (MongoDB in this case). You can do this from SourceForge DataNucleus download page. When you open the zip you will find DataNucleus jars in the lib directory, and dependency jars in the deps directory.

Step 1 : Take your model classes and mark which are persistable

For our tutorial, say we have the following classes representing a store of products for sale.
package org.datanucleus.samples.jdo.tutorial;

public class Inventory
{
    String name = null;
    Set products = new HashSet();

    public Inventory(String name)
    {
        this.name = name;
    }

    public Set getProducts() {return products;}
}
package org.datanucleus.samples.jdo.tutorial;

public class Product
{
    long id;
    String name = null;
    String description = null;
    double price = 0.0;

    public Product(String name, String desc, double price)
    {
        this.name = name;
        this.description = desc;
        this.price = price;
    }
}
package org.datanucleus.samples.jdo.tutorial;

public class Book extends Product
{
    String author=null;
    String isbn=null;
    String publisher=null;

    public Book(String name, String desc, double price, String author, 
                String isbn, String publisher)
    {
        super(name,desc,price);
        this.author = author;
        this.isbn = isbn;
        this.publisher = publisher;
    }
}
So we have a relationship (Inventory having a set of Products), and inheritance (Product-Book). Now we need to be able to persist objects of all of these types, so we need to define persistence for them. There are many things that you can define when deciding how to persist objects of a type but the essential parts are
  • Mark the class as PersistenceCapable so it is visible to the persistence mechanism
  • Identify which field(s) represent the identity of the object (or use datastore-identity if no field meets this requirement).
So this is what we do now. Note that we could define persistence using XML metadata, annotations or via the JDO API. In this tutorial we will use annotations.
package org.datanucleus.samples.jdo.tutorial;

@PersistenceCapable
public class Inventory
{
    @PrimaryKey
    String name = null;

    ...
}
package org.datanucleus.samples.jdo.tutorial;

@PersistenceCapable
public class Product
{
    @PrimaryKey
    @Persistent(valueStrategy=IdGeneratorStrategy.INCREMENT)
    long id;

    ...
}
package org.datanucleus.samples.jdo.tutorial;

@PersistenceCapable
public class Book extends Product
{
    ...
}
Note that we mark each class that can be persisted with @PersistenceCapable and their primary key field(s) with @PrimaryKey. In addition we defined a valueStrategy for Product field id so that it will have its values generated automatically. In this tutorial we are using application identity which means that all objects of these classes will have their identity defined by the primary key field(s). You can read more in datastore identity and application identity when designing your systems persistence.

Step 2 : Define the 'persistence-unit'

Writing your own classes to be persisted is the start point, but you now need to define which objects of these classes are actually persisted. You do this via a file META-INF/persistence.xml at the root of the CLASSPATH. Like this


    
    
        org.datanucleus.samples.jdo.tutorial.Inventory
        org.datanucleus.samples.jdo.tutorial.Product
        org.datanucleus.samples.jdo.tutorial.Book
        
        
            
            
            
            
        
    
Note that you could equally use a properties file to define the persistence with JDO, but in this tutorial we use persistence.xml for convenience.

Step 3 : Enhance your classes

JDO relies on the classes that you want to persist implementing PersistenceCapable. You could write your classes manually to do this but this would be laborious. Alternatively you can use a post-processing step to compilation that "enhances" your compiled classes, adding on the necessary extra methods to make them PersistenceCapable. There are several ways to do this, most notably at post-compile, or at runtime. We use the post-compile step in this tutorial. DataNucleus JDO provides its own byte-code enhancer for instrumenting/enhancing your classes (in datanucleus-core) and this is included in the DataNucleus AccessPlatform zip file prerequisite.
To understand on how to invoke the enhancer you need to visualise where the various source and jdo files are stored
src/main/java/org/datanucleus/samples/jdo/tutorial/Book.java
src/main/java/org/datanucleus/samples/jdo/tutorial/Inventory.java
src/main/java/org/datanucleus/samples/jdo/tutorial/Product.java
src/main/resources/META-INF/persistence.xml

target/classes/org/datanucleus/samples/jdo/tutorial/Book.class
target/classes/org/datanucleus/samples/jdo/tutorial/Inventory.class
target/classes/org/datanucleus/samples/jdo/tutorial/Product.class

[when using Ant]]
lib/jdo-api.jar
lib/datanucleus-core.jar
lib/datanucleus-api-jdo.jar
The first thing to do is compile your domain/model classes. You can do this in any way you wish, but the downloadable JAR provides an Ant task, and a Maven2 project to do this for you.
Using Ant :
ant compile

Using Maven2 :
mvn compile
To enhance classes using the DataNucleus Enhancer, you need to invoke a command something like this from the root of your project.
Using Ant :
ant enhance

Using Maven : (this is usually done automatically after the "compile" goal)
mvn datanucleus:enhance

Manually on Linux/Unix :
java -cp target/classes:lib/datanucleus-core.jar:
         lib/datanucleus-api-jdo.jar:lib/jdo-api.jar
     org.datanucleus.enhancer.DataNucleusEnhancer -pu Tutorial

Manually on Windows :
java -cp target\classes;lib\datanucleus-core.jar;
         lib\datanucleus-api-jdo.jar;lib\jdo-api.jar
     org.datanucleus.enhancer.DataNucleusEnhancer -pu Tutorial

[Command shown on many lines to aid reading - should be on single line]
This command enhances the .class files that have @PersistenceCapable annotations. If you accidentally omitted this step, at the point of running your application and trying to persist an object, you would get a ClassNotPersistenceCapableException thrown. The use of the enhancer is documented in more detail in the Enhancer Guide. The output of this step are a set of class files that representPersistenceCapable classes.

Step 4 : Write the code to persist objects of your classes

Writing your own classes to be persisted is the start point, but you now need to define which objects of these classes are actually persisted, and when. Interaction with the persistence framework of JDO is performed via a PersistenceManager. This provides methods for persisting of objects, removal of objects, querying for persisted objects, etc. This section gives examples of typical scenarios encountered in an application.
The initial step is to obtain access to a PersistenceManager, which you do as follows
PersistenceManagerFactory pmf = JDOHelper.getPersistenceManagerFactory("Tutorial");
PersistenceManager pm = pmf.getPersistenceManager();
Now that the application has a PersistenceManager it can persist objects. This is performed as follows
Transaction tx=pm.currentTransaction();
try
{
    tx.begin();
    Inventory inv = new Inventory("My Inventory");
    Product product = new Product("Sony Discman", "A standard discman from Sony", 49.99);
    inv.getProducts().add(product);
    pm.makePersistent(inv);
    tx.commit();
}
finally
{
    if (tx.isActive())
    {
        tx.rollback();
    }
    pm.close();
}
Note the following
  • We have persisted the Inventory but since this referenced the Product then that is also persisted.
  • The finally step is important to tidy up any connection to the datastore, and close the PersistenceManager
If you want to retrieve an object from persistent storage, something like this will give what you need. This uses a "Query", and retrieves all Product objects that have a price below 150.00, ordering them in ascending price order.
Transaction tx = pm.currentTransaction();
try
{
    tx.begin();

    Query q = pm.newQuery("SELECT FROM " + Product.class.getName() + 
                          " WHERE price < 150.00 ORDER BY price ASC");
    List products = (List)q.execute();
    Iterator iter = products.iterator();
    while (iter.hasNext())
    {
        Product p = iter.next();

        ... (use the retrieved objects)
    }

    tx.commit();
}
finally
{
    if (tx.isActive())
    {
        tx.rollback();
    }

    pm.close();
}
If you want to delete an object from persistence, you would perform an operation something like
Transaction tx = pm.currentTransaction();
try
{
    tx.begin();

    ... (retrieval of objects etc)

    pm.deletePersistent(product);
    
    tx.commit();
}
finally
{
    if (tx.isActive())
    {
        tx.rollback();
    }

    pm.close();
}
Clearly you can perform a large range of operations on objects. We can't hope to show all of these here. Any good JDO book will provide many examples.

Step 5 : Run your application

To run your JDO-enabled application will require a few things to be available in the Java CLASSPATH, these being
  • Any persistence.xml file for the PersistenceManagerFactory creation
  • Any JDO XML MetaData files for your persistable classes (not used in this example)
  • MongoDB driver class needed for accessing your datastore
  • The JDO API JAR (defining the JDO interface)
  • The DataNucleus CoreDataNucleus JDO API and DataNucleus MongoDB JARs
After that it is simply a question of starting your application and all should be taken care of. You can access the DataNucleus Log file by specifying the logging configuration properties, and any messages from DataNucleus will be output in the normal way. The DataNucleus log is a very powerful way of finding problems since it can list all SQL actually sent to the datastore as well as many other parts of the persistence process.
Using Ant (you need the included "persistence.xml" to specify your database)
ant run


Using Maven:
mvn exec:java


Manually on Linux/Unix :
java -cp lib/jdo-api.jar:lib/datanucleus-core.jar:lib/datanucleus-mongodb.jar:
         lib/datanucleus-api-jdo.jar:lib/{mongodb_jars}:target/classes/:. 
             org.datanucleus.samples.jdo.tutorial.Main


Manually on Windows :
java -cp lib\jdo-api.jar;lib\datanucleus-core.jar;lib\datanucleus-mongodb.jar;
         lib\datanucleus-api-jdo.jar;lib\{mongodb_jars};target\classes\;. 
             org.datanucleus.samples.jdo.tutorial.Main


Output :

DataNucleus Tutorial
=============
Persisting products
Product and Book have been persisted

Retrieving Extent for Products
>  Product : Sony Discman [A standard discman from Sony]
>  Book : JRR Tolkien - Lord of the Rings by Tolkien

Executing Query for Products with price below 150.00
>  Book : JRR Tolkien - Lord of the Rings by Tolkien

Deleting all products from persistence
Deleted 2 products

End of Tutorial

Part 2 : Next steps

In the above simple tutorial we showed how to employ JDO and persist objects to MongoDB. Obviously this just scratches the surface of what you can do, and to use JDO requires minimal work from the user. In this second part we show some further things that you are likely to want to do.
  1. Step 6 : Controlling the schema.
  2. Step 7 : Generate the database tables where your classes are to be persisted using SchemaTool.

Step 6 : Controlling the schema

In the above simple tutorial we didn't look at controlling the schema generated for these classes. Now let's pay more attention to this part by defining XML Metadata for the schema.


    
        
            
            
                
            
            
                
            
        

        
            
            
                
            
            
                
            
        

        
            
            
                
            
            
                
            
            
                
            
        
    
With JDO you have various options as far as where this XML MetaData files is placed in the file structure, and whether they refer to a single class, or multiple classes in a package. With the above example, we have both classes specified in the same file package-mongodb.orm, in the package these classes are in, since we want to persist to MongoDB.

Step 7 : Generate any schema required for your domain classes

This step is optional, depending on whether you have an existing database schema. If you haven't, at this point you can use the SchemaTool to generate the tables where these domain objects will be persisted. DataNucleus SchemaTool is a command line utility (it can be invoked from Maven2/Ant in a similar way to how the Enhancer is invoked). The first thing that you need is to update thepersistence.xml file with your database details.


    
    
        org.datanucleus.samples.jdo.tutorial.Inventory
        org.datanucleus.samples.jdo.tutorial.Product
        org.datanucleus.samples.jdo.tutorial.Book
        
        
            
            
            
            
        
    

Now we need to run DataNucleus SchemaTool. For our case above you would do something like this
Using Ant :
ant createschema


Using Maven2 :
mvn datanucleus:schema-create


Manually on Linux/Unix :
java -cp target/classes:lib/datanucleus-core.jar:lib/datanucleus-mongodb.jar:
         lib/datanucleus-jdo-api.jar:lib/jdo-api.jar:lib/{mongodb_driver.jar}
     org.datanucleus.store.schema.SchemaTool
     -create -pu Tutorial

Manually on Windows :
java -cp target\classes;lib\datanucleus-core.jar;lib\datanucleus-mongodb.jar;
         lib\datanucleus-api-jdo.jar;lib\jdo-api.jar;lib\{mongodb_driver.jar}
     org.datanucleus.store.schema.SchemaTool
     -create -pu Tutorial

[Command shown on many lines to aid reading. Should be on single line]
This will generate the required tables, etc for the classes defined in the JDO Meta-Data file.



Why JDO ?

The majority of applications need to persist (or store) data during their lifecycle. There are many ways of doing this with an application written in Java.
  • If your datastore is RDBMS you can handle the persistence (and retrieval) of data yourself using JDBC. Obviously with this route you have the burden of having to write the persistence layer yourself. This gives much control, but also creates significant work, both in writing the code but also in testing and maintenance.
  • You can use JDO, a standardised persistence API. With JDO you can develop plain old java objects (POJOs) and persist them as they are transparently. This requires very little work from the developer. It allows persistence to any type of datastore in principle, being designed with flexibility and datastore agnositicity in mind. This has been a standard since 2002 (JDO1), being upgraded in 2006 (JDO2) and is in the process of being developed further (JDO2.1) by Apache JDO
  • You can use JPA, a standardised persistence API, and part of the EJB3 specification. This also allows you to to develop plain old Java objects (POJOs) and persist them using a standardised API. It's specification is not as mature or as feature rich as the JDO API, nor does it provide the flexibility of using any type of datastore. This was released in 2006 (JPA1) to supercede EJB2. It really only allows persistence to RDBMS datastores. If you want to persist to other datastores you should consider JDO.
  • If you are stuck with using an EJB2.* architecture you could use Entity Beans. This means that you hand off your objects to the EJB part of the J2EE server. This simplifies things for the developer in some respect but places major restrictions in that your objects have to be Entity Beans.
  • You can also use a proprietary persistence API (e.g Hibernates own API, TopLinks own API, iBatis, Castor etc). The disadvantages of going this route are that you cannot easily swap to an alternative implementation of the API if you hit problems with your software choice.
To give a guide, here are a few important consideration points when choosing a persistence layer for your application.
FeatureJDBCJDOJPAEJB2Custom ORM
Standards-Driven
Choice of datastores
Support POJOs
Usable in J2SE
Usable in J2EE
Out of box implementation (1)
Simple to unit test
Dynamic queries (2)
Comprehensive ORM
Primary Key generation (2)
Supports inherited objects (2)
Schema Creation
Existing schema
  1. refers to whether it is necessary to write the persistence yourself (e.g as with JDBC) or whether you can just persist by simple calls.
  2. requires the developer to write this layer.