GROUP BY AND SUM WITH MONGO
Deprecated: This post is now made obsolete by the very capable MongoDB Aggregation Framework. I love when a post becomes obsolete, but at least we have the past to look back on…
Original post:
As a dyed-in-the-wool SQL guy, it can be quite humbling to be suddenly stumbling around NoSQL town, snagged by what feels like the easiest query in the world.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
SQL 101, right? The old GROUP BY and SUM yields a pretty simple aggregation. We are using Mongoid as an ORM for a Rails 3 app, and I was embarassingly stumped. The Mongoid documentation enables you to perform aggregations with filters right out of the box, so
Band.where(:likes.gt => 100).sum(:likes)
is simple enough.
But “complex” aggregations like grouping require MapReduce? (Complex to whom?!) Previously, I hadn’t had the time to play around with MapReduce, so I was hoping to avoid having to go down that road for something so simple. I am sure there are plenty of you out there who—like me—are/were just using Mongo for schema-less data storage who don’t dig into MapReduce. If so, this may be a helpful entre.
GROUPING BY 1 FIELD
MapReduce, by its nature, requires that you boil down your query to an array of hashes, which you recall are key-value pairs. Naturally, you can have multiple values per row, but you can’t have multiple keys, so grouping by 1 field is a bit easier.
See Kyle Banker’s excellent discussion of MapReduce basics for this one.
1 2 3 4 5 |
|
1 2 3 4 5 6 7 8 |
|
Since you have an array of values with a single key, you can group by that key, sum any arbitrary amount field, and you’re done. As Kyle notes, if you’re doing this in ruby you then just need to parse the results.
GROUPING BY MULTIPLE FIELDS
The issue we ran into was that we needed to group by multiple fields. This is actually no problem, because the “key” in the map function can be a hash itself, so you can “pack the uniqueness” into a hash and then unpack it later.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Presuming we want to aggregate these granular transactions across above the 6 dimensions and sum the amount_cents field, we would want to employ a summary class method that would work with filters/scopes applied. It could work something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
NULL KEYS
It’s worth noting that null values in keys are simply not included in the hash, and so if the logic of your usage for the returned array requires these values either be present pointing to a null value or point to a default value, this logic should supplant the
if reduce["_id"][group_by_field.to_s]
clause above.
There it is. Group by 1 or more fields using Mongoid. Just pack the uniqueness into a hash, aggregate with MapReduce, then unpack the keys and return.
Hope this helps!
No comments:
Post a Comment