88
61
|
What is the most efficient Java Collections library?
A few years ago, I did a lot of Java and had the impression back then that trove is the best (most efficient) Java Collections implementation. But when I read the answers to the question "Most useful free Java libraries?" I noticed that trove is hardly mentioned. So which Java Collections library is best now?
UPDATE: To clarify, I mostly want to know what library to use when I have to store millions of entries in a hash table etc. (need a small runtime and memory footprint).
| ||||||||||||
|
closed as not constructive by casperOne♦ Sep 14 '12 at 15:11
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.If this question can be reworded to fit the rules in the help center, please edit the question.
49
|
From inspection, it looks like Trove is just a library of collections for primitive types - it's not like it's meant to be adding a lot of functionality over the normal collections in the JDK.
Personally (and I'm biased) I love Guava (including the former Google Java Collections project). It makes various tasks (including collections) a lot easier, in a way which is at least reasonably efficient. Given that collection operations rarely form a bottleneck in my code (in my experience) this is "better" than a collections API which may be more efficient but doesn't make my code as readable.
Given that the overlap between Trove and the Guava is pretty much nil, perhaps you could clarify what you're actually looking for from a collections library.
| ||||||||||||||||||||
|
71
|
The question is (now) about storing lots of data, which can be represented using primitive types like
int , in a Map. Some of the answers here are very misleading in my opinion. Let's see why.
I modified the benchmark from trove to measure both runtime and memory consumption. I also addedPCJ to this benchmark, which is another collections library for primitive types (I use that one extensively). The 'official' trove benchmark does not compare IntIntMaps to Java Collection's
Map , probably storing Integers and storing ints is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable withints efficiently.
First the relevant part of the code:
I assume the data comes as primitive
ints , which seems sane. But this implies a runtime penalty for java util, because of the auto-boxing, which is not neccessary for the primitive collections frameworks.
The runtime results (without
gc() calls, of course) on WinXP, jdk1.6.0_10:100000 put operations 100000 contains operations java collections 1938 ms 203 ms trove 234 ms 125 ms pcj 516 ms 94 ms
While this might already seem drastic, this is not the reason to use such a framework.
The reason is memory performance. The results for a Map containing 100000
int entries:java collections oscillates between 6644536 and 7168840 bytes trove 1853296 bytes pcj 1866112 bytes
Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.
In my experience high memory consumption is the biggest performance issue with Java, which of course results in worse runtime performance as well. Primitive collection frameworks can really help here.
So: No, java.util is not the answer. And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not "out-perform even the specialized Trove collections".
Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects. Primitive collections are useful enough to tolerate fishy API -if you work with lots of data.
| ||||||||||||||||
|
19
|
I know this is an old post and there are a ton of answers here. But, The answers above are superficial and over simplified in terms of suggesting a library. There is no one library that does well across the various benchmarks presented here. The only conclusion I derive is if you care about performance and memory and specifically dealing with primitive types, its more than worth looking at the non jdk alternatives.
Here is a more sound analysis, in terms of benchmark mechanics and the libraries covered. this is a thread in the mahout dev list.
The libraries covered are
The results of the three benchmarks Please look at the thread for a more detailed description of the three Benchmarks here.
| ||||
|
16
|
As other commentators have noticed, the definition of "efficient" casts a wide net. However no one has yet mentioned the Javolution library.
Some of the highlights:
The Javolution distribution includes a benchmark suite so you can see how they stack up against other libraries/the built-in collections.
| |||
add comment |
12
|
Some collection libs to consider:
I would first and foremost reach for the JDK collection library. It covers most common things you need to do and is obviously already available to you.
Google Collections is probably the best high-quality library outside the JDK. It's heavily used and well supported.
Apache Commons Collections is older and suffers a bit from the "too many cooks" problem but has a lot of useful stuff as well.
Trove has very specialized collections for cases like primitive keys/values. These days we find that on modern JDKs and with the Java 5+ collections and concurrent use cases, the JDK collections out-perform even the specialized Trove collections.
If you have really high concurrency use cases, you should definitely check out stuff like the NonBlockingHashMap in the high-scale lib, which is a lock-free implementation and can stomp on ConcurrentHashMap if you have the right use case for it.
| ||||||||||||||||||||
|
8
| java.util
Sorry for the obvious answer, but for most uses, the default Java Collections are more than sufficient.
| ||||||||||||
|
4
|
To store millions of
String in a map, take a look at http://code.google.com/p/flatmap | ||||||||
|
2
|
Depends on how we define "efficient".
Every data structure has its own Big-Oh behavior for reading, writing, iterating, memory footprint, etc. A linked list in one library is likely to be the same as any other. And a hash map will be faster for reading O(1) than a linked list O(n).
This doesn't sound like "most efficient". It sounds like "most popular" to me.
Just some feedback - I've never heard of it, and I don't know anyone who has used it. Collections built into the JDK, Google, or Apache Commons are well-known to me.
| ||
add comment |
2
|
Trove offers a few advantages.
That said, a lot has been done to improve jdk collections since trove was written.
It's the hashing strategies that make it appealing to me though... Google for trove and read their overview.
| |||
add comment |
2
|
If you want to store millions of records in a hash table, chances are that you will run into memory problems. This happened to me when I tried creating a map with 2.3 million String objects, for example. I went with BerkeleyDB, which is very mature and performs well. They have a Java API that wraps the Collections API, so you can easily create arbitrarily large maps with very little memory footprint. Access will be slower though (as it is stored on disk).
Follow-up question: is there a decent (and efficient), well maintained, library for immutable collections? Clojure has excellent support for this, and it would be nice to have something similar for Java.
| ||||
|
1
|
I'm developer of happy-collections from happy-collections on source-forge
| ||
add comment |
0
|
ConcurrentHashMap as well as the
java.util.concurrent package should be mentioned, if you plan to use the HashMap in multiple threads. small memory footprint is assued, since this is part of standard java. | ||
add comment |