db4o is pretty interesting Monday, December 05, 2005

I have done a lot of work with a number of Java persistence solutions including JDBC, JDO, Hibernate, JSR-220 Persistence and Prevayler (have just tinkered with Prevayler... no real work). Just recently I started investigating yet another persistence solution called db4objects (db4o). db4o has versions for Java, .NET and Mono. I have only investigated the Java version. In the time that I have spent investigating db4o I have found some pretty interesting stuff.

One thing that stands out is that db4o allows you to persist plain 'ol Java objects as plain 'ol Java objects. POJOs need not extend any magic base class or implement any special interface. POJOs need not have any special id field. POJOs need not have any special constructor. There is no requirement for a no-arg constructor or even a public constructor. db4o doesn't require any object descriptors (XML or otherwise) and doesn't require you to mark persistent classes up with annotations. db4o does not require that persistent fields have Java Bean compliant getters and setters. db4o pretty much will take your objects as they come.

db4o can run in embedded mode which means that all of db4o is inside of your applications process. There doesn't need to be a separate db4o process running somewhere. There can be if that suits your deployment needs, but there doesn't need to be.

db4o has some pretty robust schema evolution capabilities that allow fields to be added, removed and renamed. There is support for moving classes to new packages.

db4o is an OO database. db4o is not an object to relational mapping tool, the db is an OO db.

One of the issues I initially had concerns about was performance once the database accumulated large numbers of objects. To test some of this, I created a fairly simple object model to represent music cds. The classes look something like this...


public class CDArtist {
private String name;
private List<CD> cds = new ArrayList<CD>();
// constructor and methods snipped...
}


public class CD {
private String title;
private List<CDTrack> tracks = new ArrayList<CDTrack>();
// constructor and methods snipped...
}

public class CDTrack {
private String title;
private int trackNumber;
private CD cd;
// constructor and methods snipped...
}


I downloaded a gaboodle of data from freedb.org and wrote a simple parser to turn that data into instances of my classes and started dropping those into my db4o database. At present I have about 75,000 artists, 185,000 cds and over 2,000,000 tracks in the database. I realize that for a lot of situations even those 2,000,000+ tracks don't really amount to a lot of data but that is what I am currently working with. It is enough data to exercise some of the things I wanted to look at.

db4o supports 3 query techniques...



I will show a simple example of each of these here and include some performance figures.

Query By Example (QBE)



The following code uses QBE to retrieve all CDs in the database that contain a track with the name "The Trooper".


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

CDTrack myCandidateTrack = new CDTrack(null, 0, "The Trooper");
List<CDTrack> results = db.get(myCandidateTrack);
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That query completes in less than 300-400 milliseconds. That is querying over 2,000,000 tracks, identifying the ones that match the name "The Trooper" and retrieving the cd that the track belongs to (note the call to track.getCd() inside of the loop).

S.O.D.A.




Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

Query query = db.query();
query.constrain(CDTrack.class);
query.descend("title").constrain("The Trooper");
ObjectSet<CDTrack> results = query.execute();
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That S.O.D.A. query executes in about 100 milliseconds. Again querying over 2,000,000 tracks and retrieving the matching tracks and their containing cds.

Native Query



This is what a native query might look like in db4o.


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

List<CDTrack> results = db.query(new Predicate<CDTrack>() {
public boolean match(CDTrack candidate) {
return candidate.getTitle().equals("The Trooper");
}
});
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();


This approach has some nice benefits. One is that you get real compile time type safety. The query isn't some arbitrary string that might or might not be legal at runtime. The query is real Java code that gets compiled. That is nice. However, that Predicate looks a little suspect to me. Judging from looking at the code it seems that the db4o engine is going to have to create all of my CDTrack objects and pass each of them one a time to my match(CDTrack) method so I can decide which of them match my criteria. Since I have over 2,000,000 tracks that can't be efficient. The code above executes in about 100-150 milliseconds. I am still querying those 2,000,000+ tracks and retrieving all the same stuff I was retrieving in the previous examples. What is going on here at runtime is that db4o is doing some slick class loading voodoo and figuring out what my Predicate would do, then it optimizes all of that away by turning my Predicate into a S.O.D.A. query. Run the code in a debugger and find that my match(CDTrack) method never actually gets called. There are limits here. The optimizer does a good job of figuring out what you intended to do but you can do things in your Predicate that the optimizer can't figure out in which case the Predicate cannot be optimized away and then the engine will have to create all of those CDTracks and pass them to the match method. This is easy enough to sort out at development time if you need to make sure the Predicate will be optimized. While experimenting with this don't try to put a System.out.println call or logging calls in your Predicate to monitor if your Predicate is getting called or not. Those fall in the category of things that the optimizer can't handle and a side effect of them being there is that the method will not get optimized away. There is a callback mechanism you can hookup to retrieve notifications that indicate when a Predicate is optimized and when it isn't.


ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

((YapStream)db).getNativeQueryHandler().addListener(new Db4oQueryExecutionListener() {
public void notifyQueryExecuted(Predicate filter, String msg) {
}
});


That callback will be notified when db4o first deals with any particular Predicate. The msg argument will by "DYNOPTIMIZED" if the query has been dynamicallly optimized. msg will be "UNOPTIMIZED" if the query could not be optimized. msg will be "PREOPTIMIZED" if the query had been pre optimized. db4o has some bytecode manipulation tools to preoptimize Predicates but I have not investigated that.

They are working on some Hibernate replication modules that will allow a db4o database to be kept in synch with a relational database via Hibernate. That code is still in development and I haven't looked at any of that.

The discussion forums are pretty active and as far as I can tell there is a lot of momentum behind the effort right now.

db4o is distributed under a couple of different license. There is a GPL version available for open source projects, experimenting and internal projects. There is also a commercial license available for commercial, non-open source applications. As far as I know, the GPL version is the same software as the commercial version. The restrictions have to do with distribution, not the software itself.

No reasonable person is going to claim that db4o is the silver bullet of persistence but it is interesting stuff and probably makes a lot of sense for a lot of applications. If nothing else, it is a good thing to be aware of.

21 comments:

Anonymous said...

Great test! I love db4o for its plain simpleness and your article is exactly showing how simple it is to work with persistant objects.

Anonymous said...

I am just wondering, the Hibernate replication modules aside, would you trust db40 to provide stable data storage for an enterprise application?

Jeff Brown said...

I am not sure at this point. I will say that I don't have a complete warm and fuzzy about the idea but at the same time, everything I have tried to do so far has worked out pretty well.

Anonymous said...

nice post, it really is just POJO with db4o. no nasty OR stuff ;|

Christof Wittig said...

I am just wondering, the Hibernate replication modules aside, would you trust db40 to provide stable data storage for an enterprise application?
Jay -- db4o is designed to be embedded in devices, packaged software and realtime systems (zero-admin environments), so the "classical enterprise system" (i.e., a CRM or so) is not db4o's target. While that doesn't mean that you couldn't use it, you'd rather use it to complement an Oracle (e.g. for a mobile application or a local object cache of very complex object models).
Hope that helps,
Chris
P.S.: Thanks, Jeff, for the great blog!

Anonymous said...

Thanks christof.

Anonymous said...

FYI: db4o, as a company, is participating in an effort by the Object Management Group (OMG) to create a standard specification for OODB access. We don't know exactly what that will look like yet but are pretty sure it won't be just ODMG 4.0. If you have input to this spec, or would like to actively participate, you can contact me at kenny.cason@boeing.com

Anonymous said...

Your article is very informative and helped me further.

Thanks, David

lamboap said...

do you still have that code that parsed that data into the db4o db? I'd like to try the same thing. thanks.

Jeff Brown said...

Oh boy... that was throw away code that I wrote almost 2 years ago. I will take a looksy and see if I can track it down. I still have the machine that was my primary development machine at that point so it might be there. I can probably reproduce it without too much trouble if nothing else.

Stay tuned (and give me a few days)...



jb

lamboap said...

thanks!.... [staying tuned]

FYI: I was one of the many people at the NY JavaSIG meeting who left scratching our heads when we walked into an Open Solaris talk when we were sure we had RSVP'ed for Groovy. oh well!

lamboap said...

any luck finding the parsing code?

thanks!

Anonymous said...

GOod Job! :)

Unknown said...

Good article, I was just about to the same test myself.

In terms of Enterprise use, has any company said they are using db40 with some numbers (ie number of customers using the system, what it is for)?

ehic said...

interesting read

Javed Sunesra said...

Useful information ..I am very happy to read this post. thanks for giving us this useful information. Fantastic walk-through. I appreciate this post.

red kitchenaid mixer said...

impressive read

feathered hair extensions said...

cool read , really looking good

Led Flashlight said...

Effortlessly, the article is really the greatest on this precious subject. I fit in with your decisions and will thirstily appear forward to study your next updates.

Crowne Plaza Hotel San Jose said...

A very nice page. It's great to see a blog of this quality. I learned a lot of new things and I'm looking forward to see more like this. You definitely done a great job here.

Oracle Fusion Cloud HCM Online Training said...

We provides focus and result-oriented training session, which equipped you with skill
to getting best career option in the industries. We offer basic to advance level


Oracle Fusion HCM Training