db4o is pretty interesting Monday, December 05, 2005

I have done a lot of work with a number of Java persistence solutions including JDBC, JDO, Hibernate, JSR-220 Persistence and Prevayler (have just tinkered with Prevayler... no real work). Just recently I started investigating yet another persistence solution called db4objects (db4o). db4o has versions for Java, .NET and Mono. I have only investigated the Java version. In the time that I have spent investigating db4o I have found some pretty interesting stuff.

One thing that stands out is that db4o allows you to persist plain 'ol Java objects as plain 'ol Java objects. POJOs need not extend any magic base class or implement any special interface. POJOs need not have any special id field. POJOs need not have any special constructor. There is no requirement for a no-arg constructor or even a public constructor. db4o doesn't require any object descriptors (XML or otherwise) and doesn't require you to mark persistent classes up with annotations. db4o does not require that persistent fields have Java Bean compliant getters and setters. db4o pretty much will take your objects as they come.

db4o can run in embedded mode which means that all of db4o is inside of your applications process. There doesn't need to be a separate db4o process running somewhere. There can be if that suits your deployment needs, but there doesn't need to be.

db4o has some pretty robust schema evolution capabilities that allow fields to be added, removed and renamed. There is support for moving classes to new packages.

db4o is an OO database. db4o is not an object to relational mapping tool, the db is an OO db.

One of the issues I initially had concerns about was performance once the database accumulated large numbers of objects. To test some of this, I created a fairly simple object model to represent music cds. The classes look something like this...


public class CDArtist {
private String name;
private List<CD> cds = new ArrayList<CD>();
// constructor and methods snipped...
}


public class CD {
private String title;
private List<CDTrack> tracks = new ArrayList<CDTrack>();
// constructor and methods snipped...
}

public class CDTrack {
private String title;
private int trackNumber;
private CD cd;
// constructor and methods snipped...
}


I downloaded a gaboodle of data from freedb.org and wrote a simple parser to turn that data into instances of my classes and started dropping those into my db4o database. At present I have about 75,000 artists, 185,000 cds and over 2,000,000 tracks in the database. I realize that for a lot of situations even those 2,000,000+ tracks don't really amount to a lot of data but that is what I am currently working with. It is enough data to exercise some of the things I wanted to look at.

db4o supports 3 query techniques...



I will show a simple example of each of these here and include some performance figures.

Query By Example (QBE)



The following code uses QBE to retrieve all CDs in the database that contain a track with the name "The Trooper".


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

CDTrack myCandidateTrack = new CDTrack(null, 0, "The Trooper");
List<CDTrack> results = db.get(myCandidateTrack);
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That query completes in less than 300-400 milliseconds. That is querying over 2,000,000 tracks, identifying the ones that match the name "The Trooper" and retrieving the cd that the track belongs to (note the call to track.getCd() inside of the loop).

S.O.D.A.




Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

Query query = db.query();
query.constrain(CDTrack.class);
query.descend("title").constrain("The Trooper");
ObjectSet<CDTrack> results = query.execute();
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();



That S.O.D.A. query executes in about 100 milliseconds. Again querying over 2,000,000 tracks and retrieving the matching tracks and their containing cds.

Native Query



This is what a native query might look like in db4o.


Db4o.configure().objectClass(CDTrack.class).maximumActivationDepth(0);
Db4o.configure().objectClass(CDTrack.class).objectField("title").indexed(true);

// my custom factory
ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

List<CDTrack> results = db.query(new Predicate<CDTrack>() {
public boolean match(CDTrack candidate) {
return candidate.getTitle().equals("The Trooper");
}
});
for (CDTrack track : results) {
CD theCD = track.getCd();
System.out.println(theCD);
}

db.close();


This approach has some nice benefits. One is that you get real compile time type safety. The query isn't some arbitrary string that might or might not be legal at runtime. The query is real Java code that gets compiled. That is nice. However, that Predicate looks a little suspect to me. Judging from looking at the code it seems that the db4o engine is going to have to create all of my CDTrack objects and pass each of them one a time to my match(CDTrack) method so I can decide which of them match my criteria. Since I have over 2,000,000 tracks that can't be efficient. The code above executes in about 100-150 milliseconds. I am still querying those 2,000,000+ tracks and retrieving all the same stuff I was retrieving in the previous examples. What is going on here at runtime is that db4o is doing some slick class loading voodoo and figuring out what my Predicate would do, then it optimizes all of that away by turning my Predicate into a S.O.D.A. query. Run the code in a debugger and find that my match(CDTrack) method never actually gets called. There are limits here. The optimizer does a good job of figuring out what you intended to do but you can do things in your Predicate that the optimizer can't figure out in which case the Predicate cannot be optimized away and then the engine will have to create all of those CDTracks and pass them to the match method. This is easy enough to sort out at development time if you need to make sure the Predicate will be optimized. While experimenting with this don't try to put a System.out.println call or logging calls in your Predicate to monitor if your Predicate is getting called or not. Those fall in the category of things that the optimizer can't handle and a side effect of them being there is that the method will not get optimized away. There is a callback mechanism you can hookup to retrieve notifications that indicate when a Predicate is optimized and when it isn't.


ObjectContainer db = ObjectContainerFactory.get().createObjectContainer();

((YapStream)db).getNativeQueryHandler().addListener(new Db4oQueryExecutionListener() {
public void notifyQueryExecuted(Predicate filter, String msg) {
}
});


That callback will be notified when db4o first deals with any particular Predicate. The msg argument will by "DYNOPTIMIZED" if the query has been dynamicallly optimized. msg will be "UNOPTIMIZED" if the query could not be optimized. msg will be "PREOPTIMIZED" if the query had been pre optimized. db4o has some bytecode manipulation tools to preoptimize Predicates but I have not investigated that.

They are working on some Hibernate replication modules that will allow a db4o database to be kept in synch with a relational database via Hibernate. That code is still in development and I haven't looked at any of that.

The discussion forums are pretty active and as far as I can tell there is a lot of momentum behind the effort right now.

db4o is distributed under a couple of different license. There is a GPL version available for open source projects, experimenting and internal projects. There is also a commercial license available for commercial, non-open source applications. As far as I know, the GPL version is the same software as the commercial version. The restrictions have to do with distribution, not the software itself.

No reasonable person is going to claim that db4o is the silver bullet of persistence but it is interesting stuff and probably makes a lot of sense for a lot of applications. If nothing else, it is a good thing to be aware of.

Is Sun Pimping The Java Name? Friday, April 29, 2005

Does the name "Java" really belong on these products?

Sun Java Desktop System

Sun Java Workstation W1100z

Sun Java Workstation W2100z

Free IntelliJ IDEA License Thursday, March 03, 2005

A few weeks ago the guys at JetBrains/IntelliJ announced the availability of IntelliJ IDEA licenses that they were making available free of charge to qualifying open source developers. See http://www.jetbrains.com/idea/opensource/ for details.

I already own a current license for IntelliJ IDEA and that license allows me to do whatever development I like (personal, commercial, open source, whatever). I can even install that license on as many machines as I like as long as I am the only person using them. Contrast that with the more limited open source license which is only allowed to be used for open source development and may be limited further to only being used on development of open source projects that the guys at IntelliJ approve. I am not exactly sure about that last part, but the license is limited to open source work. Anyway, because I already own a less restrictive license, the free license doesn't really buy me anything, but I wanted to participate in their program, so I submitted a request based on my involvement with JarSpy. Shortly after sending the request I got an email response letting me know that they have received lots of requests and that each one needs to be evaluated individually and that will be time consuming. I wasn't too worried about it so I saved the email and went on with my business.

Fast forward to today. Today I received an email from them letting me know that they have approved my request and are issuing me a free license. Whooo Hoo! In a way, this is of absolutely no consequence whatsoever. In another way, I am still glad that they are extending this offer and that they are not being overly restrictive about it.

I hope that this helps IntelliJ IDEA continue to grow in its popularity and subsequently leads to its continued life of innovation.

JDO 2.0 Has Been Approved! Tuesday, March 01, 2005

I am pleased to report that JDO 2.0 has been approved! This is great news for the JDO community. The initial public review ballot was voted down back in January with a vote of 10 against, 5 for and 1 abstainer. The reconsideration ballot passed with a vote of 0 against, 12 for, 1 no vote and 3 abstainers.

I don't think this vote hurts the JSR 220 effort but I think it does a lot of good for JDO users and JDO vendors.

Free IntelliJ IDEA License Tuesday, February 08, 2005

The guys at IntelliJ are giving away free licenses for IDEA to open source projects. I haven't read all of the details, but at a glance this looks awesome.

Details available here.

Bad News For JDO Wednesday, January 19, 2005

I am very disappointed to see the results of the JSR 243 public review ballot. I believe this means that the Expert Group has 30 days to submit an updated draft for a revote. If a revised draft is not submitted in that timeframe or if the revote fails, the JSR will be closed. If anyone thinks that is not correct, please share what you understand about the process.

I knew that JSR 220 was eventually going to succeed JDO 2.0 but as I understand it, there won't be anything "real" coming from JSR 220 until 2006.

Given the vote count and some of the comments submitted by the voters, it doesn't to me look very likely that the draft will be revised in such a way to change voter's minds. I think that if the vote is going to turn around it would have to be in response to community feedback but I don't know how much of that there will be.

I really am disappointed in this whole thing. ;(

Someone say it ain't so...

Already I am seeing talk of org.jdo.* popping up.

JarSpy Plugin For IntelliJ Tuesday, January 11, 2005

Tonight I was looking for the IntelliJ Plugin Wiki but I couldn't remember the url so I did the obvious; I put "intellij plugin" into google. To my surprise, the first link in the search results referred to my JarSpy Plugin For IntelliJ. I can't say that I had forgotten ever having written that, but I can say that I haven't had any reason to think about it for a long time. A couple of years ago I got an email from one of the billions of JarSpy fans out there and the email was asking me for a JarSpy plugin for IntelliJ. I was an IntelliJ user at the time (I still am) and I was interested in taking a look at their plugin API so I went at it. It didn't take long to write the plugin (although I do remember wishing there was better documentation for their API, maybe that has improved since then).

JarSpy has turned out to be one of those little things that continues to popup for me from time to time. Just a couple of weeks ago at my regular poker game one of my friends, who I had never had any reason to discuss JarSpy with, said something like "hey, I was using JarSpy today and...".

It has probably been more than a year since I made any code changes to JarSpy. I have some interesting ideas that I would like to add. Maybe I will get some time to play with that soon.