27 June 2008

ClassLoaders and Web Applications

This post has changed to correct some errors

In the previous posting I described the class loader heirarchy, and mentioned that you can use standard Java API classes and methods to create a branching tree structure of loaders. I also touched on the fact that Java EE application servers use this to ensure that each web application gets its own private load path for classes.

However, the loading rules as I described them don't quite apply to web applications, so this article is intended to cover the differences.

I'm describing the way things work in Tomcat; other servers should be doing something similar.

Tomcat Loader Heirarchy

As I mentioned in the previous post, ordinary Java applications have three loaders at run time: the Boot loader, which is usually native, platform-specific code built into the JVM and searches the standard class libraries; the Extension class loader, which searches JARs in the $JAVA_HOME/jre/lib/ext directory; and the System class loader, which searches directories and JARs specified in the CLASSPATH. The Extension loader is the parent of the System loader, and the Boot loader is (in practice) the parent of the Extension loader. Slight correction: some JVMs combine the Boot and Extension loaders into one.

Tomcat adds two more levels to this heirarchy for web applications: at the bottom, a WebappClassLoader (one for each deployed web application) with the application's WEB-INF/classes and all the archive files in the WEB-INF/lib directory set as the search path; and above that a StandardClassLoader (one only for the whole server), with its search path set as Tomcat's lib directory and all the archive files in it.

Tomcat's system class loader is the parent of this StandardClassLoader and has its search path set to include Tomcat's bootstrap.jar file and little else.

The StandardClassLoader class is a subclass of URLClassLoader but doesn't add or override anything from that class - it's functionally identical.

The WebappClassLoader class, on the other hand, subclasses URLClassLoader but reimplements most if not all of the methods. Here's why...

Servlet API loader rules

The Servlet API specification is the root of things here; it says that the loader search algorithm for web applications should ensure that classes and JARs packaged in the WAR should be searched before any of the servlet container's library JAR files, but should not allow the application to override any of the standard Java classes.

Tomcat's WebappClassLoader achieves the first part of this (searching the WAR contents first) by not delegating searches to its parent until after it has already searched its own repositories - the opposite of the usual procedure. (I'm not clear on how it ensures that standard classes don't get overridden - the code in that area is tricky to follow.) Major correction required here: In fact the WebAppClassLoader delegates directly to the System class loader first, then searches its own repositories, then finally delegates to its parent. This ensures that JRE classes can't be overridden (they get searched first) and that repositories are searched before Tomcat's lib contents.

That covers the important differences that apply to web applications. There is one other small aspect that I'd like to mention.

Getting Resources

The Servlet API specification also mandates that web applications must be able to locate their own resources using ClassLoader.getResource(). The heirarchy as described achieves this. However the specification says nothing about the static ClassLoader.getSystemResource() method and in fact this method is of little use in web applications because the System class loader doesn't know anything about the web application's resources (as you can see from where it is in the heirarchy as described).

Labels: ,

06 June 2008

Modifying the CLASSPATH at run time

Here's a question that comes my way occasionally: How can you change the search path for class loading at run time?

For example, let's say I have an application that reads the name of a JAR file from an external source, and then needs to add that JAR to the classpath so that it can load classes from it. This is something that's more likely to come up in a server environment, where you need the server to be able to add plug-in classes dynamically. For example, application servers like Tomcat need to be able to unpack a WAR file when requested; after unpacking there will be a 'classes' directory and a 'lib' directory full of JARs, all of which have to be added to the loading path so that the application can be started.

The solution to this problem requires an understanding of how the ClassLoader heirarchy works, so I'm going to cover that in some detail first.

ClassLoaders

The JVM includes a loader, usually referred to as the Boot loader. The default search path that this loader uses includes the Java runtime classes - java.lang, java.util, etc.

When the JVM is started it creates a ClassLoader object (loaded by the boot loader), usually referred to as the Extension ClassLoader. Its search path includes several JAR files found in the JVM's jre/lib/ext/ directory.

Then, the System loader is created. The search path for this loader is initialized from the CLASSPATH environment variable or from the value passed as the -cp option on the command line. (The label 'System' is a bit confusing; personally I think 'Application' class loader would be a more accurate and descriptive name.)

ClassLoaders are arranged in a heirarchy; each loader has a parent loader. The extension loader is the parent of the system loader; the parent of the extension loader is usually set as null. The boot loader is something of an exception in this respect - it's technically the parent of the extension loader but because it's part of the native implementation of the JVM and hence not a Class in the usual sense, it normally can't be accessed as a Java object.

When the JVM recognizes that it needs to load a new class, it calls a loader to do that. The loader it chooses is the same loader that loaded the class where the new class is first referenced at run time - that means that by default, when your application code first references a class that hasn't yet been loaded, it will call the system loader (the one that loaded your application classes).

The first thing the loader does is to delegate the request to its parent if it has one. The result of this is that all requests for new class references get delegated all the way up to the boot loader. So, if your code has requested a class that's to be found in the Java runtime, such as java.util.Map or java.text.Format, the boot loader will find and load the class.

If the loader can't locate the class it tells the caller - so if the class you requested is not in the boot loader path, it tells the extension loader that called it so. If the class isn't in any of the extension JARs, it gets passed back to the system loader. The system loader then attempts to find the class and in the case of your application classes, this would be where those get resolved. (Of course, if the request makes it all the way back down the heirarchy without the class being found, you'll get a ClassNotFoundException.)

To expand slightly: when a loader is called to load a class, this is the sequence of actions:


  • 1 - delegate to the parent loader if there is one. If the parent finds the class, the loader returns to its caller at this point.

  • 2 - if the parent doesn't find the class, or if there is no parent, the loader checks its local data to see if it already loaded the class. If it finds it, the loader returns at this point.

  • 3 - if the class definition isn't found in the local data, the loader attempts to find the class definition in its search path. If the class definition is found in the path, the class is loaded and added to the loader's local data, and the loader returns.

  • 4 - if this point is reached, the class hasn't been found - the loader returns control to its caller indicating such.



The Answer

Back to the original question: How to add more places to the search? The way to do that is to create a new loader with the locations you want to search set as its search path, and add this new loader into the heirarchy.

ClassLoader is an abstract class, and so can't be instantiated. Instead you'd normally use a URLClassLoader, which is basically the class to use - it does everything you would usually need. You can create your own loader classes by extending ClassLoader, but normally this is unnecessary.

The search path for URLClassLoader is provided as an array of java.net.URL objects; each URL identifies a directory or an archive file (.jar or .zip) to be searched when loading.

Let's say I have a JAR named /tmp/my-jar.jar and it contains a class called com.example.MyClass. I need to create an instance of this class. This code should do the trick:

    // First, set the search path
URL[] searchPath = new URL[1];
searchPath[0] = new File("/tmp/my-jar.jar")
.toURI()
.toURL();

// Now create a new loader
ClassLoader cl = new URLClassLoader(searchPath);

// Now we can load from the JAR:
Object o = Class.forName("com.example.MyClass",
true,
cl)
.newInstance();


A few notes about this code:

First, the URL array can contain URLs for directories as well as JAR and ZIP files. The example here has only one entry but you could provide an array containing hundreds of entries if you needed to. Note that you can't use wildcards here - each entry must point to a single archive file or directory.

Second, the loader created by the URLClassLoader constructor will have the system loader as a parent by default. You can provide a different parent as a second parameter to the constructor - this allows you to build a full-blown heirarchical tree of loaders within your application if you so wish.

Third, note the three-parameter call to Class.forName() - the first parameter is the class name, of course, as in the one-parameter call. The third parameter specifies our new loader as the one to use to load our class; the default is to use the same loader that loaded the calling class (this.getClass().getClassLoader()). The second parameter determines whether or not the class should be initialized (i.e. have its static initializer called) and you'd normally set this to true (offhand I can't think of a circumstance where you wouldn't want to to this).

Lastly, note that the new loader becomes the default for classes referenced by the newly-loaded classes. This means that MyClass can reference other classes in my-jar.jar implicitly or explicitly (i.e. using the one-parameter Class.forName() method) and the classes will be loaded correctly.

Using this you can create a structure of loaders organized as you need to implement different search paths for different requirements (for example, Tomcat uses one branch of a loader tree for its own server classes and another as a connection point for loading web applications; each webapp gets its own subtree. That's how multiple webapps can exist even with conflicting class names or versions, and without being able to access the server's internal classes).

Where the Class definitions are kept

Each loader keeps the Class objects that it loads in its own local space.

This means that if you create two loaders, each with the system loader as parent but with common directories and/or archive files in their search paths, it becomes possible to load the same class twice by invoking both class loaders to load the same class.

Other things URLClassLoader can do

To finish up, here are a couple of other useful things that you can do:

First, there's a method URLClassLoader.getURLs() that returns the loader's current search path as an array of URLs. This can be useful for debugging.

Second, loaders aren't limited to finding .class files - you can use them to find other resources that are in the search path. This applies to all loaders (i.e. ClassLoader and all its subclasses, not just URLClassLoader). This is extremely useful because it allows you to, for example, read from a property file embedded inside a JAR. Some methods that are especially useful are:

ClassLoader.getResource() - returns the URL of a named resource;

ClassLoader.getResourceAsStream() - returns an InputStream allowing you to read a named resource directly (handy for loading .properties files);

ClassLoader.getSystemResource() and ClassLoader.getSystemResourceAsStream() - static methods that do the same as the above methods, but use the system loader rather than a specific one that you may have created.

Labels: