Hasta La Vista BEA

May 5, 2008

Picture 5

The Oracle acquisition of BEA is now complete and while the result may be good for BEA in the long run, I can’t help but be a little sad. As a former BEA employee I can testify that it was a great place to work even though it had its share of bozo managers.

Some of the remaining BEA folks in San Jose held a mini wake for the company. As for me, I’ll just take my $19.38 a share. A little richer but a lot sadder.


Josh Bloch == Rainman?

January 1, 2008

I was watching this very interesting interview with Josh Bloch about Google GWT and all I could think about was how much he reminded me of Dustin Hoffman in Rainman! Don’t know, maybe it’s just me.

Gotta go – five minutes ’till Wapner.


Schedule your own Java thread dumps

December 29, 2007

So, if you ever tried to figure out what a Java program that appears to be hung is doing you are probably very familiar with the Java thread dump feature. Basically you send a signal to the JVM which responds by, essentially, writing a stack trace of each thread in the JVM to the standard output device. In fact, a thread dump contains more useful information that just a stack trace, it also show the state of each thread (i.e. runnable, waiting, etc) and which Java monitors (synchronized locks) are owned and/or being waited on.

Here is a sample snippet of a thread dump:

Full thread dump Java HotSpot(TM) Client VM (1.5.0_07-b03 mixed mode):

"Timer-5" daemon prio=10 tid=0x092d5720 nid=0x73 in Object.wait() [0x9b52f000..0x9b52fd38] 
at java.lang.Object.wait(Native Method) - waiting on <0xa2a4b978> (a java.util.TaskQueue) 
at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0xa2a4b978> (a java.util.TaskQueue) 
at java.util.TimerThread.run(Timer.java:462)

"Timer-4" prio=10 tid=0x0925d418 nid=0x72 in Object.wait() [0x9b4ed000..0x9b4edab8] 
at java.lang.Object.wait(Native Method) - waiting on <0xa2a49570> (a java.util.TaskQueue) 
at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0xa2a49570> (a java.util.TaskQueue) 
at java.util.TimerThread.run(Timer.java:462) 

As you can see a thread dump contains a lot of very useful information.

The method used to “request” a thread dump is to send a signal to the running JVM. In Unix this is the SIGQUIT signal which may be generated via either:

kill -3 <pid>

or

kill -QUIT <pid>

where <pid> is the process id the the JVM. You can also enter Ctrl-\ in the window where the JVM is running.

On Windows a thread dump is requested by sending a Ctrl-Break to the JVM process. This is pretty simple for foreground windows but requires a program (akin to Unix kill) to be used for JVMs running in the background (i.e. services).

The problem with requesting a thread dump is that it requires manual intervention, i.e. someone has to enter the kill command or press the Ctrl-Break keys to generate a thread dump. If you are having problems with your production site in the wee hours of the morning your support staff probably won’t appreciate getting out of bed to capture a few dumps for you. In addition, a single thread dump is not as useful as a series of dumps taken over a period of time. With a single dump you only get a snapshot of what is happening. You might see a thread holding a monitor that is causing others thread to block but you have no idea how long that condition has existed. The lock might have been released a millisecond after the dump was taken. If you have, say, 5 dumps taken over 20 minutes and the same thread is holding the monitor in all of them then you know you’ve got a problem to investigate.

The solution I’m going to propose makes use of JNI to request a thread dump of the current JVM and capture that output to a file which may be time stamped. This allows dump output to be segregated from everything else the JVM is sending to STDOUT.

Before you invest any more time in this article let me state that the solution I’m going to present here only partially works for windows. It is possible to programmatically request a thread dump under Windows but due to a limitation in Win32, the Microsoft C runtime, or both the capture to a separate file does not work. Even though Win32 provides APIs for changing the file handles used for STDOUT/STDERR, changing them after a process has started executing does not seem to make any difference. If you do all your Java work on Windows, you’ve been warned – don’t read to the end and then send me a nasty email saying I wasted your time!

Ok, the first thing we need to do is create a Java class that will serve as an interface to our native routine that captures thread dumps:

package com.utils.threaddump;

public class ThreadDumpUtil
{
public static int performThreadDump(final String fileName)
{
return(threadDumpJvm(fileName));
}

private static native int threadDumpJvm(final String fileName);

static
{
System.loadLibrary("libthreaddump");
}
}

This class loads a native library called libthreaddump when it is loaded and then exposes a static method to request a thread dump from Java code specifying the name of the file that should contain the captured dump.

Running this file through the javah tool generates a C header named com_utils_threaddump_ThreadDumpUtil.h which is used to help build our native routine.

The C code for the Unix variant follows:

#include <signal.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <string.h>
#include <errno.h>
#include "com_utils_threaddump_ThreadDumpUtil.h"

#define FILE_STDOUT             1
#define FILE_STDERR             2

JNIEXPORT jint JNICALL 
Java_com_nm_utils_threaddump_ThreadDumpUtil_threadDumpJvm(JNIEnv *env, jclass clazz, jstring fileName)
{
    /* get my process id */
    pid_t pid = getpid();
    
    /* open the file where we want the thread dump written */
    char* fName = (char*) (*env)->GetStringUTFChars(env, fileName, NULL);
    if (NULL == fName)
    {
        printf("threadDumpJvm: Out of memory converting filename");
        return((jint) -1L);
    }
    
    int fd = open(fName, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    if (-1 == fd)
    {
        printf("threadDumpJvm: Open of file %s failed: %d[%s]\n", fName, errno, strerror(errno));
	    (*env)->ReleaseStringUTFChars(env, fileName, fName);
        return((jint) -2L);
    }
    
    /* redirect stdout and stderr to our thread dump file */
    int fdOut = dup(FILE_STDOUT);
    int fdErr = dup(FILE_STDERR);
    dup2(fd, FILE_STDOUT);
    dup2(fd, FILE_STDERR);
    close(fd);
    (*env)->ReleaseStringUTFChars(env, fileName, fName);
    
    /* send signal requesting JVM to perform a thread dump */
    kill(pid, SIGQUIT);
    
    /* this is kind of hokey but we have to wait for the dump to complete - 10 secs should be ok */
    sleep(10);
    
    /* replace the original stdout and stderr file handles */
    dup2(fdOut, FILE_STDOUT);
    dup2(fdErr, FILE_STDERR);
    return((jint) 0L);
}

Following are the compile command lines I’ve used on a couple of Unix systems to build this dynamic library:

Mac OSX:
gcc -o liblibthreaddump.dylib -dynamiclib -I. -I$JAVA_HOME/include -L/usr/lib -lc libthreaddump_unix.c

Solaris:
gcc -o liblibthreaddump.so -G -I/$JAVA_HOME/include -I/$JAVA_HOME/include/solaris libthreaddump_unix.c -lc

Here is the C code for the Windows version of the native library:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include "com_nm_utils_threaddump_ThreadDumpUtil.h"

#define FILE_STDOUT             1
#define FILE_STDERR             2

JNIEXPORT jint JNICALL 
Java_com_nm_utils_threaddump_ThreadDumpUtil_threadDumpJvm(JNIEnv *env, jclass clazz, jstring fileName)
{
	auto	HANDLE		fd;
	auto	HANDLE 		fdOut;
	auto	HANDLE		fdErr;
	auto	long		retValue = 0L;
	auto	char* 		errText = "";
	auto	DWORD 		pid = GetCurrentProcessId();
	    
    /* open the file where we want the thread dump written */
    char* fName = (char*) (*env)->GetStringUTFChars(env, fileName, NULL);
    if (NULL == fName)
    {
        printf("threadDumpJvm: Out of memory converting filename");
        return((jint) -1L);
    }
	
	fd = CreateFile((LPCTSTR) fName, GENERIC_WRITE, FILE_SHARE_WRITE, 
					NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if (INVALID_HANDLE_VALUE == fd)
    {
        printf("threadDumpJvm: Open of file %s failed: %ld\n", fName, (long) GetLastError());
	    (*env)->ReleaseStringUTFChars(env, fileName, fName);
        return((jint) -2L);
    }

    /* redirect stdout and stderr to our thread dump file */
    fdOut = GetStdHandle(STD_OUTPUT_HANDLE);
    fdErr = GetStdHandle(STD_ERROR_HANDLE);
    printf("fdOut=%ld fdErr=%ld\n", (long) GetStdHandle(STD_OUTPUT_HANDLE), (long) GetStdHandle(STD_ERROR_HANDLE));
    if (!SetStdHandle(STD_OUTPUT_HANDLE, fd))
    	printf("SetStdHandle failed: %ld\n", (long) GetLastError());
    	
    SetStdHandle(STD_ERROR_HANDLE, fd);
    
    printf("fdOut=%ld fdErr=%ld\n", (long) GetStdHandle(STD_OUTPUT_HANDLE), (long) GetStdHandle(STD_ERROR_HANDLE));
    if (0 == GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, 0))  // pid fails here????
    {
    	retValue = (long) GetLastError();
    	errText = "Generate CTRL-BREAK event failed";
   	}
	else
	{
	    /* this is kind of hokey but we have to wait for the dump to complete - 10 secs should be ok */
    	Sleep(10000L);
    }

    printf("This is a test message\n");
    
    /* replace the original stdout and stderr file handles */
    SetStdHandle(STD_OUTPUT_HANDLE, fdOut);
    SetStdHandle(STD_ERROR_HANDLE, fdErr);
	CloseHandle(fd);
    (*env)->ReleaseStringUTFChars(env, fileName, fName);
    
    if (0L != retValue)
    {
    	printf("threadDumpJvm: Error generating thread dump: %s\n", errText);
    }
        
	return((jint) retValue);
}

Remember – the file capture will not work here, it simply creates an empty file and the thread dump goes to the original STDOUT device.

Here is the command I used to create a Windows DLL using Microsoft Visual C++ 6.0:

cl -I. -I%JAVA_HOME%\include -I%JAVA_HOME%/include/win32 -LD libthreaddump_win32.c -Felibthreaddump.dll

That’s it. All the tools needed to request a thread dump any time you like. I used these tools to diagnose problems with an ATG application cluster to research problems being reported by the ATG ServerMonitor component. The ATG ServerMonitor issues warning and error log message for various reasons like the JVM being low on memory or a application request thread executing for an extended period of time. In a future post I’ll discuss how I extended the ATG ServerMonitor to capture thread dumps under these conditions.


ATG Repository Distributed Caches, Part 5

October 20, 2007

This will be the final post in the series about ATG Repository cache invalidation. As defined in part 1 of this series ATG has a serious problem with cache invalidation in large clusters. The solution I selected implemented a reliable multicast group, based on JGroups, for each cluster to use in the distribution of invalidation events.

This solution has been installed for about a month on our production web sites which consist of 15 clusters ranging in size from 20 to over 120 ATG instances. The JGroups implementation has performed admirably working seamlessly to distribute invalidation events. During this time we have had a couple of instances of “idle threads” however the problem was limited to a single instance and did not affect the remainder of the cluster as it had in the past.

We are viewing this implementation as a complete success.

Our solution to the idle thread cache invalidation problem has been submitted to ATG development. We are hoping they will see fit to include this as a supported option for repository cache invalidation.


ATG Repository Distributed Caches, Part 4

September 2, 2007

Alternative 3

This series of posts, consisting of parts 1, 2, and 3, explores approaches to improving the distribution of repository cache invalidation events in a large ATG cluster. As mentioned the default ATG distributed and distributedJMS cache modes have some serious problems but, thanks to the configurability of PatchBay, distributedJMS offers us some options.

Using distributedJMS cache mode with a third party JMS provider offers a pretty good solution, although as I pointed out in part 2 this can be challenging using JBoss JMS.

While I was thinking about how I would prefer to distribute invalidation events it occurred to me that IP Multicast offered an almost perfect solution. With multicast each event would be sent only once and distributed to all multicast group members. Multicast across subnets can be problematic in most enterprises as generally routers are not configured to allow the packets to pass. This didn’t concern me as all instances in each of my clusters run on the same subnet. Really the only down side I could think of using multicast was that it is unreliable.

Now generally when you say a communication protocol is unreliable people immediately think, “That is totally unacceptable!“. But wait, is it really? Consider the following assertions:

  • Some applications simply don’t require reliability. Take for example an application distributing stock quotes every 10 seconds. If you miss a quote how important is it that it be retransmitted to you? Not very I’d say.
  • Multicast, or I should say UDP/IP in general, is pretty reliable if firewalls and routers are not involved. On the same subnet a very high percentage of packets will be delivered. In fact the largest problem with lost UDP packets on the same subnet is related to application latency, i.e. the application is unable to consume packets fast enough. This is good news as the problem can be solved at the application layer by adding caches or in-memory queues and processing in background threads.

Given the above I felt that most of the time I would be fine distributing invalidation events via multicast but, in my case, a lost event might result in a stale item in a cache. In eCommerce this could mean telling a customer that a product which might be back ordered was in stock. A FTC violation and a good way to disappoint customers. Neither a very desired outcome.

As a result, I began looking for a package to add reliability onto a multicast solution and what I found was JGroups. This product, which is used to cluster communications in JBoss and JBossCache is a perfect fit. It offers a highly configurable protocol stack that allows as much (or as little) reliability as needed. In fact, if I have any complaint about JGroups it’s that the stack configuration has too many options. Fortunately the product is distributed with example configurations for various purposes and, IMHO, it would be wise to stick with one of them.
DistributedmulticastGsainvalidatorsubclassOk, JGroups sounds like a good solution but how do we configure it in our ATG cluster? If you are thinking of using the distributedJMS cache mode you’re on my wave length. As the diagram on the left depicts we can easily modify the how invalidation events are distributed via this cache mode by extending ATG’s message source class atg.adapter.gsa.invalidator.GSAInvalidatorService. The image on the right was taken from the Eclipse Override/Implement Methods tool. By examining this information is is pretty clear that there are two methods that GSA may call to send an invalidation event. This pretty much jives with what we learned in part 2 about the message classes used to communicate invalidation information: atg.adapter.gsa.invalidator.GSAInvalidationMessage and atg.adapter.gsa.invalidator.MultiTypeInvalidationMessage. In fact, since one of these invalidate() methods takes a MultiTypeInvalidationMessage as a parameter it is reasonable to assume that the other invalidate() method is expected to construct and send a GSAInvalidationMessage.

So, how does this all work. It’s pretty straightforward really. When the GSA repository sends an invalidation event to the configured message source our class extension is called via one of the invalidate() methods. Rather than placing the event on a JMS destination as PatchBay expects we use JGroups to send the event to all groups members. We use this same component (the extended message source) to register a JGroups Receiver which will be notified of incoming events. Our receiver code simply places the event on a LocalJMS topic which has been configured to have the GSA Invalidation Receiver as a subscriber. The rest, as they say, is out of the box ATG.

One problem I encountered in early testing of this environment is related to two key facts:

  • By default a JGroups subscriber receives all group messages including the ones they send. This is how multicast works by default as well.
  • The ATG repository sends invalidation events before the transaction is committed. This is a real head scratcher but it is documented to work this way and obviously does. ATG should reconsider this strategy as it only makes sense to distribute an event after a transaction is successfully committed.

Given this information here is what I found. The invalidation event was often being received by the instance that originated it before the repository transaction was committed. This created a race condition that, sometimes, resulted in a deadlock as the Invalidation Receiver attempted to remove the repository item(s) from cache while another thread was committing them.

This issue was easily avoided by setting an option on the JGroups Channel that disabled receipt of a members own messages.

One final issue I’d like to point out relates to multi-homed hosts. All of my test machines are multi-homed and I discovered that, sometimes the JGroups connection would be bound to interface A while other connections were bound to interface B. This resulted in the group being partitioned into two separate groups with the same name. Since the two groups were on different subnets (and our routers do not pass multicast packets) they didn’t know about each other.

There are a couple of solutions to this problem but the simplest one I found was to configure the JGroups UDP layer with the option receive_on_all_interfaces=”true”. This allows a connection to receive packets on all interfaces configured on the machine.

I currently have this configuration running in a test environment of 10 ATG instances and it is working very well. I can’t duplicate the volume or size of our production clusters in test so it remains to be seen how this will work under load. I’ll report back when I have more information but, for now, I remain optimistic.


ATG Repository Distributed Caches, Part 3

September 1, 2007

Alternative 2

In part 1 and part 2 of this series I discuss problems with and potential solutions to ATG distributed repository cache options. As a second alternative I considered completely replacing the cache used by the ATG Repository. I know, at first this sounds crazy but if you think about it why shouldn’t you be able to plug-in a new cache to ATG’s Repository?

If you spend even a few minutes researching this you will find, as I did, that ATG has not designed the repository cache to be replaced. At the time the Repository was designed this was probably a reasonable approach but now that we have standards like JSR-107 JCACHE it seems rather limiting. My plan, the grand scheme as it were, was to convince my upper management to convince ATG’s upper management that supporting a plug-in repository cache would be in both ATG’s and their customer’s best interest.

Before initiating this plan I wanted to find a top notch enterprise cache that could be used as a replacement and get the people behind it involved as a partner. This was, for me at least, an easy selection. I have long been a fan of Tangosol (now Oracle) Coherence and this is the product I wanted to use as a distributed repository cache.

I had several virtual conversations with Cameron Purdy, Founder and CEO of Tangosol, and now something-or-the-other with Oracle. I’d met Cameron several years ago and I had briefly asked if any ATG customers were using Coherence. Apparently not. My hope was to get some of Cameron’s folks onboard and try to convince ATG to partner with them to offer Coherence as a plug in replacement for the ATG Repository cache.

I think Cameron was interested in this idea but, I suspect, he has so many things on his radar these days that this just hasn’t bubbled up to a visible position.

I still think this is a good idea but I need a solution now and things on this front are moving way too slowly so I set out to look at other alternatives.


ATG Repository Distributed Caches, Part 2

August 26, 2007

Alternative 1

In part 1 of this series I described briefly how ATG’s distributed cache invalidation works and the options it provides. I then described a serious production problem my company was encountering using the distributed cache-mode for large clusters.

As I had mentioned in part 1 ATG’s newest cache-mode option, distributedJMS, appeared to offer a good alternative to the use of TCP connections for the distribution of cache invalidation events. The main problem with this approach is that, by default, it is based on ATG SQLJMS which offers only polled, persistent destinations. If you are using ATG’s DPS module the configuration for distributedJMS cache mode is already in place. Otherwise you can follow the configuration examples in the ATG Repository users Guide.

A very important property, gsaInvalidatorEnabled, of component /atg/dynamo/Configuration must be set to true for distributedJMS cache invalidation to work.

Distributedjms

Distribution of events via JMS is supported by using a PatchBay message source/sink combination which offers us the opportunity to override the definitions and use a 3rd party JMS provider. The advantage of using a real JMS provider is that message distribution is event driven rather than polled and in-memory destinations may be used avoiding disk/database I/O. The figure to the left depicts how JMS is used in event distribution. For each item descriptor defined with a cache mode of distributedJMS ATG’s repository will route all invalidation events to a PatchBay message source defined by class atg.adapter.gsa.invalidator.GSAInvalidatorService. The Nucleus component used by the repository may be set via the invalidatorService property on GSARepository. By default, this component is located at /atg/dynamo/service/GSAInvalidatorService but you can place it anywhere you like.

The actual cache invalidation takes place in the message sink which is defined by the ATG class atg.adapter.gsa.invalidator.GSAInvalidationReceiver. This component receives events of the following types, resolves the name of the supplied repository component and issues an invalidation request for the appropriate item descriptor/repository item(s).

  • atg.adapter.gsa.invalidator.GSAInvalidationMessage – defines a single repository item that should be flushed from the cache.
  • atg.adapter.gsa.invalidator.MultiTypeInvalidationMessage – defines one or more repository items that should be flushed from the cache. All the items defined in this message must belong to the same repository.

The default PatchBay configuration for these components looks like the following:

<dynamo-message-system>
  <patchbay>
 <!-- DAS Messages -->
    <message-source>
       <nucleus-name>/atg/dynamo/service/GSAInvalidatorService</nucleus-name>
       <output-port>
         <port-name> GSAInvalidate </port-name>
         <output-destination>
            <provider-name>sqldms</provider-name>
            <destination-name>sqldms:/sqldms/DASTopic/GSAInvalidator</destination-name>
            <destination-type>Topic</destination-type>
         </output-destination>
       </output-port>
    </message-source>
    <message-sink>
       <nucleus-name>/atg/dynamo/service/GSAInvalidationReceiver</nucleus-name>
       <input-port>
         <port-name>GSAInvalidate</port-name>
         <input-destination>
           <provider-name>sqldms</provider-name>
           <destination-name>sqldms:/sqldms/DASTopic/GSAInvalidator</destination-name>
           <destination-type>Topic</destination-type>
         </input-destination>
       </input-port>
    </message-sink>
  </patchbay>
</dynamo-message-system>

Ok, so my first thought was to modify this configuration to use JBoss as the JMS provider. I considered one of the fine stand-alone JMS providers like Fiorano or Sonic and I think these would have worked just fine. We are currently still running on DAS but expect to to move to JBoss over the next year so using JBoss seemed like a natural. I promptly over-rode the above configuration like this:

<dynamo-message-system>
  <patchbay>
    <provider>
      <provider-name>JBoss</provider-name>
      <topic-connection-factory-name>ConnectionFactory</topic-connection-factory-name>
      <queue-connection-factory-name>ConnectionFactory</queue-connection-factory-name>
      <xa-topic-connection-factory-name>XAConnectionFactory</xa-topic-connection-factory-name>
      <xa-queue-connection-factory-name>XAConnectionFactory</xa-queue-connection-factory-name>
      <supports-transactions>true</supports-transactions>
      <supports-xa-transactions>true</supports-xa-transactions>
      <username></username>
      <password></password>
      <client-id></client-id>
      <initial-context-factory>/my/utils/jms/J2EEInitialContextFactory</initial-context-factory>
    </provider>

    <message-source xml-combine="replace">
      <nucleus-name>/atg/dynamo/service/GSAInvalidatorService</nucleus-name>
        <output-port>
          <port-name>GSAInvalidate</port-name>
          <output-destination>
            <provider-name>JBoss</provider-name>
              <destination-name>/topic/GSAInvalidator</destination-name>
              <destination-type>Topic</destination-type>
            </output-destination>
          </output-port>
    </message-source>    	

    <message-sink xml-combine="replace">
      <nucleus-name>/atg/dynamo/service/GSAInvalidationReceiver</nucleus-name>
        <input-port>
          <port-name>GSAInvalidate</port-name>
          <input-destination>
            <provider-name>JBoss</provider-name>
            <destination-name>/topic/GSAInvalidator</destination-name>
            <destination-type>Topic</destination-type>
          </input-destination>
        </input-port>
    </message-sink>
  </patchbay>
</dynamo-message-system>

Notice that you have to define a component that is used to obtain an initial context from the 3rd party JMS provider. ATG’s documentation covers this in detail so I won’t go into it here.

After setting up this configuration with a properly configured JBoss server I was distributing invalidation events and things were looking great. That’s always the time a nasty problem arises and this situation was no different.

I had tested this configuration but, of course, for our production environment we wanted to run a cluster of JBoss instances to provide high availability. The problem I encountered is that JBoss supports two different JMS providers:

  1. JBossMQ – offers a highly available singleton JMS service and is the out of the box configuration for all JBoss 4.2 and earlier versions. This implementation supports Java 1.4.
  2. JBoss Messaging – offers a highly available distributed message service but requires Java 1.5+. May be configured in JBoss 4.2 and will be the out of the box configuration in the next JBoss release.

We currently run ATG 7.2 under Java 1.4 and I wanted to keep our JBoss servers at the same level if possible so I decided to use JBoss 4.0.5 and JBossMQ. The first problem I encountered was that even though JBossMQ supports high availability it does so with the assistance of its clients. JBossMQ expects all its clients to register a JMS ExceptionListener to handle connection failures by reopening the connection and re-creating all JMS objects when a failure occurs. Clearly this wasn’t going to work for ATG PatchBay – I needed transparent fail over.

My next approach was to use JBoss 4.2 with JBoss Messaging. This required the JBoss servers to run on Java 1.5 but I figured I could live with that until we moved to ATG 2007.1. Of course this didn’t work as the JBoss 4.2 client jars were compiled for Java 1.5 and all my ATG instances were running under 1.4. This was starting to look like more trouble than it was worth but first I ran all the JBoss client jars through Retroweaver and deployed them under Java 1.4. This looked promising until I connected to the JBoss instance and pulled back an InitialContext. The stub that was returned required Java 1.5. I may have been able to work around this but I gave up on JBoss 4.2.

Now a reasonable person would have given up on JBoss at this point and perhaps purchased SonicMQ. Instead I set about writing a JMS mapping layer that would sit between PatchBay and JBossMQ and perform transparent fail over. What I did was use a decorator pattern to wrap every JBoss JMS class with my own that knew how to recreate itself in the event of a fail over. This wasn’t difficult but it involved a fair amount of coding.

I actually got this approach working and it appeared to work very well but then I had another idea and I set this option aside for the time being.

By the way, the JBossMQ transparent fail over layer is not specific to PatchBay, if anyone has a need of this I can probably arrange to give you the code.

That wraps up part 2 of this series. Stay tuned for my second alternative presented in part 3.