MQ Poison Messages and Backout Queues
In this post Connor Smith and I will give a brief overview of Poison Messages, and how the various IBM MQ clients handle them and what you need to do as an application writer.
The above examples show that the error caused by a poison message is not necessarily something that the application itself should have foreseen.
The figure above shows the result of just such a transaction rollback. The message has been removed from the queue, processed until an exception occurs and then rolls-back onto the queue ready to be read again. There are a number of caveats to this method…
This mechanism works fine if we have a failure that won’t happen again. However, if the message is poisoned such that it fails to get processed every time it is read from the queue then we need to think about a way to stop processing the message and deal with it differently. This could be handled by a try-catch block in some circumstances, but let’s assume that we failed to put those in correctly or have chosen, or been forced, into using other methods.
IBM MQ gives the administrator the ability to set a field on the Queue called the Backout Threshold. This field is a hint to the developer so that they can decide what to do if the BackoutCount of a newly GOT message is equal to or greater than the BackoutThreshold of the queue that they just read the message from. Yes, it’s "just a hint", we’ll see why in a minute. Also, be aware that the default threshold is zero i.e. that the threshold is not set and this value cannot be used to ascertain whether the message has been rolled back too many times.
So, if we put those pieces together, we can see that the developer is meant to read a message from a queue, they are meant to look at the message’s backout count. They should then compare that with the queues backout threshold property (which they should have acquired earlier in the code). If the message’s backout count equals or exceeds the queue;s backout threshold then the developer should retrieve the name of the backout queue, as defined on the queue the message was read from, and PUT the message onto that backout queue.
NOTE: If the backout queue has not been set on the original queue, by the administrator then the message should be PUT to the Dead Letter Queue.
This sounds like a lot of work - and it is ! Which is why lots of confusion lies in this area. However, although this was the original semantics of IBM MQ, newer messaging protocols and solutions have come along which have made this job a little easier for the developer.
WebSphere Application Server (WAS) &
Paul Titheridge has done an excellent write up of how How WAS handles rollbacks and I won’t repeat it here. However, in summary, He concludes that you really need to have both the backout threshold and backout queue names defined on your original queue to avoid poison messages being repeatedly read in and creating a loop - just as you do for JMS.
Poison Messages
A poison message is a message which is unable to be processed by a receiving application. The following are some possible reasons for this…- The IBM MQ client code itself fails to handle the message due to e.g. a bad header in the message.
- The message causes an error in the underlying system e.g. in the XML parser.
- The message has bad data in it which the application is not expecting and causes an exception in the application.
The above examples show that the error caused by a poison message is not necessarily something that the application itself should have foreseen.
Transactional rollbacks
Once a message has been found to be poisoned the next question is, what to do about it. If the message is GOT from the queue within a transaction and the failure happens while the message is being processed within the same transaction then the message will automatically rollback on to the queue and be available to be re-processed.Figure 1: Transaction rollback |
The figure above shows the result of just such a transaction rollback. The message has been removed from the queue, processed until an exception occurs and then rolls-back onto the queue ready to be read again. There are a number of caveats to this method…
- The message must be read from the queue under the transaction.
- The message must be processed under that transaction.
- The exception that causes the message to fail processing must not be caught. If the exception is caught then it is up to the coder to figure out what to do next (options include throwing another exception and causing the message to rollback !)
This mechanism works fine if we have a failure that won’t happen again. However, if the message is poisoned such that it fails to get processed every time it is read from the queue then we need to think about a way to stop processing the message and deal with it differently. This could be handled by a try-catch block in some circumstances, but let’s assume that we failed to put those in correctly or have chosen, or been forced, into using other methods.
Backout Count and Thresholds
Each time a message is backed out to the queue, the message attribute BackoutCount field (in the MQMD) is incremented by MQ itself. This means that the next time the message is GOT from the queue the BackoutCount field will be greater than zero.The next thing to consider is what to do with the information that the newly received message has been attempted to be processed before but failed and was rolled-back.IBM MQ gives the administrator the ability to set a field on the Queue called the Backout Threshold. This field is a hint to the developer so that they can decide what to do if the BackoutCount of a newly GOT message is equal to or greater than the BackoutThreshold of the queue that they just read the message from. Yes, it’s "just a hint", we’ll see why in a minute. Also, be aware that the default threshold is zero i.e. that the threshold is not set and this value cannot be used to ascertain whether the message has been rolled back too many times.
Backout Queues
IBM MQ gives the ability for the MQ administrator to define, on the queue, which other queue they expect the developer to place any messages on, if the message has failed their back-out threshold test.So, if we put those pieces together, we can see that the developer is meant to read a message from a queue, they are meant to look at the message’s backout count. They should then compare that with the queues backout threshold property (which they should have acquired earlier in the code). If the message’s backout count equals or exceeds the queue;s backout threshold then the developer should retrieve the name of the backout queue, as defined on the queue the message was read from, and PUT the message onto that backout queue.
NOTE: If the backout queue has not been set on the original queue, by the administrator then the message should be PUT to the Dead Letter Queue.
This sounds like a lot of work - and it is ! Which is why lots of confusion lies in this area. However, although this was the original semantics of IBM MQ, newer messaging protocols and solutions have come along which have made this job a little easier for the developer.
Comments
Post a Comment