What is 'Reliable' Messaging ?

Often customers will say 'I want reliable messaging'. At this point I ask lots of questions – the main one being 'what happens if you lose that message'. Only then can I actually start to ascertain what *they* mean by 'reliable'

In this blog I'd like to cover the messaging terms 'persistence', reliability', 'transactionality' and 'acknowledgements'. These are all terms used when talking about messaging. Unfortunately, it's the ease with which each term can interchange with the other that is the problem. Aligned to all these terms are the more specification like: at-most-once, at-least-once and once-and-once only. 


There are many messaging providers in the market, IBM MQ, WSO2 Message Broker, Active MQ to name some of the key players. Some providers aim for speed, some for more complex scenarios, some for usability, some try to achieve small overall message size, some for sheer market presence etc. etc. Whichever one you choose they will all talk about the following qualities of service – these are the standards that they all try to adhere to regardless of what else they are trying to achieve.

Reliability

Let's get one thing straight before we go any further....This term is a catch-all term that actually isn't an official term that you'll see in e.g. the JMS API. All the following characteristics could be thought of as helping you achieve 'reliability' just, different levels of 'reliability'.

NOTE: Even the JMS specification puts persistent messaging under the banner of 'reliability'. 'Reliability' is about far more than just message persistence.

Specification terminology

When specifying a messaging scenario many people will use the terms 'At-<Something>-Once. Let's look at those terms first....

At-Most-Once

This term is specifying that the system being built is OK to lose the odd message. However, it's also saying that the solution cannot tolerate the same message being processed twice.

At-Least-Once

This specification is saying that a message can be processed more than once. We, as the application writer, will figure out which of the one or more messages we want to keep or perhaps just process it twice and it doesn't matter. We are, therefore, absolving the messaging provider from all sorts of worries about duplication of messages. However, we are not absolving the messaging providing from losing the message because we asked for *at least* once.

Once-And-Once-Only

This is the mother of all messaging requirements – this is specifying that the message must be processed at its destination just once. So, no duplicates please and also don't lose the message as it's important.

Messaging Terminology

OK, so now we know what the specification might look like; let's go look at how to declare that to the messaging provider in messaging terms....


Acknowledgements

An acknowledgement mode is saying that the message has arrived at the client from the server and that it's now in the hands of the client so it can be deleted from the queue.

JMS defines three types of acknowledgement mode....

AUTO_ACKNOWLEDGE

With this mode the messaging provider handles the underlying acknowledgements on behalf of the client application. This means that there is a protocol exchange between the providers client-side code (the client that you're application is using to receive messages rather than your application itself) and the server. This is the most used acknowledgement-mode as it means that you have a level of 'reliability' without thinking too much ;-)

Unfortunately, there is a clear 'window of opportunity' here where messages can be lost as shown below.
Potential Message Loss in Synchronous AUTO_ACK


The above diagram shows the potential loss in a synchronous receive case. There is a subtle difference in the Asynchronous message case in that a message may be received twice.

Potential Message Duplication in Asynchronous AUTO_ACK

CLIENT_ACKNOWLEDGE

In this mode the client application explicitly acknowledges the receipt of the message. There is still potential for duplicate messages.   One main cause for this is if a failure happens just after the client has processed the message but has not acknowledged it yet as shown below.
Potential Message Duplication in CLIENT_ACK mode

DUPS_OK_ACKNOWLEDGE

This is like AUTO_ACK mode in that it is up to the underlying JMS providers client code to acknowledge the messages. The difference between DUPS_OK and AUTO_ACK is that the JMS provider is allowed to lazily acknowledge messages - which may mean they decide to acknowledge them in batches. This gives the potential for even larger windows of opportunity for duplication of messages.


Persistence

Persistence is referring to when the messaging provider serialises the message to disk. Let's look at non-persistent messaging first.

Non-Persistent Messages

Pretty much everyone know what this means. When a message is classed as non-persistent it's saying to the messaging provider 'hey, this is a lovely message but, don't worry if you lose it somewhere'. The usual sort of scenario in which you would say this is when you have another message coming along any time now and so, 'if you lose this one Mr provider, it won't worry me because I've got another one with a more recent update coming anyhow.'

Be aware – The messaging provider could interpret this as declaring 'I don't care too much about this message' – Sure, most messaging providers will try their best to get your message where it's going but, when the going gets tough, this message might just be the first one to go. So, be careful what you wish for. 'What can go wrong'? you may think – well, consider things like server memory over-runs or SLA's not being met for the persistent messages and much more - that'll be where your non-persistent may well just disappear – gone forever !

Another brief side-note here.... Lots of customers get confused by believing that once a message is 'on the queue', even though it is non-persistent, that it won't get lost in the case of message server failure. If a message is classed as 'non-persistent' the chances are high that the messaging provider did not actually write the message to disk (in order to increase performance). That, of course, means that the message will disappear if the server fails.

Persistent Messaging

Often customers tell me that they want 'persistence' they say this because they believe that if they have 'persistent messaging' then they won't lose messages. Unfortunately, this isn't quite true. We have seen how message loss can occur when using AUTO_ACK mode - that will happen regardless of whether the message was persistent or not ! 

Another area where messages can get lost is in an ESB like environment. Very often in a message driven sequence in an ESB a message is taken from the queue and then processed. Unfortunately, most ESB providers acknowledge the message within the node that takes the message off the queue. So, as the message progresses along the sequence, the ESB could fail and the message is lost, persistence or no persistence, as seen below.
Potential Message Loss in an ESB Environment


Transactional messaging..

Regardless of which ACKNOWLEDGE protocol is used, for many applications processing of a message  is not confined to just the consumer that initially receives the message. Very often consumers will forward the message on to other systems or threads to complete the processing . Consider the above ESB flow where the message being removed from the queue is just the beginning of the processing.

At this point some people get confused – Many people associate transactions as something databases do and had never associated them with messaging before. (btw: in JMS, if transactions are used then acknowledgement modes are ignored.)

If we now review our previous scenario we can see that the message is under a transaction all the way through the flow.
Messages Under A Transaction in an ESB Flow

From the persistent message on the Input queue to the persistent message on the output Q the message is under one transaction. We know that if the system fails then we will, at least, have a transaction log available to us, or, preferably a commit failure and a rolled-back message on the input queue. If, during the course of the processing of the message in the sequence, we made sure any work we did (like altering a database) was also under a transaction then we know that all that work will be rolled back as well.

Relating Solution Specification to Messaging Constructs

Probable Mappings from Specification requirements to ACK_Modes and Persistence

At-Most-Once

We can infer that this is saying 'we can lose a message' in which case we are certainly saying that this is a non-persistent message (and thereby hopefully gain performance). At-Most-Once is certainly declaring that duplicates are not allowed. We have seen that in both AUTO_ACK (Asynchronous and synchronous) and CLIENT_ACK duplicates are a potential problem so we need to code around the potential that we may receive duplicate messages regardless of which Acknowledgement mode we use. The alternative is to ensure that the message handling is done under a transaction and that any alterations to external entities (e.g. database) that we do is done under the same transaction. 

At-Least-Once

This time we cannot infer that we can lose messages. If this is confirmed (after further conversations with the customer ;-) then we have to use persistent messages. This level of specification is saying that we can have duplicate messages so DUPS_OK is fine as the level of acknowledgement. If, for instance, processing of the duplicates is becoming burdensome then CLIENT_ACK would be more sensible a choice. Alternatively AUTO_ACK with asynchronous delivery could be appropriate. Of course, in the ESB scenario above, transactions are essential.

Once-And-Once-Only

Clearly this is saying that we don't want to lose messages so, persistent messaging is a must. Given that no duplicates are a requirement then transactions are the obvious acknowledgement method of choice.


Client persistence

Yes, it gets more confusing I'm afraid....

Although we may well have declared our message as being 'persistent' and put it under a transaction many providers deem persistence as the message being persisted on the server. So, we have a network gap where our message isn't persistent – between our client and our server. This is why, for instance, IBM MQ was originally designed only with local clients in-mind i.e. clients that were running in, or very near, to the same memory space as the MQ server i.e. minimise the distance between the server and client.

To get around this problem there are now clients available that persist the message on the same location as the client. This means, that the client and persistence mechanism are usually in the same memory space and 'almost' guarantees that if the messaging client acks the message and you asked for persistence then you've got it – the message will be safely stored away on your local client in-case of failure. The message will be attempted to be sent to the server (and usually re-attempted and all hopefully under a transaction) as and when the client and server are both available and able to connect to each other.

Summary:

In this blog I tried to get across some of the complexities of the simple phrase 'reliable messaging'. It can be very easy to read an API and have the feeling that you have achieved your goals. In-depth analysis of the application and the messages and the interactions with external systems all need to be considered when designing your solution.



Comments

Popular posts from this blog