Thoughts on Ordered Messaging

Customers often suggest that they want 'ordered messaging'. Many messaging providers suggest that they will give you ordered messaging. You'll see concepts like 'ordered queues' being discussed in many products. However, as always with one-liners, the devil is in the details.

When is a timestamp created?

Timestamps are at the heart of ordering. They identify the core question behind any ordered system - 'which came first'.

Nearly all messaging systems have the capability to timestamp a message but it's important to understand *when* the timestamp is created.

JMS defines timestamp creation as happening in the messaging provider's client just as the message is sent and before the send call returns back to the application (but you can switch timestamps off entirely for speed if you want to). AMQP only defines the wire protocol and therefore has no definition of when the timestamp was created - if at all. 

If we extrapolate this further.... in a multi-broker environment the timestamp may be set when the message arrives on the queue. So, which timestamp were we expecting to be ordering by? The timestamp that was set when the message was put on the first queue or the broker's timestamp where it finally landed?

At this point, therefore, some judgement calls have to be made as to what we mean by order and timestamps. It may well be that the actual body of the message holds the true answer - it may contain a timestamp. If it does then, perhaps, that timestamp gives the order of the message. Usually, in such cases, the messaging system can't help in these scenarios. It's up to the application to ensure that for whatever reason they were keen to get message ordering that they use the message body timestamp and not the messages meta-data timestamp.

The many-many problem

Let's assume that the timestamp produced by, say, JMS is good enough to consider in terms of ordering. There is then another problem to be considered - multiple producers and consumers.

Single Producer, Single Consumer

If one producer and one consumer communicate via a queue and the queue is designated as un-ordered (whether deliberately or by default) then the chances are still quite high that the messages will get consumed in the order that they were put on to the queue.

This is because most queueing systems will put messages into a literal queueing system under-the-hood and that usually defaults to a FIFO queue. If the messaging provider gives an ordered queue in this scenario then it's pretty much guaranteed that these is ordered messaging as long as there aren't multi-threaded clients (producer or consumer).



Multiple Producers, Single Consumer

If this scenario is scaled out to say two producers and one consumer then client created timestamps give a level of message order understanding. However, if those two message producers each produce one message each (Message 1 & 2 let's call them !). If producer two's message arrives at the messaging system before producer one's message and the consumer is being fed the messages as they arrive, then there is no way for the consumer to tell that the message they just received is out of synch and that they needed to wait for message number 1 to arrive. Therefore, although there is message ordering across one provider in this scenario. multiple-producers means that the ordering is lost across the complete set of messages.



Some level of message ordering can be gained on a per producer basis if each producer identifies itself (by tagging the message with a unique ID).

Single Producer, Multiple Consumers

I'm sure that by now you get the gist... In this case the consumers may well get the correct order of messages from the producer, however, an individual consumer almost certainly won't see all the messages. This means that although an individual consumer does get ordered messages it may be that the messages are processed out of order because say, one consumer is faster than another. For example: 

  1. Consumer one processes message1, 
    1. Consumer 2 processes message 2
    2. Consumer 2 finished quicker than consumer 1 and consumes message 3. 
    3. Consumer 2 finishes processing of message three.
  2.  Consumer 1 finishes processing of message 1.



Therefore, in this scenario the order of the messages being processed was 2, 3, 1 - definitely not what may have been expected !

Clustered Servers

When we take the example of a clustered message broker the problem is even more obvious. In even the simplest of cases where there are two message producers putting messages to a cluster of message brokers, the ability to get ordered messaging is significantly reduced. 




In order for even basic message ordering to happen there needs to be a central store that the message broker uses for the messages which acts as a lock for the cluster or there needs to be a lock across the multiple local stores. Either way the solution is going to take latency out of the system - which may, or may not be OK. Only some message providers provide this facility and I will leave that to another day to discuss !

Message Proxying

All of these scenarios are based on the concept of a producer connecting directly with a messaging system. This is often not the case. Often, the system that actually connects to the messaging system itself is a proxy service of some kind like an API-Manager, ESB or an Application in an Application Server. In these cases only the timestamp within the message body itself can really be of use in message ordering as the it represents the true creation time of the message. ( If the AMQP protocol is being used with a non-JMS API the timestamp within the body of the message could be extracted and be set as the meta-data timestamp of the message on the queue - but JMS does not allow us that facility.)


Conclusions

We've seen how even the most basic of scenarios leads us to the view that the application is going to have to take some kind of responsibility for the ordering of messages. Many messaging systems will provide the tools for ordering such as message timestamps and ordered queues. However, scenarios like clustering and multiple consumers and producers complicates the matter substantially. In reality, ordered messaging is always a combination of the application understanding what messages are being processed and when; combined with the available functions within the message broker of choice. 
It could even be that message ordering is nothing to do with processing time but rather correlation of messages post processing.






the conclusio 

Comments

Popular posts from this blog