Petals Distribution

Reliability aka delivery guarantees for consumer-provider communication

Details

  • Type: New Feature New Feature
  • Status: New New
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: 4.3.0-beta-1
  • Fix Version/s: 5.4.0
  • Component/s: Components, Container
  • Security Level: Public
  • Description:
    Hide

    This question relates to many components of Petals, in particular the container and the CDK.

    The main point is to be able to guarantee to a consumer that its message is handled as desired by the provider (and inversely for out messages and such).
    The term reliability and persistence are used in more or less in the same way as in the JBI specification (annexe).

    Many aspects exists:

    1. When the container is not impacted by any problem, the JBI specification provides everything needed: if the message exchange pattern is followed (i.e. either DONE, FAULT or IN/OUT messages are exchanged as the pattern dictates) then both actors can assume everything is ok, and if an error is set it means something went wrong and the actors are notified with this information.
    2. When the container has a problem (let's say crash), messages shouldn't be lost so that when it comes back, the execution can continue as desired.

    With such guarantees, then the service consumers and providers can focus on their business and do not have to manage anything for handling such failures.
    Of course in some corner case this may not be attainable, this has to be discussed here.

    For each of these aspects, there is work to do:

    1. We need to ascertain that for the whole path that an exchange take (i.e. from send to accept), errors are caught as needed and the exchange is correctly set in error and returned to the sender. This means that:
      • The CDK should send back an error if it drops messages (see PETALSCDK-135 and PETALSCDK-90).
      • We should verify everything in the container (issue to be created in case it is needed)
    2. From the moment a send() call on the DeliveryChannel returns, the message should be considered as safely persisted until a call to accept() is done on the other side
      • We should take into account distributed scenario (i.e. persistence must happen both before the transport is involved on the sender side, and after on the receiver side). If possible in the local case, double persistence shouldn't happen.
      • Some choice should be made between these two options (or making that configurable... why not... I'm not even sure these questions are not already answered by the JBI specs):
        • the execution from "send()" to "the message is stored in the deliverychannel queue" is kind of atomic and only persisting in the DeliveryChannel queue (see PETALSESBCONT-339) is enough, if it doesn't happen the send method will throw an exception and the sender has to deal with it.
        • as soon as send is called, if something wrong happen before the message can be stored in the deliverychannel of the receiver (assuming it is persisted as in PETALSESBCONT-339), then when it's relevant, the message would have been persisted so that when the problem is solved it can be resent again. In that case send would return as soon as persistence has occurred. (not sure why this behaviour would be preferred... we should discuss that).
      • Some preliminary discussions on persistence in the delivery channel are in PETALSESBCONT-339 (which should maybe rejected and dispatched in multiple issues because it's a mess...)
    Show
    This question relates to many components of Petals, in particular the container and the CDK. The main point is to be able to guarantee to a consumer that its message is handled as desired by the provider (and inversely for out messages and such). The term reliability and persistence are used in more or less in the same way as in the JBI specification (annexe). Many aspects exists:
    1. When the container is not impacted by any problem, the JBI specification provides everything needed: if the message exchange pattern is followed (i.e. either DONE, FAULT or IN/OUT messages are exchanged as the pattern dictates) then both actors can assume everything is ok, and if an error is set it means something went wrong and the actors are notified with this information.
    2. When the container has a problem (let's say crash), messages shouldn't be lost so that when it comes back, the execution can continue as desired.
    With such guarantees, then the service consumers and providers can focus on their business and do not have to manage anything for handling such failures. Of course in some corner case this may not be attainable, this has to be discussed here. For each of these aspects, there is work to do:
    1. We need to ascertain that for the whole path that an exchange take (i.e. from send to accept), errors are caught as needed and the exchange is correctly set in error and returned to the sender. This means that:
      • The CDK should send back an error if it drops messages (see PETALSCDK-135 and PETALSCDK-90).
      • We should verify everything in the container (issue to be created in case it is needed)
    2. From the moment a send() call on the DeliveryChannel returns, the message should be considered as safely persisted until a call to accept() is done on the other side
      • We should take into account distributed scenario (i.e. persistence must happen both before the transport is involved on the sender side, and after on the receiver side). If possible in the local case, double persistence shouldn't happen.
      • Some choice should be made between these two options (or making that configurable... why not... I'm not even sure these questions are not already answered by the JBI specs):
        • the execution from "send()" to "the message is stored in the deliverychannel queue" is kind of atomic and only persisting in the DeliveryChannel queue (see PETALSESBCONT-339) is enough, if it doesn't happen the send method will throw an exception and the sender has to deal with it.
        • as soon as send is called, if something wrong happen before the message can be stored in the deliverychannel of the receiver (assuming it is persisted as in PETALSESBCONT-339), then when it's relevant, the message would have been persisted so that when the problem is solved it can be resent again. In that case send would return as soon as persistence has occurred. (not sure why this behaviour would be preferred... we should discuss that).
      • Some preliminary discussions on persistence in the delivery channel are in PETALSESBCONT-339 (which should maybe rejected and dispatched in multiple issues because it's a mess...)
  • Environment:
    -

Issue Links

Activity

Hide
Victor NOËL added a comment - Thu, 2 Jul 2015 - 18:02:32 +0200

I agree yes, that's what I meant by "When the container is not impacted by any problem, the JBI specification provides everything needed".

Nevertheless, currently, there is situations when messages are lost and that could be avoided (for example messages are dropped in the CDK, and if the container crashes, exchanges in the queue are not restorable easily).
That's what this issue is about: the question discussed here is at a level lower than the MEP and the MEP should relies on this.
Also, I guess you understood that this isssue is a tentative to clear the mess of the PETALSESBCONT-339 issue

I don't understand the question "but which difference between both ?".

As for the send versus sendSync, yes, that's a good question. When using the word "send", I was actually also considering sendSync in the discussion, but things are a bit different yes.

  • in case of crash, there is nothing to do there, the sender has been stopped, it's not as if it can receive the message when it is back online, but also he never considered that things were ok since he never received its response.
  • if something is happening without the sender being stopped, then it's the same as with the rest I think, no? An error is present in the returned exchange when it is applicable, or else, the sendSync will either block until the message is back or it will timeout.
Show
Victor NOËL added a comment - Thu, 2 Jul 2015 - 18:02:32 +0200 I agree yes, that's what I meant by "When the container is not impacted by any problem, the JBI specification provides everything needed". Nevertheless, currently, there is situations when messages are lost and that could be avoided (for example messages are dropped in the CDK, and if the container crashes, exchanges in the queue are not restorable easily). That's what this issue is about: the question discussed here is at a level lower than the MEP and the MEP should relies on this. Also, I guess you understood that this isssue is a tentative to clear the mess of the PETALSESBCONT-339 issue I don't understand the question "but which difference between both ?". As for the send versus sendSync, yes, that's a good question. When using the word "send", I was actually also considering sendSync in the discussion, but things are a bit different yes.
  • in case of crash, there is nothing to do there, the sender has been stopped, it's not as if it can receive the message when it is back online, but also he never considered that things were ok since he never received its response.
  • if something is happening without the sender being stopped, then it's the same as with the rest I think, no? An error is present in the returned exchange when it is applicable, or else, the sendSync will either block until the message is back or it will timeout.
Hide
Christophe DENEUX added a comment - Thu, 2 Jul 2015 - 17:44:25 +0200 - edited

In my mind, the delivery guarantee is linked to the MEP, and the delivery guarantee is important when the consumer does not expect a reply from the provider, except an ACK telling that it's message is taken into account:

  • for MEP InOut, or InOptionalOut, it has no sens because a reply is expected. If no reply occurs, it's a timeout. So the consumer can manage the situation,
  • for MEP InOnly, and RobustInOnly: the delivery guarantee seems to have sens, but which difference between both ?

In your 2nd point, you talk about "From the moment a send() call on the DeliveryChannel returns", but what about DeliveryChannel.sendSync(...) ?

Show
Christophe DENEUX added a comment - Thu, 2 Jul 2015 - 17:44:25 +0200 - edited In my mind, the delivery guarantee is linked to the MEP, and the delivery guarantee is important when the consumer does not expect a reply from the provider, except an ACK telling that it's message is taken into account:
  • for MEP InOut, or InOptionalOut, it has no sens because a reply is expected. If no reply occurs, it's a timeout. So the consumer can manage the situation,
  • for MEP InOnly, and RobustInOnly: the delivery guarantee seems to have sens, but which difference between both ?
In your 2nd point, you talk about "From the moment a send() call on the DeliveryChannel returns", but what about DeliveryChannel.sendSync(...) ?

People

Dates

  • Created:
    Thu, 2 Jul 2015 - 17:08:10 +0200
    Updated:
    Mon, 17 Apr 2023 - 12:27:45 +0200