[#PETALSESBCONT-327] Improve the caching of streams (*Source in the content of messages) - Petals Link JIRA

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0, 4.1.0, 4.2.0, 4.2.1, 4.2.2, 4.2.3, 4.2.4
Fix Version/s: 5.0.0
Component/s: Logging, Micro-kernel, Monitoring, Persistence, Router
Security Level: Public

Description:
Hide
Exchange that are routed by the container sometimes contain messages whose content are stored as streams.

Generally, the content of messages are of one of the following type:

DOMSource: in-memory representation of a XML document, can be read multiple-time without problems.

StreamSource: XML document readable from a stream, can only be read ONCE.

SAXSource: XML document readable from a stream or a reader (in an InputSource), can only be read ONCE.

StAXSource: XML document readable from a stream (in an XMLEventStream) or a reader (in an XMLEventReader), can only be read ONCE.

Petals sometimes need to read the content of a message for the following reasons:

retry delivery when it fails (in RouterServiceImpl with SourcesForkerUtil).

monitoring and persistence (in RouterMonitorServiceImpl with SourcesForkerUtil).

logging (in PetalsPayloadDumperFileHandler, in RMIClient).

Currently, this problem is handled by "forking" the streams: the original stream is read in order to create one or more streams from it.
One of the work takes the place of the original stream, and the others are used either for directly reading (as in the PetalsPayloadDumperFileHandler or RMIClient), or for restoring a consumed stream in the message (as in RouterServiceImpl or RouterMonitorServiceImpl, with SourcesForkerUtil).

The forking of the stream (used in PetalsPayloadDumperFileHandler, RMIClient or SourcesForkerUtil) is implemented (in com.ebmwebsourcing.easycommons.xml.SourceHelper using com.ebmwebsourcing.easycommons.stream.InputStreamForker) by caching the stream, i.e. creating an in-memory copy of it and then the new streams are created from it. Currently only StreamSource and SAXSource are supported because it is not possible to replace the stream used in a StAXSource (but it may be not needed to do so if we replace directly the Source in the NormalizedMessage instead of replacing the stream in the Source...).

The forking/restoring of the message content (used in RouterServiceImpl and RouterMonitorServiceImpl) is implemented (in SourcesForkerUtil) by storing in a static (so as a global state...) map for each message a fork of the content of the message (a Source) if it is a stream and restored when needed.

Open questions:

Is that useful to fork streams by copying them into memory, maybe multiple times, without control on that., instead of simply transforming them to an in-memory representation?

One solution is to simply transform them to an in-memory representation such as DOMSource once instead of faking a Stream backed up by a memory representation.

Another one is to use a forker that actually uses streams without using too much memory (but it's complex to implement... we have one implementation that cannot support high charge apparently in org.ow2.easywsdl.wsdl.util.InputStreamForker)

Is that a good idea to have to manage a shared state in the form of a static in SourcesForkerUtil.

This implies to know when to close the streams (even though currently the streams are in-memory copies, so closing them is not performance critical, but we never know when we change the implementation of the forker for something else)

but also it implies that this shared state is a terrible bottleneck right in the middle of the router that perform resources intensive operations such as reading streams and copying data...

A solution could be to remove it but for now we have a problem because the streams must be accessed after an exchange has been sent, and in a remote context sending an exchange consumes its content's streams...
Show
Exchange that are routed by the container sometimes contain messages whose content are stored as streams. Generally, the content of messages are of one of the following type:

DOMSource: in-memory representation of a XML document, can be read multiple-time without problems.

StreamSource: XML document readable from a stream, can only be read ONCE.

SAXSource: XML document readable from a stream or a reader (in an InputSource), can only be read ONCE.

StAXSource: XML document readable from a stream (in an XMLEventStream) or a reader (in an XMLEventReader), can only be read ONCE.

Petals sometimes need to read the content of a message for the following reasons:

retry delivery when it fails (in RouterServiceImpl with SourcesForkerUtil).

monitoring and persistence (in RouterMonitorServiceImpl with SourcesForkerUtil).

logging (in PetalsPayloadDumperFileHandler, in RMIClient).

Currently, this problem is handled by "forking" the streams: the original stream is read in order to create one or more streams from it. One of the work takes the place of the original stream, and the others are used either for directly reading (as in the PetalsPayloadDumperFileHandler or RMIClient), or for restoring a consumed stream in the message (as in RouterServiceImpl or RouterMonitorServiceImpl, with SourcesForkerUtil). The forking of the stream (used in PetalsPayloadDumperFileHandler, RMIClient or SourcesForkerUtil) is implemented (in com.ebmwebsourcing.easycommons.xml.SourceHelper using com.ebmwebsourcing.easycommons.stream.InputStreamForker) by caching the stream, i.e. creating an in-memory copy of it and then the new streams are created from it. Currently only StreamSource and SAXSource are supported because it is not possible to replace the stream used in a StAXSource (but it may be not needed to do so if we replace directly the Source in the NormalizedMessage instead of replacing the stream in the Source...). The forking/restoring of the message content (used in RouterServiceImpl and RouterMonitorServiceImpl) is implemented (in SourcesForkerUtil) by storing in a static (so as a global state...) map for each message a fork of the content of the message (a Source) if it is a stream and restored when needed. Open questions:

Is that useful to fork streams by copying them into memory, maybe multiple times, without control on that., instead of simply transforming them to an in-memory representation?

One solution is to simply transform them to an in-memory representation such as DOMSource once instead of faking a Stream backed up by a memory representation.

Another one is to use a forker that actually uses streams without using too much memory (but it's complex to implement... we have one implementation that cannot support high charge apparently in org.ow2.easywsdl.wsdl.util.InputStreamForker)

Is that a good idea to have to manage a shared state in the form of a static in SourcesForkerUtil.

This implies to know when to close the streams (even though currently the streams are in-memory copies, so closing them is not performance critical, but we never know when we change the implementation of the forker for something else)

but also it implies that this shared state is a terrible bottleneck right in the middle of the router that perform resources intensive operations such as reading streams and copying data...

A solution could be to remove it but for now we have a problem because the streams must be accessed after an exchange has been sent, and in a remote context sending an exchange consumes its content's streams...

Environment:

-

Issue Links

Depends

This issue blocks:
~~PETALSESBCONT-323~~ Remove unneeded dependencies from the system classpath
PETALSDISTRIB-149 Improve handling of Source with respect to their mutability

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Victor NOËL added a comment - Fri, 19 Jun 2015 - 17:24:34 +0200 - edited

See also PETALSESBCONT-339 for a related topic.

In particular, in the case of Message containing big objects, big attachment, etc, they should maybe not stored in the queue but outside of it.

This is because the caching of stream and attachement already is meant to store to disk and/or memory, so some kind of compatible system must be found (for example when storing big messages to disk, a reference to them could be stored in the queue instead… something like that, more work must be done to devise a good strategy).

Show

Victor NOËL added a comment - Fri, 19 Jun 2015 - 17:24:34 +0200 - edited See also PETALSESBCONT-339 for a related topic. In particular, in the case of Message containing big objects, big attachment, etc, they should maybe not stored in the queue but outside of it. This is because the caching of stream and attachement already is meant to store to disk and/or memory, so some kind of compatible system must be found (for example when storing big messages to disk, a reference to them could be stored in the queue instead… something like that, more work must be done to devise a good strategy).

Hide

Permalink

Victor NOËL added a comment - Thu, 16 Jul 2015 - 13:57:54 +0200

Removed unrelated things from the summary.

Also we should note that since ~~PETALSESBCONT-345~~ was (partially) resolved, everything is broken because a provider that uses the IN message will consumes it and then it won't be possible to serialize it or things like that.

I'm really wondering if the best is not to use DOMSource all the time because anyway, we keep so much copies of a Source in memory with the stream forker that it becomes useless and counter-productive !

In Apache Camel, they (optionally, there is a flag to activate it) cache all the stream-like content of message and, potentially, they do it on disk! This relates in a way to PETALSESBCONT-339.

Show

Victor NOËL added a comment - Thu, 16 Jul 2015 - 13:57:54 +0200 Removed unrelated things from the summary. Also we should note that since ~~PETALSESBCONT-345~~ was (partially) resolved, everything is broken because a provider that uses the IN message will consumes it and then it won't be possible to serialize it or things like that. I'm really wondering if the best is not to use DOMSource all the time because anyway, we keep so much copies of a Source in memory with the stream forker that it becomes useless and counter-productive ! In Apache Camel, they (optionally, there is a flag to activate it) cache all the stream-like content of message and, potentially, they do it on disk! This relates in a way to PETALSESBCONT-339.

Hide

Permalink

Victor NOËL added a comment - Thu, 16 Jul 2015 - 15:18:09 +0200

Solution to be implemented:

We will provide our own implementation of Source, derived from StreamSource but reusable and in-memory, inspired by the Apache licenced BytesSource from Apache Camel.
By default, we will use in-memory Source (BytesSource or DOMSource) implementation in NormalizedMessage (enforced by setContent)
We will remove all the fork that are scattered all around the code.

Point 2 will be improved with times, for special cases such as Source with a systemId pointing to an URL (hence a stream is used but only by the Transformer and the Source can be reused) and so on. It won't be done right away because care must be taken that the Source is not modified after… it's complex!

Show

Victor NOËL added a comment - Thu, 16 Jul 2015 - 15:18:09 +0200 Solution to be implemented:

We will provide our own implementation of Source, derived from StreamSource but reusable and in-memory, inspired by the Apache licenced BytesSource from Apache Camel.
By default, we will use in-memory Source (BytesSource or DOMSource) implementation in NormalizedMessage (enforced by setContent)
We will remove all the fork that are scattered all around the code.

Hide

Permalink

Victor NOËL added a comment - Tue, 21 Jul 2015 - 17:40:23 +0200

This is committed, and in particular we made the following choices:

by default DOMSource's Node is made readonly by exploiting not-so-public API of the JVM implementation of Node.
petals-commons became a module for things to end up in the system classloader (and it only contains BytesSource for now), see ~~PETALSESBCONT-323~~.

now we have left:

To support StreamSource and SAXSource with just a systemId pointing to an URL (and not the stream itself that can be read only once).
The question of the modifiability of DOMSource is still open, for now, they should be read-only, and anyway cloned when got from CDK Exchange with getContentAsDocument.
To create issues to remove Persistence and Monitoring components from Petals
Improve maybe some components to take advantage of this new way of doing things… (Camel for example will be improved thanks to BytesSource).

Show

Victor NOËL added a comment - Tue, 21 Jul 2015 - 17:40:23 +0200 This is committed, and in particular we made the following choices:

by default DOMSource's Node is made readonly by exploiting not-so-public API of the JVM implementation of Node.
petals-commons became a module for things to end up in the system classloader (and it only contains BytesSource for now), see ~~PETALSESBCONT-323~~.

now we have left:

To support StreamSource and SAXSource with just a systemId pointing to an URL (and not the stream itself that can be read only once).
The question of the modifiability of DOMSource is still open, for now, they should be read-only, and anyway cloned when got from CDK Exchange with getContentAsDocument.
To create issues to remove Persistence and Monitoring components from Petals
Improve maybe some components to take advantage of this new way of doing things… (Camel for example will be improved thanks to BytesSource).

Hide

Permalink

Victor NOËL added a comment - Wed, 22 Jul 2015 - 13:28:28 +0200

Camel was updated to take advantage of BytesSource.

Show

Victor NOËL added a comment - Wed, 22 Jul 2015 - 13:28:28 +0200 Camel was updated to take advantage of BytesSource.

Hide

Permalink

Victor NOËL added a comment - Wed, 22 Jul 2015 - 15:15:02 +0200

StreamSource and SAXSource are now accepted when possible.

Show

Victor NOËL added a comment - Wed, 22 Jul 2015 - 15:15:02 +0200 StreamSource and SAXSource are now accepted when possible.

Hide

Permalink

Victor NOËL added a comment - Wed, 22 Jul 2015 - 15:26:22 +0200

Modifiability of DOMSource is covered in PETALSDISTRIB-149.

Show

Victor NOËL added a comment - Wed, 22 Jul 2015 - 15:26:22 +0200 Modifiability of DOMSource is covered in PETALSDISTRIB-149.

People

Assignee:

Victor NOËL

Reporter:

Victor NOËL
Watchers:

0

Dates

Created:

Tue, 21 Apr 2015 - 11:05:19 +0200

Updated:

Wed, 22 Jul 2015 - 15:26:33 +0200

Resolved:

Wed, 22 Jul 2015 - 15:26:33 +0200

Petals ESB Container

Improve the caching of streams (*Source in the content of messages)

Details

Issue Links

Activity

People

Dates