Exchange that are routed by the container sometimes contain messages whose content are stored as streams.
Generally, the content of messages are of one of the following type:
- DOMSource: in-memory representation of a XML document, can be read multiple-time without problems.
- StreamSource: XML document readable from a stream, can only be read ONCE.
- SAXSource: XML document readable from a stream or a reader (in an InputSource), can only be read ONCE.
- StAXSource: XML document readable from a stream (in an XMLEventStream) or a reader (in an XMLEventReader), can only be read ONCE.
Petals sometimes need to read the content of a message for the following reasons:
- retry delivery when it fails (in RouterServiceImpl with SourcesForkerUtil).
- monitoring and persistence (in RouterMonitorServiceImpl with SourcesForkerUtil).
- logging (in PetalsPayloadDumperFileHandler, in RMIClient).
Currently, this problem is handled by "forking" the streams: the original stream is read in order to create one or more streams from it.
One of the work takes the place of the original stream, and the others are used either for directly reading (as in the PetalsPayloadDumperFileHandler or RMIClient), or for restoring a consumed stream in the message (as in RouterServiceImpl or RouterMonitorServiceImpl, with SourcesForkerUtil).
The forking of the stream (used in PetalsPayloadDumperFileHandler, RMIClient or SourcesForkerUtil) is implemented (in com.ebmwebsourcing.easycommons.xml.SourceHelper using com.ebmwebsourcing.easycommons.stream.InputStreamForker) by caching the stream, i.e. creating an in-memory copy of it and then the new streams are created from it. Currently only StreamSource and SAXSource are supported because it is not possible to replace the stream used in a StAXSource (but it may be not needed to do so if we replace directly the Source in the NormalizedMessage instead of replacing the stream in the Source...).
The forking/restoring of the message content (used in RouterServiceImpl and RouterMonitorServiceImpl) is implemented (in SourcesForkerUtil) by storing in a static (so as a global state...) map for each message a fork of the content of the message (a Source) if it is a stream and restored when needed.
Open questions:
- Is that useful to fork streams by copying them into memory, maybe multiple times, without control on that., instead of simply transforming them to an in-memory representation?
- One solution is to simply transform them to an in-memory representation such as DOMSource once instead of faking a Stream backed up by a memory representation.
- Another one is to use a forker that actually uses streams without using too much memory (but it's complex to implement... we have one implementation that cannot support high charge apparently in org.ow2.easywsdl.wsdl.util.InputStreamForker)
- Is that a good idea to have to manage a shared state in the form of a static in SourcesForkerUtil.
- This implies to know when to close the streams (even though currently the streams are in-memory copies, so closing them is not performance critical, but we never know when we change the implementation of the forker for something else)
- but also it implies that this shared state is a terrible bottleneck right in the middle of the router that perform resources intensive operations such as reading streams and copying data...
- A solution could be to remove it but for now we have a problem because the streams must be accessed after an exchange has been sent, and in a remote context sending an exchange consumes its content's streams...