Process massive amounts of XML

Stream massive amounts of XML to XQuery expressions

The XQJ API allows you to stream XML content directly to the XQuery engine via external variables. this content can either be queried against or stored into the database.

The XQDynamicContext interface, provides methods for binding XML to XQuery variables as streams.

[SNAPSHOT]: XQDynamicContext (streaming XML content)

public interface XQDynamicContext { void bindDocument(QName qn, Reader r, String baseURI, XQItemType t); void bindDocument(QName qn, InputStream is, String baseURI, XQItemType t); void bindDocument(QName qn, XMLStreamReader xsr, XQItemType xqit); void bindDocument(QName qn, Source src, XQItemType xqit); // All other methods of XQDataFactory ... }

However, by default XQJ will consume the input source immediately (i.e. reading the input source there and then, storing the data in memory as a DOM Document Object). You may run into some difficulty if the XML content you are planning on using is 100 gigabytes.

To get around this and to make sure XQJ does not act like a greedy child when presented their favourite flavoured cookie, there are a couple of things you can do

Enabling deferred binding

// Create a NEW XQStaticContext Object (based on the current static context) XQStaticContext properties = conn.getStaticContext(); // Set its Binding Mode property to deferred (i.e. streaming). properties.setBindingMode(XQConstants.BINDING_MODE_DEFERRED); // 1. Properties can be set in the context of a single XQExpression: XQExpression xqe = conn.createExpression(properties); // 2. or in the context of a single XQPreparedExpression: String xqueryString = "declare variable $massiveXML external; $massiveXML"; XQPreparedExpression xqpe = conn.prepareExpression(xqueryString, properties); // 3. or if you wish to make a global change, affecting all NEW Expressions: conn.setStaticContext(properties);

The following is a full working example of streaming XML data from a HTTP URL to an XQuery Processor via a bound variable:

DeferredBinding.java

import javax.xml.xquery.*; import java.io.*; import java.net.URL; import javax.xml.namespace.QName; import net.cfoster.sedna.xqj.SednaXQDataSource; public class DeferredBinding { public static void main(String[] args) throws IOException, XQException { XQDataSource xqs = new SednaXQDataSource(); xqs.setProperty("serverName", "localhost"); xqs.setProperty("databaseName", "test"); XQConnection conn = xqs.getConnection("SYSTEM", "MANAGER"); XQStaticContext properties = conn.getStaticContext(); properties.setBindingMode(XQConstants.BINDING_MODE_DEFERRED); XQExpression xqe = conn.createExpression(properties); String surl = "http://www.w3.org/TR/2007/REC-xquery-20070123/xquery.xml"; URL url = new URL(surl); xqe.bindDocument(new QName("x"), url.openStream(), null, null); String xqueryString = "declare variable $x external; $x"; XQResultSequence rs = xqe.executeQuery(xqueryString); while(rs.next()) System.out.println(rs.getItemAsString(null)); conn.close(); } }

End of section summary

Now you're comfortable sending huge amounts of XML Data to the XQuery engine to process, how about processing huge amounts of XML Data sent from the XQuery processor.

The next section discusses streaming XQuery Result Sequences with StAX (Streaming API for XML) and SAX (Simple API for XML).