Chapter 6. General XQuery extension functions

Table of Contents

1. Serialization
2. XSL Transformation
3. Dynamic evaluation
4. Pattern-matching
5. Date and Time
5.1. Differences with W3C specifications
5.2. Cast Extensions
5.3. Additional constructors
5.4. Additional accessors
6. Error handling
7. Miscellaneous

These general purpose functions belong to the namespace denoted by the predefined "x:" prefix. The x: prefix refers to namespace "com.qizx.functions.ext".

1. Serialization

Serialization — the process of converting XML nodes into a stream of characters — is defined in the W3C specifications, however there is no standard function for performing serialization.

x:serialize can output a document or a node into XML, HTML, XHTML or plain text, to a file or to the default output stream.

x:serialize( $node as node(), $options as element(option) )
  as xs:string?

Description: Serializes the element and all its content into text. The output can be a file (see options below).

Parameter $treea XML tree to be serialized to text.

Parameter $optionsan element bearing options in the form of attributes: see below.

Returned value: The path of the output file if specified, otherwise the serialized result.

The options argument (which may be absent) has the form of an element of name "options" whose attributes are used to specify different options. For example:

x:serialize( $doc,
             <options output="out\doc.xml"
                      encoding="ISO-8859-1" indent="yes"/>)

This mechanism is similar to XSLT's xsl:output specification and is very convenient since the options can be computed or extracted from a XML document.

Table 6.1. Implemented serialization options

option namevaluesdescription
methodXML (default) XHTML, HTML, or TEXToutput method
output / filea file pathoutput file. If this option is not specified, the generated text is returned as a string.
versiondefault "1.0"version generated in the XML declaration. No validity check.
standalone"yes" or "no".No check is performed.
encodingmust be the name of an encoding supported by the JRE.The name supplied is generated in the XML declaration. If different than UTF-8, it forces the output of the XML declaration.
indent"yes" or "no" (default "no").output indented.
indent-value (extension)integer valuespecifies the number of space characters used for indentation.
omit-xml-declaration"yes" or "no" (default "no").controls the output of a XML declaration.
include-content-type"yes" or "no" (default "no").for XHTML and HTML methods, if the value is "yes", a META element specifying the content type is added at the beginning of element HEAD.
escape-uri-attributes"yes" or "no" (default "yes").for XHTML and HTML methods, escapes URI attributes (i.e specific HTML attributes whose value is an URI).
doctype-publicthe public ID in the DOCTYPE declaration.Triggers the output of the DOCTYPE declaration. Must be used together with the doctype-system option.
doctype-systemthe system ID in the DOCTYPE declaration.Triggers the output of the DOCTYPE declaration.
auto-dtd (extension)"yes" or "no" (default "yes").

If the node is a document node and if this document has DTD information, then output a DOCTYPE declaration.

  • A Document stored in an XML Library may have properties storing this information (dtd-system-id and dtd-public-id) initially set by import.

  • a parsed document gets DTD information from the XML parser.

  • a constructed node has no DTD information.


 

2. XSL Transformation

The x:transform function invokes a XSLT style-sheet on a node and can retrieve the results of the transformation as a tree, or let the style-sheet output the results.

This is a useful feature when one wants to transform a document (for example extracted from the XML Libraries) or a computed fragment of XML into different output formats like HTML, XSL-FO etc.

This example generates the transformed document $doc into a file out\doc.xml:

x:transform( $doc, "ssheet1.xsl",
             <parameters param1="one" param2="two"/>,
             <options output-file="out\doc.xml" indent="yes"/>)

The next example returns a new document tree. Suppose we have this very simple stylesheet which renames the element "doc" into "newdoc":

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 version ="1.0" >
  <xsl:template match="doc">
     <newdoc><xsl:apply-templates/></newdoc>
  </xsl:template>
</xsl:stylesheet>

The following XQuery expression:

x:transform( <doc>text</doc>, "ssheet1.xsl", <parameters/> )

returns:

<newdoc>text</newdoc>
x:transform( $source as node(), 
             $stylesheet-URI as xs:string, 
             $xslt-parameters as element(parameters) 
             [, $options as element(options)] )
  as node()?

Transforms the source tree through a XSLT stylesheet. If no output file is explicitly specified in the options, the function returns a new tree.

Parameter $sourcea XML tree to be transformed. It does not need to be a complete document.

Parameter $stylesheet-URIthe URI of a XSLT stylesheet. Stylesheets are cached and reused for consecutive transformations.

Parameter $xslt-parametersan element holding parameter values to pass to the XSLT engine. The parameters are specified in the form of attributes. The name of an attribute matches the name of a xsl:param declaration in the stylesheet (namespaces can be used). The value of the attribute is passed to the XSLT transformer.

Parameter $options[optional argument] an element holding options in the form of attributes: see below.

Returned value: if the path of an output file is not specified in the options, the function returns a new document tree which is the result of the transformation of the source tree. Otherwise, it returns the empty sequence.

Table 6.2. XSLT transform options

option namevaluesdescription
output-fileAn absolute file path.Output file. If this option is not specified, the generated tree is returned by the function, otherwise the function returns an empty sequence.
XSLT output properties (instruction xsl:output): version, standalone, encoding, indent, omit-xml-declaration etc. These options are used by the style-sheet for outputting the transformed document. They are ignored if no output-file option is specified.
Specific options of the XSLT engine (Saxon or default XSLT engine) An invalid option may cause an error.

About the efficiency of the connection with XSLT

The connection with an XSLT engine uses generic JAXP interfaces, and thus must copy XML trees passed in both directions. This is not as efficient as it could be and can even cause memory problems if the size of processed documents is larger then a few dozen megabytes, depending on the available memory size.

3. Dynamic evaluation

The following functions allow dynamically compiling and executing XQuery expressions.

function x:eval( $expression as xs:string )
  as xs:any

Compiles and evaluates a simple expression provided as a string.

The expression is executed in the context of the current query: it can use global variables, functions and namespaces of the current static context. It can also use the current item '.' if defined in the evaluation context.

However there is no access to the local context (for example if x:eval is invoked inside a function, the arguments or the local variables of the function are not visible.)

Parameter $expressiona simple expression (cannot contain prologue declarations).

Returned value: evaluated value of the expression.

Example:

declare variable $x := 1;
declare function local:fun($p as xs:integer) { $p * 2 };

let $expr := "1 + $x, local:fun(3)"
return x:eval($expr)

This should return the sequence (2, 6).

4. Pattern-matching

The following functions match the string-value of nodes (elements and attributes) with a pattern.

Example 1: this expression returns true if the value of the attribute @lang matches the SQL-style pattern:

x:like( "en%", $node/@lang )

Example 2: this expression returns true if the content of the element 'NAME' matches the pattern:

$p/NAME[ x:like( "Theo%" ) ]
function x:like( $pattern as xs:string [, $context-nodes as node()* ])
  as xs:boolean

Returns true if the pattern matches the string-value of at least one node in the node sequence argument.

Parameter $patterna SQL-style pattern: the wildcard '_' matches any single character, the wildcard '%' matches any sequence of characters.

Parameter $context-nodesoptional sequence of nodes. The function checks sequentially the string-value of each node against the pattern. If absent, the argument default to '.', the current item. This makes sense inside a predicate, like in the example 2 above.

Returned value: a boolean.

function x:ulike( $pattern as xs:string [, $context-nodes as node()* ])
  as xs:boolean

This function is very similar to x:like, except that the pattern has syntax à la Unix ("glob pattern"). The character '?' is used instead of '_' (single character match), and '*' instead of '%' (multi-character match).

Note: these functions — as well as the standard fn:matches function, and the full-text functions — are automatically recognized by the query optimizer which uses library indexes to boost their execution whenever possible.

5. Date and Time

5.1. Differences with W3C specifications

Qizx is compliant with the W3C Recommendation. The only differences at present are extensions of the cast operation: Qizx can directly cast date, time, dateTime and durations to and from double values representing seconds, and keeps the extended "constructors" that build date, dateTime, etc, from numeric components like days, hours, minutes, etc.

5.2. Cast Extensions

In order to make computations easier, Qizx can:

  • Cast xdt:yearMonthDuration to numeric values: this yields the number of months. The following expression returns 13:

    xdt:yearMonthDuration("P1Y1M") cast as xs:integer
  • Conversely, cast numeric value representing months to xdt:yearMonthDuration. The following expression holds true:

    xdt:yearMonthDuration(13) = xdt:yearMonthDuration("P1Y1M")
  • Cast xdt:daytimeDuration to double: this yields the number of seconds. The following expression returns 7201:

    xdt:dayTimeDuration("PT2H1S") cast as xs:double
  • Conversely, cast a numeric value representing seconds to xdt:daytimeDuration.

  • Cast xs:dateTime to double. This returns the number of seconds elapsed since ``the Epoch'', i.e. 1970-01-01T00:00:00Z. If the timezone is not specified, it is considered to be UTC (GMT).

  • Conversely, cast a numeric value representing seconds from the origin to a dateTime with GMT timezone.

  • cast from/to the xs:date type in a similar way (like a dateTime with time equal to 00:00:00).

    xdt:date("1970-01-02") cast as xs:double = 86400
  • cast from/to the xs:time type in a similar way (seconds from 00:00:00).

    xdt:time("01:00:00") cast as xs:double = 3600

5.3. Additional constructors

These constructors allow date, time, dateTime objects to be built from numeric components (this is quite useful in practice).

function xs:date( $year as xs:integer,
                  $month as xs:integer,
                  $day as xs:integer )
  as xs:date

Builds a xs:date from a year, a month, and a day in integer form. The implicit timezone is used.

For example xs:date(1999, 12, 31) returns the same value as xs:date("1999-12-31").

function xs:time( $hour as xs:integer,
                  $minute as xs:integer,
                  $second as xs:double )
  as xs:time

Builds a xs:time from an hour, a minute as integer, and seconds as double. The implicit timezone is used.

function xs:dateTime( $year as xs:integer, $month as xs:integer, $day as xs:integer, 
                      $hour as xs:integer, $minute as xs:integer, $second as xs:double 
                      [, $timezone as xs:double] )
  as xs:dateTime

Builds a xs:dateTime from the six components that constitute date and time.

A timezone can be specified: it is expressed as a signed number of hours (ranging from -14 to 14), otherwise the implicit timezone is used.

5.4. Additional accessors

These functions are kept for compatibility. They are slightly different than the standard functions:

  • they accept several date/time and durations types for the argument (so for example we have get-minutes instead of get-minutes-from-time, get-minutes-from-dateTime etc.),

  • but they do not accept untypedAtomic (node contents): such an argument should be cast to the proper type before being used. So the standard function might be as convenient here.

function get-seconds( $moment )
  as xs:double?

Returns the "second" component from a xs:time, xs:dateTime, and xs:duration.

Can replace fn:seconds-from-dateTime, fn:seconds-from-time, fn:seconds-from-duration, except that the returned type is double instead of decimal, and an argument of type xdt:untypedAtomic is not valid.

function get-all-seconds( $duration )
  as xs:double?

Returns the total number of seconds from a xs:duration. This does not take into account months and years, as explained above.

For example get-all-seconds(xs:duration("P1YT1H")) returns 3600.

function get-minutes( $moment )
  as xs:integer?

Returns the "minute" component from a xs:time, xs:dateTime, and xs:duration.

function get-hours( $moment )
  as xs:integer?

Returns the "hour" component from a xs:time, xs:dateTime, and xs:duration.

function get-days( $moment )
  as xs:integer?

Returns the "day" component from a xs:date, xs:dateTime, xs:day, xs:monthDay and xs:duration.

function get-months( $moment )
  as xs:integer?

Returns the "month" component from a xs:date, xs:dateTime, xs:yearMonth, xs:month, xs:monthDay and xs:duration.

function get-years( $moment )
  as xs:integer?

Returns the "year" component from a xs:date, xs:dateTime, xs:year, xs:yearMonth and xs:duration.

function get-timezone( $moment )
  as xs:duration?

Returns the "timezone" component from any date/time type and xs:duration.

The returned value is like timezone-from-* except that the returned type is xs:duration, not xdt:dayTimeDuration.

6. Error handling

XQuery has currently no mechanism to handle run-time errors.

Actually the language is such that an error handling is not absolutely mandatory: many errors need not be recovered (for example type errors); the doc() function which, can generate a dynamic error, is now protected by a new function doc-available().

However, extensions (namely the Java binding mechanism) can generate errors. It is not possible to provide a protection auxiliary like doc-available() for every functionality.

Qizx provides a try/catch construct, which is a syntax extension. This construct has several purposes.

try { expr } catch($error) { fallback-expr }

The try/catch extended language construct first evaluates the body expr. If no error occurs, then the result of the try/catch is the return value of this expression.

If an error occurs, the local variable $error receives a string value which is the error message, and fallback-expr is evaluated (with possible access to the error message). The resulting value of the try/catch is in this case the value of this fallback expression. An error in the evaluation of the fallback-expression is not caught.

The type of this expression is the type that encompasses the types of both arguments.

Important

The body (first expression) is guaranteed to be evaluated completely before exiting the try/catch - unless an error occurs. In other terms, lazy evaluation, which is used in most Qizx expressions, does not apply here.

This is specially important when functions with side-effects are called in the body. If such functions generate errors, these errors are caught by the try/catch, as one can expect. Otherwise lazy evaluation could produce strange effects.

Example: tries to open a document, returns an element error with an attribute msg containing the error message if the document cannot be opened.

try {
    doc("unreachable.xml")
}
catch($err) {
    <error msg="{$err}"/>
}

7. Miscellaneous

function x:parse($xml-text)
  as node()?

Parses a string representing an XML document and returns a node built from that parsing. This can be useful for converting to a node a string from any origin.

Note that function x:eval could be used too (and it is more powerful, since any kind of node can be built with it), but there are some syntax differences: for example in x:eval, the curly braces { and } have to be escaped by duplicating them.

Parameter $xml-textA well-formed XML document as a string.

Returned value: A node of the Data Model if the string could be correctly parsed; the empty sequence if the argument was the empty sequence. An error is raised if there is a parsing error.

function x:in-range( $value, $low-bound as item(), $high-bound as item() )
  as xs:boolean

function x:in-range( $value, $low-bound as item(), $high-bound as item(), 
                     $low-included as xs:boolean,
                     $high-included as xs:boolean )
  as xs:boolean

Returns true if at least one item from the sequence $value belongs to the range defined by other parameters.

This function is used typically to optimize a predicate in a Library query, for example //object[ x:in-range(@weight, 1, 10) ] which is equivalent to //object[@weight >= 1 and @weight <= 10].

The reason for this function is that the query optimizer is not able to detect such a double test in all situations. The function could become useless in later versions of Qizx, after improvement of the query optimizer.

Parameter $valueAny sequence of items. Items must be comparable to the bounds, otherwise a type error is raised.

Parameters $low-bound, $high-boundLower and upper bounds of the range. They must be of compatible types.

Parameters $low-includedIf $low-included is equal to true(), the comparison used is $low-bound <= $value, otherwise $low-bound < $value. If absent, <= is assumed.

Parameters $high-includedIf $high-included is equal to true(), the comparison used is $value <= $high-bound, otherwise $value < $high-bound. If absent, <= is assumed.

Returned value: True if at least one item from the sequence $value belongs to the range defined by $low-bound, $high-bound.