Sedna LogoBackground Top
 
Home  |  Getting Started  |  Documentation  |  Demo  |  Download  |  Support 

2.2 XQuery Options and Extensions

2.2.1 Controlling Serialization

Serialization is the process of converting XML nodes evaluated by XQuery into a stream of characters. In Sedna serialization is carried out when the result of a query is returned to the user. You can control the serialization by setting the serialization parameters specified in [5]. Currently, Sedna supports the following serialization parameters:




Parameter name Values Description






indent ”yes” or ”no” Output indented
(default yes)



cdata-section-elementsElement list Text within specified elements
e.g. ’name;data’appears within CDATA section



To set a serialization parameter, use the output option in a query prolog. The output option is in the Sedna namespace (http://www.modis.ispras.ru/sedna) which is the predefined namespace in Sedna so you can omit its declaration. The value of the output option must have the following structure ”parameter-name=value; parameter-name=value”. Consider the following examples:

declare namespace se = "http://www.modis.ispras.ru/sedna";  
declare option se:output "indent=yes; cdata-section-elements=script";  
<x><script>K&amp;R C</script><b>element</b></x>

As mentioned above, you may omit the Sedna namespace declaration:

declare option se:output "indent=yes; cdata-section-elements=script";  
<x><script>K&amp;R C</script><b>element</b></x>

This query is evaluated as follows:

<x>  
  <script><![CDATA[K&R C]]></script>  
  <b>element</b>  
</x>

2.2.2 Value index-scan Functions

In the current version of Sedna, query executor does not use indices automatically. Use the following functions to enforce executor to employ indices.

index-scan ($title as xs:string,  
            $value as xdt:anyAtomicType,  
            $mode as xs:string) as node()*

The index-scan function scans the index with the $title name and returns the sequence of nodes which keys are equal (less than, greater than, greater or equal, less or equal) to the search value $value. A Sedna error is raised if the search value can not be cast to the atomic type of the index. The $mode parameter of the xs:string type is used to set the type of the scan. The value of the parameter must be equal to one of the following: ’EQ’ (equal), ’LT’(less than), ’GT’ (greater than), ’GE’ (greater or equal), ’LE’ (less or equal).

index-scan-between ($title as xs:string,  
                    $value1 as xdt:anyAtomicType,  
                    $value2 as xdt:anyAtomicType,  
                    $range as xs:string) as node()*

The index-scan-between scans the index with the $title name and returns the sequence of nodes which keys belong to the interval (segment, left half-interval, right half-interval) between the first $value1 and second $value2 search values. A Sedna error is raised if the search values can not be cast to the atomic type of the index. This function provides the $range parameter of the xs:string type to set the type of the scan. The value of the string must be equal to one of the following: ’INT’ (interval), ’SEG’ (segment), ’HINTR’ (right half-interval), ’HINTL’ (left half-interval).

For example, to select the names of people who live in the London city employing the ”people” index defined in section 2.5.3, use the following expression:

index-scan("people", "London", "EQ")/name

2.2.3 Full-Text Search Functions

Please read section 2.5.4 before reading this section.

In the current version of Sedna, query executor does not use full-text indices automatically. Use the following functions to enforce executor to employ indices.

ftindex-scan($title as xs:string,  
             $query as xs:string,  
             $options as xs:string) as node()*

The ftindex-scan function scans the full-text index with the $title name and returns the sequence of items which satisfy the $query. If dtSearch [15] is used as full-text search backend, use dtSearch request language [16] to specify the query. DtSearch options dtsSearchAnyWords or dtsSearchAllWords may be specified in $options.

For example, you can employ the ”articles” index defined in section 2.5.4 to select the titles of articles that contain word ”apple” but not ”pear”:

ftindex-scan("articles", "apple and not pear")/title

If native full-text indices are used, the following constructs can be used as parts of query:

  • phrases: several words in single or double quotes will be searched as a phrase
  • binary operators: parts of the query separated by whitepace are treated as conjunction, disjunction is specified by ’OR’. For example: query ’apple juice’ will return only nodes contaitinig both words. while query ’apple OR juice’ will return nodes containing any of these words.
  • stemming: if index was created with stemtype=both option, tilde must be appended to the word in order to use stemming, for example in the query ’apple juice~’ stemming will be used only for the second word.
  • contains: to search for words inside some tag, use CONTAINS; the query ’title CONTAINS word’ will return nodes in which word ’word’ occurs as part of tag ’title’.

All keywords (like CONTAINS and OR) must be upper-cased.

The ftscan function returns those items of the input sequence $seq which satisfy the query $query. The function does not use indices and can be applied to any sequence of nodes, even those that are not indexed. The query $query is evaluated over the text representation constructed according to the $type and $customization_rules parameters. The values of the parameters are the same as those used when a full-text index is created (see section 2.5.4 for details).

ftscan($seq as node()*,  
       $query as xs:string,  
       $type as xs:string,  
       $customization_rules as xs:string) as node()*

For example, you can select the titles of articles that contain word ”apple” but not ”pear” without using indices and using special customization rules as follows:

ftscan(doc("foo")/library//article,  
       "apple and not pear",  
       "customized-value",  
       (("b","string-value"),("a","delimited-value")))/title

2.2.4 SQL Connection

SQL Connection allows access to relational databases from XQuery using SQL. The resulting relations are represented on-the-fly as sequences of XML-elements representing rows. These elements have sub-elements corresponding with the columns returned by the SQL query and thus can be easy processed in XQuery. All functions dealing with access to SQL data are located in the namespace http://modis.ispras.ru/Sedna/SQL which is referred as sql in the following function declarations and examples.

Connections

In order to execute SQL queries on a RDBMS, you should first establish a connection to it using one of the sql:connect functions:

function sql:connect($db-url as xs:string) as xs:integer  
function sql:connect($db-url as xs:string,  
                     $user as xs:string) as xs:integer  
function sql:connect($db-url as xs:string,  
                     $user as xs:string,  
                     $password as xs:string) as xs:integer  
function sql:connect($db-url as xs:string,  
                     $user as xs:string,  
                     $password as xs:string,  
                     $options as element()*) as xs:integer

These functions attempt to establish a database connection to the given URL using a user name and password if specified. They return a connection handle which could be then passed to sql:execute, sql:prepare, sql:close, sql:rollback, and sql:commit functions. If connection could not be established, a Sedna error is raised.

All arguments of the sql:connect functions except for $db-url are optional:

  • $db-url is the URL of the database to which a connect is established. It should be one of the following form:
    odbc:<driver name>:[//<server>[/<database>][;]][<options>]

    “;” after <database> or <server> is required if there are some driver options following it. Driver options must be in the following form:

    <option>=<value>{;<option>=<value>}

    List of available options depends on the ODBC driver used. One of the common options is “Port” which is used to specify the port on which the database server is configured to listen. For example:

    odbc:MySQL ODBC 3.51 Driver://localhost/somedb;Port=1234

  • $user is your user name for the session.
  • $password is your password for the session.
  • $options is an optional sequence of connection options. Connection options are elements of the form:
    <sql:option name="<option-name>" value="<option-value>"/>

    The only connection option available for the moment is manual-commit which enables manual commit mode if its value is on.

To disconnect from the database, you can use the following function:

function sql:close($connection as xs:integer) as element()?

It closes database connection associated with connection handle $connection. A Sedna error is raised if operation cannot be completed.

Executing Queries

When a database connection is established you can start executing queries. Two types of query execution are supported: direct query execution and prepared query execution.

Direct Queries

Simple SQL queries are executed as the following XQuery example shows:

declare namespace sql="http://modis.ispras.ru/Sedna/SQL";  
let $connection :=  
   sql:connect("odbc:MySQL ODBC 3.51 Driver://localhost/somedb",  
               "user", "pass")  
return  
   sql:execute($connection,  
               "SELECT * FROM people WHERE first = ’Peter’");

The result will be something like this:

<tuple first="Peter" last="Jackson" city="Wellington"/>

There are two functions for direct query execution:

function sql:execute($connection as xs:integer,  
                     $statement as xs:string) as element()*  
function sql:execute($connection as xs:integer,  
                     $statement as xs:string,  
                     $query-options as element()*) as element()*

These functions execute a SQL query and return a sequence of elements representing the query result. SQL query can be as both a query statement and an update statement. In case of query statement, the result sequence contains an element named ’row’ for each row of the query result. Each element contains as many children attributes as there are non-NULL fields in the corresponding result-row. Each attribute has the name of a row field. Fields with NULL values are not included. In case of update statement, empty sequence is returned.A Sedna error is raised on an erroneous statement.

The sql:execute have the following arguments:

  • $connection is a connection handle, returned by sql:connect function;
  • $statement is a string containing SQL statement to be executed;
  • $query-options is a sequence of optional query parameters.

Update queries can be executed using the sql:exec-update function:

function sql:exec-update($connection as xs:integer,  
      $statement as xs:string) as xs:integer  
function sql:exec-update($connection as xs:integer,  
      $statement as xs:string,  
      $query-options as element()*) as xs:integer

these functions are similar to sql:execute, but return the number of rows affected by an update query (instead of an empty sequence returned by sql:execute for update-queries). Function arguments are same as for sql:execute. The behaviour of this function is undefined for non-update queries.

Prepared Statements

Sometimes it is more convenient or more efficient to use prepared SQL statements instead of direct query execution. In most cases, when a SQL statement is prepared it will be sent to the DBMS right away, where it will be compiled. This means that the DBMS does not have to compile a prepared statement each time it is executed.

Prepared statements can take parameters. This allows using the same statement and supply it with different values each time you execute it, as in the following XQuery example:

declare namespace sql="http://modis.ispras.ru/Sedna/SQL";  
let $connection :=  
   sql:connect("odbc:MySQL ODBC 3.51 Driver://localhost/somedb",  
               "user", "pass")  
let $statement :=  
   sql:prepare($connection,  
               "INSERT INTO people(first, last) VALUES (?, ?)")  
return (sql:execute($statement, "John", "Smith"),  
        sql:execute($statement, "Matthew", "Barney"))

this XQuery code inserts two rows into table people and returns an empty sequence.

To use prepared statements, first you need to create a prepared statement handle using the sql:prepare function:

function sql:prepare($connection as xs:integer,  
                     $statement as xs:string) as xs:integer  
function sql:prepare($connection as xs:integer,  
                     $statement as xs:string,  
                     $query-options as element()*) as xs:integer

these functions prepare a SQL statement for later execution and returns a prepared statement handle which can be used in the sql:execute and sql:exec-update functions. A Sedna error is raised on an erroneous statement.

The sql:prepare functions have the following arguments:

  • $connection is a connection handle, created by sql:connect function;
  • $statement is a string containing a SQL statement that may contain one or more ’?’ - IN parameter placeholders;
  • $query-options is a sequence of optional query parameters.

There are two prepared statement execution functions, similar to direct query execution:

function sql:execute($prepared-statement as xs:integer,  
                     $param1 as item()?,  
                     ...) as element()*

this function is similar to sql:execute for direct queries and returns a sequence of elements representing the query result.

The sql:execute function have the following arguments:

  • $prepared-statement is a prepared statement handle created by $sql:prepare;
  • $param1, ... are parameters for parametrized statements. The number of parameters specified must exactly match the number of parameters of the prepared statement. NULL values are represented as empty sequences ().

To execute a prepared update statement you may use exec-update function:

function sql:exec-update($prepared-statement as xs:integer,  
                         $param1 as item()?,  
                         ...) as xs:integer

This function is similar to sql:execute, but returns the number of rows affected by an update query (instead of an empty sequence returned by sql:execute for update-queries). Function arguments are the same as for sql:execute. The behavior of this function is undefined for non-update queries.

Transactions

The default commit mode of connection is auto-commit, meaning that all updates will be committed automatically. If this is not desired behaviour, you can pass manual-commit option to sql:connect when you create a connection handle.

In manual commit mode you can specify when updates will be committed or rolled back:

declare namespace sql="http://modis.ispras.ru/Sedna/SQL";  
let $connection :=  
  sql:connect("odbc:MySQL ODBC 3.51 Driver://localhost/testdb",  
              "user-name",  
              "user-password",  
              <sql:option name="manual-commit" value="1"/>)  
return  
  for $person in doc("auction")/person  
  return (  
    sql:execute($connection, "<do something with person>"),  
    if (fn:all-is-ok($connection, $person)) then  
      (  
        sql:execute($connection, "<do something with person>"),  
        sql:commit($connection)  
      )  
    else  
      sql:rollback($connection))

There are two functions for specifying transaction boundaries - sql:commit and sql:rollback (transactions are started automatically by queries, these functions only close them):

function sql:commit($connection as xs:integer) as element()?

sql:commit function commits all changes made during the last transaction in the database connection specified by connection handle $connection and closes transaction. A Sedna error is raised if operation cannot be completed.

Function sql:rollback rolls back all changes made during the last transaction in the database connection specified by the connection handle $connection and closes transaction. A Sedna error is raised if operation cannot be completed.

function sql:rollback($connection as xs:integer) as element()?

2.2.5 External Functions

External function is a notion defined in the XQuery specification [3] as follows: ”External functions are functions that are implemented outside the query environment”. Support for external functions allows you to extend XQuery by implementing functions in other languages.

Sedna provides a server programming API to write external functions in the C/C++ language. External functions in Sedna are limited to dealing with sequences of atomic values. External functions are compiled and linked in the form of shared libraries (i.e. .dll files in Windows or .so files in Linux/FreeBSD and .dylib in Mac OS) and loaded by the server on demand. Although the Sedna XQuery executor evaluates queries in a lazy manner, all external function calls are evaluated in an eager manner.

Using External Functions

To use an external function you need to declare this function in prologue with external keyword instead of function body. Then it may be used normally:

declare function se:f($a as xs:integer) as $xs:integer external;  
f(10)

Creating External Functions

External functions must be written in C/C++. To implement a new XQuery function func you should write the following C (or C++) functions: func, func_init and func_deinit. When executor decides that it needs to use an external function, first it initializes this function by calling func_init, after that it will call func to compute results as many times as needed. When some external function is not needed anymore, executor calls func_deinit (which probably will free any memory allocated by func_init). Each one of the three functions receives an SEDNA_EF_INIT1 structure as a parameter. This structure has several fields that are initialized by executor before any func_init or func_deinit calls:

typedef struct sedna_ef_init  
{  
    void *(*sedna_malloc)(size_t);  
    void (*sedna_free)(void *);  
    SEDNA_SEQUENCE_ITEM *item_buf;  
} SEDNA_EF_INIT;

The fields of this structure may be used in your implementation:

  • sedna_malloc is a pointer to a malloc function which must be used to allocate memory for function results, this memory will be automatically freed by the query executor. It may also be used to allocate memory for internal use, such memory must be freed manually using the sedna_free function.
  • sedna_free is a pointer to free function that releases memory allocated using sedna_malloc function.
  • item_buf is a pointer to a preallocated SEDNA_SEQUENCE_ITEM which may be used to store results (this allows to avoid using sedna_malloc function when result is a single atomic non-string value)

func, func_init and func_deinit must have specific signatures:

  • func() (required) – computes external function results. This function has the following signature:
    SEDNA_SEQUENCE_ITEM *func(SEDNA_EF_INIT *init,  
                              SEDNA_EF_ARGS *args,  
                              char * error_msg_buf);

    • init is a pointer to the SEDNA_EF_INIT structure which was passed to func_init function (if written);
    • args is a pointer to the SEDNA_EF_ARGS structure which contains all function arguments;
    • error_msg_buf is a pointer to the string buffer used for specifying error message if function invocation fails. Maximum message length is SEDNA_ERROR_MSG_BUF_SIZE bytes, including the null character ’\0’.
  • func_init() (optional) – the initialization function. It can be used to allocate any memory required by the main function. This function has the following signature:
    void func_init(SEDNA_EF_INIT *init, char * error_msg_buf);

    • init is a pointer to the SEDNA_EF_INIT structure (the pointer to this structure will be passed then to func and func_deinit functions);
    • error_msg_buf is a pointer to the string buffer used for specifying error message if function invocation fails. Maximum message length is SEDNA_ERROR_MSG_BUF_SIZE, including the null character ’\0’.
  • func_deinit() (optional) – the deinitialization function. It should deallocate any memory allocated by the initialization function. This function has the following signature:
    void func_init(SEDNA_EF_INIT *init, char * error_msg_buf);

    • init is a pointer to the SEDNA_EF_INIT structure which was passed to func_init function (if written);
    • error_msg_buf is a pointer to the string buffer used for specifying error message if function invocation fails. Maximum message length is SEDNA_ERROR_MSG_BUF_SIZE, including the null character ’\0’.

When func, func_init or func_deinit is being executed error_msg_buf contains an empty string. If function succeedes, it should leave this value empty. In case of error a non-empty string (error description) must be placed in error_msg_buf (if you place an empty string in error_msg_buf executor assumes that function execution was successful).

Each shared library must also export an null-terminated array with the names of the XQuery functions defined by this library:

char const *ef_names[] = {"func", NULL};

The file sedna_ef.h defines several types for representing function arguments and results:

  • SEDNA_ATOMIC_TYPE – represents an atomic type, defined as:
    typedef enum sedna_atomic_type {  
         SEDNATYPE_integer,  
         SEDNATYPE_float,  
         SEDNATYPE_double,  
         SEDNATYPE_string  
    } SEDNA_ATOMIC_TYPE;

  • SEDNA_ATOMIC_VALUE – represents an atomic value, defined as:
    typedef int     SEDNA_integer;  
    typedef float   SEDNA_float;  
    typedef double  SEDNA_double;  
    typedef char   *SEDNA_string;  
    typedef struct sedna_atomic_value {  
        SEDNA_ATOMIC_TYPE type;  
        union {  
            SEDNA_integer   val_integer;  
            SEDNA_float     val_float;  
            SEDNA_double    val_double;  
            SEDNA_string    val_string;  
        };  
    } SEDNA_ATOMIC_VALUE;

    Memory for values that are pointers (i.e. SEDNA_string) MUST be allocated using the malloc function passed in the SEDNA_EF_INIT structure.

  • SEDNA_SEQUENCE_ITEM – represents a node in a linked list of atomic values, defined as:
    typedef struct sedna_sequence_item {  
        SEDNA_ATOMIC_VALUE data;  
        struct sedna_sequence_item *next;  
    } SEDNA_SEQUENCE_ITEM;

    Linked lists are used to represent sequences of atomic values. An empty sequence is presented by a NULL pointer. If func needs to return a sequence of values, memory for nodes MUST be allocated using the malloc function passed in SEDNA_EF_INIT structure.

  • SEDNA_EF_ARGS – represents an array of arguments passed to a function, defined as:
    typedef struct sedna_ef_args {  
        int length;  
        SEDNA_SEQUENCE_ITEM **args;  
    } SEDNA_EF_ARGS;

Location of External Function Libraries

Compiled libraries must be placed in the directory lib that is (1) in the same directory where the directory data with database data is located or (2) in the directory <db_name>_files where database data are stored2 . Libraries that are database-independent should be placed in (1). Libraries that are database-specific should be placed in (2). Overloaded functions are not allowed. If two libraries located in (1) and (2) contain functions with the same name, a function from the library in (2) is called. If libraries in the same directory (1 or 2) contain functions with the same name, it is not specified which one is called.

There is a sample external function code available in the folder:

     [win:] INSTALL_DIR\examples\api\external-functions\c\  
     [nix:] INSTALL_DIR/examples/api/external-functions/c/

where INSTALL_DIR refers to the directory where Sedna is installed.

2.2.6 Runtime Properties

The se:get-property function provides a method for applications to determine in runtime the current values of system parameters, configurable limits, environment information. The name argument specifies the system variable to be queried. The function is defined within the predefined Sedna namespace (se prefix) as follows:

se:get-property($name as xs:string) as item()

The available names are as follows:

  • $user - retrieves string which contains current user name