Range Indexes and Empty Elements

Author: Dave Cassel  |  Category: Software Development

I’ve started a couple posts lately, only to find them more complex than expected. So to get myself back on the board, tonight you get a pretty simple one, based on an error one of my colleagues encountered recently. Suppose we have an int range element index set up on <count/> in our MarkLogic database.

xdmp:document-insert(
  "/test.xml",
  <doc>
    <count/>
  </doc>
)

This triggers an error:

[1.0-ml] XDMP-RANGEINDEX: xdmp:eval(“xdmp:document-insert(&#10; &quot;/test.xml&quot;,&#10; <doc>&#…”, (), <options xmlns=”xdmp:eval”><database>13528717381355350321</database><root>/Users/dcassel/gi…</options>) — Range index error: int fn:doc(“/test.xml”)/doc/count: XDMP-CAST: (err:FORG0001) Invalid cast: xs:untypedAtomic(“”) cast as xs:int

So what’s going on? When you set up a range index on an element, MarkLogic will add any new values to the index. The value of the <count/> element above is “” (empty string), which has no valid interpretation as a number. You’ll encounter the same problem with any non-string element or attribute range index (I see it a lot with dates that don’t match the required format).

The Solution

There are basically two choices if you don’t have an actual value to put in an element. The first is to assume a sensible default. For an element called count, that’s probably zero. The other approach is to skip the element altogether. While that can be simple for the insert, it does make an update a little more complex, as you need to check whether you will insert a element or replace an existing one:

declare function local:update-count($doc, $new-ct)
{
  if (fn:exists($doc/count)) then
    xdmp:node-replace($doc/count, <count>{$new-ct}</count>)
  else
    xdmp:node-insert-child($doc, <count>{$new-ct}</count>)
};

Besides complexity, the other factor to consider is whether a missing value is likely to be provided later. If the value might never be provided (for instance, you have a lot of possible elements, but only a few usually get filled), then you’re better off skipping them — the empty elements don’t server a purpose. If it is likely that an element will get a value, then populating it with a reasonable default keeps your code a little simpler.

 

Tags: ,

One Response to “Range Indexes and Empty Elements”

  1. ankit kakkar Says:

    ML 6.0 has “invalid values” parameter while creating range indexes. It can let you either ignore the error and ingest the data OR you can reject the document.

Leave a Reply