Wildcards in MarkLogic date queries

Author: Dave Cassel  |  Category: Software Development

I have another reader question today, also from Amit. To summarize, he’s using the Search API and has set up a range constraint like so:

  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="date">
      <range type="xs:date" facet="false">
        <element ns="http://www.marklogic.com/app/meta" name="Date"/>
      </range>
    </constraint>
  </options>

So far, so good. But the query that he wants to run is “date:1980-01-??”; that is, find documents that have a Date element with a year of 1980, the month January, and any day within that month. The problem Amit is having is either one of syntax or his approach, depending on how we look at it.

Revising the Query

The simplest solution here is to change the query a bit. The goal is to find dates in January 1980. We can do so by changing the query from this:

date:1980-01-??

to this:

date GE 1980-01-01 AND date LE 1980-01-31

See the search:search() documentation in the grammar section to see how the GE (greater-than-or-equal-to) and LE (less-than-or-equal-to) operators are defined.

Taking this approach, the only complexity would be knowing the last day of the month in order to properly specify the query. Two options come to mind here:

  1.  Use the functx:last-day-of-month() function.
  2. Write the query this way: “date GE 1980-01-01 AND date LT 1980-02-01”.

Either way, it’s pretty straightforward to set the query up this way.

A Custom Constraint

MarkLogic’s Search API allows you to write a custom constraint, which means you can parse the query yourself and interpret it accordingly. (I did a post on custom facets a while back.) Using this technique, Amit could write a custom constraint that accepts searches like “date:1980-01-??” and translate that into

  cts:and-query((
    cts:element-range-query(xs:QName("meta:Date"), ">=", $begin-date),
    cts:element-range-query(xs:QName("meta:Date"), "<=", $end-date)
  ))

That really works out to doing exactly the same as the Revising the Query approach above, but it pushes the work of finding the range of dates into the search code, instead of having to do it at a higher level.

How Flexible Is This?

The challenge in this approach is that a user might infer more capability that one would probably want to provide in the implementation. For instance, if I can do this search: “date:1980-01-??”, does that mean I can do this one as well: “date:1980-??-01”?

That’s a different thing to achieve because we’re no longer trying to find a range of consecutive values, which range element indexes are very good at. To deal with problem, we have two choices: restrict the type of search or use a different kind of search.

If we don’t need to support finding non-consecutive ranges, then we can simply reject any search string that has numbers to the right of the question marks (1980-??-?? is okay; 1980-??-01 is not). Any string that ends with a bunch of question marks can be parsed to construct a range query.

Supporting Non-consecutive Dates

If we do want to support non-consecutive ranges, we need to go about it differently. We’ll still need to do the custom constraint. We can’t do wildcard searches against a date element range index; rather, we will need a string element range index (we might also want the xs:date index). Once we have that, we can make a call like this:

  let $dates := cts:element-value-match(
    xs:QName("meta:Date"),
    "1980-??-01",
    ("type=string", "collation=http://marklogic.com/collation/codepoint"))

That will give us a list of matching date strings. We can then use that as part of a query:

  cts:element-range-query(xs:QName("meta:Date"), "=", $dates)

I would certainly expect this to be slower than a consecutive-date range constraint, but this would do the job.

Tags: , ,

3 Responses to “Wildcards in MarkLogic date queries”

  1. Geert Says:

    Instead of creating a custom constraint, you could generated named buckets dynamically. A pattern like 1980-01-?? would result in 1980-01-??. Calculate that just before you call search:search, inject that into the search options, and done.

    Not sure how well it works for 1980-??-01, but worked well for me for 1980 and 1980-01 kind of patterns, even 198001-198012 kind of patterns..

    :-)

  2. Geert Says:

    Sorry, bucket xml got lost in the html, my comment should read:

    Instead of creating a custom constraint, you could generated named buckets dynamically. A pattern like 1980-01-?? would result in <bucket name=”1980-01-
    ??” ge=”1980-01-01″ lt=”1980-02-01″>1980-01-??</bucket>. Calculate that just before you call search:search, inject that into the search options, and done.

    Not sure how well it works for 1980-??-01, but worked well for me for 1980 and 1980-01 kind of patterns, even 198001-198012 kind of patterns..

    :-)

  3. Dave Cassel Says:

    Good thought, Geert. There would still need to be some code to translate the query into the bucket, so the question is partially about where you want the parsing code to live. Either way should work well.

Leave a Reply