A trick with cts:near-query

Author: Dave Cassel  |  Category: Software Development

A reader recently emailed me a question about an old post on the MarkLogic developer’s mailing list. The goal was to run a query such that an element’s value and an attribute on that element both matched. The problem was that the queries the asker built would match one element’s value and a different element’s attribute value, resulting in a false positive. In this post, I present a solution using cts:near-query(). Let’s take a look:

Sample Data

<mydoc>
  <data>
    <carList>
      <car color="green">M3</car>
      <car color="blue">beetle</car>
      <car color="yellow">fiat</car>
      <car color="red">911</car>
    </carList>
  </data>
</mydoc>

Attempted Solution

cts:search(
  fn:collection("test")/mydoc,
  cts:element-query(
    xs:QName("carList"), 
    cts:and-query(( 
      cts:element-attribute-value-query(xs:QName("car"), xs:QName("color"), "blue"), 
      cts:element-value-query(xs:QName("car"), "M3") 
    )) 
  )
)

This search didn’t work, in that it produced false positives: results that matched the query, but weren’t what the developer was looking for. Specifically, the developer wanted a query that would match if and only if there was a blue M3 car. This query got a hit on the sample document because looking under the carList element (the element-query), there was a blue car and there was an M3 car. However, those two hits are on different cars.

You might think about changing the cts:element-query to target car instead of carList. Interestingly, that works for the cts:element-attribute-value-query, but not for the cts:element-value-query. Why? Because cts:element-query specifies an element that other queries will look inside of. A cts:element-attribute-value-query may target an attribute on the specified element, but in this case the cts:element-value-query would be looking for a car within a car — that doesn’t work.

Revised Solution

Here’s an approach that will work.

cts:search(
  fn:collection("test")/mydoc,
  cts:near-query(
    (
      cts:element-value-query(xs:QName("car"), "M3"),
      cts:element-attribute-value-query(xs:QName("car"), xs:QName("color"), "green")
    ),
    0
  )
)

cts:near-query is like cts:and-query, with the additional requirement that the matching parts of the document be within a specified distance of each other. In this case, I’m requiring a distance of zero, which brings up the question of how near-query measures distance.

Near-query Distance

To think about word positions, think first about the words in elements. In our sample document, “M3”, in the first car element, is in position zero. The word “beetle”, in the second car element, is in position one. Notice that the elements themselves do not affect the position counts, nor do words in the attributes.

Based on my testing, the position for an attribute word is the position value of the next element word. In our sample data, the color attribute of the first car element (“green”) has position zero, same as the “M3” word in the car element. If there were attributes on carList, their positions would also be the same as that of “M3”, being the next element word. The color value on the second car (“blue”) has a position of one, ensuring a distance greater than zero between that attribute value’s postion (one) and the element word’s position (zero) before it.

We can take advantage of this with near-queries, at least in a case like this. By setting up a cts:near-query that requires an attribute value and an element value with zero distance between them, we target an element value and attributes on that same element.

Performance

With the near-query based search, we can now get the correct results. One thing remains: make sure we get correct results quickly. As it stands, we are relying on the filtering stage to get this query right (remembering that MarkLogic searches are done with the two steps of Index Resolution and Filtering). If we try to speed this up by passing the “unfiltered” option to cts:search(), we’ll start getting false positives again. The reason that happens is that the default database settings needed for this near-query are off. Try running the query unfilftered:

cts:search(
  fn:collection("test")/mydoc,
  cts:near-query(
    (
      cts:element-value-query(xs:QName("car"), "M3"),
      cts:element-attribute-value-query(xs:QName("car"), xs:QName("color"), "blue")
    ),
    0
  ),
  "unfiltered"
)

This query runs the index resolution step, finding that our sample document has a car element with the value “M3” and a car element with a “blue” color attribute. The default indexes lack position information, so the query would rely on filtering (loading the document and checking the locations of the hits) to get rid of this false positive. Since we’re running unfiltered, this it slips through.

We can fix that by turning on two database settings: element value positions and attribute value positions. With them on, the index resolution step can see that the “M3” and “blue” values are too far apart, and the candidate match is correctly dropped.

Tags: , , ,

One Response to “A trick with cts:near-query”

  1. Pete Williams Says:

    Dave, nice trick using cts:near-query. For databases that have the ‘word
    searches’ index enabled, here’s an alternative approach that works with
    cts:element-query:

    cts:search(
    fn:collection(“test”)/mydoc,
    cts:element-query(
    xs:QName(“car”),
    cts:and-query((
    cts:element-attribute-value-query(xs:QName(“car”), xs:QName(“color”), “blue”),
    cts:word-query(“M3”)
    ))
    )
    )

    Keep in mind, the use of cts:word-query in place of cts:element-value-query
    may not return expected results as you will now match keywords/phrases in the
    car element, and not necessarily the entire car element value. It all depends
    on your application/dataset if this is acceptable or not.

    To support unfiltered searches, this approach would additionally require the
    ‘word positions’, ‘element word positions’ and ‘attribute value positions’
    indexes to be enabled.

Leave a Reply