XQuery: finding values in one sequence that aren’t in another

Author: Dave Cassel  |  Category: Software Development

During a recent working session, a question came up about how to quickly find all the values in one sequence that aren’t in another. A little poking around the web turns up what seems to be the standard approach:

I found that approach in a couple places, including functx’s implementation. If I’m thinking about it right, for every element in $seq1, we’re traversing $seq2, making it an O(n^2) operation (not positive about that). Here’s a different approach I thought of, making use of a couple MarkLogic operators:

The ! operator is to map a sequence of arguments onto a function. Thus,

$seq1 ! map:entry(fn:string(.), fn:true())

is the same as

for $item in $seq
return map:entry($item, fn:true())

The map-minus operator ($map2 – $map1) returns a map the elements in $map2 that are not in $map1.

Comparing the methods using this code:

… the distinct-values approach runs in about 4.2-4.4 seconds on my laptop; the map approach runs in about 0.24 seconds. Using maps involves a little more code to set things up, but it’s pretty small and seems to be worth it for a long sequence.

Tags: , ,

2 Responses to “XQuery: finding values in one sequence that aren’t in another”

  1. Rajesh S Says:

    Very good logic. Thanks

  2. ggeemaa Says:

    How to store the result of an XQUERY as a list?

      let $i := doc("s1.xml")//PNR
      for $d in distinct-values($i/@adsUniqueID),
          $n in distinct-values($i[@adsUniqueID = $d]/owner/office/@IATA)
      return
        <result adsUniqueID="{$d}" Owner_cityCode="{$d}"/>
    

    The above query will produce the output as:

      <result adsUniqueID="1234" owner_cityCode="HIM"/>
      <result adsUniqueID="5678" owner_cityCode="HIN"/>
    

    Instead I would like to have the result as:
    adsUniqueID owner_cityCode
    1234 HIM
    5678 HIN

    (or)

    (12234,HIM)
    (5678,HIN)

    Can anybody provide me a solution for this

Leave a Reply