XQuery: finding values in one sequence that aren’t in anotherAuthor: Dave Cassel | Category: Software Development
During a recent working session, a question came up about how to quickly find all the values in one sequence that aren’t in another. A little poking around the web turns up what seems to be the standard approach:
I found that approach in a couple places, including functx’s implementation. If I’m thinking about it right, for every element in $seq1, we’re traversing $seq2, making it an O(n^2) operation (not positive about that). Here’s a different approach I thought of, making use of a couple MarkLogic operators:
The ! operator is to map a sequence of arguments onto a function. Thus,
$seq1 ! map:entry(fn:string(.), fn:true())
is the same as
for $item in $seq return map:entry($item, fn:true())
The map-minus operator ($map2 – $map1) returns a map the elements in $map2 that are not in $map1.
Comparing the methods using this code:
… the distinct-values approach runs in about 4.2-4.4 seconds on my laptop; the map approach runs in about 0.24 seconds. Using maps involves a little more code to set things up, but it’s pretty small and seems to be worth it for a long sequence.