Every now and then I get caught by this little gotcha, so I figured I’d share and hopefully by writing about it, I’ll remember to do this right. Let’s start with a little something simple, shall we?
let $seq := (1 to 100) return $seq[10]
Simple, as promised. I create a sequence of numbers from one to one hundred and I ask for the tenth one. I run this and I get 10 as a result. So far, so good.
Now, suppose that I want to get a random element of this sequence. MarkLogic Server provides an xdmp:random() function, so this should be easy, too:
let $seq := (1 to 100) return $seq[xdmp:random(99) + 1]
Randomly generate a number from 0 to 99, add one to get us into the 1 to 100 range, and return the value with that index. I run this one and I get… the empty sequence. I run it again and I get two values. I run it again and get one. What’s going on?
To see what’s going on, let’s run this in CQ using the Profile button.
expression | count |
let $seq := 1 to 100 return $seq[xdmp:random(9) + 1] | 1 |
xdmp:random(9) + 1 | 100 |
xdmp:random(9) | 100 |
$seq[xdmp:random(9) + 1] | 1 |
1 to 100 | 1 |
What we see is the xdmp:random() expression getting called 100 times. Yet if you run Profile on the first implementation ($seq[10]), you’ll see that $seq[10] is evaluated just once and that “10” doesn’t show up as an expression.
When I put a constant in the index operator ([]), XQuery knows exactly which element(s) I want — no work is required. But when I put an expression there, it evaluates the expression once for each element in the sequence and checks whether the current index matches the expression. That lets us do complicated things like
(return the elements whose indexes are divisible by three) but it comes at the cost of evaluating that expression more often than you might expect.
So what should we do instead? Happily, there is a simple solution:
let $seq := (1 to 100) let $index := xdmp:random(99) + 1 return $seq[$index]
This approach returns one value every time it’s called, and profile shows us that each expression is evaluated only once.
Moral of the story: if you have an expression as a sequence index, make sure it’s not doing more work than you intend. Profiling, as always, is your friend.
Tags: gotcha, marklogic, xquery