Fixing a slow XQuery functionAuthor: Dave Cassel | Category: Software Development
Sometimes an application is slower to respond than we like. There are a variety of reasons this can happen — hardware, indexes, algorithms. MarkLogic offers some tools to investigate the software aspects of code running too slow. The rest of this post is a case study investigating slow responses on a project I worked on last year. I’ll mention some of the tools you can use and some of the problems you can look for.
Last year I was involved in a project where I was seeing page-load times over 10 seconds. Since I’m used to seeing sub-second responses for even complex queries, it was clear something had gone very wrong. I fixed a couple smaller things before locking onto the big problem. The major problem turned out to be a pretty important change, affecting the data structure for the whole application!
Looking for problems
I’d load a page and not see any data for 10+ seconds. I started my investigation with Chrome’s Developer Tools (View -> Developer -> Developer Tools) and went to the Network tab to see specifically what URL was taking so long (an AJAX request). That gave me a URL that was hitting an XQuery module. It turned out that module just did a little processing of the HTTP parameters, then passed them into a function (gs:search) in a library module (search-lib.xqy). So far, so good — that’s a good practice that allows me to isolate the functionality I need to work with. The next tool I turned to was Query Console. I set the Content Source to the application’s database, used an import statement to make the library function available.
import module namespace gs = "http://marklogic.com/project/search" at "/app/ws/lib/search-lib.xqy";
Then I called the function, passing in some realistic parameters, which I’d also gotten from the Network tab in the browser. I called it once just to make sure I had the parameters right, then turned to the profiler. (Note that this means I was working with a warm cache. Go directly to the profiler will give the results you’d see with a cold cache.) To run the profiler in either MarkLogic 6 or 7, you click on the Profile tab before clicking the Run button. The profiler gives you a table, showing you the location of an expression (file, line number), the count (how many times that expression was evaluated), Shallow % (what percentage of the run time did it take to evaluate this particular expression), and Deep % (what percentage of the run time did it take to evaluate this expression and any expressions kicked off by it, such as when a function calls another function). That pointed me toward a function called gs:doc-compliance-map(). In the particular search that I ran, it was called over 300 times and took up a significant percentage of the total run time.
Pagination isn’t just for the front-end code; it makes life easier on the server, too. After all, results past the first page or two may well never be looked at, so why have the server do the work? As mentioned, for this application at this stage, it turned out this was okay, and fixing this was relegated to the future.
Takeaway: limit search requests to the amount of data you’re going to show. You can always come back for more.
The profiler had narrowed the problem down to the gs:doc-compliance-map() function. Before we go any further, let’s take a look at that function.
On taking a closer look, I saw that gs:doc-compliance-map() called gs:highlight(). Since the Deep % for gs:doc-compliance-map() was higher than its Shallow %, I figured that was worth a look. Here’s gs:highlight:
Takeaway: Use the profiler to determine where code is slow — it’s the right tool for the job. Focus on functions where the shallow % is high.
Let’s think about what gs:highlight is doing. It loops over the results of cts:search() and calls cts:highlight() on each of those results, doing some in-text replacements to wrap search matches in <b> tags. We’re passing two things to cts:search(). The first says that we’re only interested in results in the document at a particular URI; the second says that the document needs to match some passed-in search criteria. Given that we’re not using fragmentation in this database, and therefore a document = a fragment, limiting the results to a particular URI means that we don’t need to loop on the results. Calling cts:search() makes sense if we want to ensure that the document at that URI matches the search. In this case, we know the document at that URI matches the search, because up in the gs:search() function (not shown), this was one of the URIs found by a call to search originally! So far, what we need is more simply expressed like this:
But we can go a step further. The highlight function marks up the document based on a search, surrounding hits with <b> tags. But looking at the doc-compliance-map() function, nothing is done with the <b> tags! It turns out that we can simplify the doc-compliance-map function itself by replacing the call to gs:highlight() with fn:doc(), simply retrieving the document.
Takeaway: Sometimes slowness is due to work that is simply unnecessary. If it doesn’t contribute to what you’re trying to accomplish, skip it.
The next problem I noticed with the code was that gs:doc-compliance-map() loops across the same doc repeatedly. The actual intent of the code is to find out how many of the section and subsection elements had text that matched each of three specific terms. I consolidated into one loop. I also extracted out the terms so that the list could more easily be adjusted if necessary (the original version had the actual terms as hard-coded strings).
Takeaway: If you’ve got multiple loops over the same data, and the loops don’t have interdependencies, you can probably consolidate them.
After making that adjustment, I ran some tests and saw that 1) the results had stayed the same and 2) it ran faster. Making progress. By this point, I was getting familiar with the code which led me to …
Problem #4 (the big one)
Having taken care of some easy things that made a difference, I’d zeroed in on what was really causing the problem. Take a look at line 12 of the previous gist and you’ll see fn:contains(). Why? It turns out that there is one compliance element per section or subsection. If more than one value was needed, they were concatenated in the element, with a space in between. The app needed to make a “contains” check instead of looking for elements with a specific value. Done this way, the application depends on string comparisons. There’s a better way.
What we really want to do is take advantage of MarkLogic’s indexes. By setting up an element range index on the compliance element, we can use cts:element-values() and cts:frequency() to get the counts quickly. No string comparisons, just a simple index lookup. Let’s see what that looks like:
This change does require reprocessing the current content to split those compliance elements that have multiple values, but it’s definitely worth while.
The revised function gets the values within a single document and uses the item-frequency option to find the counts. In this case, the fact that we’re not factoring in the section and subsection is fine; given the document structure we don’t need it. If we did, we’d use a path range index instead of a simple element range index.
Comparing this version to the original, I saw approximately a 90% cut in run time for my test data. No other change came close.
Takeaway: Make use of indexes; if you can avoid loading documents and looking through them, you code will run much faster.
That last take-away is connected to a big part of making MarkLogic searches fast — if you aren’t yet familiar with the index resolution and filtering stages of a search, it’s worth taking the time to read about.