A custom facet for the Search API

Author: Dave Cassel  |  Category: Software Development

I recently got to teach the MarkLogic Essentials class for MarkLogic University. To illustrate custom facets, I implemented one on the fly with the class watching. No pressure. :) The good news is that it (almost) worked with the first try — there was just one little problem with the display.

This post shows the custom facet that I built. You can find more information and other examples in the MarkLogic Server Search Developer’s Guide.

What is a custom facet?

Regular facets are based on the values in a range element index, which can be nicely displayed next to search results. Every now and then, you might want to build a facet based on something that you can’t build an index on, because it’s not in an element or an attribute. You have to be careful about this, because generating facet values needs to be fast. Often, a better idea than a custom facet will be modifying your data to get the values you want into an element or an attribute. But if that’s not an option, and you can still do the needed calculations quickly, a custom facet may be what you need.

A facet on collections

I decided to do a sample custom facet based on collections that were defined in the database. The sample app for the class was a Top Songs search application. There were two collections set up in the data, capturing the favorite songs of two users: the  “danny” and “jason” collections.

The first step is to set up the options that will be passed to the Search API:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="favorites">
    <custom facet="true">
      <parse apply="parse" ns="http://marklogic.com/MLU/facet"
        at="/modules/facet-lib.xqy" />
      <start-facet apply="start" ns="http://marklogic.com/MLU/facet"
        at="/modules/facet-lib.xqy" />
      <finish-facet apply="finish" ns="http://marklogic.com/MLU/facet"
        at="/modules/facet-lib.xqy" />
    </custom>
  </constraint>
</options>

The parse, start-facet, and finish-facet elements within the custom constraint specify the namespace, localname, and location of the functions that will do the work of the facet. I put all three into a module called facet-lib.xqy.

Parse

When you type a search string for a MarkLogic Server application, that string eventually needs to get converted to some kind of query. Even a simple text search gets converted to a cts:word-query. In your parse function, you will convert the part of the string that represents the constraint/facet to some kind of query. Let’s take an example. Suppose that you type in the search box “favorites:danny”. Favorites matches up with the name of a our contraint, and danny is the particular value that we want to select. In our example, we want to turn that into a cts:collection-query and pass “danny” as the value. Let’s take a look at some code.

declare function facet:parse(
  $constraint-qtext as xs:string,
  $right as schema-element(cts:query))
as schema-element(cts:query)
{
  <root>{cts:collection-query(fn:string($right//cts:text))}</root>/*
};

The signature of the function is mandated by the Search API, including the fact that I need to return a schema-element(cts:query). I’ve found that simply calling a query constructor, such as cts:collection-query(), doesn’t match what I need, so the <root/>/* bit is just a little trickery to make sure my types match up.

The $constraint-qtext parameter will have the name of the constraint and the joiner; in our case, it has “favorites:”. The $right parameter has what was to the right of the joiner. In our example, it has “danny”. However, note that it’s not a string, it’s a cts:query. I have to say, I’ve not yet seen a case where $right holds something other than a cts:word-query wrapped around a cts:text element that holds the string, but I’m imagine there’s some good reason for this parameter not to simply be a string. In any case, fn:string($right//cts:text) gets the text (“danny”), which we then pass to cts:collection-query().

Start

To produce a facet, the search API needs to determine a list of values, calculate their frequencies, and put them into the standard format that we see in the output from search:search(). The first two steps, finding the values and their frequencies, are the role of the function identified in the start-facet element for your facet. In our collections facet, we need to produce a list of the collections and the counts that go with them. To get the right counts, we need to account for whatever query the user has entered. The collections-related part of the query, if any, will have been parsed by our parse function above. Here’s my start function:

declare function facet:start(
  $constraint as element(search:constraint),
  $query as cts:query?,
  $facet-options as xs:string*,
  $quality-weight as xs:double?,
  $forests as xs:unsignedLong*)
as item()*
{
  for $coll in cts:collections((), ($facet-options, "concurrent"), $query, $quality-weight, $forests)
  return <collection name="{$coll}" count="{cts:frequency($coll)}"/>
};

Again, the signature is specified by the Search API. The $constraint parameter gives us access to the <constraint name=”favorites”> element that we added to the Search API options. That can be handy if we want to add some annotations to the constraint and take advantage of them here.

The $query parameter is the complete query from the user, as parsed by the Search API. $facet-options come from the constraint definition, though the documentation tells us to add “concurrent” to any lexicon calls. I’ve never used the $quality-weight or $forests parameters, but I can pass them right along to my call. I use the cts:collections() function to get the list of collections (note that the collections lexicon setting on the database must be on). For each collection, I return a collection element, including a count attribute populated with a call to cts:frequency().

Finish

Finally, we get to the finish function. The role of this function is simply to put the values generated in the start function into the format expected in Search API results. Here’s my finish function:

declare function facet:finish(
  $start as item()*,
  $constraint as element(search:constraint),
  $query as cts:query?,
  $facet-options as xs:string*,
  $quality-weight as xs:double?,
  $forests as xs:unsignedLong*)
as element(search:facet)
{
  element search:facet {
    attribute name {$constraint/@name},
    for $range in $start
    return element search:facet-value{
      attribute name { fn:string($range/@name) },
      attribute count { fn:string($range/@count) },
      fn:string($range/@name)
    }
  }
};

Pretty straightforward: I take the collection elements that I generated in the start function, which are passed in through the $start parameter, and format them into a search:facet element with a search:facet-value element for each collection.

The documentation points out you can actually skip implementing the start function, in which case the finish function is responsible for the lexicon calls. I had to think a little bit about the value of doing them separately, but I think the value comes from the “concurrent” parameter that we used in the start function. Using that, and some good structure inside the Search API itself, MarkLogic Server will be able to kick off processing of one facet and move on to the next while waiting for the first to complete. If you do all the work in the finish function, MarkLogic Server will have to wait for the function call to complete.

Constraints and Facets

Sometimes in this post I’ve used the word “constraint” and other times I’ve said “facet”. Let me take a moment to clarify that. A constraint is a tool for narrowing a search. A facet is a constraint where a list of values is generated and presented to the user. While I’ve described the process of making a custom facet, you can make a custom constraint. There are two differences. First, you would specify facet=”false” when you set up the constraint in the Search API options. Second, you would not define the start or end functions. Just leave them out of the Search API options.

The library

That’s pretty much it. I made a library module with the following lines at the top, and then the parse, start, and finish definitions.

xquery version "1.0-ml";
module namespace facet="http://marklogic.com/MLU/facet";
import module namespace search = "http://marklogic.com/appservices/search" at
  "/MarkLogic/appservices/search/search.xqy";

One more note: there’s nothing magical about the names of the functions — you can call them whatever you want, in whatever namespace you want. That can be useful when your functions live in the same module as some other code, or if you have one library module that implements more than one custom facet.

I have a post rattling around my head about unparsing custom facets, something that doesn’t seem to be well documented. Stay tuned.

Tags: , , ,

3 Responses to “A custom facet for the Search API”

  1. Amit Gope Says:

    Hi David,

    your article made an interesting read. It was very informative. But i am fairly new with the search api and had some other doubt. Would it be possible for you drop your email id so that i can ask my doubt?

  2. Dave Cassel Says:

    Hi Amit, you can find my contact info on the About page.

  3. David Cassel » Blog Archive » Wildcards in MarkLogic date queries Says:

    […] a custom constraint, which means you can parse the query yourself and interpret it accordingly. (I did a post on custom facets a while back.) Using this technique, Amit could write a custom constraint that accepts searches like […]

Leave a Reply