Unparsing a custom facet

Author: Dave Cassel  |  Category: Software Development

In many search applications, when we show the results of a user’s search, we also want to display the search that was done by putting the query string into a text box, especially if the search is built up from an advanced search screen. In some cases, we might want to modify the search before displaying it. For instance, part of your UI might be a control that removes a constraint from the current search. It’s pretty straightforward to convert the query string to cts:query elements, remove the particular constraint, and then convert the cts:query elements back to a query string. App Builder applications do this. But in order for that to work, your custom constraint needs to contain the information needed to unparse it.

Parsing and Unparsing

In this post, I’m going to show how to make your custom facets unparseable (remember that means converting a parsed query back to a string, not that your query can’t be parsed). The MarkLogic Server documentation has some examples, but not much description of what needs to be done. I found out what we need to do by digging into the Search API code that comes with MarkLogic Server.

In my last post, I showed how to construct a custom facet with the MarkLogic Server Search API. The functions I wrote to make the custom facet included a parse function, which converted part of the query string, such as “favorites:danny”, into a cts:search element. To go the other direction, I can pass a cts:query to search:unparse() and get back the query string. Let’s look at a set of Search API options:

declare variable $options :=
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="favorites">
      <custom facet="true">
        <parse apply="parse" ns="http://marklogic.com/MLU/facet"
          at="/modules/facet-lib.xqy" />
        <start-facet apply="start" ns="http://marklogic.com/MLU/facet"
          at="/modules/facet-lib.xqy" />
        <finish-facet apply="finish" ns="http://marklogic.com/MLU/facet"
          at="/modules/facet-lib.xqy" />
      </custom>
    </constraint>
    <constraint name="genre">
      <range type="xs:string" facet="true">
        <element ns="http://marklogic.com/MLU/top-songs" name="genre"/>
      </range>
    </constraint>
    <return-query>true</return-query>
  </options>;

These options include the custom facet I defined in my previous post, as well as a facet on the genre element. With the <return-query>true</return-query> option, the results of a search:search() call will include the parsed query. I can go directly to this step by calling search:parse(), in order to see how a query string is being interpreted. So if I call search:parse(“night genre:rock”, $options), looking for rock songs that include the word night, the search API will convert that to:

<cts:and-query strength="20" qtextjoin="" xmlns:cts="http://marklogic.com/cts">
  <cts:word-query qtextref="cts:text">
    <cts:text>night</cts:text>
  </cts:word-query>
  <cts:element-range-query qtextpre="genre:" qtextref="cts:annotation" operator="=">
    <cts:element xmlns:_1="http://marklogic.com/MLU/top-songs">_1:genre</cts:element>
    <cts:annotation qtextref="following-sibling::cts:value"/>
    <cts:value xsi:type="xs:string" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">rock</cts:value>
  </cts:element-range-query>
</cts:and-query>

We got an and-query, consisting of a word query for the word “night” and an element-range-query for the genre. I can take these results and pass them to search:unparse(), and I’ll get my query string back: search:unparse(search:parse($query-str, $options)) will be the same as the $query-str (whitespace may be different). Now, what happens if I try that with my custom facet? search:parse(“night favorites:danny”, $options) =>

<cts:and-query strength="20" qtextjoin="" xmlns:cts="http://marklogic.com/cts">
  <cts:word-query qtextref="cts:text">
    <cts:text>night</cts:text>
  </cts:word-query>
  <cts:collection-query>
    <cts:uri>danny</cts:uri>
  </cts:collection-query>
</cts:and-query>

So far so good: “night” gets parsed to a word-query again, while “favorites:danny” becomes a collection query, based on the parse function that I wrote. Let’s pass those results to search:unparse():

search:unparse(search:parse("night favorites:danny", $options))
SEARCH-NONANNOTATED: (err:FOER0000) Non-annotated query

Annotating a query

To see what’s happening, take a closer look at the cts:element-range-query element that search:parse() generated in the first example and compare it to the cts:collection-query element generated in the second example. Note that the first has elements @qtextpre and @qtextref. These two attributes tell the search API how to go from the parsed query back to a string representation of the query — the query text. The @qtextpre attribute indicates the first part of the constraint reference. The @qtextref attribute tells the Search API to look at the cts:annotation element below to figure out the value that should be to the right of the colon in the string. In order to make the favorites facet unparsable, it needs some attributes that tell the Search API how to do it. To make this happen I’d modify my custom facet’s parse function to add the attributes, such that search:parse(“favorites:danny”) would produce this instead:

<cts:collection-query qtextpre="favorites:" qtextref="cts:uri">
  <cts:uri>danny</cts:uri>
</cts:collection-query>

The @qtextpre attributes indicates the name of the constraint, as well as the joiner, while @qtextref tells the search API where to find the value for the rest of the query text. In this case, it should look in the cts:uri element that is part of the cts:collection-query.

The annotation attributes

@qtextconst

This attribute provides the Search API with a simple, one-step approach to going back to the query string. You can build the string that should represent the query right as you parse it. This is the easiest way to handle unparsing.

<cts:collection-query qtextconst="favorites:danny">
  <cts:uri>danny</cts:uri>
</cts:collection-query>

@qtextpre, @qtextpost

These two attributes are used as constant prefixes and suffixes, sandwiched around a value calculated from the query itself. The prefix is likely to be a constraint name and the joiner. The suffix is most likely to be a quotation mark, if the value includes a space, but could potentially be some other marker used to indicate the end of the constraint value.

<cts:collection-query qtextpre="favorites:" qtextref="cts:uri">
  <cts:uri>danny</cts:uri>
</cts:collection-query>

@qtextref

This attribute is pretty interesting. The intent of this attribute is to tell the Search API where it can find some part of the query text that it is trying to generate. It takes several different values that might lead the unwary into thinking that you can put an XPath expression here and the Search API would use that to navigate down to the text it needs. Not quite — the analysis of this attribute is looking for one of the following strings, and if it doesn’t find one, it will trigger an error, saying that it “can’t follow ref to” wherever the attribute points. I’m going to break the valid values into two sets: those that are essentially acceptable XPath statements, indicating the node whose value should be converted to a string, and those that tell the Search API to dive deeper and recurse.

Find the string

My custom facet example above uses one of these: the cts:uri value. The Search API will find the cts:uri element inside the cts:collection-query element and take its string value. The following-sibling values are mostly likely to be used on cts:annotation elements within a query (see below).

  • cts:text
  • cts:uri
  • following-sibling::cts:text
  • following-sibling::cts:value
  • following-sibling::cts:uri
  • following-sibling::cts:region

Recurse

For each of these values, the Search API will look for a matching element under the current element and process its @qtextpre, @qtextref and @qtextpost attributes.

  • cts:annotation
  • schema-element(cts:query)

Remember that these are string values that you would put in the @qtextref attribute — no substitutions allowed. For instance, you could not specify a different kind of schema-element besides cts:query.

I asked Micah, one of my colleagues in Engineering, about why someone would use @qtextpre, @qtextref, and @qtextpost instead of the simpler @qtextconst. He told me “the reason … is so that if the query itself gets manipulated (say, changing the value of the <cts:uri>) it all flows through to unparse.” So this system allows for manipulation of the parsed query itself. I haven’t needed this level of flexibility yet for the custom facets I’ve written, but if you’ve seen a good example, I’d be interested in hearing about it in a comment.

Tags: , ,

Leave a Reply